Our first step encompassed creation of ISs for wild type sequence of ASXL1 and 76 sequences with AASs. Second, we calculated ISM scores for each frequency in the IS. In the third step, we performed Mann-Whitney U Test on these scores Pazopanib FGFR related to the frequency with highest amplitude value in IS of wild type sequence��F(0.036). As it did not significantly discriminates between SNPs and mutations, we applied the same statistical test for the next highest peak frequency in the spectrum. We went on with this procedure until we identified IS peak frequency F(0.476) that discriminate disease related mutations (p = 0.018) (Figure 3(a)). 75% of sequences with SNPs had lower and 77% of sequences with mutations had higher values of amplitudes compared to wild type (Figure 4).
Figure 3Process for the selection of significant frequencies from the spectra of ASXL1 (a), EZH2 (b), DNMT3A (c), and TET2 (d).Figure 4Distribution of ISM scores.EZH2 is frequently mutated in lymphoid malignancies, with the hot spot on Tyr641 [45]; however, mutations in myeloid malignancies are spread throughout the entire sequence with no hot spot. ISM algorithm identified frequency F(0.411) that significantly discriminates sequences with SNPs and mutations, with p = 0.003 (Figure 3(b)). Six SNPs containing sequences had amplitude value corresponding to this frequency below the value of wild type, while approximately half of sequences with mutations had higher values of amplitudes than wild type (Figure 4).In DNMT3A sequence, 6 SNPs and 41 mutations were separated at IS frequency F(0.071) with p = 0.
041 (Figure 3(c)). Contrary to the ASXL1 and EZH2,the majority of sequences with SNPs had amplitude values above wild type value (83%), while more than half of the sequences with mutations (51%) had corresponding amplitudes lower than wild type (Figure 4).Finally, we analyzed 45 TET2 sequences with SNPs and 121 with mutations. IS frequency F(0.491) was shown to be significant classifier (p = 0.025) (Figure 3(d)) separating sequences with SNPs (60% below wild type value) and with mutations (55% above wild type value) (Figure 4). Since TET2 variations make the largest proportion of our dataset, we used them for cross-validation of our method for frequency selection. We randomly split them into five groups, and each time we submitted four different groups to the ISM-based algorithm.
All analyses resulted in the identification of F(0.491) as the most important frequency, which indicates minimal bias in our performance evaluation.4.4. Performance of ISM Algorithm on AASs outside CFDs and Comparison with PolyPhen-2 and SIFTThis research is focused on predictions of functional effects of AASs in nCFDs. We compared predictive Dacomitinib power of ISM algorithm and commonly used MSA-based PolyPhen-2 and SIFT on the subset of our data, which contained 108 SNPs and 51 mutations.