The red transcript represents the novel TAR. Each of the other colors represents an ortholog pair in the two species. Taken together, these results suggest that: 1) the isolated novel sequences are conserved at the sequence level, and, therefore, likely to be transcribed, relative to the other H. capsulatum strains in most cases, and relative to B. dermatitidis for about half of the cases; 2) transcripts with deeply conserved sequence across the Onygenales also tend to be predicted as genes in most of these fungi; and 3) for about half of the isolated novel sequences, a corresponding gene prediction exists in
another genome, highlighting differences in the prediction pipelines, while the other half represent truly novel discoveries of this tiling experiment. CFTRinh-172 manufacturer Using standard expression profiling and sequence homology to enrich gene validation To complement our tiling arrays, we took advantage of our archive of expression selleck screening library data compiled across several distinct growth conditions, including iron limitation, and all three morphologies (yeast, mycelia, and conidia). We surveyed whether gene predictions were detected in these expression
profiling experiments, which employed whole-genome oligonucleotide microarrays where each prediction was represented by one or two gene-optimized 70 mer BAY 63-2521 manufacturer probes. Additionally, we used INPARANOID[12] to determine if gene predictions had homologs in other fungi. This validation by inferred homology to genes in other fungi relied on sequence conservation independent of expression pattern. The validation criteria for each strategy are given in the methods section and the results are summarized in Figure 7 (detailed per-gene
results are available as Additional file 1, Table S1 and may be browsed interactively at http://histo.ucsf.edu). By these criteria, 8,115 non-repeat predicted proteins were validated by gene expression and 7,129 were validated by sequence homology. Figure 7 A majority of predicted genes are validated by multiple methods. Summary of genes validated by tiling (red), homology (blue), or expression Dichloromethane dehalogenase (white). The circles on the right indicate special, disjoint classes: novel, tiling-detected transcripts with no corresponding gene prediction (yellow); predicted genes not validated by any method (green); and predicted genes with significant overlap to repeat regions (excluded from the analysis) (brown). Genes that were validated by tiling, gene expression, and sequence homology represented the largest category of predictions (5,379 genes) and accounted for 56% of the non-repeat predicted gene set. The next largest category was 1,404 genes validated by gene-expression and sequence conservation but not by the tiling experiment (15% of the non-repeat predicted gene set), followed by 845 genes (9%) validated only by expression array, and 487 genes (5%) validated by expression and tiling but not sequence conservation.