For sequences contain longer than 300 bp, annotation rate was 39.0%, while for sequences longer than 1 kb, the proportion increased to 67.6% (Table 2). Additionally, sequences without annotations may represent poorly conserved regions (e.g., un-translated regions (UTRs)) in P. yessoensis. Table 2 Functional annotation of the P. yessoensis transcriptome. Secondly, Gene Ontology (GO) [28] analysis was carried out, which provides a dynamic, controlled vocabulary and hierarchical relationships for the representation of information on molecular function, cellular component and biological process, allowing a coherent annotation of gene products. Of 21,414 annotated sequences in Swiss-Prot, 15,530 (72.5%) were assigned with one or more GO terms. In total, 81,121 GO assignments were finally obtained, with 37.
1% for biological processes, 32.4% for molecular functions, and 30.4% for cellular components. For biological processes, genes involved in cellular process (GO: 0009987) and metabolic process (GO: 0008152) were highly represented. For molecular functions, binding (GO: 0005488) were the most represented GO term, followed by catalytic activity (GO: 0003824). Regarding cellular component, the most represented categories were cells (GO: 0005623) and organelles (GO: 0043226) (Fig. 2). Figure 2 Functional annotation of assembled sequences based on gene ontology (GO) categorization. Besides GO analysis, KEGG [29] pathway mapping based on enzyme commission (EC) numbers for assignments was also carried out for the assembled sequences, which is an alternative approach to categorize genes functions with the emphasis on biochemical pathways.
EC numbers were assigned to 4,846 unique sequences, which were involved in 244 different pathways. Summary of the sequences involved in these pathways was included in Table S2. Of these 4,846 sequences with KEGG annotation, 45.5% were classified into the genetic information processing (GIP), with most of them involved in replication and repair, folding, sorting and degradation, transcription, and translation. Sequences classified into the metabolism accounted for 42.8% of the KEGG annotated sequences. The well-represented metabolic pathways were enzyme families, carbohydrate metabolism, amino acid metabolism, and energy metabolism. Cellular processes were represented by 18.3% of the KEGG annotated sequences.
The cell motility, cell growth and death, immune system, and endocrine system were well represented. Additionally, 15.2% of the sequences were classified into environmental information processing (EIP) including signal transduction, signaling and interaction molecules, and membrane transport. Functional genes involved in Cilengitide growth, reproduction, stress and immunity For many aquaculture animals like scallop, economic traits like growth and reproduction are of particular interest to the researchers.