There are 3 different datasets of proteins for an unigene build. + Proteins predicted by longest6frame (a SGN script that translate the sequence in the 6 ORF and get the longest) + Proteins predicted by estscan (http://www.ch.embnet.org/software/ESTScan2.html) + Proteins preferred (for each unigene, compare both methods and get the longest protein) For each dataset, exists two files (cds and protein). Also is provided a version of the preferred protein dataset with annotations compatibles with ProteinPilot program. ---------------------------------------- Report: ---------------------------------------- Files: ---------------------------------------- Version 2: ---------------------------------------- * cds sequences: - cds fasta file: Nicotiana_tabacum_cds_predicted_by_estscan.v2.fasta - number of sequences: 80117 - total bases: 40799580 - average sequences length: 509 - maximum sequence length: 6291 - minimum sequence length: 51 * protein sequences: - protein fasta file: Nicotiana_tabacum_protein_predicted_by_estscan.v2.fasta - number of sequences: 80117 - total aminoacids: 13599860 - average sequences length: 169 - maximum sequence length: 2097 - minimum sequence length: 17 * cds sequences: - cds fasta file: Nicotiana_tabacum_cds_predicted_by_longest6frame.v2.fasta - number of sequences: 87142 - total bases: 35692089 - average sequences length: 409 - maximum sequence length: 6087 - minimum sequence length: 39 * protein sequences: - protein fasta file: Nicotiana_tabacum_protein_predicted_by_longest6frame.v2.fasta - number of sequences: 87142 - total aminoacids: 11879953 - average sequences length: 136 - maximum sequence length: 2029 - minimum sequence length: 13 * cds sequences: - cds fasta file: Nicotiana_tabacum_cds_predicted_by_preferred.v2.fasta - number of sequences: 84602 - total bases: 43366017 - average sequences length: 512 - maximum sequence length: 6291 - minimum sequence length: 39 * protein sequences: - protein fasta file: Nicotiana_tabacum_protein_predicted_by_preferred.v2.fasta - number of sequences: 84602 - total aminoacids: 14448294 - average sequences length: 170 - maximum sequence length: 2097 - minimum sequence length: 13 ---------------------------------------- Version 1: ---------------------------------------- * cds sequences: - cds fasta file: Nicotiana_tabacum_cds_predicted_by_estscan.v1.fasta - number of sequences: 18899 - total bases: 11950869 - average sequences length: 632 - maximum sequence length: 6090 - minimum sequence length: 51 * protein sequences: - protein fasta file: Nicotiana_tabacum_protein_predicted_by_estscan.v1.fasta - number of sequences: 18899 - total aminoacids: 3983623 - average sequences length: 210 - maximum sequence length: 2030 - minimum sequence length: 17 * cds sequences: Note: This sequence dataset have not any cds longest6frame file * protein sequences: - protein fasta file: Nicotiana_tabacum_protein_predicted_by_longest6frame.v1.fasta - number of sequences: 26877 - total aminoacids: 4464603 - average sequences length: 166 - maximum sequence length: 1972 - minimum sequence length: 12 * cds sequences: - cds fasta file: Nicotiana_tabacum_cds_predicted_by_preferred.v1.fasta - number of sequences: 10541 - total bases: 7094820 - average sequences length: 673 - maximum sequence length: 6090 - minimum sequence length: 51 * protein sequences: - protein fasta file: Nicotiana_tabacum_protein_predicted_by_preferred.v1.fasta - number of sequences: 25398 - total aminoacids: 4596017 - average sequences length: 180 - maximum sequence length: 2030 - minimum sequence length: 12 ----------------------------------------