There are 3 different datasets of proteins for an unigene build. + Proteins predicted by longest6frame (a SGN script that translate the sequence in the 6 ORF and get the longest) + Proteins predicted by estscan (http://www.ch.embnet.org/software/ESTScan2.html) + Proteins preferred (for each unigene, compare both methods and get the longest protein) For each dataset, exists two files (cds and protein). Also is provided a version of the preferred protein dataset with annotations compatibles with ProteinPilot program. ---------------------------------------- Report: ---------------------------------------- + Unigene Build: Tomato species #2 Files: * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_estscan.v2.fasta - number of sequences: 39967 - total bases: 26098701 - average sequences length: 653 - maximum sequence length: 5235 - minimum sequence length: 51 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_estscan.v2.fasta - number of sequences: 39967 - total aminoacids: 8699567 - average sequences length: 217 - maximum sequence length: 1745 - minimum sequence length: 17 * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_longest6frame.v2.fasta - number of sequences: 43366 - total bases: 24435660 - average sequences length: 563 - maximum sequence length: 4395 - minimum sequence length: 51 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_longest6frame.v2.fasta - number of sequences: 43366 - total aminoacids: 8133858 - average sequences length: 187 - maximum sequence length: 1465 - minimum sequence length: 17 * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_preferred.v2.fasta - number of sequences: 42257 - total bases: 27618501 - average sequences length: 653 - maximum sequence length: 5235 - minimum sequence length: 51 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_preferred.v2.fasta - number of sequences: 42257 - total aminoacids: 9200584 - average sequences length: 217 - maximum sequence length: 1745 - minimum sequence length: 17 + Unigene Build: Tomato species #1 Files: * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_estscan.v1.fasta - number of sequences: 34008 - total bases: 24292557 - average sequences length: 714 - maximum sequence length: 5226 - minimum sequence length: 51 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_estscan.v1.fasta - number of sequences: 34008 - total aminoacids: 8064426 - average sequences length: 237 - maximum sequence length: 1741 - minimum sequence length: 9 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_longest6frame.v1.fasta - number of sequences: 35426 - total aminoacids: 6893059 - average sequences length: 194 - maximum sequence length: 1522 - minimum sequence length: 21 * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_preferred.v1.fasta - number of sequences: 20519 - total bases: 16100889 - average sequences length: 784 - maximum sequence length: 5226 - minimum sequence length: 111 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_preferred.v1.fasta - number of sequences: 34829 - total aminoacids: 8366356 - average sequences length: 240 - maximum sequence length: 1741 - minimum sequence length: 21 ----------------------------------------