There are 3 different datasets of proteins for an unigene build. + Proteins predicted by longest6frame (a SGN script that translate the sequence in the 6 ORF and get the longest) + Proteins predicted by estscan (http://www.ch.embnet.org/software/ESTScan2.html) + Proteins preferred (for each unigene, compare both methods and get the longest protein) For each dataset, exists two files (cds and protein). Also is provided a version of the preferred protein dataset with annotations compatibles with ProteinPilot program. ---------------------------------------- Report: ---------------------------------------- Files: ---------------------------------------- Version 3: ---------------------------------------- * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_estscan.v3.fasta - number of sequences: 15185 - total bases: 8624949 - average sequences length: 567 - maximum sequence length: 2790 - minimum sequence length: 51 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_estscan.v3.fasta - number of sequences: 15185 - total aminoacids: 2874983 - average sequences length: 189 - maximum sequence length: 930 - minimum sequence length: 17 * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_longest6frame.v3.fasta - number of sequences: 16913 - total bases: 7785630 - average sequences length: 460 - maximum sequence length: 2388 - minimum sequence length: 36 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_longest6frame.v3.fasta - number of sequences: 16913 - total aminoacids: 2589487 - average sequences length: 153 - maximum sequence length: 796 - minimum sequence length: 11 * cds sequences: - cds fasta file: tomato_species_cds_predicted_by_preferred.v3.fasta - number of sequences: 16046 - total bases: 9029244 - average sequences length: 562 - maximum sequence length: 2790 - minimum sequence length: 36 * protein sequences: - protein fasta file: tomato_species_protein_predicted_by_preferred.v3.fasta - number of sequences: 16046 - total aminoacids: 3007795 - average sequences length: 187 - maximum sequence length: 930 - minimum sequence length: 11 ---------------------------------------- Version 2: ---------------------------------------- * cds sequences: - cds fasta file: Coffea_canephora_cds_predicted_by_estscan.v2.fasta - number of sequences: 12126 - total bases: 7029897 - average sequences length: 579 - maximum sequence length: 2775 - minimum sequence length: 51 * protein sequences: - protein fasta file: Coffea_canephora_protein_predicted_by_estscan.v2.fasta - number of sequences: 12126 - total aminoacids: 2343299 - average sequences length: 193 - maximum sequence length: 925 - minimum sequence length: 17 * cds sequences: - cds fasta file: Coffea_canephora_cds_predicted_by_longest6frame.v2.fasta - number of sequences: 16105 - total bases: 7807119 - average sequences length: 484 - maximum sequence length: 2541 - minimum sequence length: 51 * protein sequences: - protein fasta file: Coffea_canephora_protein_predicted_by_longest6frame.v2.fasta - number of sequences: 16105 - total aminoacids: 2597012 - average sequences length: 161 - maximum sequence length: 847 - minimum sequence length: 17 * cds sequences: - cds fasta file: Coffea_canephora_cds_predicted_by_preferred.v2.fasta - number of sequences: 15721 - total bases: 8118714 - average sequences length: 516 - maximum sequence length: 2775 - minimum sequence length: 54 * protein sequences: - protein fasta file: Coffea_canephora_protein_predicted_by_preferred.v2.fasta - number of sequences: 15721 - total aminoacids: 2703941 - average sequences length: 171 - maximum sequence length: 925 - minimum sequence length: 17 ---------------------------------------- Version 1: ---------------------------------------- * cds sequences: - cds fasta file: Coffea_canephora_cds_predicted_by_estscan.v1.fasta - number of sequences: 12531 - total bases: 6968484 - average sequences length: 556 - maximum sequence length: 2628 - minimum sequence length: 51 * protein sequences: - protein fasta file: Coffea_canephora_protein_predicted_by_estscan.v1.fasta - number of sequences: 12531 - total aminoacids: 2311318 - average sequences length: 184 - maximum sequence length: 875 - minimum sequence length: 15 * cds sequences: - cds fasta file: Coffea_canephora_cds_predicted_by_preferred.fasta - number of sequences: 12507 - total bases: 6958710 - average sequences length: 556 - maximum sequence length: 2628 - minimum sequence length: 51 * protein sequences: - protein fasta file: Coffea_canephora_protein_predicted_by_preferred.fasta - number of sequences: 12507 - total aminoacids: 2308044 - average sequences length: 184 - maximum sequence length: 875 - minimum sequence length: 15 NOTE: There are not any cds/protein sequences for longest6frame method. ----------------------------------------