There are 3 different datasets of proteins for an unigene build. + Proteins predicted by longest6frame (a SGN script that translate the sequence in the 6 ORF and get the longest) + Proteins predicted by estscan (http://www.ch.embnet.org/software/ESTScan2.html) + Proteins preferred (for each unigene, compare both methods and get the longest protein) For each dataset, exists two files (cds and protein). Also is provided a version of the preferred protein dataset with annotations compatibles with ProteinPilot program. ---------------------------------------- Report: ---------------------------------------- Files: ---------------------------------------- Version 1: ---------------------------------------- * cds sequences: - cds fasta file: Nicotiana_benthamiana_cds_predicted_by_estscan.v1.fasta - number of sequences: 15097 - total bases: 7680720 - average sequences length: 508 - maximum sequence length: 7221 - minimum sequence length: 51 * protein sequences: - protein fasta file: Nicotiana_benthamiana_protein_predicted_by_estscan.v1.fasta - number of sequences: 15097 - total aminoacids: 2560240 - average sequences length: 169 - maximum sequence length: 2407 - minimum sequence length: 17 * cds sequences: - cds fasta file: Nicotiana_benthamiana_cds_predicted_by_longest6frame.v1.fasta - number of sequences: 16585 - total bases: 7053909 - average sequences length: 425 - maximum sequence length: 7227 - minimum sequence length: 57 * protein sequences: - protein fasta file: Nicotiana_benthamiana_protein_predicted_by_longest6frame.v1.fasta - number of sequences: 16585 - total aminoacids: 2347146 - average sequences length: 141 - maximum sequence length: 2409 - minimum sequence length: 19 * cds sequences: - cds fasta file: Nicotiana_benthamiana_cds_predicted_by_preferred.v1.fasta - number of sequences: 16024 - total bases: 8160357 - average sequences length: 509 - maximum sequence length: 7227 - minimum sequence length: 60 * protein sequences: - protein fasta file: Nicotiana_benthamiana_protein_predicted_by_preferred.v1.fasta - number of sequences: 16024 - total aminoacids: 2718145 - average sequences length: 169 - maximum sequence length: 2409 - minimum sequence length: 19 ----------------------------------------