ITAG1 Tomato Genome Release Contents: 1. Introduction 2. Files in this release 3. Links and other resources 4. Release statistics == 1. Introduction == The International Tomato Annotation Group (ITAG) is pleased to announce the ITAG1 release of the official Tomato genome annotation (ITAG1), covering approximately 85% of the genome, with 33,926 gene models. This release file set was generated on March 10, 2010. In this release, 29,320 ( 86.4%) of the gene models are supported by homology to either existing ESTs or cDNA sequences, with 16,763 ( 49.4%) supported by both. All of the gene models are annotated with best-guess text descriptions of their function, and 24,721 ( 72.9%) have associated Gene Ontology terms describing their function. See section 4 for more statistics describing this release. Please send comments or questions about these annotations to: itag@sgn.cornell.edu == 2. Files in this release == ITAG1_README.txt Overview of the release, with some statistics. ITAG1_cdna.fasta fasta-format sequence file of cDNA sequences. ITAG1_cdna_alignments.gff3 GFF version 3 file containing alignments of existing EST and cDNA sequences to the genome. ITAG1_cds.fasta fasta-format sequence file of CDS sequences. ITAG1_de_novo_gene_finders.gff3 GFF version 3 file containing predictions from several de novo gene finders. These were integrated into the final gene models by EuGene. ITAG1_gene_models.gff3 GFF version 3 file containing gene models in this release. ITAG1_infernal.gff3 GFF version 3 file containing INFERNAL transcript features in this release. ITAG1_genomic.conf GBrowse configuration file used at SGN to display annotations on genomic sequences from this release ITAG1_genomic.fasta fasta-format sequence file of genomic contig sequences. ITAG1_genomic_all.gff3 GFF version 3 file containing all genomic annotations in this release. ITAG1_protein.conf GBrowse configuration file used at SGN to display annotations on protein sequences from this release ITAG1_protein_functional.gff3 GFF version 3 file containing functional annotations to protein sequences. ITAG1_proteins.fasta fasta-format sequence file of protein sequences. ITAG1_sgn_data.gff3 GFF version 3 file containing alignments to sequences related to data on SGN. Currently contains alignments to SGN unigenes, SGN marker sequences, and SGN locus sequences. == 3. Links and other resources == Sequences and annotations can also be viewed and searched on SGN: http://solgenomics.net/gbrowse/ The fully annotated chromosome sequences in GFF version 3 format, along with Fasta files of cDNA, CDS, genomic and protein sequences, and lists of genes are available from the SGN ftp site at: ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/ For those who are not familiar with the GFF3 file format, the format specification can be found here: http://www.sequenceontology.org/gff3.shtml A graphical display of the Tomato sequence and annotation can be viewed using SGN's genome browser. Browse the chromosomes, search for names or short sequences and view search hits on the whole genome, in a close-up view or on a nucleotide level: http://solgenomics.net/gbrowse/ SGN's BLAST services have also been updated with this dataset, available at: http://solgenomics.net/tools/blast/ ITAG is committed to the continual improvement of the Tomato genome annotation and actively encourages the community to contact us with new data, corrections and suggestions. Announcements of new releases, updates of data, tools, and other developments from ITAG can be found on SGN: http://solgenomics.net/ == 4. Release statistics == 4.1 Proportion of Genome Annotated Estimated genome size: 930 Mbp Size of annotated assembly: 793 Mbp Est. proportion of genome: 85% 4.2 Structural Annotation Gene model count: 33,926 Exon count: 160,924 Intron count: 126,998 Gene model length (bp) --------------------------------------------------- Min 150 Max 53,768 Range 53,618 Mean 3,381.7 StdDev 3,473.7 Median 2,358 Frequency Distribution: Bin Frequency 6,852 29,939 13,554 3,299 20,257 535 26,959 103 33,661 34 40,364 10 47,066 5 53,768 1 Intergenic distance (bp) --------------------------------------------------- Min 3 Max 2,976,229 Range 2,976,226 Mean 39,231.9 StdDev 100,418.6 Median 10,489 Frequency Distribution: Bin Frequency 372,031 31,721 744,060 453 1,116,088 104 1,488,116 23 1,860,144 5 2,976,229 1 Exons per gene model --------------------------------------------------- Min 1 Max 69 Range 68 Mean 4.7 StdDev 4.5 Median 3 Frequency Distribution: Bin Frequency 10 29,682 18 3,654 26 483 35 77 44 24 52 3 60 2 69 1 Exon length (bp) --------------------------------------------------- Min 2 Max 10,472 Range 10,470 Mean 290.9 StdDev 421.5 Median 154 Frequency Distribution: Bin Frequency 1,311 155,987 2,620 4,104 3,928 619 5,237 137 6,546 49 7,854 16 9,163 8 10,472 4 Intron length (bp) --------------------------------------------------- Min 42 Max 20,706 Range 20,664 Mean 536.7 StdDev 864.8 Median 217 Frequency Distribution: Bin Frequency 2,625 123,860 5,208 2,420 7,791 458 10,374 159 12,957 68 15,540 29 18,123 1 20,706 3 4.3 Functional Annotation Gene models with GO terms: 24,721 ( 72.9%) Unique GO terms associated: 5,702 Genes with splice variants: 0 Gene models with functional description text: 33,926 (100.0%) Gene Ontology terms associated, per gene model --------------------------------------------------- Min 0 Max 52 Range 52 Mean 4.3 StdDev 4.6 Median 3 Frequency Distribution: Bin Frequency 6 25,071 13 7,326 20 1,229 26 236 32 49 39 13 52 2 4.4 Gene model supporting evidence ESTs/cDNAs aligned to the genome: 215,243 Gene models with cDNA OR protein support: 29,320 ( 86.4%) Gene models with cDNA homology support: 18,304 ( 54.0%) Gene models without cDNA homology support: 15,622 ( 46.0%) Gene models with protein homology support: 27,779 ( 81.9%) Gene models without protein homology support: 6,147 ( 18.1%) Gene models with both cDNA and protein support: 16,763 ( 49.4%) Gene models with only cDNA homology support: 1,541 ( 4.5%) Gene models with only protein homology support: 11,016 ( 32.5%) Gene models with no homology support: 4,606 ( 13.6%)