Files in ITAG2.3_release

ITAG2.3_assembly.gff3
GFF version 3 file containing the location of SL2.40 contigs and scaffolds on the pseudomolecules.
ITAG2.3_cdna_alignments.gff3
GFF version 3 file containing alignments of existing EST and cDNA sequences to the genome.
ITAG2.3_cdna.fasta
FASTA-format file of the cDNA sequence produced by each gene model.
ITAG2.3_cds.fasta
FASTA-format sequence file of the CDS sequence produced by each gene model.
ITAG2.3_changed_genes.txt
List of genes that have had structural changes since the last release (ITAG2).
ITAG2.3_deleted_genes.txt
List of genes that were in the ITAG2 release, but have been deleted for this release.
ITAG2.3_dropped_features.gff3
GFF version 3 file listing features from the ITAG2 that could not be remapped from the SL2.31 assembly to the SL2.40 assembly.
ITAG2.3_protein_functional.gff3
GFF version 3 file containing functional annotations to protein sequences.
ITAG2.3_gene_list.txt
Text file containing a list of (versioned) gene identifiers present in this release.
ITAG2.3_de_novo_gene_finders.gff3
GFF version 3 file containing predictions from several de novo gene finders. These are intermediate data, used by EuGene to decide the final consensus gene models.
ITAG2.3_genomic.fasta
fasta-format sequence file of genomic pseudomolecule sequences from the SL2.40 assembly.
ITAG2.3_genomic_reagents.gff3
GFF version 3 file containing alignments to other genomic sequences from tomato: genomic clones, other genome builds, etc.
ITAG2.3_infernal.gff3
GFF version 3 file containing Infernal RNA results, see http://infernal.janelia.org/
ITAG2.3_metadata.ini
Machine-readable INI-format file of metadata about each of the files in this release.
ITAG2.3_gene_models.gff3
GFF version 3 file containing gene models in this release, along with associated GO terms and text functional descriptions.
ITAG2.3_new_genes.txt
List of genes that are new in this release, and were not in the ITAG2 release.
ITAG2.3_other_genomes.gff3
GFF version 3 file containing alignments to genomic sequences from other organisms.
ITAG2.3_proteins.fasta
FASTA-format sequence file of the protein sequence produced by each gene model.
ITAG2.3_README.txt
Overview of the release, with some statistics.
ITAG2.3_protein_reference.gff3
GFF version 3 file containing reference features for each of the protein sequences in this release.
ITAG2.3_repeats_aggressive.gff3
Repetitive elements, identified by RepeatMasker without the '-nolow' option set, meaning that low-complexity regions are masked. Repeat dataset used is available at ftp://ftp.sgn.cornell.edu/genomes/Solanum_lycopersicum/repeats/mipsREdat_8.8_eudico_TEs.masked.gz.
ITAG2.3_repeats.gff3
Repetitive elements, identified by RepeatMasker with the '-nolow' option set, meaning that low-complexity regions are not masked. Repeat dataset used is available at ftp://ftp.sgn.cornell.edu/genomes/Solanum_lycopersicum/repeats/mipsREdat_8.8_solanaceae_TE.masked.gz.
ITAG2.3_sgn_data.gff3
GFF version 3 file containing alignments to sequences related to data on SGN. Currently contains alignments to SGN unigenes, SGN marker sequences, and SGN locus sequences.

By downloading pre-publication tomato genome sequence or ITAG annotation files, you indicate your acceptance of the following data access agreement.

Data access agreement 

PLEASE READ BEFORE ACCESSING THE PRE-PUBLICATION TOMATO GENOME SEQUENCE OR ANNOTATIONS: The International Tomato Genome Sequencing Consortium is pleased to make available a pre-publication draft assembly of the tomato genome for use by public and private research communities as a resource to enable plant biology discovery and improve the human condition through improved agriculture. This assembly was produced by the Dutch/French assembly team and includes both 454 data and Sanger sequence data (BAC-ends, fosmid-ends and Selected BAC Mixture sequences).

We caution you that the current assembly is a "work-in-progress" and as such is subject to modification prior to publication release (anticipated for mid-2011), some of which is likely to be substantial. Therefore we encourage you to carefully and independently validate any conclusions you may draw from this sequence. We will update this resource as improvements in the assembly are made. We welcome any feedback regarding your successes or that may assist us in improving the quality and accuracy of this sequence.

This pre-publication tomato genome data is made available with the understanding that users will respect the rights of those who contributed to this effort to describe the tomato genome in a peer-reviewed publication. This description includes whole genome level analyses on genes, gene families, repetitive sequences etc. We encourage you to review the NIH-NHGRI guidelines on distribution and use of pre-publication genome sequence at http://www.genome.gov/page.cfm?pageID=10506537. Any use of the tomato genome data prior to its publication should credit "The International Tomato Genome Sequencing Consortium". If you are uncertain about how to credit the use of the sequence or its appropriate use please do not hesitate to contact Joyce Van Eck.

To download ITAG pre-publication annotation data, please provide your contact information.

Something wrong? Report a problem