Release notes

Version 2.30 (2010-08-09)

Changes

  • Sequenced BACs were integrated in the scaffolds of assembly version 2.10.
    • Version 548 of tomato BAC sequences (downloaded from SGN) was used as the basis for BAC integration. Phase 1 BACs (23.2 Mb) were excluded. Phase 2 and phase 3 BACs were assembled into contigs and further processed for the integration, resulting in 117.5 Mb of sequence.
    • Out of the available 117.5 Mb of BAC sequence, 116.6 Mb was integrated. During the integration 4.5 Mb of Ns in the assembly were replaced by real DNA sequence.
    • For further information see the "BAC integration summary".
  • After the BAC integration the scaffolds were ordered and oriented on the 12 chromosomes using the two physical maps (Keygene WGP and Arizona SNaPshot), the Kazusa genetic map, and multiple FISH maps.

Technical details

  • The new scaffolds were placed and oriented on the 12 chromosomes by integration of the two physical maps (Keygene WGP and Arizona SNaPshot), the Kazusa genetic map, and multiple FISH maps.

    • Where possible, the scaffolds were placed and oriented on one of the 12 chromosomes. Corresponding MULTI-FASTA and AGP files were produced. Unoriented scaffolds have "0" as orientation in the AGP files and have the "+" orientation in the corresponding FASTA files. Scaffolds linked by the physical maps have 'yes' in the linkage column in the AGP files. Gaps between adjacent scaffolds on chromosomes are of type 'U' (undefined size) and size 100 (following the NCBI specifications).
    • Intra-scaffold gaps, linking two contigs, that were produced during clone-end scaffolding with Bambus, are of size 60 and type 'U'. The real size of these gaps are unknown.
    • All scaffolds that could not be placed on either of the 12 tomato chromosomes by either the genetic or physical maps were placed on an artificial "chromosome 0". The scaffolds on this chromosome are ordered from large to small and unoriented.
    • All sequences are in upper case.
    • The orientation of all contigs in the scaffolds is always "+" because the contigs are reconstructed from the scaffolds.
    • The orientation of all contigs in the chromosomes is either "+" or "-", never "0", as the contigs have an order and orientation in the scaffolds.

Assembly stats

  • Contigs (only the contigs that make up the scaffolds): 26,874 sequences, 737.7 Mb, 50% of assembly in 1,996 contigs of 87,129 bp or longer
  • Scaffolds: 3,232 sequences, 781.4 Mb, 50% of assembly in 17 scaffolds of 16.5 Mb or longer
  • Pseudomolecules: 12 chromosomes and "chromosome 0", 781.7 Mb. 91 scaffolds placed on chromosome 1 to 12, 53 of these are also oriented.

Version 2.10 (2010-06-25)

Changes

  • The scaffolds from assembly 1.50 were further linked together by the clone ends (BAC and fosmid)

  • The new scaffolds were placed and oriented on the 12 chromosomes by integration of the two physical maps (KeyGene WGP and Arizona SNaPshot), the Kazusa genetic map, and multiple FISH maps.

    • Where possible, the scaffolds were placed and oriented on one of the 12 chromosomes. Corresponding MULTI-FASTA and AGP files were produced. Unoriented scaffolds have "0" as orientation in the AGP files and have the "+" orientation in the corresponding FASTA files. Scaffolds linked by the physical maps have 'yes' in the linkage column in the AGP files. Gaps between adjacent scaffolds on chromosomes are of type 'U' (undefined size) and size 100 (following the NCBI specifications).
    • All scaffolds that could not be placed on either of the 12 tomato chromosomes by either the genetic or physical maps were placed on an artificial "chromosome 0". The scaffolds on this chromosome are ordered from large to small and unoriented.
    • All sequences are in upper case.
    • The orientation of all contigs in the scaffolds is always "+" because the contigs are reconstructed from the scaffolds.
    • The orientation of all contigs in the chromosomes is either "+" or "-", never "0", as the contigs have an order and orientation in the scaffolds.

Assembly stats

  • Contigs (only the contigs that make up the scaffolds): 29,736 sequences, 733.0 Mb, 50% of assembly in 2,754 contigs of 69,257 bp or longer
  • Scaffolds: 3,433 sequences, 781.3 Mb, 50% of assembly in 17 scaffolds of 16.5 Mb or longer
  • Pseudomolecules: 12 chromosomes and "chromosome 0", 781.6 Mb

Version 2.00 (2010-06-16)

Version 2.00 was retracted because of some technical errors in the data. Please do not use version 2.00 for any analysis.

Version 1.50 (2010-05-14)

Changes

  • The contigs from assembly 1.03 were polished using the SOLiD data and SOLEXA data. Polishing included sinlge-base error correction and indel correction (mostly homopolymer).
  • Contamination from E.coli and vector sequences was removed. Organellar sequences were separated (and are thus not included in this data set).
  • Several structural inconsistencies were solved.
  • Contigs from fully sequenced BACs were integrated.
  • Superscaffolds were built using clone-end information (BAC and fosmid ends).
  • Several gaps were filled using the alternative CABOG assembly (same input as Newbler 1.03 and polished by the illumina kmers)

Assembly stats

  • Contigs (only the contigs that make up the scaffolds): 29,736 sequences, 733.0 Mb, 50% of assembly in 2,754 contigs of 69,257 bp or longer
  • Scaffolds: 3,584 sequences, 781.2 Mb, 50% of assembly in 27 scaffolds of 7.8 Mb or longer

Version 1.03 (2010-01-22)

Changes

  • During the assembly we screened for E. coli sequences to prevent the E. coli contamination (from the SBM data) as found in version 1.0
  • Two new 454 runs (3kb and 20kb) were added to the assembly
  • This assembly was made with an updated version of the assembler

Assembler

  • This assembly was made with Newbler version 2.3-PostRelease-01/11/2010

Assembly input

  • A new filtered 454 data set was created because of the addition of 2 new 454 runs (3kb and 20kb)
    • 56 million reads, 20.8 Gb, approx. 22.0X coverage
  • SBM and clone-end input were the same as in version 1.00 (see below)

Assembly stats

  • Contigs: 110,872 sequences, 762.0 Mb, 50% of assembly in 3,641 contigs of 55,730 bp or longer
  • Scaffolds: 3,761 sequences, 781.7 Mb, 50% of assembly in 52 scaffolds of 4.4 Mb or longer

Version 1.00 (2009-11-27)

Assembler

  • This assembly was made with newbler version 2.3-PostRelease-11/19/2009

Assembly input

  • This assembly contains 454 sequences, Selected BAC clone Mixture (SBM) sequences, and BAC/fosmid end sequences.
    • 454 (filtered): 55 million reads, 20.5 Gb, approx. 21.6X coverage
    • SBM (filtered): 3.8 million reads, 3.1 Gb, approx. 3.3X coverage
    • BAC ends (filtered): 308,490 sequences (incl. 135,271 pairs), 180 Mb, approx. 0.18X coverage
    • Fosmid ends (filtered): 151,299 sequences (incl. 64,722 pairs), 83 Mb, approx. 0.087X coverage

Assembly stats

  • Contigs: 118,692 sequences, 762.5 Mb, 50% of assembly in 4,238 contigs of 47,298 bp or longer
  • Scaffolds: 7,409 sequences, 794.6 Mb, 50% of assembly in 50 scaffolds of 4.5 Mb or longer

Raw data