IN ATTENDANCE =============== Glenn Bryan (UK) Lukas Mueller (US) Robert Buels (US) Remi Martin (US,DE) Joyce Van Eck (US) Jim Giovannoni (US) Hans de Jong (NL) Dora Szinay (NL) Dan Milbourne (Ireland) Giovanni Giuliano (IT) Mingsheng Chen (CN) Clare Riddle (UK) Jeanne Jacobs (NZ) Nagendra Singh (IN, NRCPB) Alessandro Vezzi (IT) Sung-Hwan Jo (KR) Doil Choi (KR) Mark Fiers (NL) Rene Klein-Lankhorst (NL, EU-SOL) Mondher Bouzaye Sanwen Huang (CN) Klaus Mayer MINUTES ============ Meeting called to order at 9:01 am, 1/13/08 Opening introduction given by Joyce Van Eck CHR 10 UPDATE - Joyce Van Eck =============================== 19 BACs have been sequenced to date 8 are in progress We (US group) are sequencing chromosomes 1 and 10 Fosmid End Sequencing ----------------------- 1152 done by the US 50,000 are being done with EU-SOL funding 50,000 are being done by the Italian group We will end up with about 300,000 sequenced clones, giving approx. 12X coverage Fish Localization ------------------- 19 BACs have been localized with FISH Latest Sequencing News -------------------------- Seq will be done by Roe grp at U. Oklahoma Plan to sequence 560 BACs Funded by NSF Plant Sequencing Status of Sequencing Publication ---------------------------------- Lukas: It's important to publish this interim paper to put us a bit more in the awareness of the wider community and the funding agencies. In a week or two I will send out a draft for comments and additions. Jim Giovannoni: Adding to that, I've talked to Harry Klee (sp?), he is interested in seeing it published in the Plant Journal, as a full paper. Current Outreach Activities ------------------------------- (Joyce Van Eck speaking) - We've been developing some classroom programs for outreach - The Sol Family goes to School : educate students about plant families and the modern agricultural supply chain - Bioinformatics Summer Internship : SGN hires undergraduates and high-schoolers, has been a very successful program - Potato micro-tuber formation : High-school level biology lesson - Participate in high school teacher workshops Future Outreach Directions ------------------------------ Tomato Breeders Toolbox - on SGN Interactive genome sequencing puzzle Rene Klein-Lankhorst: Note, there's been a change in the fosmid end sequencing, it was not actually funded by EU-SOL. Jim Giovannoni: Thank you all for depositing your BAC sequences in SGN, our progress looks much better from the outside now. I want to reiterate how vital it is to keep depositing them, it's the best way to help us (the US project) secure our NSF funding. CHR 2 SEQUENCING Doil Choi, KRIBB ================================ 143 BACs have been submitted to Genbank, about 15.5 MB in all, 14 singletons and 17 bac contigs. We have found some recombination hot spots, and possibly a recombination cold spot. We also found an instance of either inter-chromosomal duplication or a chimeric BAC. LE_HBa0155D20 and LE_HBa0203P08, and LE_HBa0320D08. 155D20 has been FISHed to chromosome 2 definitely. (much discussion ensues) CHR 3 UPDATE Mingsheng Chen =============================== Our funding for 200 BACs has now come through, we have 14 BACs finished in the database now. 47 seed BACs have been localized with FISH on chr 3 With FPC, we have 29 contigs covering 10 Mb of euchromatic sequence. CHR 4 UPDATE Clare Riddle =============================== In December, we added 107,681 tomato reads on the trace archive at trace.ensembl.org 44 chromosome 4 FPC contigs 91 clones completed so far Approx 8.7Mb of finished, assembled sequence so far. Given the most recent FISH results, there are 3 contigs, totalling 1.5Mb, that were determined to be on different chromosomes. One of them is on chromosome 11. Hans de Jong: with the new fosmid libraries, and with the BACs that people have at the ends of their regions, would it be possible to reach into the telomeres with them? CR: Yes, it should be possible. Lukas: Do you think you'll be able to finish the chromosome this year? CR: Probably not getting it down to just 2 contigs, no. HdJ: The genome is starting to look like it's definitely more complex than Arabidopsis's. CHR 5 UPDATE Nagendra Singh - NRCPB ============================== Similarly to the other projects, we have also learned how important it is to make absolutely sure that the BAC belongs to your chromosome. Have 13 seed BACs. Fingerprinting, PCR for markers, PFGE size estimation, IL mapping, and FISH mapping. Extension BAC validation by: - PCR with 3 pairs of primers from overlapping region - Purity check with HindIII fingerprinting of 8 clones - End sequencing for verification of seed BAC - PFGE size determination - IL-mapping We found that nearly 50% of the clones that were supposed to be there according to FPC are actually there. We have 13 extension BACs for our seed BACs, and some of them are starting to form good contigs. 5 new seed BACs generated by blasting BAC ends to markers. CHR 6 UPDATE Rene Klein-Lankhorst ============================= 155 BACs sequenced, 64 seeds plus 91 extensions 140 BACs in 12 major contigs Short arm is completely covered, with 3 small gaps Long arm has 5 major gaps left 16 Mb of sequenced DNA, 12Mb of unique assembled DNA Total euch. size is approx 20Mb, so we're about 60% done. 76 BACs available on SGN, 68 in phase 1/2, 8 in phase 3 3 novel seed BAcs were found with a targeted screening of MboI bac library with overgo markers, work done by Joyce Van Eck 21 AFLP markers plus 10 SNP markers selected for the different gaps. 16 BACs have been retrieved with these markers Gap closure of chr6 bacs ------------------------ - most of the last 6 months of effort has been in closing gaps in the unfinished BACs - out of 800 gaps in the BAcs, 400 have been closed in the past 6 months CHR 7 UPDATE Mondher Bouzayen ============================= Estimated 27 Mb euchromatin, estimated 250-270 BACs needed to cover this. To select seed BACs, we used overgo assays, FPC contigging from China, in silico marker searches, SYNGENTA markers, 3D-DNA pools of BAC libraries CNRGV in Toulouse generated these 3D pools, with 1/2 of HindIII library and the whole MboI library. These are available to the community, order them at http://cnrgv.toulouse.inra.fr Our BAC selection strategy has been mostly based on the Zamir ILs, but they are being validated also with FISH (Wageningen - de Jong, US - Stack, China - Chen) Delineation of euch/heteroch regions of Chr 7 is being done at Wageningen. 97 BACs are in contigs on chromosome 7, 25 BACs are singletons. One nice contig has 12 members and spans 30 cM. We will also make 3-D pools available for the fosmid clones. CHR 11 UPDATE Sanwen Huang ============================= Working on both tomato and potato BACs. So far, 15 potato and 10 tomato BACs have been sequenced. Doing comparative sequencing of tomato and potato makes it easier to find extension BACs. Also doing comparative analyses of homologous BACs. Most of the euchromatin is very conserved, but the heterochromatin is not so conserved. CHR 12 UPDATE Alessandro Vezzi ============================== Funded by Italian Govt (FIRB and Agronanotech projects) and EU-SOL. Two sequencing pipelines are active, one at UPadua, one at a private company. Keeping sequencing synchronization information info in an SQL database with Perl web scripts. Currently 67 BACs at various stages. 19 BACs in HTGS3. Developed extension BAC selection software, PABS-Select, that integrates with the SGN BLAST interface. PABS is available at http://tomato.cribi.unipd.it BAC FISH and repeat bar-coding technology for tomato and potato Hans de Jong ============================= There have been some recent advances in FISH technology - cell spreading methods - high-res FISH pachytene morphology - digital image processing improvements - chromosome straightening - direct 5-color FISH We have FISHed a total of 300 BACs for our European partners. Majority of tomato genome is repetitive, 1/4 of it highly repetitive, 1/2 is medium-repetitive, 1/4 is low/single-copy repetitive. Seven chromatin classes are observed in tomato. Cot blocking is ineffective when the repetitive content of the BAC is not too high. For the Tomato project, we want 3 anchor points on each chromosome: the telomeres, the centromeres, and the euchromatin. FISH with BAC LE_HBa0057J04 paints the centromeres of tomato chromosomes very nicely, plus some spots in other regions. Nowadays, heterochromatin is better understood as an epigenetic state of euchromatin. The inversions we see between the genetic and physical maps are simply a consequence of the genetic map algorithms, nothing to worry much about, but it means that we have to be careful in using the genetic map to localize BACs. Through cross-species FISH, we've found some evidence for 3 chromosome 6 inversions between tomato and potato. A large 6S inversion, a smaller 6S inversion nested within that, and than a smaller heterochromatic inversion near the centromere. There may be significant repeat polymorphisms between different varieties of tomato, be careful! There are lots of indications of gene duplications in tomato. BAC selection can also be biased by the abundant repeats! TOWARD AN INTEGRATED PHYSICAL MAP Mingsheng Cheng ========================== strategy: - reassemble FPC data for HindIII and MboI, able to pull together more contigs - got 4360 total contigs - manual editing of build, contig merging by bac end searching and marker information, got the contigs down to 4156 - 837 markers (781 framework markers) - Overgo Markers - 3D Pooling - data results available publicly - Fingerprint Simulated Digest (FSD) - in-silico digest, compare position, check integrity of the physical map Was able to get 187 Mb worrth of contigs WebFPC installation available: http://tomato.genetics.ac.cn ROLE OF FISH IN SEQUENCING THE TOMATO GENOME Steven Stack =============================================== 10 countries in the project Tomato genome is intermediate in size between maize and arabidopsis. 77% of the DNA is heterochromatin 23% is euchromatin HindIII library has 15X coverage, average 117 kb per clone Chromosomal in site Suppression (CISS) Hybridization: along with the labeled probe, include a 50-100-fold excess of masking Cot100 DNA CISS is very important, these experiments don't really work without it Functions of FISH in Sequencing: 1. determine locations of anchor bacs 2. define euchromatin-heterochromatin borders 3. determinate distances in Mb 4. locate problem BACs The euchromatin/heterochromatin border is not a distinct, exact thing. Looking at the sequence itself, it's just a continuous decline in gene density approaching zero. So far, have FISHed 177 BACs, got successful localizations for 126 of those 51 failed to localize because they either gave no signal or there was more thann one signal in spite of CISS hybridization. 17.5% of the BACs were FISHed to the wrong chromosome Of these, 11 have been checked by sequencing - 7 were overgo false positives - 1 was due to a picking error - 1 was due to a typographical error - 2 were due to mapping errors The mapping errors suggest that about 3% of mislocated BACs on chromosomes are actually due to mistakes in the EXPEN-2000 map. But, this isn't a random sample, it was troublesome BACs only, so the actual proportion of mismapped BACs is actually probably lower. So you CAN have quite a lot of faith in the EXPEN-2000 map. Future Direction: Increased emphasis on defining the sizes of gaps in sequencing - within BACs - between contigs - between contigs and euchromatin-heterochromatin borders DISCUSSION ================================= Giovanni Giuliano goes over the current SOL sequence finishing standards. Mondher Bouzayen: remember, we are not strictly limited to the euchromatin, our objective is to sequence as many genes as possible. RKL: from the funding agencies' perspective, though, we have said that we will sequence the euchromatin Jim Giovannoni: we need to have a uniform definition of when we stop sequencing MB: I think in one or two months we will be finishing this part, and you can include it in the publication. Steven Stack: it's something that each project will need to decide as they're sequencing. If the genes start decreasing and repeats start increasing, then they can decide to stop. Lukas: I have an interesting slide from Daniel Buchan that shows gene and repeat content along chromosome 4. As you see, the two arms are very different, the short arm has nearly a stepwise transition into heterochromatin, while the long arm has a much more gradual transition into it. It's going to vary between chromosomes and even between arms of the same chromosomes. MB: It seems there's no problem in discriminating between euchromatin and heterochromatin using repeat content, but in terms of gene content, it's not so clear. JG: Right now we should concentrate on doing what we've promised. We should try to make things as simple as possible for our funding agencies. MB: We're discovering a lot of things using BAC-by-BAC sequencing that we wouldn't see if we were doing whole-genome shotgun, I think this is an important point to stress to funding agencies. Outline of Publication Manuscript Lukas Mueller --------------------------------- Introduction Project description, scope, size of echromatin, etc Common sequencing standards Methods F2-2000 map BAC library construction BAC end sequencing and fingerprint construction Full bac sequencing FPC IL mapping Repeat identification ITAG annotation pipeline REsults Library stats, includiing anchoring and fpc data IL mapping examples (Italy, India) FISH -> border bacs discussion full bac status annotation, repeat content, genome structure (ITAG) identification of transcription factor (Akhilesh) tomato genome tools Discussion I will send out this draft copy tonight. ITAG Overview =================== ITAG uses a distributed pipeline system, with analyses being run by many different centers. The ITAG results will probably be available by the end of the month. ===== GET SLIDES FROM _ Joyce Van Eck _ Doil Choi _ Mingsheng Chen X 2 _ Clare Riddle _ Rene Klein-Lankhorst _ Alessandro Vezzi _ Hans de Jong _ Steven Stack _ Lukas publication outline