1/11/2009 10am - 3:30pm, Town and Country Resort, Le Sommet Room, San Diego, CA, USA ATTENDANCE ---------------- Ying Wang Jim Giovannoni Carl Braun (Monsanto) Giorgio Valle Doil Choi Byung-Cheorl Kang Cheol-Gu Hur Eric Ganko ( Syngenta ) Todd Vision Mondher Bouzayen Giovanni Giuliano Zhangjun Fei Roeland Van Hamm Hongbin Zhou (Heinz) Dora Szinay (afternoon session) Hans de Jong (afternoon session) PRESENTATIONS ---------------- ==== TODD VISION ====== Talk about Mimulus guttatus sequencing status update, initial analyses of mimulus sequence. Mimulus is in the lamiid (euasterid I) clade. Mimulus is self-compatible. High recombination rate. Small genome. Rich in SNPs. Type locality for Mimulus: Iron Mountain, OR Mimulus genome projects: - NSF Frontiers in Biological Research (2003) - physical and genetic maps, markers, transformation protocols - DOE JGI - community sequences and WGS Physical maps - two 11X libraries for guttatus, one 10X library for lewisii 500K ESTs from guttatus, 30K each from 3 other species RepeatMasker masks about 60% of current mimulus assembly ======== ROELAND VAN HAMM ============ Chr. 6 update, next-gen sequencing update have 1 large supercontig of 23 BACs, 9 singletons they have quite a few large gaps that they haven't been able to bridge. project is out of money, and still would need 160 more BACs if stuck to same strategy, because chr6 has turned out to be much bigger than expected. Tomato project overall progress has been slow, many other chromosomes are having trouble. Next gen initiative - will make a 10X genome analyzer- generated AFLP map - a 30X 454 Titanium genome sequence set, combineation of shotgun and paired-end runs - 30X SOLiD read set, paired ends - 3 Million Sanger reads from Kazusa BAC pool - will attempt assembly of all this data together, then anchor all of the contigs to the new physical map using the new AFLP sequence tags AFLP physical map - made with existing BAC libraries, plus a new 4X random-sheared BAC set - faster - cheaper - 25bp tags on average every 250bp, which should be sufficient to anchor all the contigs 3 types of libraries being produced for 454: - shotgun (wag. and padua) - 3kb and 20kb paired-end (Roche) - will get a 30X coverage using these Proof of this concept: - this technique was tested on arabidopsis and drosophila - we are using more coverage than the tested runs Assembly - will use existing 66Mb to benchmark the procedure (same thing done on Vitis genome) ALL DATA WILL BE RELEASED TO THE SOL SEQUENCING CONSORTIUM Estimated timeline - Oct 08 - Apr 09 : production of SOLID and 454 data - production of physical map: jan - aug 2009 - assembly and physical map anchoring : may - september 2009 - release of data fo SOL sequencing consortium - sep 2009 QUESTIONS Doil Choi: i am still not clear on exactly what our strategy is for publishing our final results Giovannoni: i can see how we could possibly have the path of one publication all together with the draft assembly, then chromosome-specific followup papers, or alternatively we could have one big final paper. Ying Wang: will we be treating the 454 data the same as the BAC data for the purpose of making chromosome assemblies? consensus: data will be released as soon as it is validated and organized sufficiently for confident release. official annotation on the next-gen contigs will come from ITAG, but other sites have their own annotation pipelines, which are also completely free to annotate the next-gen contigs. ============ STEVE STACK ============ Update on FISH in Tomato Sequencing Only about 23% of the tomato DNA is in the euchromatin. FISH service to other member countries from the Stack lab has been suspended for a while now, but it looks like the money will soon be restarted to resume providing that service. 234 total BACs FISHed by Stack lab to date. 70 failed. 164 successfully localized. 24 FISHed to wrong chromosomes. Of these, 11 were checked by sequencing. 7 were due to overgo false positives. 1 was due to a picking error. 1 was due to a typographical error. 2 were due to mapping errors. Again, funding will be coming through soon for doing this as a free service to the sequencing community. ============= DORA SZINAY ============ Update on FISH in Tomato Sequencing Update on Cross-species FISH work There seem to be two main repeat types in chromosome 7. Cross-species FISH: - can reveal rearrangements between species Found a novel nested inversion between tomato and potato on chromosome 6. In coming year, they will investigate wild tomato species. ============= JOYCE VAN ECK ============ Chromosomes 1 and 10 update Funding will be coming through for the rest of the BACs, probably in March - 600 more BACs. Money will be spread over 3 years. In the past, 23 BACs sequenced as trial with 3 different companies. Since then, more BACs completed with 454 with Bruce Roe. 53 more BACs are anchored and ready to sequence. On SGN, we're developing a Breeder's Toolbox to make SGN more accessible for use by breeders. ============ DOIL CHOI ================ Chromosome 2 update Since Cologne meeting, have tried several strategies for filling the gaps in Chr2. To date, about 190 BAcs seqeunced. 183 in phase 3. 11 have been sequenced with 454. 10 BACs have not been successfully localized to chromosomes. Have been trying to find fosmids to fill gaps. Found some candidate fosmids. Used some more marker sets. Found some more candidate BACs. BLASTed contig ends against the SBM (selected BAC mixture) sequences set. Got some more candidate BACs. So, we've made some progress at finding more candidate extension BACs. Plan: - Might also use synteny regions from grape to try to find more candidate BACs. - Have used fosmid ands and SBM to find further BACs. After sequencing this round, we're going to stop and wait for the results of the next-gen sequencing. ================= YING WANG ========== Chromosome 3 Have 75-76 BACs in phase 3. Have identified 285 BACs to be IL mapped. ================ GERARD BISHOP =========== Update on Chromosome 4 sequencing estimate is still 19Mb of euchromatin 119 BACs definitely on chr4. 57 BACs are under confirmation. Still 60 markers on chr4 for which still no BACs have been identified. ================ MONDHER BOUZAYEN ======== Update on Chromosome 7 progress Generated 3d dna pools and macroarray filters from BAC and fosmid libraries Also, macroarray filters for ecri and half of fosmid library Both of these tools are available for use by the community. Did an analysis of the sequence produced by other projects, for some of them nearly half of the sequenced BACs appear to be heterochromatic. Plan to do 454 titanium sequencing with Long-PET starting this Feb. Also will continue sequencing BACs with 454. After doing the WGS we will sequence more BACs and Fos to fill the gaps on chr 7. ================= GIOVANNI GIULIANO ======= Chromosome 12 progress update Have been working for some time on IL mapping. In the beginning, we did it by sequencing. Then we tried doing CAPS, which is faster, but has a much lower success rate. Now, we are doing a PCR-based method, designing primers on the pennelli sequence. Works about 80% of the time. Then, we confirm with sequencing. Currently 80 BACs in various stages, which is around 70% of the projected euchromatin. Almost all of our seed bacs have been FISH mapped by Dora. We still have some gaps on both the short arm and the long arm that we need to fill. consensus: ========== DISCUSSION ========= G.Giuliano: I think there's a danger now that there's the next gen project that people might be tempted to start dropping the BAC-by-BAC sequencing. It's unfortunate that the US project has only now gotten funded for BAC-by-BAC, when the UK and other projects are starting to come to an end. HDJ: Now we're starting to get to the biology phase. GG: We need to think of ways to get contiguous sequences to a high quality and publishing all of them together, along with the whole-genome shotgun, all published at the same time. That would be my preference. RVH: maybe we should see the BAC-by-BAC as a second phase that will come after the WGS. Because all the gap closing will take a long time. I would prefer publishing the WGS results, and then go for the high-quality. D.Choi: I don't think we can wait until all the chromosomes are ready. Some of the countries are still basically at the initial phase. GG: My experience is that there is a lag phase, and then things suddenly start to catch up. M.Bouzayen: now that we have more tools available, I think it will take much less time for all of the lagging project to catch up. I think RVH is probably right, I favor publishing as soon as we have a good draft sequence, so the consortium can be seen to produce something sooner. The nice sequence will come afterward BAC by BAC. Once we finish this first phase, (and Jim G. has already suggested this), we can consider some groups helping other groups to get it done BAC-by-BAC to high quality. RVH: But the next-gen can also speed up the BAC sequencing. After the draft sequence, we can make a large effort to select as many BACs as possible and sequence them in large batches, and finish all the chrs at the same time. MB: This makes it even more important that the next-gen is available to all as soon as possible. Ying Wang: for assembling the next-gen, you still have to use all the current available BAC-by-BAC data. I agree that we should publish the draft sequence and then have chromome-specific papers later. GG: So that means there should be no more chromosome-specific papers until the next-gen is done, and we need to get those who are not present to agree to this also. GG: Different issue, does it really make sense for us to continue the approach of carefully physically mapping each BAC before sequencing. Would we go faster if we sent things to sequencing much earlier? GG: So the general agreement is that we should do one paper with a draft sequence and chromosome data, and then follow with chromosome-specific finishing papers?