ATTENDANCE Christine Nicholson (UK) Joyce Van Eck (US) Roeland van Ham (NL) Karen McLaren (UK) Mario Caccamo (UK) Jim Giovannoni (US) Doil Choi (KR) Rene Klein-Lankhorst (NL) Byoungchorl Kang (KR) Robert Buels (US) Antonio Granell (ES) Paramjit Khurana (IN) Mark van Haaren (NL) Giorgio Valle (IT) Mara Ercolano (IT) Giovanni Guiliano (IT) Zhangjun Fei (US) Glenn Bryan (UK) NOTES Called to order at 10:05 ====== 1st presentation - Jim Giovannoni ======= US TOMATO SEQUENCING UPDATE US sequencing project has not done much sequencing yet, have been concentrating on support of other projects. In addition to Rod Wing's HindIII library from arizona, we've generated 2 more libraries, the EcoRI and MboI. We've sequenced 17 BACs so far, they're all on SGN. We've gotten enough support from NSF to keep going, and we have a proposal in to get the full funding to sequence the 3 chromosomes. NEW: we've finished the fosmid library, and we've been picking them. Already 150,000 clones plated. Don't have money yet to sequence them. In the budget for the proposal we put in, we're proposing to end-sequence 400,000 of those clones. We picked that number in talking with Agencourt about how much is typically done for a library of that size. Future plans: -array and end-sequence the fosmid library -do the full sequencing of US portion (about 550 BACs) -remember, Steve Stack can do 10 FISH's for each project Please don't hesitate to send out lab reminders about =========== 2nd presentation, Doil Choi ========== CHROMOSOME 2 UPDATE based on our new calculations, we think the euch. of chromosome 2 is about 22 MB, not 26. We did a lot of FISH. BAC extensions: about 67% of seed bacs can be extended with a BAC end blast. some of the extensions were checked with dual FISH also, we did fiber-fish for overlapping extended bacs. we know the telomeric region and also the order of the hetero and euchro regions in chr 2. one of our bacs overlaps with the hetero. region of chr2. based on our 89 finished BACs. we've finished about 100 bacs, about 10.3 mb, so we think our chromosome is about 46% finished. right now, we've stopped because we haven't been able to extend any more bacs. we tried hunting for new seed bacs, blasting bac ends against tomato repeat DB. we have a 2 MB gap around the north of chr 2. we're trying to get a seed bac in there. first, we blasted against bac ends, designed primers from 64 bac end sequences, 2 of them showed no product, 5 were different-sized, 2 were m82-only, and 55 of them were OK. don't forget, september 9-13, SOL meeting in Korea at Jeju Island. =========== presentation 3, Christine Nicholson ======== CHROMOSOME 4 UPDATE i'll talk about our map progress, strategies we're using for additional coverage, our sequencing progress, and finishing recap of our fingerprinting of the MboI library. don't forget that it's there. available on SGN and via FTP from Sanger. (add sanger FTP link here) in situations were we can't extend, we're walking out with hybridization, grid the libraries in-house continuing with chromosome 4 verification techniques - colony PCR to check overlaps with other bacs - ... additional resources that could be very useful (wish-list) - more FISH - size gaps using FISH - more mapped/anchored markers - more overgos on SGN against the MboI library - end-sequenced sheared fosmid library will be very useful currently we have 6.5 MB of sequence, 6 MB unique we expect to have about 10MB in the next month we have over 2 MB of finished sequence, 19 BACs all will be finished to HTGS3 we have an in-house QC checking team our clones have a 2KB overlap between adjacent clones, minimizing redundancy and it's sufficient for final assembly we have an extensive software suite, we use GAP4 to edit the clones we have some points for the discussion, we'll come back to these later on. =========== 4th presentation, Paramjit Khurana ======= CHROMOSOME 5 UPDATE we have about 12MB to be sequenced, about 111 BACs structure of our group, 3 institutes are involved: U. Delhi South Campus, Nat. Research Centre for Plan Biotech, Natl Centre for Plant Genome Research 5 or 6 of our BACs are phase 3 we have already found some genes on our sequenced BAC clones summary: we are using the introgression lines to map BAC clones to chr 5. we've mapped 44 clones, 6 clones have been submitted to NCBI, 5 have been sent out for FISH =========== 5th presentation, Roeland van Ham =========== CHROMOSOME 6 UPDATE short arm is 2.7MB, long arm is 17.7 MB (euchromatin) we've got 64 phase 1/2 BACs, 5 phase 3 BACs, so we have about 6.9 we've done a 454 sequencing pilot of 8 potato BACs, it looks quite promising. about 2.5x more contigs than traditional Sanger shotgun we'll purchase a new 454 machine early this year with a 100nt read length, and we'll use it later this year we have 2 novel candidate seed bacs in the fish pipeline we've done a lot of FISH. using pooled FISH, we've been able to figure out where some of our gaps are we would certainly benefit from more seed bacs summary, reasonably good distribution of seed bacs, no oceans, just a couple of 'seas', doing FISH to find some seeds for these gaps ============= presentation 6, Antonio Granell ========== CHROMOSOME 9 update estimated euchro. 16MB, estimated 164 BACs to sequence we got 15 seed bacs verified on chromosome 9. steve stack did some FISH for us, and confirmed then on chromosome 9. we have a big gap in the long arm of chr9. distribution of our seed bacs is not that even. so far, we have completed 21 BACs, 12 in progress. we are also having trouble getting extension BACs. we really need more seed bacs. for this, we'll take advantage of some additional overgo screening being done at cornell, and also mapping on zamir IL lines. =============== presentation 7, Mara Ercolano =========== CHROMOSOME 12 UPDATE IL mapping: from our experience doing this, there can be ambiguities involving the penellii sequence, but we only found this twice in our experiences. goal: mapping 200 new seed BACs from the sets of confirmed-euchromatic BACs we have total 20 seed bacs, 32 extension bacs. 7 BACs have been submitted to SGN, 45 more BACs are in our sequencing pipeline. a bioinformatics platform for supporting annotation at the univ. of naples. ================= DISCUSSION ========== Joyce: let's have about 10 minutes of discussion JG: I have a couple of questions, 1 is for Doil Choi. Is your BAC on the telomere spanning the border or what? DC: It's close to the telomere, but we haven't yet proven it experimentally. JG: I ask this because this is a common issue for all of us, telling when we're getting to a border. DC: This BAC contains telomeric sequences. CN: Is the sequence available? DC: It's in SGN. CN: Do you have the name of it? DC: It's C02HBa257H21. DC: Our project has avg. 6.8 kb per gene, only 58% with the ests GG: That's a very low percentage of predicted genes with EST coverage. Perhaps that means our coverage of the transcriptome is probably very low. JG: Another question I have, you had that one BAC that appeared in its sequences to be on Chr2, but then it was mapping onto chromosome 4. ============ presentation 8, Rene Klein-Lankhorst ======= Eric from Syngenta is not here yet, so I will improvise and tell a little bit about some help that Syngenta is offering to the EU-SOL project. Budget for entire project =~ $34.5 million, 25.9 million euros Work package 5.4, tomato genome sequencing - budget - 2million euros, - de-bottleneck european genome projects work package 6.1, tomato genome bioinformatics - budget: 1.3 million euros work package 6.3, integrated solanaceae data resource and analysis - 1.2 million dollars - tools for analysis - link eu-sol database to international solanaceae databases debottlenecking operations - italy - fish validation - outsourcing of sequencing up to phase III - spain - identification of novel seed bacs - FISH validation of seed bacs - outsourcing some of the sequencing - netherlands - closure of ~200 BACs Hans de Jong will provide FISH services to all European countries (only 3-400 dollars per BAC) BIG ANNOUNCEMENT Syngenta markers will now be available for use by the entire project ===== Transfer floor to Mark van Haaren ======== Keygene is working with Syngenta, De Ruiter, etc. to make a map with a lot of AFLP markers. We got a question from Rene about whether we have markers for some of his gaps, and we do. We discussed it with our shareholders and they have agreed to free up these markers for use to get new seed bacs. We will make these markers available for chromosome 6 gap filling first. So the workflow is: - identify gaps in your chromosome - ask keygene if they have any markers - they will look, and release the markers that you need ========== Rene ======== Rene asked Syngenta the same question, and they seem to have anchored about 1/4 of the tomato genome. They're built 736 contigs from the HindIII library. We'll get access to these mapped contigs on the tomato genome. So basically, Syngenta is giving us 1000 novel seed BACs. About 260 MB. Syngenta has its own 19000-marker database, so they used their marker sequences to search the BAC end sequence data, then went to the FPC data to anchor the BACs. Workflow: - make a request to Rene for the anchoring position of a BAC, he will forward to Syngenta, then get back to you JG: This will make it much easier to find seed BACs. =============== Jim Giovannoni ========== TOOLKIT FOR BAC EXTENSIONS The question in light of this announcement is, do we still want to pursue some of these ways of generating more seed BACs. 1. mapping of random low-copy bacs to create a pool of new seed bacs for all 12 chromosomes - low-copy bacs identified by Tabata et. al. - bac end sequences mapped to lyc X pen IL lines 2. sequencing 2 million+ reads derived from low-copy BACs defined by Tabata et al 3. identification of low-copy cosmids (US) to be end-sequenced by Japan 4. BAC overgo screen of mboI library 5. industry-anchored BACs RKL: I think be should continue with the BAC overgo experiment. CN: How will the 2 million-reads from low-copy BACs be released JG: At least in GenBank, most likely through SGN as well JG: ==== Goals of the intl tomato genome sequencing project. The NSF hit us pretty hard on the question of 'how do you know you're done with the project?'. I'll go over some of our answers, which seem to have made them pretty happy. We were asked for some assurance of how good our estimates are. There have been a lot of cytological measurements, estimates are that about 23% of the genome is euchromatic. Estimate of about 212 MB. Through bioinformatics method (Zhangfun Fei), estimate 239 MB. We estimate about 85% of tomato genes will be recovered through the tomato genome sequencing project. Our completion criteria for the entire project is tied to the precedent set by the rice project. Ours is complete when it is comparable to what rice has. GG: We should start thinking about the best strategy for getting a whole-genome sequence. JG: We'd have to have a physical map somehow. That's the foundation. CN: Keep in mind that there will probably be heterochromatic island in the euchromatin. Perhaps we should define hetero. in terms of gene density. DC: We have to decide what we'll do for hunting for new BACs. GG: Should we have a unified protocol or 2 or 3 different ways? DC: We should make a standard protocol and distribute it to all the countries. We are more interested in selecting the BACs in chr 2. GG: I suggest we take another couple of weeks to validate the protocols and then decide what's best. We are aiming at mapping both ends of BAC, to avoid hybrid BACs. RKL: Do we have any indication of percentage of chimeric BACs? What percentage they occur in, etc? ======== presentation 9, Nevin Young ========= I am the coordinator of the medicago project. This presentation shows an update of the progress of the medicago project. medicago is a legume, closely related to alfalfa. It was chosen for sequencing because it does nitrogen fixation, and it has very little heterochromatin. Sequencing strategy was bac-by-bac of entire euchromatin. We originally estimated about 200MB of euch, now know it's more like 300MB. It's not a crop, it's only a model organism. Genome currently is 2000 bacs, 200MB, 350 contigs, 275 scaffolds. 407 gaps remaining (with 60% of genome complete). Of those 400 gaps, about 1/3 don't have an obvious next step to bridge them. ======== presentation 11, Giovanni Giuliano ======== TOMATO MICROARRAY PLATFORMS Arrays: Tom1, Tom2, Affy-public, Affy-Syngenta, Custom (nimblegen (ohio state), agilent, combimatrix) If you do an experiment with the Affy-Syngenta microarray, you get access to only _your_ genes that are being regulated. Take-home message: nothing here covers the entire tomato transcriptome that everybody's doing. There is a need for a comprehensive one. -- We're dealing with about 46K unigenes in the public databases, covering about 28MB. Projected 35-38K tomato genes, and projected transcriptome is about 100-125MB. We're still missing a lot of transcribed regions. Not that many genes missing, but lots of transcribed regions are. According to Doil, more than 40% of your gene models had no corresponding EST alignments. There are a number of efforts under way to get more of the transcriptome (see slides). Affy proposes: 2.6M probes (and mismatch probes), full transcript, custom design, with a minimum order of 540 microarray chips. I want a publicly available array, that covers all available unigenes, and we want upgradability at no cost as genome sequence progresses. Bonus features that I would like would be miRNA probes, antisense probes, probes for validation of gene models, and SNP mapping. How can we do it? (see 'How to do it' slide for diagram) ========== presentation 12, Matthew Lorence ========== Explains types of available affy array designs (see slides) Proposed new design is more than 35K transcripts, which is about 3x more than the existing chips Reasonable price: no design fee, but a large minimum order, first 500 200euros/array, next 500, 150/array, remaining 125/array. ============ presentation 13, Christian Bachem ========= POTATO GENOME PROJECT UPDATE I work for Richard Visser (WUR). the PGSC contains 14 labs, sequence complete potato genome by 2010. in addition to the benefits accruiing from the sequencing, we are also interested in building up genomics capacity. We're building a network of research labs that can go on to keep working on this data. Strategy: BAC-by-BAC, chrom by chrom diploid genotype being used (RH89-039-16) we're generating a physical map based on a 10k marker aflp map basing our genetic map on 250 offspring, 384 aflp primer combinations used for mapping resulting map is subdivided into bins (chromosomal segments) where we can't differentiate between the markers, and that represents a single crossover event (0.8CM). published in Genetics. physical map: 76K bacs fingerprinted, 7000 contigs in the fingerprint map, about 40 anchored contigs per chromosome netherlands has sequenced ~100 bacs, other partners are starting up, physical map browser is available to all partners, BAC registry and genome browser is being set up soon. we've been doing FISH, it's corresponding well with our physical mapping data tomato is much further along in the process than potato, but we may have some solutions to some of the problems you're having. we should meet together and increasingly integrate the two projects. -- questions for the moment, we are aiming at finishing BACs to phase 2, for both heterochromatic and euchromatic ones. in terms of the aflp anchoring, there is no major difference between the quality of contigs, it doesn't give abnormal distributions JG: are there plans to go beyond the single library you guys have now? CB: yes, I'm worried about the fact that you tomato people have so many more libraries, but it would mean putting it through all the fingerprinting and mapping process. we're considering some kind of smaller library in the later stages (fosmid, etc). What do you think Jim? JG: Well, Rod Wing is very adamant about needing multiple libraries, I guess it depends on how many gaps you're willing to tolerate. Just with positional cloning, we've run into situations where certain places are not covered by any BAC, even with 3 libraries. I'd say just use multiple libraries with different enzymes, and then a sheared fosmid library. Agencourt's routine strategy is 2 full bac libraries and a deep fosmid library. No matter what, there will be some regions you won't be able to get. It's sort of a cost/benefit analysis, how many gaps will you fill with another lib, versus how much will it cost to make it. I was about to ask you whether you'd figured out some way to do it with just one library! CB: No, we'll have to cross this bridge when we come to it. DC: What does the heterochromatin look like in potato? CB: I don't know really, it looks just like tomato. RVH: genome sizes are equal, it must be the same CB: we've also done some cross-hybridization with tomato plants, and that appears to work very well, and that might help us a bit with the hetero/eu boundaries. We're also trying to get a handle on the genetic versus physical distances. RKL: Are analogous chromosomes numbered the same in tomato and potato? CB: Yes. In the end, it's hard to tell some of them apart, so we'll use BACs to tell some of them apart, and their analogues in tomato. --- JVE: any more discussion topics? KM: Yes, we have some concerns about the guidelines document. Many of them are practical things about how we approach finishing issues, and how we annotate sequences publicly to say what's going on with the finishing. KM: We should have a workshop on finishing in Korea. JG: Is there a sense that people are all doing things differently, or is there just need for somebody to take the lead, and everybody will be happy to follow? RB: If you have some recommendations before then, we can probably just integrate it into the guidelines document. JVE: Karen, you could just email me some suggestions and I'll forward them to the SGN people.