Coffee data on SGN


[Nov 7, 2005]. Cornell University and Nestle SA are releasing 47,000 coffee (Coffea canephora var robusta) EST sequences to the public on the Sol Genomics Network.


An EST database has been generated for coffee based on sequences from approximately 47,000 cDNA clones derived from five different stages/tissues, with a special focus on developing seeds. When computationally assembled, these sequences correspond to 13,175 unigenes, which were analyzed with respect to functional annotation, expression profile and evolution. Compared with Arabidopsis, the coffee unigenes encode a higher proportion of proteins related to protein modification/turnover and metabolism-an observation that may explain the high diversity of metabolites found in coffee and related species. Several gene families were found to be either expanded or unique to coffee when compared with Arabidopsis. A high proportion of these families encode proteins assigned to functions related to disease resistance. Such families may have expanded and evolved rapidly under the intense pathogen pressure experienced by a tropical, perennial species like coffee. Finally, the coffee gene repertoire was compared with that of Arabidopsis and Solanaceous species (e.g. tomato). Unlike Arabidopsis, tomato has a nearly perfect gene-for-gene match with coffee. These results are consistent with the facts that coffee and tomato have a similar genome size, chromosome karyotype (tomato, n=12; coffee n=11) and chromosome architecture. Moreover, both belong to the Asterid I clade of dicot plant families. Thus, the biology of coffee (family Rubiacaeae) and tomato (family Solanaceae) may be united into one common network of shared discoveries, resources and information.

The dataset is described in detail in the following publication:

Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts. Lin C, Mueller LA, Carthy JM, Crouzillat D, Petiard V, Tanksley SD. Theor Appl Genet. 2005 Nov 5;:1-17.

A pdf file of this article is available in Open Access format, free of charge to anyone.

Data access

The coffee ESTs, computationally derived unigenes, protein sequences, protein domains and Gene Ontology annotations are immediately downloadable from the SGN ftp server. The data will be added to the SGN database in the near future.