Conserved Ortholog Set (COS) markers

We have used a computational method to screen the tomato EST database against the arabidopsis genome sequence in order to find a set of highly conserved, single copy genes which can be used as markers for comparative mapping between the tomato and arabidopsis genomes. Currently we have identified approximately 1000 of these Conserved Ortholog Set (COS) markers.

In this section of SGN, you will find sequence and clone information for each of these COS markers as well as their matching counterpart in the arabidopsis genome. We have surveyed these COS markers and mapped some of these COS markers on the tomato genome to provide a tomato: arabidopsis comparative map. This mapping information is available on SGN now.

Because these COS markers are so highly conserved, they may also be useful for comparative mapping in other dicot species. The computational screen by how COS markers were identified is described below:

  • TBLASTX tomato ESTs against the arabidopsis genome (specifically, the BAC tiling path from TAIR)
  • Identify single best matches (< e -15) between a single tomato EST (or associated contig) and a single BAC in Arabidopsis. Each tomato EST must pass the following criteria:
    • it hits an arabidopsis sequence at a significance level of < e -15
    • it is determined to be NOT part of a domain (where several solanaceous sequences hit the same arabidopsis region)
    • the next best Arabidopsis hit must be of lower significance (delta e > 10)
  • ESTs that pass all these criteria are classified as conserved orthologs, all others are considered potentially paralogous and eliminated.

Functional annotation of COS markers

COS-markers were functionally annotated using the MIPS role categories as they were defined at the time of the analysis. As the functional role descriptions are subject to continued changes, a copy of the role categories and the COS-marker annotations as used in the analysis can be found below:

MIPS role categories at time of COS marker annotation

The list of COS markers is available in both online and plain text format

COS list online - online format with links to sequences and tiling path.
Old plain text download - plain text format for download.
Old plain text download 2 - plain text format for download.

Note: for the tomato map position in the table linked above:

  • Copy No is how many copies the COS marker shows on a southern blot survey- S(single) is 1-2 copy, L(low) is 3-4 copy, M(multiple) means more than 4 copy.
  • Map position is where the COS marker mapped on the tomato: arabidopsis systeny map. For example, T1675 map position is 01.005, meaning Chromosome 1, 5 cM down from the top. Most of the COS markers have only one map position, if the COS marker has a second or third map position, then there were two or three alleles that could be mapped.