This directory contains coding sequence (cds) and protein predictions from the SGN unigene builds. Each build has its own sub-directory with the corresponding species name and contains two files in fasta format. One contains the predicted cds sequences and the other contains protein sequence. Species specific matrices that have been developed are given as a third file where applicable. Methods used: We used ESTScan version 3.0 to to process sgn unigene build 3. The model for the ESTScan was built based on 483 EMBL tomato (lycopersicon esculentum) nuclear coding sequences using the software provided with ESTScan. After we built the model, we evaluated the model in terms of its ability to determine the right frame, identify the coding region and fix the frame shift error. The test sequences came from published tomato cDNA sequences with known start and stop, published tomato cDNA sequences with deliberately introduced frame shift errors and tomato ESTs with coding regions identified through alignment to arabidopsis homolog. It turned out that ESTScan, with the tomato model we built, was able to find the frame with high accuracy. Most of the coding regions identified by ESTScan were correct. However, it might miss 3 to 5 amino acids at both ends. Especially, when the non coding region was short (less than 30 nucleotide), ESTScan might not be able to distinguish it from the coding region. ESTScan fixed the frame shift errors in most of the test sequences, not guaranteed at exactly the insert/deletion position but usually within 10 nuceotides up or down stream of the shift error. Chenwei Lin (Cornell University) developed the tomato model for ESTScan and ran the ESTScan analyses. Please contact sgn-feedback@sgn.cornell.edu if you have any questions or comments.