####. This directory contains fasta files for a few different genomes with added annotations about the secretion predictions by the programs SignalP and TargetpP. For details of the prediction programs TargetP and SignalP, see: Locating proteins in the cell using TargetP, SignalP, and related tools Olof Emanuelsson, Søren Brunak, Gunnar von Heijne, Henrik Nielsen Nature Protocols 2, 953-971 (2007). Here is an example of a sequence-id line containing these secretion-related annotations: >LOC_Os01g02160.1|13101.m00154|protein hydroxyproline-rich glycoprotein family protein, putative, expressed signalPnn:0.935,YES,21 signalPhmm:1.000,0.000,SP,21 targetP:0.022,0.024,0.942,0.021,S,1 Interpretation of these annotations: signalPnn:0.935,YES,21 these are results from the neural network part of signalp; Dscore = 0.935, YES a signal peptide is predicted (this is just based on the Dscore, YES iff Dscore >= 0.43) 21 is the predicted length of the signal peptide. signalPhmm:1.000,0.000,SP,21 These are results from the hidden markov model part of signalp; 1.000 is p_sp, the "probability" associated with the prediction that it is a signal peptide (SP). 0.000 is p_sa, the "probability" associated with the prediction that it is a signal anchor (SA). SP is the prediction. Look at p_sp, p_sa, and p_ns = 1 - (p_sp + p_sa); predict SP, SA, or NS (not secreted) according to which of these three is greatest. 21 is the predicted length of the SP or SA. targetP:0.022,0.024,0.942,0.021,S,1 The first 4 numbers are scores for the 4 possible targetting predictions, respectively C (chloroplast), M (mitochondia), S (secreted), and _ (other). S is the prediction, just based on which of the 4 scores is greatest. 1 is a score in indicating the strength of the prediction, with 1 being strongest, and 5 weakest. It is based on the difference between the two largest of the 4 targetting scores; i.e. a difference of 0.8-1.0 -> 1, 0.6-0.8 -> 2, etc. The fasta files included are: Oryza sativa: Based on ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.0/all.dir/all.pep ; 67393 sequences. Rice.pep.1.sptp.fasta First gene models only, 56591 genes. Rice.pep.1.sptp.YES.fasta A subset of the above; the 8071 genes with signal peptide predicted by SignalP. Rice.pep.1.sptp.YES.S.fasta A subset of the above; the 5607 genes with signal peptide predicted by both SignalP, TargetP. Arabidopsis thaliana: Tair8.pep.1.sptp.fasta First gene models only; 27201 genes. Tair8.pep.1.sptp.YES.fasta A subset of the above; 5194 genes with signal peptide predicted by SignalP. Tair8.pep.1.sptp.YES.S.fasta A subset of the above; 4555 genes with signal peptide predicted by both SignalP, TargetP. Tair9.pep.sptp.fasta 33410 gene models. Tair9.pep.1.sptp.fasta First gene models only, 27343 genes. Tair9.pep.1.sptp.YES.fasta Subset of the above; 5226 genes with signal peptide predicted by SignalP. Tair9.pep.1.sptp.YES.S.fasta Subset of the above; 4555 genes with signal peptide predicted by both SignalP, TargetP. Brachypodium distachyon: Brachy.pep.1.sptp.fasta First gene models only, 25532 genes. Brachy.pep.1.sptp.YES.fasta Subset of above; 4923 genes with signal peptide predicted by SignalP. Brachy.pep.1.sptp.YES.S.fasta Subset of above; 3689 genes with signal peptide predicted by both SignalP, TargetP. Solanum Lycopersicum ITAG1.pep.sptp.fasta 33926 gene models. ITAG1.pep.sptp.YES.fasta Subset of above; 5756 genes with signal peptide predicted by SignalP. ITAG1.pep.sptp.YES.S.fasta Subset of above; 5051 genes with signal peptide predicted by both SignalP, TargetP.