
             How to load overgo plates with Robert's code 

                          by Beth Skwarecki


CAUTION
=======

    I expect to rewrite many of these scripts. This document explains
how the code works as of the date it was checked in to CVS, which is
to say, after Robert wrote the scripts but before I messed with them
in any substantial way. Changes are only to PERLLIB paths and some
database configuration kinds of things. 


IF YOU ARE IMPATIENT
====================

   There is a cheat sheet / quick ref at the bottom of this file.


WHAT TO DO
==========

   Loading a plate has two parts: 
      0. Building the physical database
      1. Loading the definition file for each new plate
      2. Loading the overgo results for each new plate

   Building the database is out of the scope of this document. The
required scripts are rumored to live in
/data/shared/physical_mapping/Build . The README in that directory may
be of interest to you, if you're interested in that sort of thing. 

   A plate definition file is a tab-separated grid indicating the
names and placement of markers on the given plate. The number of the
plate is indicated in the filename (not in the contents of the
file). The file itself has no row or column headings. A plate
definition can be loaded at any time before loading its overgo
results. Typically a definition file is loaded as soon as the plate is
designed, and the results are loaded as soon as they are
available. Plate definitions are browsable on the website.

   The results file is a spreadsheet (converted to a tab-separated
format) containing information about hits between pools and bacs. For
more details on what this means, read (or ask somebody) about the
overgo process. After this data is loaded, several other scripts must
be run to "deconvolute" the bac/probe matches, and then to detect
"plausibility" of bacs and bac contigs. (more on what that means,
later). 



HOW TO DO IT
============

Your working environment
------------------------

   The tools necessary to load plates are (should be) in
sgn-tools/physical/ . Make sure that the executables are in your $PATH
and the perl modules are in your $PERLLIB (recall that you'll want
to type 'make' in your sgn-tools checkout directory).

   Your working directory should be laid out something like this:

         beth@siren:/home/beth/work/plate_loading% ls -R
         .:
	 plate_12
	 plate_13
	 plate_14

	 ./CSV_type2:
	 overgo_plate12_result.csv  
	 overgo_plate13_result.csv  
	 overgo_plate14_result.csv

For the remainder of this tutorial we will assume you are sitting in
the directory as above. Plate #12 will be used as an example.


Files
-----

   plate_12 and friends are the plate definition files. They begin
life as a .xls emailed to you by a biologist. Prepare them by: 

    - exporting the .xls to tab-delimited (gnumeric can do this)
    - removing any row or column headers
    - naming the file as above: plate_XX, where XX is the plate
      number. 

   overgo_plate12_result.csv and friends are the plate result
files. They, too, begin life as a .xls emailed to you by a
biologist. Prepare them by:

    - exporting the .xls to tab-delimited (gnumeric can do this)
    - KEEPING row and column headers
    - naming the file as above, and making sure it is in the CSV_type2
      directory (this directory does not need to be in your working
      directory). 


Running the scripts
-------------------

    Load a plate definition file. This adds the plate to overgo_plates
in the physical database and its markers to probe_markers (if it
already exists, you will be asked whether to overwrite the old
definition).

    % load_physical_db.pl -addplate=plate_12


    Load the plate results. Note that the directory to be supplied as
an argument is the *parent* of the CSV_type2 dir (in other words, if
the files live in /home/beth/CSV_type2, you will want to pass
/home/beth as your argument). 

    WARNING: This will load ALL of the result files in that directory. 

    % load_plate_results.pl <dir>


    Run the deconvolution algorithm. This sorts out which bacs matched
which marker (stored in overgo_associations) and identifies
conflicting results (stored in tentative_overgo_assocations and
tentative_overgo_association_conflict_groups).

   At the time of writing, only one overgo version has ever been used
(that would be version 1). The logfile is optional; if none is
specified, the script will log to deconvolution_err in the current
directory.

    % deconvolution_algorithm.pl <version> [<logfile>]


    Identify plausible BACs. A bac with hits to multiple probes is
"implausible" if those probes correspond to markers that are a certain
centimorgan distance apart on the map. At the time of this writing,
that distance is 5 cM. 

    % detect_plausible_bac_clusters.pl <version>


    Identify plausible BAC contigs. A contig is an assembly of bacs;
The contigs are constructed by software called FPC (which we at SGN
don't run ourselves; we are given the result files). A contig is
considered "implausible" if the adjacent markers on two adjacent bacs
are more than a certain centimorgan distance apart. (in other words,
if the "last" marker on one bac is too far away from the "first"
marker on the next bac). Currently this value is 5cM. 

    There are several ways to indicate the version to be used. In
keeping with our earlier examples, you can indicate that you want
overgo version #1 with "-o 1". 

   % detect_plausible_bac_contigs.pl <version> 


   Now, your plates should be correctly loaded. Check this by viewing
the plate on the website. The page in question is something like:
http://localhost/cgi-bin/maps/physical/list_overgo_plate_probes.pl?plate_no=12



CHEAT SHEET / QUICK REF
=======================


# plate definition file: tab-delimited, no headers, filename plate_12

load_physical_db.pl -addplate=plate_12

# results file: tab-delimited, filename
# CSV_type2/overgo_plate12_result.csv

load_plate_results.pl <dir> # dir that is the parent of CSV_type2

deconvolution_algorithm.pl <version> [<logfile>] # usually version 1

detect_plausible_bac_clusters.pl <version> 

detect_plausible_bac_contigs.pl -o <version> 