Insert into submit_user(17), organism(66), common_name(11), library
library:
insert with shortname
cccl	91
cccp	92
cccs18w	93
cccwc22w	94
cccs30w	95
cccs42w	96
cccs46w	97

Delete the changes in librrary and submit_user.  Do the insert into these two tables in sgn_submit
submit_user_id 21


In ace files, the clone names were followed by trimmed_seq_id;

The organization of chromatogram files in CGN is very different from SGN.  I re-organized the CGN chromatograms.  
Some plates were sequenced more than once.  The easiest way to keep track of these ESTs is to use CGN seq_id (or trimmed_seq_id).  However, it is difficult to find a way to kee these information in SGN, so that we can track them.
So I use clone_name_trimmed_seq_id as the clone_name for SGN.
Risk, the repeated sequencing results from the same clone are treated as different clones this way.

List of repeated plates
cccp14  10011314        10011510
cccs30w17       lin_10021132    lin_10021930
cccs30w19       lin_10021931    lin_10021133
cccs30w23       lin_10019622    lin_10019623    lin_10018898
cccs30w24       lin_10018899    lin_10019624    lin_10019626
cccs30w7        10013479        lin_10013875
cccs30w9        10015585        lin_10015585

update seqread set clone_name=replace(clone_name, '_', '-') where library_id = 93;

hqi from qc_report is used as $start in est.pl, substr $string, $start ...

CGN qulaity evaluation codes:
E coli contamination(4)
low complexity(7)
low overall quality (5)
too ambigous (3)
Insert too shot (2)

SGN est flags code (applicable to CGN data)
Insert too short (0x4)
High expected error (0x8)
Low complexity (0x10)
E coli or cloning host contamination (0x20)

SGN est status (applicable to CGN data)
Contaminants not assessed (0x20)
Chimera not assessed (0x40)

What to put in qc_report.hqi_start and hqi_length if the clone is E coli contamination, low complexity, low overall quality?
Enter the qulaity trim result

How to incorporate polyA trim from CGN into SGN?
the trimmed_sequence in CGN already took out the polyA! (good)

Load qc_report
Only hqi_start and hqi_length can be retrieved from cgn.  So, set qstart, istart and hqi_start all to hqi_start, set q_end and i_end according to hqi_length.
Many qc_status were NULL.  So leave it NULL for all CGN ESTs
I can't find out what vs_status means.  However, for most ESTs, this value is 2.  So set it to 2 for CGN ESTs
vector_tokens is read from tab-delimit file of trimming.  It is too hard to recover it form CGN.  Leave it NULL

cccwc22w, 11656 in the list while 11660 in the database
just some duplication
cccwc22w20b14,cccwc22w5d18, cccwc22w5d18, cccwc22w6p22  


cccs46w: 10904 in the list while 10907 in the database
cccs46w19m10, cccs46w24j23, cccs46w25l9, cccs46w30i8 missing in the list and cccs46w9g1 missing in the database

group_id for the coffee ESTs is 34813
organism_id for Coffea canephora is 27

FastParser has been modified.  It returns the whole string "SGN-EXXX".  Modify my ace files to accommodate the change.
Modify unigene-ace-upload.pl to remove SGN-E
Change NULL to "undef"
Change load file to a loop, due to permision.
The unigene_ace table has been deleted from SGN.  Remove the loading ACE file part of ungiene-db-load.pl

Mar 14
The ESTs and unigenes are all loaded!
Need to fix
1.  Flag of ESTs, not trimmed.  Change status flag from 0x10 to 0x20
2.  On EST page, "no current unigene builds incorporate this sequence" Set status of the unigene build to 'C'
3.  EST organism, lycopersicum.  A mistake in loading the data, set organism_id to 27

To do
1.  Clean up the clone_name
2.  Add CGN unigene_id
3.  Blast and functional annotations
4.  Modify search frame work to include search with cgn ungiene_id

THERE IS LEAK!!!
13175 unigenes in CGN while only 13121 unigenes in SGN!
There are only 13121 contigs/singlets in ace-parsed-contig-membership.tdv

Run ace-membership.pl on amatxu, the same number (13121)
Those lost are singlets.  I guess we just lost the ACE files.  Yes

To recover those unigenes
Find all the missing unigenes, grab the corresponding est.
Fill up the ace-parsed-membership.tdv with those missing singlets.
Add those sequences to the unigene.seq and unigene.qual file
Run the unigene-upload.pl
46157 46158 and 30079, missing clone name?  
Clone name are missing from cgn_clone_unigene.
Check cgn_clone_unigene, only this three cases.
Fix the cgn_clone_unigene file
cccl4j5-30079	130634, seq_id is 26352  No clone name could be found in other_identifier table.  Recover it form trace location, 
-46157	131087	seq_id is 43748 only trimmed sequence, no raw sequence
-46158	131088	seq_id is 43749 only trimmed sequence, no raw sequence
These sequeces are not in sgn. not in sgn_est, not in seq_qc

Recover cccl4j5-30079, but not the other two.

Clones with multiple entries in sgn_est
> cccs46w19m10-45548
> cccs46w24j23-48035
> cccs46w25l9-48952
> cccs46w30i8-50212
> cccwc22w20b14-57851
> cccwc22w5d18-50792
> cccwc22w5d18-50792
> cccwc22w6p22-51723


Unigene build date not showing up in unigen index page.

log in sandbox as postgres
vacuum analyze


April 14
Load blast into cxgn.

Problem 1
We can no longer mount /data/shared, since now they are in different subnet.  It is now in the subnet in the server room.

Problem 2
The /data/shared databases have been changed.  No fasta file anymore.

Problem 3

Defline annotation id is not defined for AT2G32700.3
