Nicotiana benthamiana draft genome sequence v2.6.1

  The N. benthamiana draft genome v2.6.1. is a new assembly version based on the Illumina data used in the version Niben1.0.1, with the addition of new Illumina mate pair data (20Kb, 15Kb and 5Kb) produced by Prof. Greg Martin and Prof. Lukas Mueller (BTI), PacBio Sequel Long reads produced by Dr. Brain Kivtko (UGA) and HiC data donated by Dr. Silin Zhong (HKCU). The result is an assembly with 19 pseudomolecules and 17,620 small scaffolds. You can find more details about this assembly in this presentation <link to the PDF>.

  Assembly
  The assembly was performed using SOAPdenovo2 with the default settings on the Illumina data (PE500bp + MP2Kb +MP5Kb + MP15Kb + MP20Kb) with one round of gap filling with GapCloser (Illumina PE500bp) and another round with PBJelly (10X coverage with PacBio Sequel). The assembly was re-scaffolded again with SSPACE-Long (PacBio Sequel) and the gaps filled again with another round of PBJelly. This final assembly was scaffolded into chromosomes using HiC data and the tool 3D-DNA.

 STATS: 
Total assembly size (Gb): 3.03.
Total number of sequences: 19 chromosomes (>95% anchoring) + 17,620 scaffolds.
Longest sequence (Mb): 194.60.
N90 (sequences): 18
L90 (Mb): 137.64
BUSCO: C:95.9%[S:47.6%,D:48.3%],F:2.1%,M:2.0%
C:97.7%[S:46.6%,D:51.1%],F:2.0%,M:0.3%,n:1375
Merqury:
Completeness: 98.12%
QV: 29.4
Consensus error rate: 0.001
  Annotation
  The annotation was performed with Maker-P. In brief, we downloaded all the RNA-Seq publicly available for N. benthamiana from NCBI SRA (this was in 2017) and we mapped to the Niben2.6.1 reference with STAR. A evidence-based set of gene models was created with StringTie. A de-novo repeat library was generated with RepeatModeler2. Then, the models created with StringTie were used to train Augustus using Braker. The models generated with StringTie, the training files from Augustus, the protein set from different Solanaceae genomes and the repeat library from RepeatModeler2 were used as inputs for Makert-P. 

  STATS:
Gene models: 61,328.
Exons/gene model: 4.98
Annotated 5' UTRs: 1,901
Annotated 3' UTRs: 244
Average protein length (Aa): 333
Gene space size (Mb): 281.
Repeats:
Low complexity: 1.27%
DNA Transposons: 1.25%
LINE Transposons: 1.57%
SINE Transposons: 0.36%
LTR/Copia: 4.60%
LTR/Gypsy: 10.73%
Other LTR transposons: 0.16%
rRNA: 0.03%
tRNA: <0.01%
snRNA: <0.01%
  If you have any questions about the assembly or annotation, please contact Aureliano Bombarely (abombarely@ibmcp.upv.es).