| de novo assembler tools |
cannot generate |
complete chromosome sequences |
|
| new draft Nicotiana benthamiana genome |
was assembled from |
Pacific BioScience Highly Accurate Long Read Sequencing reads (PacBio HiFi) |
Nicotiana benthamiana |
| maize B chromosome introgressed into the (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) inbred line |
was sequenced using |
combination of chromosome flow sorting, Illumina sequencing, Bionano optical mapping, and high-throughput chromatin conformation capture (Hi-C) |
Zea mays |
| new assembly (80.6 Mb) |
is comparable to |
previous assembly results |
Plasmopara viticola |
| leaf data |
were assembled into |
316537 catalogue loci |
|
| genome assembly |
resulted in |
chromosome 1 with telomere repeat sequence on one end |
|
| pseudochromosome sequence inference |
is based on |
conserved genomic blocks |
|
| Pseudomolecule scaffolding |
produced |
15 scaffolds corresponding to the haploid chromosome number |
Ipomoea purpurea |
| Spinacia oleracea genome |
was used as |
reference genome |
Spinacia oleracea |
| final polished blackgrass genome assembly |
contains |
unanchored sequences |
Alopecurus myosuroides |
| DNA sequences from 144 samples |
were classified into |
seven pseudomolecules of F. vesca |
Fragaria vesca |
| limitation of de novo assembly |
makes difficult |
chromosome-scale structural variation analysis |
|
| short next-generation sequencing (NGS) reads |
would result in |
highly fragmented assemblies |
Hordeum vulgare |
| Oxford Nanopore long reads |
produced |
complete genome assembly |
|
| DM1-3 genome |
based on |
cultivated diploid species Solanum tuberosum group phureja |
Solanum tuberosum |
| scaffolds Plvit038 and Plvit053 |
were confirmed by their juxtaposition on |
primary contig (Primary_000014F) and haplotig (Haplotig_000014F004) |
Plasmopara viticola |
| Oxford Nanopore long reads |
produced for |
Quercus dentata genome assembly |
Quercus dentata |
| Y chromosome |
has assembly length of |
c. 195.2 Mb |
Spinacia oleracea L. subsp. turkestanica |
| Illumina HiSeq 2000 system sequencing to 10× coverage |
resulted in |
60% coverage of chromosome 3B reference |
Triticum aestivum |
| genome assembly |
resulted in |
telomere-to-telomere assemblies of chromosomes 2–7 |
|
| new assembly (80.6 Mb) |
falls intermediate between |
SMRT sequencing assembly (92.94 Mb) and Illumina assembly (74.74 Mb) |
Plasmopara viticola |
| Nanopore reads and Illumina reads |
polished |
genome assembly |
|
| 34.79 gigabases of trimmed and self-corrected reads |
produced |
602 Mb assembly in 402 scaffolds |
Ipomoea purpurea |
| scaffold sequences |
are generated by |
de novo assemblers |
|
| Quercus dentata assembled genome |
has contig N50 of |
4.8 Mb |
Quercus dentata |
| two scaffolds (Plvit038 and Plvit053) |
indicate |
close physical proximity of sequences |
Plasmopara viticola |
| long-read sequencing technologies from Oxford Nanopore Technologies and Pacific Biosciences |
allowed |
complete assembly of centromeric DNA regions |
|
| Canu assembler |
produced |
genome assembly |
|
| Amaranthus hypochondriacus assembly |
is |
highly contiguous with 16 chromosome-scale scaffolds |
Amaranthus hypochondriacus |
| karyotype information in Brassicaceae |
was used to develop |
KGBassembler (Karyotype-based Genome assembler for Brassicaceae) |
|
| Quercus dentata assembled genome |
composed of |
312 contigs |
Quercus dentata |
| X/Y-chromosome assembly result |
was splintered with |
RagTag software |
Spinacia oleracea L. subsp. turkestanica |
| B71 isolate |
has |
reference genome sequence |
Magnaporthe oryzae |
| CrusView pseudochromosome inference feature |
is especially convenient for |
nonmodel species lacking genetic and/or physical map |
|
| genetic anchoring of individual clones |
will enable positioning of |
singleton clones |
Hordeum vulgare |
| single-chromosome sequencing |
produces assembly results much better than |
maize B chromosome assembly |
Spinacia oleracea; Zea mays |
| B71 genome (B71Ref1) |
contained |
five scaffolds from the mini-chromosome |
|
| de novo assembler tools |
cannot generate without |
genetic and/or physical maps |
|
| Hi-C paired-end reads |
used to assist |
assembly correction and chromosome anchoring |
Quercus dentata |
| physical contigs in centromeric regions |
lacks ordering of |
relative order of physical contigs |
Hordeum vulgare |
| 6,243 (99.4%) of all sequenced BACs |
harbored |
WGS contigs |
Hordeum vulgare |
| haplotigs |
had mean contig size of |
58,872 bp |
Ficus carica |
| (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v1 |
forms |
61,161 scaffolds |
Zea mays |
| (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 |
includes |
1,844 gene space contigs |
Zea mays |
| (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v2 |
improved v1 by integrating |
genetic and optical map information |
Zea mays |
| (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v2 |
has approximately 80% of maize genome |
ordered and oriented |
Zea mays |
| clones |
are assembled into |
physical contigs |
Hordeum vulgare |
| whole-genome shotgun (WGS) contig morex_contig_94710 |
contains |
photoperiod-H1 |
Hordeum vulgare |
| (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v2 |
improved v1 by addition of |
fosmid reads |
Zea mays |
| Canu assembler |
is |
non-hybrid assembler |
|
| Scutellaria baicalensis |
has |
diploid reference genome |
Scutellaria baicalensis |
| early Nicotiana benthamiana genome drafts |
were fragmented due to |
short reads |
Nicotiana benthamiana |
| Casuarina equisetifolia |
has |
high-quality genome assembly |
Casuarina equisetifolia |
| Illumina short reads |
corrected errors in |
genome assembly |
Acer truncatum |
| 13 long super-scaffolds (pseudochromosomes) |
represent |
99.44% of final genome assembly |
Acer truncatum |
| two counterpart genes s01133g27051 and s01133g27052 |
were located in |
28 882–34 561 region of seq001133 scaffold sequence |
Ficus carica |
| haplotigs |
had N50 of |
89,539 bp |
Ficus carica |
| long-read phased assemblies |
provided suitable solution with minimal |
assembly and gene prediction errors |
Ficus carica L. |
| Pacific Biosciences (PacBio) long-reads sequencing |
was used to generate |
diploid reference genomes |
Durio zibethinus; Scutellaria baicalensis; Prunus × yedoensis |
| pseudomolecule sequences |
were named |
FER_r1.1.pseudomolecule dataset |
Ficus erecta |
| pseudomolecules |
had sizes varying between |
20.5 and 29.5 Mb |
Marchantia polymorpha |
| 407 primary contigs |
corresponded to |
266,522,563 bp |
Ficus carica |
| resulting super-scaffolds from scf_v1 |
had N50 of |
64.014 Mb |
|
| third generation genomic technologies |
enable |
simple and cost-effective solutions for chromosomal-level assemblies |
|
| FALCON-Unzip algorithm |
uses |
PacBio sequencing data |
|
| additional wheat variety assemblies |
are less contiguous than |
Chinese Spring reference assembly |
Triticum durum; Triticum aestivum |
| ONT sequencing of another Arabidopsis accession |
enabled resolution of |
quantitative trait loci (QTL) previously recalcitrant to BAC sequencing |
Arabidopsis thaliana |
| polyploidy |
is |
ongoing challenge in genome assembly |
|
| next-generation sequencing (NGS) technologies |
facilitates |
de novo assemblies of plant genomes |
|
| unphased genomes |
can have |
differing random haplotypes represented in a single genome assembly sequence |
|
| A. truncatum genome |
was assembled into |
13 pseudochromosomes |
Acer truncatum |
| Marchantia polymorpha genome |
consists of |
nearly 3000 scaffolds |
Marchantia polymorpha |
| estimated size of interscaffold gaps |
increased total length of map to |
208 823 686 Mb |
Marchantia polymorpha |
| de novo draft genome |
is assembled for |
Petunia parodii |
Petunia parodii |
| elite cultivated variety Zhongsizhu 1 |
subjected to |
whole-genome sequencing |
Boehmeria nivea |
| chromosome-level pseudomolecule sequences |
were assembled for |
Ficus erecta |
Ficus erecta |
| Dottato assembly |
had higher percentage of annotated genes than |
previously produced fig genome assembly |
Ficus carica |
| genome-wide comparison between primary contigs and haplotigs |
identified |
903,428 single nucleotide polymorphisms and indels |
Ficus carica |
| hybrid assembly |
integrated with |
high-density genetic map |
|
| sequences with greater than 4–5% variation |
are often retained as |
primary contigs |
Manihot esculenta |
| Hi-C technology |
employed for |
scaffolding of Gastrodia elata genome contigs |
Gastrodia elata |
| repeats in genome sequence |
causes difficulty in |
genome assembly |
|
| local similarity between homeologous chromosomes |
can introduce similar challenges as |
heterozygosity |
|
| Oxford Nanopore Technologies (ONT) nanopore sequencer |
enabled assembly of |
more contiguous and complete versions of Banana reference genome |
|
| genetic map anchoring |
arranged scaffolds into |
eight linkage groups corresponding to eight autosomes |
Marchantia polymorpha |
| assembly of complex plant genomes |
is |
challenging task |
|
| genome anchoring process |
produced |
set of 13 pseudomolecules |
Ficus carica |
| centromere regions |
are typically difficult to assemble due to |
highly repetitive composition |
|
| long-read sequencing |
allowed assembly of |
Ficus erecta genome sequence with high contiguity |
Ficus erecta |
| haplotigs overlapping flanking regions |
probably due to |
heterozygosity and structural variations |
Ficus carica |
| genome anchoring process |
associated |
407 primary contigs to fig chromosomes |
Ficus carica |
| long-read sequencing and high depth coverage |
produced genome of high |
sequence contiguity and accuracy |
Ficus carica L. |
| Hi-C data |
were used to assemble |
chromosome-scale scaffolds |
Acer truncatum |
| primary contigs |
had minimum size of |
20,012 bp |
Ficus carica |
| DBG2OLC and Canu assemblies |
produced |
lower N50 value compared with FALCON |
Ficus carica |
| diploid FALCON-Unzip assembler |
produced |
primary set of contigs |
Ficus carica |
| chromosome-scale scaffolds |
were assembled into |
13 long super-scaffolds (pseudochromosomes) |
Acer truncatum |
| assembly process |
produced |
333,400,567 bp of fig genome sequence |
Ficus carica |
| haplotigs |
used to estimate |
heterozygosity of Dottato cultivar |
Ficus carica |
| chloroplast genome |
represented by |
circularized sequence of 160,594 bp |
Ficus carica |
| bottle gourd reference genomes |
were assembled from |
Illumina short reads |
Lagenaria siceraria |
| k-mers that contained short reads pertaining to missing heterozygous sequence |
assembled using |
SPAdes |
Manihot esculenta |
| haplotig N50 size |
is consistent with |
relatively short dispersed regions of heterozygosity |
Manihot esculenta |
| 'pseudohap' mode |
generates |
pseudo-haplotype contigs by collapsing alternate sequences from phased haplotigs with homozygous sequence from primary assembly |
Manihot esculenta |
| plant mitogenomes |
are often assembled as |
single circular chromosome termed 'master circle' |
|
| old and recent bursts of transposable elements |
render challenging |
plant genome assembly |
|
| primary contigs |
had N50 of |
823,517 bp |
Ficus carica |
| total sequence assembled |
is approximately |
1 Gb |
Manihot esculenta |
| wild emmer genome sequence (WEW_v1.0) |
was assembled from |
whole-genome shotgun (WGS) reads |
Triticum turgidum ssp. dicoccoides |
| de novo assemblies for each individual of a population using 10X Genomics |
have only minor increase in |
costs |
|
| assembly of allopolyploids |
works surprisingly well even without |
dedicated assembly methods |
|
| first plant genome based completely on Pacific Biosciences (PacBio) single molecule real time (SMRT) sequencing |
resulted in |
fourth most contiguous genome at the time |
Oropetium thomaeum |
| quantitative trait loci (QTL) repeat structure |
required |
reads greater than 20 kb |
Arabidopsis thaliana |
| genome assembly |
spanned |
331.6 Mb with 538 contigs and N50 length of 1.9 Mb |
Ficus erecta |
| FALCON assembler |
was chosen as |
most appropriate assembler |
Ficus carica |
| primary assembly |
used in |
downstream annotation and DNA modification analysis |
Ficus carica |
| long-read assembled contigs |
integrated with |
301 single-molecule genome maps from NanoChannel Arrays |
|
| collapsed haplotype representation of the genome |
results in artifacts in |
highly heterozygous plants |
Manihot esculenta |
| modified coords2hp.py script |
resulted in |
one complete set of 9925 contigs comprising approximately 720 Mb for each phase |
Manihot esculenta |
| Rubrum wintersweet genome V1.0 assembly |
has contig N50 of |
8.13 Mb |
Chimonanthus praecox |
| Gastrodia elata genome contigs |
have total length of |
1.043 Gb |
Gastrodia elata |
| Gastrodia elata genome contigs |
cover |
95% of Gastrodia elata genome |
Gastrodia elata |
| Cucumis melo L. ssp. agrestis IVF77 |
was subjected to |
Hi-C assembly |
Cucumis melo |
| physical map linkages |
can generate |
chromosome-scale, fully phased diploid genome assemblies |
|
| Phase1 contigs |
aligned to |
scaffolded Phase0 assembly |
Manihot esculenta |
| 1152 possible ways to assemble contigs 2–15 |
includes |
288 assembly paths that could generate a closed circle |
Glycine max |
| scaffold 16 of DASZ |
anchored to |
chromosome 14 |
Camellia sinensis |
| IWGSC RefSeq v2.1 assembly |
is |
major advance for applied and basic applications |
|
| generation of SLRs for de novo assembly |
requires |
high amounts of short read coverage |
|
| optical map assemblies |
have a bias to break at |
regions with closely-spaced restriction sites |
|
| 10X Genomics linked read approach |
used only |
short-read data of a single sequencing library at modest coverage |
Homo sapiens |
| Oxford Nanopore Technologies (ONT) nanopore sequencer |
enabled assembly of |
more contiguous and complete versions of Sorghum reference genome |
|
| Hi-C |
routinely enables |
resolution of genomes into chromosomes |
|
| linear contigs connected by a single edge |
is |
simplest instance of genome assembly graph |
|
| pseudomolecule sequence names and directions |
were assigned in accordance with |
previously described high-density genetic map for fig |
Ficus carica |
| final map |
had total length of |
198 443 496 Mb |
Marchantia polymorpha |
| Potentilla micrantha |
has |
high-quality genome assembly |
Potentilla micrantha |
| centromere regions |
remain largely unknown in |
short-read assemblies |
|
| three pairs of SNPs |
were co-located on |
the same scaffolds |
|
| FALCON-Phase coords2hp.py script |
truncated |
contigs and haplotigs at ends of alignments |
Manihot esculenta |
| optical map-guided assembly |
guided |
revision of approximately 10% of genome sequence |
|
| successful assembly of centromeric and telomeric regions of walnut hybrid |
suggests that |
IWGSC sequence of CS genome can be further refined |
|
| linked short read data from 10X Genomics platform |
enable |
de novo assemblies for each individual of a population |
|
| single molecule sequencing |
enabled |
near complete chromosomes |
|
| single-molecule reads |
produce |
draft assemblies with high contiguity |
|
| long read assembly algorithms |
can disentangle |
divergent haplotypes |
|
| connections between contigs |
are represented by |
edges |
|
| advancements in assembly methods |
enable |
construction of high-quality reference genomes |
|
| long-read sequencing and advanced scaffolding methods |
enabled |
first round of highly contiguous, almost complete T2T genomes |
|
| resulting alternate assembly |
contained over |
341 Mb assembled in haplotigs |
Manihot esculenta |
| complete mitogenome (∼403 kb) from landrace (LR; 'Aiganhuang') |
is reported in |
soybean |
Glycine max |
| remaining 12 CcCIPK genes |
were assigned to |
Unplaced Scaffold |
Cajanus cajan |
| individual research groups |
can sequence and assemble |
genomes they are interested in |
|
| high repetitiveness due to transposable elements |
displays |
unique challenges |
|
| de novo assembly |
itself requires |
high coverage |
|
| haplotigs from 10X Genomics assembly |
achieved N50 values of up to 9 Mb for |
assembly of individual sets of chromosomes |
Homo sapiens |
| concept of the reference genome |
is replaced by |
pan-genome |
|
| hairballs with thousands of nodes and edges |
are common and are likely driven by |
complex genome features and high copy number repeats |
|
| NanoChannel Array (BioNano) |
provides |
high-resolution genome maps |
|
| haplotig truncation lengths |
compared with |
contig vs. haplotig alignment lengths |
Manihot esculenta |
| importance of assembling both haplotypes |
suggests |
many large haplotypic SVs might be present with potential impact on gene expression or function |
Manihot esculenta |
| high read depth of contig 15 |
infers that |
sequence of contig 15 is duplicated in the Wm82 mitogenome |
Glycine max |
| Hi-C chromatin contact data |
used for |
assembly of 18 Gastrodia elata chromosomes |
Gastrodia elata |
| Cucumis metuliferus CM27 |
has |
12 chromosomes |
Cucumis metuliferus |
| eggplant |
has |
CL assembly based on combination of Illumina, Nanopore, 10× Genomics, and Hi-C scaffolding |
Solanum melongena |
| durum wheat cv. Svevo reference sequence |
was published following |
bread wheat cv. Chinese Spring reference sequence |
Triticum turgidum ssp. durum |
| scf_v3 super-scaffolds |
were ordered and oriented through alignment on |
high-density genetic maps |
|
| contigs within scaffolds |
retained |
gaps filled with Ns |
|
| IWGSC RefSeq v2.1 pseudomolecules |
had effective length of |
14 316 999 506 bp |
|
| latest assembly tools |
promise to simplify |
practical assembly procedure |
|
| read pairs generated from the two ends of Hi-C fragments |
comes from |
closely linked DNA |
|
| assembly of Brassica juncea |
featured |
scaffold N50 of 1.5 Mb |
Brassica juncea |
| assembly of Brassica juncea |
was generated with |
short and long-read sequencing data combined with optical maps |
Brassica juncea |
| physical mapping technologies |
enabled |
high-quality, chromosome scale assemblies |
|
| large SVs between haplotypes |
could affect |
how FALCON-Phase aligns and places haplotigs vs. primary contigs |
Manihot esculenta |
| gaps in IWGSC RefSeq v1.0 assembly |
were closed using |
contigs built from WGS PacBio SMRT CS long reads |
Triticum aestivum |
| CS optical maps |
relocated |
233 scaffolds |
|
| optical mapping |
is |
third generation genomic technology |
|
| contig scaffolding |
typically starts by |
ordering contigs using alignments of paired reads |
|
| sequence contigs |
can be used to control for errors in |
optical maps |
|
| optical map information |
can be integrated during |
sequence assembly |
|
| chromosome conformation capture sequencing (Hi-C) |
is |
elegant solution to challenges of chromosome-scale assembly |
|
| bridging between neighbored polymorphisms |
could thereby |
distinguish the homeologous chromosomes |
|
| shotgun sequencing and OLC assemblers |
were adopted for |
papaya genome |
|
| Oxford Nanopore Technologies (ONT) nanopore sequencer |
enabled assembly of |
more contiguous and complete versions of Arabidopsis reference genome |
Arabidopsis thaliana |
| read mapping coverage and sequence homology |
identify |
potential haplotigs |
Manihot esculenta |
| combination of multiple independent sources of information |
highlights importance of |
produce consensus less likely to suffer from technology-specific issues |
|
| challenges of plant genome assembly |
are determined by |
genome size and increased levels of repetitiveness |
|
| long DNA fragments derived from dilution methods |
can be used for |
assembly of haplotypes |
|
| Oxford Nanopore Technologies (ONT) nanopore sequencer |
enabled assembly of |
more contiguous and complete versions of Tomato reference genome |
|
| complex sequences and high copy number repeats (LTRs, centromeres, etc.) |
can create |
indiscernible hairballs of thousands of interconnected nodes with no clear paths |
|
| Dottato assembly |
had higher percentage of annotated repeats than |
previously produced fig genome assembly |
Ficus carica |
| heterozygosity and structural variations |
led to complications during |
assembly process |
Ficus carica |
| one complete set of 9925 contigs for each phase |
included |
almost all original heterozygosity assembled |
Manihot esculenta |
| average read depth of contig 15 |
was about twice of |
average read depth of other contigs |
Glycine max |
| single-molecule real-time (SMRT) sequencing and Hi-C technology |
were combined to obtain |
high-quality genome assembly of Cucumis metuliferus |
Cucumis metuliferus |
| pseudomolecules_v2.0 along with ChrUn |
constituted |
intermediate IWGSC RefSeq v2.0 assembly |
|
| complex repeats such as ribosomal RNA (rRNA) or centromeric satellite DNA |
can create |
higher-order ambiguities in the graph structure |
|
| long-read assembly |
supplemented with |
short-read contigs (SRC) |
Manihot esculenta |
| contigs 2–15 with large repeats at termini |
can be assembled into |
1152 possible ways to assemble into a minimal complete genome |
Glycine max |
| large, polyploid, repeat-rich genomes |
require |
innovative strategies for assembly |
|
| single-molecule sequencing technologies |
reinforced the need to choose |
sequencing and assembly strategies |
|
| WTDBG2 |
can leverage |
corrected reads |
|
| DNA sequencing advances |
have included |
improved scaffolding |
|
| extra SRC |
included |
missing heterozygous sequence |
Manihot esculenta |
| bioinformatics algorithms for de novo assembly |
are |
available |
|
| high proportion of repeated sequences |
complicates |
assembly of reference-quality genome sequence |
Triticum aestivum; Triticum turgidum ssp. durum |
| Hi-C heat map inspection and telomere signature sequence placement |
was used to adjust and confirm |
final chromosomal scaffolds |
Amaranthus cruentus |
| single reference assembly |
does not reflect |
gene diversity of a species |
|
| structural variant analysis |
is |
emerging frontier in plant genome assembly |
|
| high-density genetic maps |
were traditionally used to anchor |
contigs and scaffolds into chromosomes |
|
| long read assembly algorithms |
can accurately correct |
divergent haplotypes |
|
| long read assembly algorithms |
lead to assemblies that exceed |
monoploid genome size |
|
| coords2hp.py script in FALCON-Phase |
modified to |
force inclusion of entire length of each haplotig |
Manihot esculenta |
| TME7 genome assembly |
successfully assembled |
almost entirety of TME7 genome |
Manihot esculenta |
| Chr-m2 |
has length of |
63 kb |
Glycine max |
| SMRT sequencing and high-throughput chromosome interaction mapping |
produced |
high-quality chromosome-level C. metuliferus genome assembly |
Cucumis metuliferus |
| wild emmer wheat genome sequence |
was first reported for |
wild emmer wheat |
Triticum turgidum ssp. dicoccoides |
| resulting super-scaffolds from scf_v2 alignment to NLRS |
had largest recording length of |
364.575 Mb |
|
| De Bruijn graph (DBG) assembly methods |
handle |
shorter reads sequenced at greater depth |
|
| polyploid genomes |
present challenges for |
genome sequencing |
|
| CS optical maps |
reoriented |
354 scaffolds |
|
| genome-wide (consensus) maps |
can be used to |
scaffold the contigs of a corresponding sequence assembly |
|
| early genome assemblies |
included |
millions of contigs |
Triticum durum; Triticum aestivum |
| single molecule reads |
provide opportunities to untangle |
genomic regions missed by short read technologies |
|
| ONT sequencing |
reduced assembly to |
40 contigs that spanned chromosome arms (telomere to centromere) |
Arabidopsis thaliana |
| Hi-C assembly |
mapped |
97% of wild ramie sequences into 14 pseudomolecules |
Boehmeria nivea |
| wild ramie assembly |
has |
contig N50 length of 10.51 Mb |
Boehmeria nivea |
| benefits of accurate phasing |
outweigh |
additional minor duplication |
Manihot esculenta |
| complementation between phases |
preserves |
existence of potentially crucial single copy genes |
Manihot esculenta |
| Cucumis metuliferus CM27 |
was subjected to |
Hi-C assembly |
Cucumis metuliferus |
| PacBio-based assembly of Arabis alpina |
was |
337 Mb long |
Arabis alpina |
| heterozygosity |
is |
ongoing challenge in genome assembly |
|
| 903,428 single nucleotide polymorphisms and indels |
accounted for |
overall heterozygosity of approximately 2.7 polymorphisms per kilobase |
Ficus carica |
| potential of MGEs to be shared between bacteria |
makes difficult |
generating high-quality metagenome assemblies |
|
| long-read and short-read sequencing platforms |
generate |
plasmids |
|
| FALCON-Phase |
discarding |
over 40 Mb of sequence from both primary and haplotig assemblies |
Manihot esculenta |
| polyploidy |
complicates |
assembly of reference-quality genome sequence |
Triticum aestivum; Triticum turgidum ssp. durum |
| long-read sequencing and long-range scaffolding methods |
enable |
assembly of entire chromosomes |
|
| optical mapping and Hi-C data |
are complementary and can help bridging |
different regions in the genome |
|
| scaffold N50 values of close to 20 Mb |
outperformed |
contiguity of long-read assembly contigs |
Homo sapiens |
| de novo assembly of an autoploid plant |
does not exist that reconstructs |
each of the homeologous chromosomes separately |
|
| assembly of autoploid species |
has used |
diploid or even haploid individuals |
|
| high-quality plant reference genomes |
were assembled using |
minimum tiling path of BACs sequenced with Sanger technology |
|
| second generation sequencing technologies such as 454 and Illumina |
spurred the development of |
De Bruijn graph (DBG) assembly methods |
|
| complex plant genomes |
have |
repeat structures greater than 20 kb |
|
| diploid assembly |
maximized to include |
as much haplotypic sequence as possible |
Manihot esculenta |
| 673 super-scaffolds anchored on 21 wheat chromosomes |
produced |
pseudomolecules_v1.1 |
|
| polyploid genomes |
have increased |
repetitiveness |
|
| assembly of the gene space of the hexaploid genome of wheat |
was first attempted with |
simple whole-genome shot-gun approaches |
Triticum aestivum |
| longer reads |
enables |
assembly of longer contigs |
|
| genome size, heterozygosity, and repeat content estimates |
inform |
sequencing strategy and depth |
|
| next-generation frameworks |
integrate |
long-read and short-read data, optical maps, and Hi-C signals |
|
| Bill and Melinda Gates Foundation |
funded |
plant genome assembly and annotation work |
|
| long-read and short-read sequencing platforms |
generate |
circularized genomes |
|
| 'unzip'-emit-style haplotigs |
used for comparison with |
primary assembly |
Manihot esculenta |
| Rubrum wintersweet genome V1.0 assembly |
is approximately |
96.17% of predicted genome size |
Chimonanthus praecox |
| computational strategies for data integration |
are less straightforward as compared to |
de novo assemblies |
|
| Peregrine |
uses |
sparse hierarchical minimizers (SHIMMER) |
|
| raw reads |
can be mapped to resolve |
missing haplotype regions |
|
| extraction stage |
involves |
DNA extraction using optimized protocols for LMW or HMW DNA |
|
| long-read sequencing |
overcomes problems with |
short-read sequencing assembly |
|
| SNP and indel calling from aligned Illumina reads |
showed |
similar heterozygosity results |
Ficus carica |
| recent genomic approaches |
have been applied to recover |
complete genomes from herbarium material |
|
| powerful scaffolding strategies |
enables |
fully phased, chromosome-scale, telomere-to-telomere assemblies |
|
| de novo assembly of long reads from Nanopore and PacBio libraries |
yielded |
270.2-Mb genome sequence for wild ramie |
Boehmeria nivea |
| goal to purge primary assembly |
used |
Purge_dups |
Manihot esculenta |
| primary and haplotig assemblies |
phased using |
Hi-C data and FALCON-Phase |
Manihot esculenta |
| results |
show |
accurately produced one full haplotype assembly of TME7 and second alternate assembly containing most haplotypic variation |
Manihot esculenta |
| Rubrum wintersweet genome V1.0 assembly |
comprises |
661 contigs |
Chimonanthus praecox |
| (IAA14, SLR, AT4G14550) assembly |
is not clear how well it works if |
fragments contain repeats |
|
| individual fingerprints (or maps) |
can be assembled into |
genome-wide (consensus) maps |
|
| hexaploid genome of wheat |
has size of |
17 Gb |
Triticum aestivum |
| partially phased genomes |
can be collapsed into |
chimeric monoploid |
|
| contig N50 |
is |
quality metric for draft genome assembly |
|
| DNA sequencing advances |
have included |
optical mapping |
|
| Two chromosome-level genomes |
represent |
wild and cultivated ramie |
Boehmeria nivea |
| TME7 assembly |
represents |
significant improvement over previous attempts at assembly of heterozygous African cassava lines |
Manihot esculenta |
| 11 super-scaffolds |
have total length of |
737.03 Mb |
Chimonanthus praecox |
| plant-specific genome assemblers |
consider |
complexity of plant genome |
|
| Oxford Nanopore Technology sequencing data |
were assembled using |
Wtdbg2 |
Amaranthus cruentus |
| first plant genomes assembled from PacBio data alone |
were from |
Arabidopsis thaliana |
Arabidopsis thaliana |
| first whole-genome assemblies using Oxford Nanopore data |
have reached |
N50 values of multiple hundred kb for fungal genomes |
|
| genome assembly algorithms |
are designed to correct, overlap, and polish |
long reads with high error-rates |
|
| Planning stage |
involves |
short-read sequencing and quality control to estimate genome properties |
|
| short-read sequencing |
produced |
hundreds of assemblies with expanded gene-space coverage |
|
| short-read contigs (SRC) |
contained |
additional heterozygous sequence |
Manihot esculenta |
| extra SRC |
contained |
duplicates of already assembled sequences |
Manihot esculenta |
| 'Williams 82' nuclear genome |
is |
first assembled and widely used reference genome in soybean |
Glycine max |
| pseudomolecules_v1.1 |
acquired |
most of intra-scaffold gaps |
|
| sizing gaps of previously unknown sizes |
allows |
local scale to be assessed |
|
| Illumina and 10× Illumina reads |
were used with pilon for polishing to produce |
final assembly |
Amaranthus cruentus |
| chromosome conformation capture |
was introduced to overcome |
weaknesses of short-read assemblies |
|
| contiguity improvement gained by integration of optical maps |
depends on |
different factors including contiguity of the assembly before integration and contiguity of optical consensus maps |
|
| integration of optical maps |
in the best cases led to |
reconstruction of entire chromosomes |
|
| integration of Dovetail Genomics read pairs into the PacBio assembly of A. alpina |
improved assembly contiguity similar well as compared to |
optical mapping data |
Arabidopsis alpina |
| haplotypes |
can also be assembled from |
long-range scaffolding methods |
|
| assembly stage |
involves |
long-read sequencing producing progressive genome assemblies |
|
| Large Language Models (LLMs) adapted for biological sequences and genome graphs |
may help transform |
phasing and haplotyping of complex plant genomes |
|
| low sequencing depth for plastid genome |
resulted in generation of only |
selected plastid markers |
Sartidia isaloensis |
| this article |
highlights |
both current best practices and the opportunities on the horizon |
|
| large language models (LLMs) |
may help disentangle |
haplotypes across highly repetitive and centromeric regions |
|
| goal to assemble full heterozygous diploid phased assembly |
sought to |
maximize amount of haplotypic sequence assembled |
Manihot esculenta |
| v3.0/v4.0 assembly |
has anchored to 12 chromosomes |
1.11 Gb of sequences containing 96.4% of predicted genes |
Solanum melongena |
| 673 super-scaffolds sharing two or more markers with genetic maps |
were anchored on |
21 wheat chromosomes |
|
| Hi-C |
enables resolution of genomes into chromosomes even for |
allotetraploids like Eragrostis tef |
Eragrostis tef |
| BAC-by-BAC genome assembly approach |
limited |
high-quality genome assemblies to a few model organisms |
|
| sample selection, preservation, and extraction of high-molecular-weight (HMW) DNA |
continue to represent |
major bottlenecks |
|
| continued progress in plant genome assembly |
promises |
deeper resolution of polyploid complexity and more consistent gene predictions |
|
| high-quality reference genome for Hangzhou Gourd |
was assembled using |
aforementioned approaches |
Lagenaria siceraria |
| assembled genome |
has size of |
1.20 Gb |
Euscaphis japonica |
| SLRs |
have been used to assemble |
a few eukaryotic genomes with genome sizes of some hundred Mb |
|
| very large genomes |
present challenges for |
genome sequencing |
|
| repetitive features such as rDNA (45 S, 5 S), centromeres, and telomeres |
are difficult features to assemble in |
telomere to telomere (T2T) efforts |
|
| polished hybrid assembly |
is |
assembly ZAAS_Lsic_2.0 |
|
| TME7 genome assembly |
used target haploid size of |
approximately 700 Mb |
Manihot esculenta |
| graph-based assembly approach used in FALCON-Unzip |
complemented by |
other orthogonal tools developed to extract haplotypic sequences |
Manihot esculenta |
| genome for each accession |
was de novo assembled using Megahit into |
contigs larger than 500 bp |
Solanum melongena |
| optical mapping |
is |
one of novel technologies emerged to improve scaffolding |
|
| revolution in plant genomics |
resulted in |
many lower quality assemblies |
|
| primary and alternative haplotypes |
can be collapsed into |
single, non-redundant but chimeric pseudomolecule |
|
| Verkko |
is |
assembly algorithm |
|
| recent T2T assemblies of Arabidopsis |
provide |
first complete views of entire centromeres |
Arabidopsis thaliana |
| assembly algorithms |
can accept |
HiFi and ONT data |
|
| genome size and heterozygosity ratio |
were estimated using |
tools in GenomeScope v2.0 |
Selaginella kraussiana |
| diploid assembly |
is approximately double the size of |
haploid consensus genome |
Prunus persica |
| draft genome sequences |
have been generated in |
a number of species |
|
| improved assembly algorithms |
enables |
fully phased, chromosome-scale, telomere-to-telomere assemblies |
|
| de novo assembly of the P. tomentosa genome |
might help to |
improve results in the future |
Populus tomentosa |
| Assembly, scaffolding, and gene prediction |
are inherently iterative processes |
iterative processes with double-headed arrows indicating feedback |
|
| Large Language Models (LLMs) adapted for biological sequences and genome graphs |
may help address challenges in |
polyploid and aneuploid plant genomes |
|
| Picea abies genome |
is available at |
ConGenIE.org database |
Picea abies |
| Quercus dentata assembled genome |
has total size of |
893.54 Mb |
Quercus dentata |
| 12 chromosome-level super scaffolds |
have scaffold N50 of |
75 Mb |
Quercus dentata |
| 49,719 reads |
were mapped on |
S. perrieri nuclear ribosomal DNA |
Sartidia perrieri |
| Illumina library construction and short-read sequencing |
were performed using |
Illumina HiSeq 2500 platform |
Selaginella kraussiana |
| CaTrailin4 and Ca5609 genes |
were truncated at 5' ends in |
short-read genome assembly |
Craspedostauros australis |
| diploid-aware procedure |
segregates |
highly divergent alleles into haplotigs |
Plasmopara viticola |
| pseudochromosome sequences |
are inferred from |
scaffold sequences |
|
| contigs and quality-filtered reads from GS (FLX, AT2G30120) platform |
assembled together using |
newbler v2.5.3 |
Cicer arietinum |
| assembled sequence |
has |
N50 index of 931 |
Cicer arietinum |
| sacred lotus |
genome sequenced using |
high-quality sequencing data |
sacred lotus |
| short RenSeq Illumina reads |
can be assembled de novo with high accuracy to generate |
contigs representing the majority of the NB-LRR loci |
|
| clone-by-clone strategy |
should not be abandoned in favor of |
POPSEQ |
Hordeum vulgare |
| 'Kasalath' physical map |
contained |
BAC clones covering 17 physical gaps in latest genomic sequence of 'Nipponbare' |
Oryza sativa |
| Gossypium physical maps |
serve as frameworks for |
anchoring and ordering assembled sequences into reference allotetraploid cotton genome |
Gossypium hirsutum |
| three loops |
comprise |
overlapping or similar sequence reads |
Fritillaria imperialis |
| ABySS assembly |
produced |
304 948 126 bases of assembled sequences |
Cicer arietinum |
| low-copy-number genome assembly (LCG) |
obtained from |
non-repetitive reads |
Triticum aestivum |
| issues of assembly quality |
most likely because of |
low read depth and presence of two homoeologous copies for each genomic region |
Hordeum bulbosum |
| next-generation sequencing (NGS) technology |
produces |
hundreds of thousands of sequence contigs |
|
| this study |
identified |
contigs not presented in haploid genome |
Citrus sinensis |
| enriched data set |
is mapped onto |
pseudo-chromosomes |
Triticum monococcum |
| reference genome |
was derived from |
doubled monoploid of an adapted diploid S. tuberosum Group Phureja clone |
Solanum tuberosum |
| Pacific Biosciences long-read sequencing combined with Illumina reads |
created |
new high-quality genome assembly |
Craspedostauros australis |
| high-density genetic maps generated by genotype sequencing |
are essential resources for |
sequencing and assembling the pineapple genome |
Ananas comosus |
| DNA regions encoding four of eight C4 amino acid transitions |
were available for |
at least one Sartidia accession |
Sartidia perrieri; Sartidia dewinteri; Sartidia isaloensis |
| insufficient sequencing coverage |
caused |
difficulties in assembly of complete cpDNA |
Sartidia isaloensis |
| 313,839 reads |
were assigned to |
S. dewinteri chloroplast genome |
Sartidia dewinteri |
| S. isaloensis |
has |
mean sequencing depth on low-copy genes of 0.4× |
Sartidia isaloensis |
| Selaginella moellendorffii genome v.1.0 |
is available at |
Phytozome database |
Selaginella moellendorffii |
| (HAP1, MAGO, MEE63, AT1G02140) genome assembly |
has |
contig N50 of 23.3 Mb |
Prunus persica |
| quality of the assembly after each step |
was assessed using |
BUSCO v5.2.2 |
Selaginella kraussiana |
| WGS assembly |
contains |
81% genome coverage in contigs and 85% genome coverage in scaffolds |
Linum usitatissimum |
| synteny between Arabidopsis and Brassica rapa |
was used to assemble |
Arabidopsis cDNA sequencing dataset into Brassica rapa based pseudo-chromosomes |
Arabidopsis thaliana; Brassica rapa |
| three released NGS cucumber genome assemblies |
comprises approximately |
4000 de novo assembled scaffolds accounting for approximately 55–65% of the 367 Mbp cucumber genome |
Cucumis sativus |
| primary assembly |
scaffolded using |
bacterial artificial chromosome (BAC)-end sequences |
Cicer arietinum |
| de novo sequence assembly for read mapping |
requires two to eight lanes |
HiSeq sequencing |
|
| BAC contigs |
varied among |
chromosomes |
Oryza sativa |
| 146 transcripts |
located within |
genomic regions corresponding to physical gaps on 'Kasalath' chromosome 1 |
Oryza sativa |
| 'Kasalath' physical map |
contains BAC clones covering |
genomic regions corresponding to 17 of the 'Nipponbare' gaps |
Oryza sativa |
| six ppc genes (ppc-aL1a, ppc-aL1b, ppc-aL2, ppc-aR, ppc-B1, ppc-B2) |
had few reads recovered |
Sartidia species |
Sartidia perrieri; Sartidia dewinteri; Sartidia isaloensis |
| 972 bp (6.3%) |
were available for |
all three Sartidia species |
Sartidia perrieri; Sartidia dewinteri; Sartidia isaloensis |
| barley physical map |
contains |
contigs larger than 904 kilobases |
Hordeum vulgare |
| two assemblies available for wheat |
based on |
same set of 454 reads |
Triticum aestivum |
| assemblies from short RenSeq reads |
do not span |
whole gene models |
|
| Illumina 76-bp sequences |
is mainly due to |
the number of large gene sub-families and the high sequence similarity between paralogs and alleles |
|
| novel assembly algorithms |
may speed up |
process of data collection and analysis |
|
| construction of large-insert mate-pair libraries |
is not straightforward and often yields |
sub-optimal results |
Hordeum vulgare |
| reads assembled using same program as previously |
resulted in assembly with |
shorter cumulative length (1.6 Gb versus 1.9 Gb) |
Hordeum vulgare |
| genome de novo assembly of the mutant pool |
yielded |
15,364 scaffolds |
Chlamydomonas reinhardtii |
| anchoring of draft genome scaffolds to high-density linkage map |
developed |
chromosome-level cucumber draft genome assembly |
Cucumis sativus |
| large amount of repetitive DNA sequences in most plant genomes |
pose significant technical challenges in |
sequencing quality and assembly accuracy |
|
| protocol for single-chromosome isolation and sequencing |
will probably require further optimization |
to obtain good results in species with larger genomes |
|
| Syntrichia ruralis draft genome |
consists of |
3211 scaffolds |
Syntrichia ruralis |
| new reference genome for Amaranthus hypochondriacus L. |
removes sequencing errors from |
previous assembly |
Amaranthus hypochondriacus |
| genome polishing |
demonstrates successful removal of |
small errors in the assembly |
Amaranthus hypochondriacus |
| genome |
was scaffolded with |
Hi-C proximity-guided assembly |
Beta vulgaris |
| flax BNG map |
could be used as backbone to refine |
flax WGS reference sequence |
Linum usitatissimum |
| 35 candidate false joins |
represented |
7.97% of the improved assembly |
Nelumbo nucifera |
| maize genotype W22 |
has |
whole-genome assembly |
Zea mays |
| large size of C. vulgaris 211/11P mitochondrial DNA |
is consistent with |
mitochondrial genomes of other green algae |
Chlorella vulgaris |
| final merged genome version of Casuarina equisetifolia |
has N50 for scaffold of |
1.06 Mb |
Casuarina equisetifolia |
| centromere-specific tandem repeats |
are notoriously challenging to assemble de novo from |
short-read sequence data |
|
| optical mapping data (~1400×) |
obtained using |
BioNano Genomics technology |
Chlorella vulgaris |
| Populus alba assembly |
consisted of |
464M of genome including 37,901 protein-coding genes |
Populus alba |
| chromosome conformation capture sequencing (Hi-C) |
will be valuable method to further improve |
assembly contiguity |
|
| C. rubella genome assembly |
was assembled using |
BAC and fosmid paired-end link support |
Capsella rubella |
| ONT read-based assemblies |
not fully assembled |
rDNA arrays and centromeres |
Arabidopsis thaliana |
| One library |
was sequenced in |
100-bp paired-end mode |
Selaginella kraussiana |
| low-quality sequences |
filtered from |
25.88 Gb of reads |
Linum usitatissimum |
| Hi-C raw data (∼213×) |
were used to further assemble |
genome using ALLHiC v0.9.13 software with the parameter "-k 10 -e GATC" |
Selaginella kraussiana |
| S. dewinteri |
lacks |
ppc-aL1b gene |
Sartidia dewinteri |
| Hi-C library |
was sequenced in |
150-bp paired-end mode on the Illumina Novaseq 6000 platform |
Selaginella kraussiana |
| NextDenovo v2.5.0 with the parameter "genome_size = 129M" |
was used to generate |
draft assembly from the CLR reads (∼128×) |
Selaginella kraussiana |
| chromosomes |
were |
gapless |
Prunus persica |
| limited capability of next-generation sequencing assembly software |
reflects |
absence of complete nested Elbe retrotransposons in draft genome sequence |
|
| diploid assembly |
has comparable |
repeat sequences |
Prunus persica |
| next-generation sequencing assembly software |
has limited capability to resolve |
segmental duplications and homologous repeats with identities exceeding 85% |
|
| Chromosome-level assemblies |
are either not available or the quality has not been validated |
|
Cucumis sativus |
| Sequence information spanning entire gap regions |
allowed |
revision of length of ambiguous fragments |
Solanum tuberosum |
| BAC-based physical map of 'Kasalath' chromosome 1 |
covers |
approximately 93.9% of 'Nipponbare' chromosome 1 sequence |
Oryza sativa |
| 39.4 million unmapped genomic reads |
were subjected to |
de novo assembly |
Citrus sinensis |
| 'ZS11' genome assembly |
had |
~95% (924 Mb) within 2003 longest scaffolds >1000 bp |
Brassica napus |
| relocation of 38 of 64 genes |
was confirmed by |
new version of melon genome (version 3.5.1) |
Cucumis melo |
| scaffold assembly organized using ALLMAPS program |
organized into |
20 chromosome-scale pseudomolecules (719.5 Mb) and 41 231 unplaced scaffolds (≥1 kb; 220.3 Mb) |
Glycine latifolia |
| short-read assemblies |
likely underestimate |
repeat prevalence |
|
| flax BNG optical map |
had N50 of |
2.15 Mb |
Linum usitatissimum |
| final pseudo-chromosomes |
generated based on |
scaffold order using consensus map |
Nelumbo nucifera |
| whole-genome sequencing using different approaches and clones |
yielded |
two reference maps |
Spirodela polyrhiza |
| long-read ON sequence-based assemblies |
able to resolve |
tandem repeat sequences |
Spirodela polyrhiza |
| different Wm82 sources |
explain |
differences between Wm82v2 and Wm82v4 assemblies |
Glycine max |
| curated and assembled Thellungiella parvula genome sequence |
has total size of |
137.09 Mb |
Thellungiella parvula |
| in silico mapping |
performed against |
completed genomic sequence of 'Nipponbare' rice |
Oryza sativa |
| PCR-based BAC screening |
extended |
physical size of BAC contigs |
Oryza sativa |
| Hifiasm software |
achieved |
haplotype-resolved assembly |
Prunus persica |
| (HAP1, MAGO, MEE63, AT1G02140) and (GCS1, HAP2, AT4G11720) genomes |
are comparable to |
previously published peach genomes |
Prunus persica |
| flax BNG optical map |
showed improvement over |
flax WGS assembly |
Linum usitatissimum |
| BNG contig BNG28 |
was |
2.95 Mb |
Linum usitatissimum |
| anchoring rate and orienting rate using each single map |
lower than |
anchoring and orienting rates using consensus map |
Nelumbo nucifera |
| integration of Illumina sequencing, long-reads (PacBio), and optical mapping (Bionano Genomics) |
allowed assembly of |
Chlorella vulgaris 211/11P genome |
Chlorella vulgaris |
| draft unordered scaffolds |
contain |
chimeras caused by mis-scaffolding |
|
| long scaffolds of 'China Antique' draft assembly |
generated by |
sequencing of paired-end 20 kb insert library |
Nelumbo nucifera |
| new Spirodela polyrhiza genome map |
is at |
chromosome scale |
Spirodela polyrhiza |
| ON platform |
is |
promising technology for polishing genome assemblies |
|
| three reference-quality de novo assemblies |
includes |
improved assembly of northern US accession Wm82 |
Glycine max |
| long reads sequencing technologies |
coupled with |
highly accurate short sequencing |
Casuarina equisetifolia ssp. incana |
| ~×14 coverage using PacBio-based long reads |
has potential for improvement in |
contig N50 |
Casuarina equisetifolia ssp. incana |
| PacBio sequencing |
was conducted on |
PacBio sequel platform including two SMRT cells |
Selaginella kraussiana |
| Hi-C library |
was prepared as described previously |
previous literature |
Selaginella kraussiana |
| PacBio long-reads |
used to construct |
high-quality genome assembly |
Ficus carica |
| haplotype phased assembly for fig |
in combination with physical information from |
previous fig genome assembly |
Ficus carica L. |
| significant SNPs |
were found on |
22 different scaffolds |
|
| sequences |
were assigned to |
the chromosomes |
Ficus erecta |
| haplotigs |
had maximum size of |
1,220,129 bp |
Ficus carica |
| Prunus × yedoensis |
has |
diploid reference genome |
Prunus × yedoensis |
| short-read platforms |
cause limitations in |
de novo assembly |
|
| blackgrass genome assembly |
has |
total primary contig length of 3475 Mb |
Alopecurus myosuroides |
| DNA sequences from 144 samples |
were best matched to |
reference genomes of octoploid strawberry Fragaria × ananassa and diploid strawberry F. vesca |
Fragaria × ananassa; Fragaria vesca |
| Primula oreodoxa complex |
comprises |
35 complete plastid genomes |
Primula oreodoxa |
| genetic map |
was anchored to |
v1.3 assembly |
Marchantia polymorpha |
| Syntrichia ruralis |
has |
draft genome of 381.24 Mb |
Syntrichia ruralis |
| de novo assembler tools |
can generate |
contig and/or scaffold sequences |
|
| genome |
was assembled from |
PacBio long reads |
Beta vulgaris |
| genetic linkage maps |
utilized to guide |
pseudomolecule construction |
Glycine latifolia |
| 94 BNG supercontigs |
were obtained from |
148 BNG contigs |
Linum usitatissimum |
| none of these approaches alone |
cannot lead to |
complete genome sequence assembly with high accuracy |
|
| integration of Bionano data |
resulted into |
genome assembly where 26 contigs anchored into 14 scaffolds |
Chlorella vulgaris |
| 14 scaffolds of nuclear genome |
contained |
98.9% of assembled C. vulgaris 211/11P genome |
Chlorella vulgaris |
| PacBio and second-generation sequencing |
was used to construct |
high-quality genome assembly |
Casuarina equisetifolia |
| Illumina HiSeq 2000 system sequencing to 10× coverage |
entirely covered |
30% of genes on chromosome 3B |
Triticum aestivum |
| Hi-C links |
anchored and oriented |
639.1 Mb of contigs onto 18 pseudochromosomes |
Manihot esculenta |
| high-coverage and accurate long-read sequence data and multiple assembly strategies |
resulted in |
two gap-free genome assemblies of xian rice varieties ZS97 and MH63 |
Oryza sativa |
| reference sequence of a single 1 Gb chromosome of hexaploid wheat |
has not been completed |
5 years after publication of a physical map |
Triticum aestivum |
| longer sequence reads |
may speed up |
process of data collection and analysis |
|
| several millions of markers provided by NGS technology |
may be used to bring contigs into |
linear order |
|
| large number of detected SNPs |
is used to integrate |
sequence assembly with two established framework maps |
Hordeum vulgare |
| BAC clone sequencing |
resulted in better assembly of |
'ZS11' genome |
Brassica napus |
| genome assembly |
resulted in |
telomere-to-telomere mini-chromosome |
|
| CrusView |
may be used to perform |
karyotype-based genome assembly |
|
| long-read sequencing technologies |
aid production of |
highly contiguous reference genomes |
Nicotiana benthamiana |
| CrusView performance in pseudochromosome assembly |
depends largely on |
quality of scaffolds and contigs |
|
| 312 contigs |
anchored into |
12 chromosome-level super scaffolds |
Quercus dentata |
| assembly results of X/Y chromosomes |
showed |
good collinearity and high coverage with reference genome |
Spinacia oleracea |
| CaTrailin4 and Ca5609 genes |
contain |
long repetitive stretches |
Craspedostauros australis |
| genome assembly |
resulted in |
circularized mitochondrial genome |
|
| single-chromosome sequencing |
is suitable to verify |
assignment of DNA sequence contigs to individual pseudomolecules |
|
| SPAdes |
was used for the assembly of |
X/Y chromosome using Illumina and Nanopore sequencing data |
Spinacia oleracea |
| SOAPdenovo |
is |
de Bruijn graph-based assembly program |
|
| whole-genome shotgun (WGS) de novo assemblies |
have typically relied on |
long sequencing reads |
|
| chromosomal-level genome assembly for Syntrichia ruralis |
is derived from |
clonally propagated male gametophyte |
Syntrichia ruralis |
| initial genome assembly |
was based on |
21.3 Gb long reads with 62.5× coverage |
Ficus erecta |
| primary contigs (FER_r1.0pctg) |
has total length of |
331.6 Mb |
Ficus erecta |
| gene models of M. polymorpha draft genome |
were lifted over to |
assembly |
Marchantia polymorpha |
| FragScaff program |
scaffolded |
genome |
Acer truncatum |
| whole genome sequences |
are available for |
Petunia axillaris and Petunia exserta |
Petunia axillaris; Petunia exserta |
| scaffolds Plvit038 and Plvit053 |
were confirmed to be contiguous within |
same genomic region |
Plasmopara viticola |
| CrusView |
provides function to infer |
pseudochromosome sequences |
|
| Y-chromosome |
has length of |
195 246 603 bp |
Spinacia oleracea |
| newly assembled contigs |
were long enough to span |
previously ambiguous regions |
Oryza sativa |
| genome assemblers such as CANU and FALCON |
enable |
building near-complete or gap-free rice genomes |
Oryza sativa |
| HiFi and ONT sequencing |
greatly improved |
assembly continuity |
Citrullus lanatus |
| Hi-C data |
permit |
assembly of large-scale scaffolds into pseudo-chromosomes |
Artemisia annua |
| Tang et al. (2022) |
released |
44 wild and cultivated diploid genome assemblies |
|
| assembly |
can be considered truly complete |
if a full genome sequence must be derived for each individual copy |
|
| (AT3G41762) |
was found in |
26-kb misassembled region in TAIR10 |
Arabidopsis thaliana |
| Col-PEK assembly |
provides |
long-awaited key resource for the plant community |
Arabidopsis thaliana |
| single-chromosome sequencing |
is better for |
accurate assembly of chromosome genome in genomes with high content of repetitive sequence |
|
| final polished blackgrass genome assembly |
contains |
seven pseudo-chromosomes |
Alopecurus myosuroides |
| HiFi+HiC assembly |
has higher contiguity than |
new N. benthamiana genome assembly using Chromium™ linked-read sequencing (10× Genomics) and Nanopore sequencing (Oxford Nanopore) |
Nicotiana benthamiana |