Notice: Database construction is still in progress. Certain features may be incomplete, slower than usual, or temporarily unavailable while we re-ingest the knowledge graph with citation data. Thank you for your patience.
← All pathways

genome assembly

12901 relationships annotated with this phrase. Showing first 500 of 12901.
Source entity Relationship Target entity Species
de novo assembler tools cannot generate complete chromosome sequences
new draft Nicotiana benthamiana genome was assembled from Pacific BioScience Highly Accurate Long Read Sequencing reads (PacBio HiFi) Nicotiana benthamiana
maize B chromosome introgressed into the (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) inbred line was sequenced using combination of chromosome flow sorting, Illumina sequencing, Bionano optical mapping, and high-throughput chromatin conformation capture (Hi-C) Zea mays
new assembly (80.6 Mb) is comparable to previous assembly results Plasmopara viticola
leaf data were assembled into 316537 catalogue loci
genome assembly resulted in chromosome 1 with telomere repeat sequence on one end
pseudochromosome sequence inference is based on conserved genomic blocks
Pseudomolecule scaffolding produced 15 scaffolds corresponding to the haploid chromosome number Ipomoea purpurea
Spinacia oleracea genome was used as reference genome Spinacia oleracea
final polished blackgrass genome assembly contains unanchored sequences Alopecurus myosuroides
DNA sequences from 144 samples were classified into seven pseudomolecules of F. vesca Fragaria vesca
limitation of de novo assembly makes difficult chromosome-scale structural variation analysis
short next-generation sequencing (NGS) reads would result in highly fragmented assemblies Hordeum vulgare
Oxford Nanopore long reads produced complete genome assembly
DM1-3 genome based on cultivated diploid species Solanum tuberosum group phureja Solanum tuberosum
scaffolds Plvit038 and Plvit053 were confirmed by their juxtaposition on primary contig (Primary_000014F) and haplotig (Haplotig_000014F004) Plasmopara viticola
Oxford Nanopore long reads produced for Quercus dentata genome assembly Quercus dentata
Y chromosome has assembly length of c. 195.2 Mb Spinacia oleracea L. subsp. turkestanica
Illumina HiSeq 2000 system sequencing to 10× coverage resulted in 60% coverage of chromosome 3B reference Triticum aestivum
genome assembly resulted in telomere-to-telomere assemblies of chromosomes 2–7
new assembly (80.6 Mb) falls intermediate between SMRT sequencing assembly (92.94 Mb) and Illumina assembly (74.74 Mb) Plasmopara viticola
Nanopore reads and Illumina reads polished genome assembly
34.79 gigabases of trimmed and self-corrected reads produced 602 Mb assembly in 402 scaffolds Ipomoea purpurea
scaffold sequences are generated by de novo assemblers
Quercus dentata assembled genome has contig N50 of 4.8 Mb Quercus dentata
two scaffolds (Plvit038 and Plvit053) indicate close physical proximity of sequences Plasmopara viticola
long-read sequencing technologies from Oxford Nanopore Technologies and Pacific Biosciences allowed complete assembly of centromeric DNA regions
Canu assembler produced genome assembly
Amaranthus hypochondriacus assembly is highly contiguous with 16 chromosome-scale scaffolds Amaranthus hypochondriacus
karyotype information in Brassicaceae was used to develop KGBassembler (Karyotype-based Genome assembler for Brassicaceae)
Quercus dentata assembled genome composed of 312 contigs Quercus dentata
X/Y-chromosome assembly result was splintered with RagTag software Spinacia oleracea L. subsp. turkestanica
B71 isolate has reference genome sequence Magnaporthe oryzae
CrusView pseudochromosome inference feature is especially convenient for nonmodel species lacking genetic and/or physical map
genetic anchoring of individual clones will enable positioning of singleton clones Hordeum vulgare
single-chromosome sequencing produces assembly results much better than maize B chromosome assembly Spinacia oleracea; Zea mays
B71 genome (B71Ref1) contained five scaffolds from the mini-chromosome
de novo assembler tools cannot generate without genetic and/or physical maps
Hi-C paired-end reads used to assist assembly correction and chromosome anchoring Quercus dentata
physical contigs in centromeric regions lacks ordering of relative order of physical contigs Hordeum vulgare
6,243 (99.4%) of all sequenced BACs harbored WGS contigs Hordeum vulgare
haplotigs had mean contig size of 58,872 bp Ficus carica
(B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v1 forms 61,161 scaffolds Zea mays
(B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 includes 1,844 gene space contigs Zea mays
(B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v2 improved v1 by integrating genetic and optical map information Zea mays
(B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v2 has approximately 80% of maize genome ordered and oriented Zea mays
clones are assembled into physical contigs Hordeum vulgare
whole-genome shotgun (WGS) contig morex_contig_94710 contains photoperiod-H1 Hordeum vulgare
(B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v2 improved v1 by addition of fosmid reads Zea mays
Canu assembler is non-hybrid assembler
Scutellaria baicalensis has diploid reference genome Scutellaria baicalensis
early Nicotiana benthamiana genome drafts were fragmented due to short reads Nicotiana benthamiana
Casuarina equisetifolia has high-quality genome assembly Casuarina equisetifolia
Illumina short reads corrected errors in genome assembly Acer truncatum
13 long super-scaffolds (pseudochromosomes) represent 99.44% of final genome assembly Acer truncatum
two counterpart genes s01133g27051 and s01133g27052 were located in 28 882–34 561 region of seq001133 scaffold sequence Ficus carica
haplotigs had N50 of 89,539 bp Ficus carica
long-read phased assemblies provided suitable solution with minimal assembly and gene prediction errors Ficus carica L.
Pacific Biosciences (PacBio) long-reads sequencing was used to generate diploid reference genomes Durio zibethinus; Scutellaria baicalensis; Prunus × yedoensis
pseudomolecule sequences were named FER_r1.1.pseudomolecule dataset Ficus erecta
pseudomolecules had sizes varying between 20.5 and 29.5 Mb Marchantia polymorpha
407 primary contigs corresponded to 266,522,563 bp Ficus carica
resulting super-scaffolds from scf_v1 had N50 of 64.014 Mb
third generation genomic technologies enable simple and cost-effective solutions for chromosomal-level assemblies
FALCON-Unzip algorithm uses PacBio sequencing data
additional wheat variety assemblies are less contiguous than Chinese Spring reference assembly Triticum durum; Triticum aestivum
ONT sequencing of another Arabidopsis accession enabled resolution of quantitative trait loci (QTL) previously recalcitrant to BAC sequencing Arabidopsis thaliana
polyploidy is ongoing challenge in genome assembly
next-generation sequencing (NGS) technologies facilitates de novo assemblies of plant genomes
unphased genomes can have differing random haplotypes represented in a single genome assembly sequence
A. truncatum genome was assembled into 13 pseudochromosomes Acer truncatum
Marchantia polymorpha genome consists of nearly 3000 scaffolds Marchantia polymorpha
estimated size of interscaffold gaps increased total length of map to 208 823 686 Mb Marchantia polymorpha
de novo draft genome is assembled for Petunia parodii Petunia parodii
elite cultivated variety Zhongsizhu 1 subjected to whole-genome sequencing Boehmeria nivea
chromosome-level pseudomolecule sequences were assembled for Ficus erecta Ficus erecta
Dottato assembly had higher percentage of annotated genes than previously produced fig genome assembly Ficus carica
genome-wide comparison between primary contigs and haplotigs identified 903,428 single nucleotide polymorphisms and indels Ficus carica
hybrid assembly integrated with high-density genetic map
sequences with greater than 4–5% variation are often retained as primary contigs Manihot esculenta
Hi-C technology employed for scaffolding of Gastrodia elata genome contigs Gastrodia elata
repeats in genome sequence causes difficulty in genome assembly
local similarity between homeologous chromosomes can introduce similar challenges as heterozygosity
Oxford Nanopore Technologies (ONT) nanopore sequencer enabled assembly of more contiguous and complete versions of Banana reference genome
genetic map anchoring arranged scaffolds into eight linkage groups corresponding to eight autosomes Marchantia polymorpha
assembly of complex plant genomes is challenging task
genome anchoring process produced set of 13 pseudomolecules Ficus carica
centromere regions are typically difficult to assemble due to highly repetitive composition
long-read sequencing allowed assembly of Ficus erecta genome sequence with high contiguity Ficus erecta
haplotigs overlapping flanking regions probably due to heterozygosity and structural variations Ficus carica
genome anchoring process associated 407 primary contigs to fig chromosomes Ficus carica
long-read sequencing and high depth coverage produced genome of high sequence contiguity and accuracy Ficus carica L.
Hi-C data were used to assemble chromosome-scale scaffolds Acer truncatum
primary contigs had minimum size of 20,012 bp Ficus carica
DBG2OLC and Canu assemblies produced lower N50 value compared with FALCON Ficus carica
diploid FALCON-Unzip assembler produced primary set of contigs Ficus carica
chromosome-scale scaffolds were assembled into 13 long super-scaffolds (pseudochromosomes) Acer truncatum
assembly process produced 333,400,567 bp of fig genome sequence Ficus carica
haplotigs used to estimate heterozygosity of Dottato cultivar Ficus carica
chloroplast genome represented by circularized sequence of 160,594 bp Ficus carica
bottle gourd reference genomes were assembled from Illumina short reads Lagenaria siceraria
k-mers that contained short reads pertaining to missing heterozygous sequence assembled using SPAdes Manihot esculenta
haplotig N50 size is consistent with relatively short dispersed regions of heterozygosity Manihot esculenta
'pseudohap' mode generates pseudo-haplotype contigs by collapsing alternate sequences from phased haplotigs with homozygous sequence from primary assembly Manihot esculenta
plant mitogenomes are often assembled as single circular chromosome termed 'master circle'
old and recent bursts of transposable elements render challenging plant genome assembly
primary contigs had N50 of 823,517 bp Ficus carica
total sequence assembled is approximately 1 Gb Manihot esculenta
wild emmer genome sequence (WEW_v1.0) was assembled from whole-genome shotgun (WGS) reads Triticum turgidum ssp. dicoccoides
de novo assemblies for each individual of a population using 10X Genomics have only minor increase in costs
assembly of allopolyploids works surprisingly well even without dedicated assembly methods
first plant genome based completely on Pacific Biosciences (PacBio) single molecule real time (SMRT) sequencing resulted in fourth most contiguous genome at the time Oropetium thomaeum
quantitative trait loci (QTL) repeat structure required reads greater than 20 kb Arabidopsis thaliana
genome assembly spanned 331.6 Mb with 538 contigs and N50 length of 1.9 Mb Ficus erecta
FALCON assembler was chosen as most appropriate assembler Ficus carica
primary assembly used in downstream annotation and DNA modification analysis Ficus carica
long-read assembled contigs integrated with 301 single-molecule genome maps from NanoChannel Arrays
collapsed haplotype representation of the genome results in artifacts in highly heterozygous plants Manihot esculenta
modified coords2hp.py script resulted in one complete set of 9925 contigs comprising approximately 720 Mb for each phase Manihot esculenta
Rubrum wintersweet genome V1.0 assembly has contig N50 of 8.13 Mb Chimonanthus praecox
Gastrodia elata genome contigs have total length of 1.043 Gb Gastrodia elata
Gastrodia elata genome contigs cover 95% of Gastrodia elata genome Gastrodia elata
Cucumis melo L. ssp. agrestis IVF77 was subjected to Hi-C assembly Cucumis melo
physical map linkages can generate chromosome-scale, fully phased diploid genome assemblies
Phase1 contigs aligned to scaffolded Phase0 assembly Manihot esculenta
1152 possible ways to assemble contigs 2–15 includes 288 assembly paths that could generate a closed circle Glycine max
scaffold 16 of DASZ anchored to chromosome 14 Camellia sinensis
IWGSC RefSeq v2.1 assembly is major advance for applied and basic applications
generation of SLRs for de novo assembly requires high amounts of short read coverage
optical map assemblies have a bias to break at regions with closely-spaced restriction sites
10X Genomics linked read approach used only short-read data of a single sequencing library at modest coverage Homo sapiens
Oxford Nanopore Technologies (ONT) nanopore sequencer enabled assembly of more contiguous and complete versions of Sorghum reference genome
Hi-C routinely enables resolution of genomes into chromosomes
linear contigs connected by a single edge is simplest instance of genome assembly graph
pseudomolecule sequence names and directions were assigned in accordance with previously described high-density genetic map for fig Ficus carica
final map had total length of 198 443 496 Mb Marchantia polymorpha
Potentilla micrantha has high-quality genome assembly Potentilla micrantha
centromere regions remain largely unknown in short-read assemblies
three pairs of SNPs were co-located on the same scaffolds
FALCON-Phase coords2hp.py script truncated contigs and haplotigs at ends of alignments Manihot esculenta
optical map-guided assembly guided revision of approximately 10% of genome sequence
successful assembly of centromeric and telomeric regions of walnut hybrid suggests that IWGSC sequence of CS genome can be further refined
linked short read data from 10X Genomics platform enable de novo assemblies for each individual of a population
single molecule sequencing enabled near complete chromosomes
single-molecule reads produce draft assemblies with high contiguity
long read assembly algorithms can disentangle divergent haplotypes
connections between contigs are represented by edges
advancements in assembly methods enable construction of high-quality reference genomes
long-read sequencing and advanced scaffolding methods enabled first round of highly contiguous, almost complete T2T genomes
resulting alternate assembly contained over 341 Mb assembled in haplotigs Manihot esculenta
complete mitogenome (∼403 kb) from landrace (LR; 'Aiganhuang') is reported in soybean Glycine max
remaining 12 CcCIPK genes were assigned to Unplaced Scaffold Cajanus cajan
individual research groups can sequence and assemble genomes they are interested in
high repetitiveness due to transposable elements displays unique challenges
de novo assembly itself requires high coverage
haplotigs from 10X Genomics assembly achieved N50 values of up to 9 Mb for assembly of individual sets of chromosomes Homo sapiens
concept of the reference genome is replaced by pan-genome
hairballs with thousands of nodes and edges are common and are likely driven by complex genome features and high copy number repeats
NanoChannel Array (BioNano) provides high-resolution genome maps
haplotig truncation lengths compared with contig vs. haplotig alignment lengths Manihot esculenta
importance of assembling both haplotypes suggests many large haplotypic SVs might be present with potential impact on gene expression or function Manihot esculenta
high read depth of contig 15 infers that sequence of contig 15 is duplicated in the Wm82 mitogenome Glycine max
Hi-C chromatin contact data used for assembly of 18 Gastrodia elata chromosomes Gastrodia elata
Cucumis metuliferus CM27 has 12 chromosomes Cucumis metuliferus
eggplant has CL assembly based on combination of Illumina, Nanopore, 10× Genomics, and Hi-C scaffolding Solanum melongena
durum wheat cv. Svevo reference sequence was published following bread wheat cv. Chinese Spring reference sequence Triticum turgidum ssp. durum
scf_v3 super-scaffolds were ordered and oriented through alignment on high-density genetic maps
contigs within scaffolds retained gaps filled with Ns
IWGSC RefSeq v2.1 pseudomolecules had effective length of 14 316 999 506 bp
latest assembly tools promise to simplify practical assembly procedure
read pairs generated from the two ends of Hi-C fragments comes from closely linked DNA
assembly of Brassica juncea featured scaffold N50 of 1.5 Mb Brassica juncea
assembly of Brassica juncea was generated with short and long-read sequencing data combined with optical maps Brassica juncea
physical mapping technologies enabled high-quality, chromosome scale assemblies
large SVs between haplotypes could affect how FALCON-Phase aligns and places haplotigs vs. primary contigs Manihot esculenta
gaps in IWGSC RefSeq v1.0 assembly were closed using contigs built from WGS PacBio SMRT CS long reads Triticum aestivum
CS optical maps relocated 233 scaffolds
optical mapping is third generation genomic technology
contig scaffolding typically starts by ordering contigs using alignments of paired reads
sequence contigs can be used to control for errors in optical maps
optical map information can be integrated during sequence assembly
chromosome conformation capture sequencing (Hi-C) is elegant solution to challenges of chromosome-scale assembly
bridging between neighbored polymorphisms could thereby distinguish the homeologous chromosomes
shotgun sequencing and OLC assemblers were adopted for papaya genome
Oxford Nanopore Technologies (ONT) nanopore sequencer enabled assembly of more contiguous and complete versions of Arabidopsis reference genome Arabidopsis thaliana
read mapping coverage and sequence homology identify potential haplotigs Manihot esculenta
combination of multiple independent sources of information highlights importance of produce consensus less likely to suffer from technology-specific issues
challenges of plant genome assembly are determined by genome size and increased levels of repetitiveness
long DNA fragments derived from dilution methods can be used for assembly of haplotypes
Oxford Nanopore Technologies (ONT) nanopore sequencer enabled assembly of more contiguous and complete versions of Tomato reference genome
complex sequences and high copy number repeats (LTRs, centromeres, etc.) can create indiscernible hairballs of thousands of interconnected nodes with no clear paths
Dottato assembly had higher percentage of annotated repeats than previously produced fig genome assembly Ficus carica
heterozygosity and structural variations led to complications during assembly process Ficus carica
one complete set of 9925 contigs for each phase included almost all original heterozygosity assembled Manihot esculenta
average read depth of contig 15 was about twice of average read depth of other contigs Glycine max
single-molecule real-time (SMRT) sequencing and Hi-C technology were combined to obtain high-quality genome assembly of Cucumis metuliferus Cucumis metuliferus
pseudomolecules_v2.0 along with ChrUn constituted intermediate IWGSC RefSeq v2.0 assembly
complex repeats such as ribosomal RNA (rRNA) or centromeric satellite DNA can create higher-order ambiguities in the graph structure
long-read assembly supplemented with short-read contigs (SRC) Manihot esculenta
contigs 2–15 with large repeats at termini can be assembled into 1152 possible ways to assemble into a minimal complete genome Glycine max
large, polyploid, repeat-rich genomes require innovative strategies for assembly
single-molecule sequencing technologies reinforced the need to choose sequencing and assembly strategies
WTDBG2 can leverage corrected reads
DNA sequencing advances have included improved scaffolding
extra SRC included missing heterozygous sequence Manihot esculenta
bioinformatics algorithms for de novo assembly are available
high proportion of repeated sequences complicates assembly of reference-quality genome sequence Triticum aestivum; Triticum turgidum ssp. durum
Hi-C heat map inspection and telomere signature sequence placement was used to adjust and confirm final chromosomal scaffolds Amaranthus cruentus
single reference assembly does not reflect gene diversity of a species
structural variant analysis is emerging frontier in plant genome assembly
high-density genetic maps were traditionally used to anchor contigs and scaffolds into chromosomes
long read assembly algorithms can accurately correct divergent haplotypes
long read assembly algorithms lead to assemblies that exceed monoploid genome size
coords2hp.py script in FALCON-Phase modified to force inclusion of entire length of each haplotig Manihot esculenta
TME7 genome assembly successfully assembled almost entirety of TME7 genome Manihot esculenta
Chr-m2 has length of 63 kb Glycine max
SMRT sequencing and high-throughput chromosome interaction mapping produced high-quality chromosome-level C. metuliferus genome assembly Cucumis metuliferus
wild emmer wheat genome sequence was first reported for wild emmer wheat Triticum turgidum ssp. dicoccoides
resulting super-scaffolds from scf_v2 alignment to NLRS had largest recording length of 364.575 Mb
De Bruijn graph (DBG) assembly methods handle shorter reads sequenced at greater depth
polyploid genomes present challenges for genome sequencing
CS optical maps reoriented 354 scaffolds
genome-wide (consensus) maps can be used to scaffold the contigs of a corresponding sequence assembly
early genome assemblies included millions of contigs Triticum durum; Triticum aestivum
single molecule reads provide opportunities to untangle genomic regions missed by short read technologies
ONT sequencing reduced assembly to 40 contigs that spanned chromosome arms (telomere to centromere) Arabidopsis thaliana
Hi-C assembly mapped 97% of wild ramie sequences into 14 pseudomolecules Boehmeria nivea
wild ramie assembly has contig N50 length of 10.51 Mb Boehmeria nivea
benefits of accurate phasing outweigh additional minor duplication Manihot esculenta
complementation between phases preserves existence of potentially crucial single copy genes Manihot esculenta
Cucumis metuliferus CM27 was subjected to Hi-C assembly Cucumis metuliferus
PacBio-based assembly of Arabis alpina was 337 Mb long Arabis alpina
heterozygosity is ongoing challenge in genome assembly
903,428 single nucleotide polymorphisms and indels accounted for overall heterozygosity of approximately 2.7 polymorphisms per kilobase Ficus carica
potential of MGEs to be shared between bacteria makes difficult generating high-quality metagenome assemblies
long-read and short-read sequencing platforms generate plasmids
FALCON-Phase discarding over 40 Mb of sequence from both primary and haplotig assemblies Manihot esculenta
polyploidy complicates assembly of reference-quality genome sequence Triticum aestivum; Triticum turgidum ssp. durum
long-read sequencing and long-range scaffolding methods enable assembly of entire chromosomes
optical mapping and Hi-C data are complementary and can help bridging different regions in the genome
scaffold N50 values of close to 20 Mb outperformed contiguity of long-read assembly contigs Homo sapiens
de novo assembly of an autoploid plant does not exist that reconstructs each of the homeologous chromosomes separately
assembly of autoploid species has used diploid or even haploid individuals
high-quality plant reference genomes were assembled using minimum tiling path of BACs sequenced with Sanger technology
second generation sequencing technologies such as 454 and Illumina spurred the development of De Bruijn graph (DBG) assembly methods
complex plant genomes have repeat structures greater than 20 kb
diploid assembly maximized to include as much haplotypic sequence as possible Manihot esculenta
673 super-scaffolds anchored on 21 wheat chromosomes produced pseudomolecules_v1.1
polyploid genomes have increased repetitiveness
assembly of the gene space of the hexaploid genome of wheat was first attempted with simple whole-genome shot-gun approaches Triticum aestivum
longer reads enables assembly of longer contigs
genome size, heterozygosity, and repeat content estimates inform sequencing strategy and depth
next-generation frameworks integrate long-read and short-read data, optical maps, and Hi-C signals
Bill and Melinda Gates Foundation funded plant genome assembly and annotation work
long-read and short-read sequencing platforms generate circularized genomes
'unzip'-emit-style haplotigs used for comparison with primary assembly Manihot esculenta
Rubrum wintersweet genome V1.0 assembly is approximately 96.17% of predicted genome size Chimonanthus praecox
computational strategies for data integration are less straightforward as compared to de novo assemblies
Peregrine uses sparse hierarchical minimizers (SHIMMER)
raw reads can be mapped to resolve missing haplotype regions
extraction stage involves DNA extraction using optimized protocols for LMW or HMW DNA
long-read sequencing overcomes problems with short-read sequencing assembly
SNP and indel calling from aligned Illumina reads showed similar heterozygosity results Ficus carica
recent genomic approaches have been applied to recover complete genomes from herbarium material
powerful scaffolding strategies enables fully phased, chromosome-scale, telomere-to-telomere assemblies
de novo assembly of long reads from Nanopore and PacBio libraries yielded 270.2-Mb genome sequence for wild ramie Boehmeria nivea
goal to purge primary assembly used Purge_dups Manihot esculenta
primary and haplotig assemblies phased using Hi-C data and FALCON-Phase Manihot esculenta
results show accurately produced one full haplotype assembly of TME7 and second alternate assembly containing most haplotypic variation Manihot esculenta
Rubrum wintersweet genome V1.0 assembly comprises 661 contigs Chimonanthus praecox
(IAA14, SLR, AT4G14550) assembly is not clear how well it works if fragments contain repeats
individual fingerprints (or maps) can be assembled into genome-wide (consensus) maps
hexaploid genome of wheat has size of 17 Gb Triticum aestivum
partially phased genomes can be collapsed into chimeric monoploid
contig N50 is quality metric for draft genome assembly
DNA sequencing advances have included optical mapping
Two chromosome-level genomes represent wild and cultivated ramie Boehmeria nivea
TME7 assembly represents significant improvement over previous attempts at assembly of heterozygous African cassava lines Manihot esculenta
11 super-scaffolds have total length of 737.03 Mb Chimonanthus praecox
plant-specific genome assemblers consider complexity of plant genome
Oxford Nanopore Technology sequencing data were assembled using Wtdbg2 Amaranthus cruentus
first plant genomes assembled from PacBio data alone were from Arabidopsis thaliana Arabidopsis thaliana
first whole-genome assemblies using Oxford Nanopore data have reached N50 values of multiple hundred kb for fungal genomes
genome assembly algorithms are designed to correct, overlap, and polish long reads with high error-rates
Planning stage involves short-read sequencing and quality control to estimate genome properties
short-read sequencing produced hundreds of assemblies with expanded gene-space coverage
short-read contigs (SRC) contained additional heterozygous sequence Manihot esculenta
extra SRC contained duplicates of already assembled sequences Manihot esculenta
'Williams 82' nuclear genome is first assembled and widely used reference genome in soybean Glycine max
pseudomolecules_v1.1 acquired most of intra-scaffold gaps
sizing gaps of previously unknown sizes allows local scale to be assessed
Illumina and 10× Illumina reads were used with pilon for polishing to produce final assembly Amaranthus cruentus
chromosome conformation capture was introduced to overcome weaknesses of short-read assemblies
contiguity improvement gained by integration of optical maps depends on different factors including contiguity of the assembly before integration and contiguity of optical consensus maps
integration of optical maps in the best cases led to reconstruction of entire chromosomes
integration of Dovetail Genomics read pairs into the PacBio assembly of A. alpina improved assembly contiguity similar well as compared to optical mapping data Arabidopsis alpina
haplotypes can also be assembled from long-range scaffolding methods
assembly stage involves long-read sequencing producing progressive genome assemblies
Large Language Models (LLMs) adapted for biological sequences and genome graphs may help transform phasing and haplotyping of complex plant genomes
low sequencing depth for plastid genome resulted in generation of only selected plastid markers Sartidia isaloensis
this article highlights both current best practices and the opportunities on the horizon
large language models (LLMs) may help disentangle haplotypes across highly repetitive and centromeric regions
goal to assemble full heterozygous diploid phased assembly sought to maximize amount of haplotypic sequence assembled Manihot esculenta
v3.0/v4.0 assembly has anchored to 12 chromosomes 1.11 Gb of sequences containing 96.4% of predicted genes Solanum melongena
673 super-scaffolds sharing two or more markers with genetic maps were anchored on 21 wheat chromosomes
Hi-C enables resolution of genomes into chromosomes even for allotetraploids like Eragrostis tef Eragrostis tef
BAC-by-BAC genome assembly approach limited high-quality genome assemblies to a few model organisms
sample selection, preservation, and extraction of high-molecular-weight (HMW) DNA continue to represent major bottlenecks
continued progress in plant genome assembly promises deeper resolution of polyploid complexity and more consistent gene predictions
high-quality reference genome for Hangzhou Gourd was assembled using aforementioned approaches Lagenaria siceraria
assembled genome has size of 1.20 Gb Euscaphis japonica
SLRs have been used to assemble a few eukaryotic genomes with genome sizes of some hundred Mb
very large genomes present challenges for genome sequencing
repetitive features such as rDNA (45 S, 5 S), centromeres, and telomeres are difficult features to assemble in telomere to telomere (T2T) efforts
polished hybrid assembly is assembly ZAAS_Lsic_2.0
TME7 genome assembly used target haploid size of approximately 700 Mb Manihot esculenta
graph-based assembly approach used in FALCON-Unzip complemented by other orthogonal tools developed to extract haplotypic sequences Manihot esculenta
genome for each accession was de novo assembled using Megahit into contigs larger than 500 bp Solanum melongena
optical mapping is one of novel technologies emerged to improve scaffolding
revolution in plant genomics resulted in many lower quality assemblies
primary and alternative haplotypes can be collapsed into single, non-redundant but chimeric pseudomolecule
Verkko is assembly algorithm
recent T2T assemblies of Arabidopsis provide first complete views of entire centromeres Arabidopsis thaliana
assembly algorithms can accept HiFi and ONT data
genome size and heterozygosity ratio were estimated using tools in GenomeScope v2.0 Selaginella kraussiana
diploid assembly is approximately double the size of haploid consensus genome Prunus persica
draft genome sequences have been generated in a number of species
improved assembly algorithms enables fully phased, chromosome-scale, telomere-to-telomere assemblies
de novo assembly of the P. tomentosa genome might help to improve results in the future Populus tomentosa
Assembly, scaffolding, and gene prediction are inherently iterative processes iterative processes with double-headed arrows indicating feedback
Large Language Models (LLMs) adapted for biological sequences and genome graphs may help address challenges in polyploid and aneuploid plant genomes
Picea abies genome is available at ConGenIE.org database Picea abies
Quercus dentata assembled genome has total size of 893.54 Mb Quercus dentata
12 chromosome-level super scaffolds have scaffold N50 of 75 Mb Quercus dentata
49,719 reads were mapped on S. perrieri nuclear ribosomal DNA Sartidia perrieri
Illumina library construction and short-read sequencing were performed using Illumina HiSeq 2500 platform Selaginella kraussiana
CaTrailin4 and Ca5609 genes were truncated at 5' ends in short-read genome assembly Craspedostauros australis
diploid-aware procedure segregates highly divergent alleles into haplotigs Plasmopara viticola
pseudochromosome sequences are inferred from scaffold sequences
contigs and quality-filtered reads from GS (FLX, AT2G30120) platform assembled together using newbler v2.5.3 Cicer arietinum
assembled sequence has N50 index of 931 Cicer arietinum
sacred lotus genome sequenced using high-quality sequencing data sacred lotus
short RenSeq Illumina reads can be assembled de novo with high accuracy to generate contigs representing the majority of the NB-LRR loci
clone-by-clone strategy should not be abandoned in favor of POPSEQ Hordeum vulgare
'Kasalath' physical map contained BAC clones covering 17 physical gaps in latest genomic sequence of 'Nipponbare' Oryza sativa
Gossypium physical maps serve as frameworks for anchoring and ordering assembled sequences into reference allotetraploid cotton genome Gossypium hirsutum
three loops comprise overlapping or similar sequence reads Fritillaria imperialis
ABySS assembly produced 304 948 126 bases of assembled sequences Cicer arietinum
low-copy-number genome assembly (LCG) obtained from non-repetitive reads Triticum aestivum
issues of assembly quality most likely because of low read depth and presence of two homoeologous copies for each genomic region Hordeum bulbosum
next-generation sequencing (NGS) technology produces hundreds of thousands of sequence contigs
this study identified contigs not presented in haploid genome Citrus sinensis
enriched data set is mapped onto pseudo-chromosomes Triticum monococcum
reference genome was derived from doubled monoploid of an adapted diploid S. tuberosum Group Phureja clone Solanum tuberosum
Pacific Biosciences long-read sequencing combined with Illumina reads created new high-quality genome assembly Craspedostauros australis
high-density genetic maps generated by genotype sequencing are essential resources for sequencing and assembling the pineapple genome Ananas comosus
DNA regions encoding four of eight C4 amino acid transitions were available for at least one Sartidia accession Sartidia perrieri; Sartidia dewinteri; Sartidia isaloensis
insufficient sequencing coverage caused difficulties in assembly of complete cpDNA Sartidia isaloensis
313,839 reads were assigned to S. dewinteri chloroplast genome Sartidia dewinteri
S. isaloensis has mean sequencing depth on low-copy genes of 0.4× Sartidia isaloensis
Selaginella moellendorffii genome v.1.0 is available at Phytozome database Selaginella moellendorffii
(HAP1, MAGO, MEE63, AT1G02140) genome assembly has contig N50 of 23.3 Mb Prunus persica
quality of the assembly after each step was assessed using BUSCO v5.2.2 Selaginella kraussiana
WGS assembly contains 81% genome coverage in contigs and 85% genome coverage in scaffolds Linum usitatissimum
synteny between Arabidopsis and Brassica rapa was used to assemble Arabidopsis cDNA sequencing dataset into Brassica rapa based pseudo-chromosomes Arabidopsis thaliana; Brassica rapa
three released NGS cucumber genome assemblies comprises approximately 4000 de novo assembled scaffolds accounting for approximately 55–65% of the 367 Mbp cucumber genome Cucumis sativus
primary assembly scaffolded using bacterial artificial chromosome (BAC)-end sequences Cicer arietinum
de novo sequence assembly for read mapping requires two to eight lanes HiSeq sequencing
BAC contigs varied among chromosomes Oryza sativa
146 transcripts located within genomic regions corresponding to physical gaps on 'Kasalath' chromosome 1 Oryza sativa
'Kasalath' physical map contains BAC clones covering genomic regions corresponding to 17 of the 'Nipponbare' gaps Oryza sativa
six ppc genes (ppc-aL1a, ppc-aL1b, ppc-aL2, ppc-aR, ppc-B1, ppc-B2) had few reads recovered Sartidia species Sartidia perrieri; Sartidia dewinteri; Sartidia isaloensis
972 bp (6.3%) were available for all three Sartidia species Sartidia perrieri; Sartidia dewinteri; Sartidia isaloensis
barley physical map contains contigs larger than 904 kilobases Hordeum vulgare
two assemblies available for wheat based on same set of 454 reads Triticum aestivum
assemblies from short RenSeq reads do not span whole gene models
Illumina 76-bp sequences is mainly due to the number of large gene sub-families and the high sequence similarity between paralogs and alleles
novel assembly algorithms may speed up process of data collection and analysis
construction of large-insert mate-pair libraries is not straightforward and often yields sub-optimal results Hordeum vulgare
reads assembled using same program as previously resulted in assembly with shorter cumulative length (1.6 Gb versus 1.9 Gb) Hordeum vulgare
genome de novo assembly of the mutant pool yielded 15,364 scaffolds Chlamydomonas reinhardtii
anchoring of draft genome scaffolds to high-density linkage map developed chromosome-level cucumber draft genome assembly Cucumis sativus
large amount of repetitive DNA sequences in most plant genomes pose significant technical challenges in sequencing quality and assembly accuracy
protocol for single-chromosome isolation and sequencing will probably require further optimization to obtain good results in species with larger genomes
Syntrichia ruralis draft genome consists of 3211 scaffolds Syntrichia ruralis
new reference genome for Amaranthus hypochondriacus L. removes sequencing errors from previous assembly Amaranthus hypochondriacus
genome polishing demonstrates successful removal of small errors in the assembly Amaranthus hypochondriacus
genome was scaffolded with Hi-C proximity-guided assembly Beta vulgaris
flax BNG map could be used as backbone to refine flax WGS reference sequence Linum usitatissimum
35 candidate false joins represented 7.97% of the improved assembly Nelumbo nucifera
maize genotype W22 has whole-genome assembly Zea mays
large size of C. vulgaris 211/11P mitochondrial DNA is consistent with mitochondrial genomes of other green algae Chlorella vulgaris
final merged genome version of Casuarina equisetifolia has N50 for scaffold of 1.06 Mb Casuarina equisetifolia
centromere-specific tandem repeats are notoriously challenging to assemble de novo from short-read sequence data
optical mapping data (~1400×) obtained using BioNano Genomics technology Chlorella vulgaris
Populus alba assembly consisted of 464M of genome including 37,901 protein-coding genes Populus alba
chromosome conformation capture sequencing (Hi-C) will be valuable method to further improve assembly contiguity
C. rubella genome assembly was assembled using BAC and fosmid paired-end link support Capsella rubella
ONT read-based assemblies not fully assembled rDNA arrays and centromeres Arabidopsis thaliana
One library was sequenced in 100-bp paired-end mode Selaginella kraussiana
low-quality sequences filtered from 25.88 Gb of reads Linum usitatissimum
Hi-C raw data (∼213×) were used to further assemble genome using ALLHiC v0.9.13 software with the parameter "-k 10 -e GATC" Selaginella kraussiana
S. dewinteri lacks ppc-aL1b gene Sartidia dewinteri
Hi-C library was sequenced in 150-bp paired-end mode on the Illumina Novaseq 6000 platform Selaginella kraussiana
NextDenovo v2.5.0 with the parameter "genome_size = 129M" was used to generate draft assembly from the CLR reads (∼128×) Selaginella kraussiana
chromosomes were gapless Prunus persica
limited capability of next-generation sequencing assembly software reflects absence of complete nested Elbe retrotransposons in draft genome sequence
diploid assembly has comparable repeat sequences Prunus persica
next-generation sequencing assembly software has limited capability to resolve segmental duplications and homologous repeats with identities exceeding 85%
Chromosome-level assemblies are either not available or the quality has not been validated Cucumis sativus
Sequence information spanning entire gap regions allowed revision of length of ambiguous fragments Solanum tuberosum
BAC-based physical map of 'Kasalath' chromosome 1 covers approximately 93.9% of 'Nipponbare' chromosome 1 sequence Oryza sativa
39.4 million unmapped genomic reads were subjected to de novo assembly Citrus sinensis
'ZS11' genome assembly had ~95% (924 Mb) within 2003 longest scaffolds >1000 bp Brassica napus
relocation of 38 of 64 genes was confirmed by new version of melon genome (version 3.5.1) Cucumis melo
scaffold assembly organized using ALLMAPS program organized into 20 chromosome-scale pseudomolecules (719.5 Mb) and 41 231 unplaced scaffolds (≥1 kb; 220.3 Mb) Glycine latifolia
short-read assemblies likely underestimate repeat prevalence
flax BNG optical map had N50 of 2.15 Mb Linum usitatissimum
final pseudo-chromosomes generated based on scaffold order using consensus map Nelumbo nucifera
whole-genome sequencing using different approaches and clones yielded two reference maps Spirodela polyrhiza
long-read ON sequence-based assemblies able to resolve tandem repeat sequences Spirodela polyrhiza
different Wm82 sources explain differences between Wm82v2 and Wm82v4 assemblies Glycine max
curated and assembled Thellungiella parvula genome sequence has total size of 137.09 Mb Thellungiella parvula
in silico mapping performed against completed genomic sequence of 'Nipponbare' rice Oryza sativa
PCR-based BAC screening extended physical size of BAC contigs Oryza sativa
Hifiasm software achieved haplotype-resolved assembly Prunus persica
(HAP1, MAGO, MEE63, AT1G02140) and (GCS1, HAP2, AT4G11720) genomes are comparable to previously published peach genomes Prunus persica
flax BNG optical map showed improvement over flax WGS assembly Linum usitatissimum
BNG contig BNG28 was 2.95 Mb Linum usitatissimum
anchoring rate and orienting rate using each single map lower than anchoring and orienting rates using consensus map Nelumbo nucifera
integration of Illumina sequencing, long-reads (PacBio), and optical mapping (Bionano Genomics) allowed assembly of Chlorella vulgaris 211/11P genome Chlorella vulgaris
draft unordered scaffolds contain chimeras caused by mis-scaffolding
long scaffolds of 'China Antique' draft assembly generated by sequencing of paired-end 20 kb insert library Nelumbo nucifera
new Spirodela polyrhiza genome map is at chromosome scale Spirodela polyrhiza
ON platform is promising technology for polishing genome assemblies
three reference-quality de novo assemblies includes improved assembly of northern US accession Wm82 Glycine max
long reads sequencing technologies coupled with highly accurate short sequencing Casuarina equisetifolia ssp. incana
~×14 coverage using PacBio-based long reads has potential for improvement in contig N50 Casuarina equisetifolia ssp. incana
PacBio sequencing was conducted on PacBio sequel platform including two SMRT cells Selaginella kraussiana
Hi-C library was prepared as described previously previous literature Selaginella kraussiana
PacBio long-reads used to construct high-quality genome assembly Ficus carica
haplotype phased assembly for fig in combination with physical information from previous fig genome assembly Ficus carica L.
significant SNPs were found on 22 different scaffolds
sequences were assigned to the chromosomes Ficus erecta
haplotigs had maximum size of 1,220,129 bp Ficus carica
Prunus × yedoensis has diploid reference genome Prunus × yedoensis
short-read platforms cause limitations in de novo assembly
blackgrass genome assembly has total primary contig length of 3475 Mb Alopecurus myosuroides
DNA sequences from 144 samples were best matched to reference genomes of octoploid strawberry Fragaria × ananassa and diploid strawberry F. vesca Fragaria × ananassa; Fragaria vesca
Primula oreodoxa complex comprises 35 complete plastid genomes Primula oreodoxa
genetic map was anchored to v1.3 assembly Marchantia polymorpha
Syntrichia ruralis has draft genome of 381.24 Mb Syntrichia ruralis
de novo assembler tools can generate contig and/or scaffold sequences
genome was assembled from PacBio long reads Beta vulgaris
genetic linkage maps utilized to guide pseudomolecule construction Glycine latifolia
94 BNG supercontigs were obtained from 148 BNG contigs Linum usitatissimum
none of these approaches alone cannot lead to complete genome sequence assembly with high accuracy
integration of Bionano data resulted into genome assembly where 26 contigs anchored into 14 scaffolds Chlorella vulgaris
14 scaffolds of nuclear genome contained 98.9% of assembled C. vulgaris 211/11P genome Chlorella vulgaris
PacBio and second-generation sequencing was used to construct high-quality genome assembly Casuarina equisetifolia
Illumina HiSeq 2000 system sequencing to 10× coverage entirely covered 30% of genes on chromosome 3B Triticum aestivum
Hi-C links anchored and oriented 639.1 Mb of contigs onto 18 pseudochromosomes Manihot esculenta
high-coverage and accurate long-read sequence data and multiple assembly strategies resulted in two gap-free genome assemblies of xian rice varieties ZS97 and MH63 Oryza sativa
reference sequence of a single 1 Gb chromosome of hexaploid wheat has not been completed 5 years after publication of a physical map Triticum aestivum
longer sequence reads may speed up process of data collection and analysis
several millions of markers provided by NGS technology may be used to bring contigs into linear order
large number of detected SNPs is used to integrate sequence assembly with two established framework maps Hordeum vulgare
BAC clone sequencing resulted in better assembly of 'ZS11' genome Brassica napus
genome assembly resulted in telomere-to-telomere mini-chromosome
CrusView may be used to perform karyotype-based genome assembly
long-read sequencing technologies aid production of highly contiguous reference genomes Nicotiana benthamiana
CrusView performance in pseudochromosome assembly depends largely on quality of scaffolds and contigs
312 contigs anchored into 12 chromosome-level super scaffolds Quercus dentata
assembly results of X/Y chromosomes showed good collinearity and high coverage with reference genome Spinacia oleracea
CaTrailin4 and Ca5609 genes contain long repetitive stretches Craspedostauros australis
genome assembly resulted in circularized mitochondrial genome
single-chromosome sequencing is suitable to verify assignment of DNA sequence contigs to individual pseudomolecules
SPAdes was used for the assembly of X/Y chromosome using Illumina and Nanopore sequencing data Spinacia oleracea
SOAPdenovo is de Bruijn graph-based assembly program
whole-genome shotgun (WGS) de novo assemblies have typically relied on long sequencing reads
chromosomal-level genome assembly for Syntrichia ruralis is derived from clonally propagated male gametophyte Syntrichia ruralis
initial genome assembly was based on 21.3 Gb long reads with 62.5× coverage Ficus erecta
primary contigs (FER_r1.0pctg) has total length of 331.6 Mb Ficus erecta
gene models of M. polymorpha draft genome were lifted over to assembly Marchantia polymorpha
FragScaff program scaffolded genome Acer truncatum
whole genome sequences are available for Petunia axillaris and Petunia exserta Petunia axillaris; Petunia exserta
scaffolds Plvit038 and Plvit053 were confirmed to be contiguous within same genomic region Plasmopara viticola
CrusView provides function to infer pseudochromosome sequences
Y-chromosome has length of 195 246 603 bp Spinacia oleracea
newly assembled contigs were long enough to span previously ambiguous regions Oryza sativa
genome assemblers such as CANU and FALCON enable building near-complete or gap-free rice genomes Oryza sativa
HiFi and ONT sequencing greatly improved assembly continuity Citrullus lanatus
Hi-C data permit assembly of large-scale scaffolds into pseudo-chromosomes Artemisia annua
Tang et al. (2022) released 44 wild and cultivated diploid genome assemblies
assembly can be considered truly complete if a full genome sequence must be derived for each individual copy
(AT3G41762) was found in 26-kb misassembled region in TAIR10 Arabidopsis thaliana
Col-PEK assembly provides long-awaited key resource for the plant community Arabidopsis thaliana
single-chromosome sequencing is better for accurate assembly of chromosome genome in genomes with high content of repetitive sequence
final polished blackgrass genome assembly contains seven pseudo-chromosomes Alopecurus myosuroides
HiFi+HiC assembly has higher contiguity than new N. benthamiana genome assembly using Chromium™ linked-read sequencing (10× Genomics) and Nanopore sequencing (Oxford Nanopore) Nicotiana benthamiana