| polished genome |
was used for |
genome annotation |
Amaranthus hypochondriacus |
| BRAKER2 genes |
were combined with |
predicted coding sequence of full-length transcripts |
Amaranthus hypochondriacus |
| unannotated locus with high similarity to ANR genes |
had no annotation support from |
computational and long-read annotation |
Amaranthus hypochondriacus |
| gene PVIT_0015215.T1 |
corresponds to |
gene g166 |
Plasmopara viticola |
| single-star and no-star genes |
have little |
supporting evidence |
Arabidopsis thaliana |
| pseudogenes without disabling substitutions |
have |
median length of 175 bp |
Arabidopsis thaliana |
| MAKER-P |
updated and revised |
maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build (5b+) |
Zea mays |
| MAKER-P update mode |
provides means to refresh |
annotations of established plant genomes |
|
| MAKER-P revision process for 5b+ |
decreased |
5b+ gene set from 39,155 to 38,783 annotations |
Zea mays |
| transporter genes |
have been annotated in |
model and non-model plant genomes |
|
| full-length transcript sequencing data |
was added to |
genome annotation |
Amaranthus hypochondriacus |
| MAKER-P annotations |
are comparable in quality to |
The Arabidopsis Information Resource 10 annotations |
Arabidopsis thaliana |
| previously annotated ncRNAs |
are not transcribed or have |
extremely low transcription levels in RNA-Seq data |
Arabidopsis thaliana |
| repeated genes and other sequences |
result in |
more sequence alignments and gene predictions |
|
| RNA sequencing (RNA-seq) data |
hold great potential for |
annotation of newly sequenced plant genomes |
|
| MAKER-P |
allows it to scale to |
even the largest plant genomes |
|
| 5b+ annotation build |
contains |
213 improved genes |
|
| MAKER-P |
uses RNA-seq data to add |
untranslated region (UTR) and exon sequences |
|
| genome annotation v2.2 |
contains |
23 817 annotated genes |
Amaranthus hypochondriacus |
| Rice Annotation Project Database |
contains |
revised annotations for many rice genes |
Oryza sativa |
| SDR of Y chromosome |
had |
219 genes |
Spinacia oleracea L. subsp. turkestanica |
| haplotigs |
contained |
15,642 protein coding gene loci |
Plasmopara viticola |
| plant genomes |
can be difficult targets for |
annotation |
|
| domain families |
differ significantly in |
pseudogene:gene ratio |
Arabidopsis thaliana |
| gene finders trained with unmatched species parameters |
causes suffering in |
gene model accuracy |
|
| miRNA prediction pipeline miR-PREFeR of MAKER-P |
follows |
criteria for plant miRNA annotation |
|
| Arabidopsis genome |
contains |
fewest repeats among sequenced plant genomes |
Arabidopsis thaliana |
| relative complexity of many plant genomes |
makes challenging |
creation, quality control, and dissemination of high-quality gene structure annotations |
|
| remaining MAKER-P unique protein-coding gene models |
were broken into |
two classes: multiexon models with confirmed splice sites and single-exon models with domains |
Zea mays |
| amaranth genome |
contains |
three 3R MYB candidate genes |
Amaranthus hypochondriacus |
| polished genome and full-length transcript sequencing data |
enabled production of |
most complete genome annotation of amaranth to date |
Amaranthus hypochondriacus |
| MAKER-P |
can annotate in only a few hours |
large, complex plant genomes |
Zea mays |
| WebApollo |
can be easily deployed in |
classroom for hands-on instruction |
|
| MAKER-P in update mode |
revises |
intron-exon structures of reference annotation data set |
|
| MAKER-P |
defaults to |
original reference annotation |
|
| MAKER-P revised models |
contain |
additional UTR sequence |
Zea mays |
| evidence set |
provides support for |
90% of annotated genes in TAIR10 |
Arabidopsis thaliana |
| MAKER-P improvements |
are made across |
entire TAIR10 data set |
Arabidopsis thaliana |
| single investigator using MAKER-P |
can carry out |
update of existing genome annotations with new RNA-Seq data |
|
| MAKER-P |
will fulfill |
need for automated high-quality genome annotation system |
Zea mays |
| 4,049 multiexon MAKER-P de novo models |
encode |
multiexon transcripts with at least one confirmed splice site |
Zea mays |
| MAKER-P |
guarantees constant, complete analysis of |
RNA-seq data |
|
| MAKER-P throughput |
demonstrates that |
even largest plant genomes could be annotated in reasonable time frame |
|
| pseudogenes |
are an issue, especially for |
plant genomes |
|
| MAKER-P |
uses |
RNA-seq data |
|
| Arabidopsis genome |
encodes |
10 MAP2Ks |
Arabidopsis thaliana |
| 6a build |
was created to provide |
maize community with single annotation build comprising best-possible annotated gene models |
Zea mays |
| MAKER-P on Texas Advanced Computing Center |
can de novo annotate |
Arabidopsis thaliana genome |
Arabidopsis thaliana |
| MAKER-P |
provides solution to |
genome annotation synchronization problem |
|
| MAKER-P update |
extends and modifies |
exon coordinates of TAIR10 gene annotations |
Arabidopsis thaliana |
| MAKER-P |
is based upon |
widely used MAKER genome annotation pipeline |
|
| MAKER-P training process |
uses |
splice-aware aligner Exonerate |
|
| 5b+ miRNA annotations |
were created by |
aligning genomic sequences against miRBase using BLASTN |
Zea mays |
| Copia superfamily |
represented |
8.74% of genome assembly |
Ficus carica |
| MAKER-P tool kit |
is freely available for |
academic use |
|
| maize assembly annotation using 2,172 CPUs |
finished in |
2 h and 53 min |
Zea mays |
| MAKER-P |
can systematically improve upon |
quality of existing V2 annotation build |
Zea mays |
| 5b+ annotation build |
contains |
251 new genes |
|
| 5b+ annotation build |
has higher percentage of models with |
annotated start and stop codons |
|
| AUGUSTUS |
comes pretrained for |
maize |
Zea mays |
| wheat ESTs and homology to rice loci |
used for |
improved and up-to-date annotation of wheat array probesets |
|
| MAKER-P |
includes capability for |
noncoding RNA annotation |
|
| MAKER-P on Texas Advanced Computing Center |
completes annotation in |
less than 3 hours |
|
| MAKER-P annotation of alternatively spliced transcripts |
mirrors |
performance on Arabidopsis genome |
Zea mays |
| MAKER-P default model |
excludes |
fourth exon of (ATEXO70A1, EXO70A1, AT5G03540) .3 |
Arabidopsis thaliana |
| RNA-seq data |
provides means for |
improvement of genome annotations |
|
| better supported genes |
have |
correspondingly more evidence |
Arabidopsis thaliana |
| novel genomes |
often contain |
new classes of repeats absent from RepBase and MAKER's internal repeat library |
|
| MAKER-P tool kit |
contains |
two guided tutorials for repeat library construction |
|
| MAKER-P |
is available to iPlant users as |
supported module on TACC Lonestar cluster |
|
| MAKER-P |
creates |
new gene models |
|
| MAKER-P de novo build |
contains |
5,045 additional annotations not overlapping 5b+ gene models |
Zea mays |
| genome annotations |
fall out of synchronization with |
available evidence |
|
| MAKER-P installed on iPlant resources at Texas Advanced Computing Center (TACC) |
grants ability to rapidly annotate |
new plant genomes |
|
| Arabidopsis thaliana genome |
contains |
expressed sequence tags (ESTs) |
Arabidopsis thaliana |
| MAKER-P |
uses update functionality to |
automatically update TAIR10 annotations |
Arabidopsis thaliana |
| MAKER-P |
includes integrated means for |
tRNA and snoRNAs |
|
| gene finders trained for other genomes |
are challenging and fraught with difficulties |
gene model accuracy |
|
| MAKER-P |
can carry out complete de novo annotation of 17.83-Gb draft loblolly pine genome in less than |
24 h |
Pinus taeda |
| 6a build |
is composed of |
additional new, well-supported genes from MAKER-P de novo build |
Zea mays |
| MAKER-P |
provides means for |
management of existing plant genome annotations |
|
| plant genomes |
can be unusually rich in |
transposable elements |
|
| F-box family |
has |
pseudogene:gene ratio of 152:567 |
Arabidopsis thaliana |
| MAKER-P |
is useful for |
rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes |
|
| MAKER-P performance with custom repeat library |
shows little difference in |
de novo annotation of Arabidopsis |
Arabidopsis thaliana |
| WebApollo database |
can be constructed and placed online within |
hours of finishing annotation run |
|
| regional and whole-genome duplication events |
impact |
gene structure annotation |
|
| maize genome annotation build 6a |
is demonstrably superior to |
existing 5b+ build |
Zea mays |
| 6a annotation build |
includes |
4,466 additional new gene annotations |
|
| MAKER-P |
can be used to train |
Augustus and SNAP |
|
| Arabidopsis thaliana genome |
is |
well annotated |
Arabidopsis thaliana |
| Arabidopsis thaliana |
genome sequencing discovered |
conserved plant (AT.EIF4E1, CUM1, EIF4E, eIF4E1, AT4G18040) and (EIF(ISO)4E, EIF4E2, eIFiso4E, LSP, LSP1, AT5G35620) |
Arabidopsis thaliana |
| MAKER-P |
is benchmarked using |
Arabidopsis thaliana genome |
Arabidopsis thaliana |
| MAKER-P |
demonstrates utility for |
annotation of novel plant genomes |
|
| Arabidopsis assembly |
is approximately |
120 Mb |
Arabidopsis thaliana |
| MAKER-P default model |
summarizes |
three transcripts with single consensus transcript |
Arabidopsis thaliana |
| 6a annotation build |
contains |
4,466 new genes |
|
| MAKER-P |
can improve |
existing genome annotation build |
|
| MAKER-P updates |
can be carried out much more rapidly and frequently than |
heretofore possible |
|
| MAKER-P training process |
uses |
RNA-seq data and ESTs |
|
| Rice BAC clones annotated data |
obtained from |
RAP-DB |
Oryza sativa |
| MAKER-P speed and flexibility |
will enable individual iPlant users to generate |
custom genome annotation data sets |
|
| MAKER-P |
is able to |
effectively revise gene models regardless of complexity or quantity of evidence |
|
| repetitive DNA |
masked with |
RepeatMasker |
Oryza sativa |
| MAKER-P annotations |
are comparable in quality to |
maize V2 annotation build |
Zea mays |
| carnivorous bladderwort plant Utricularia gibba |
has |
fewest repeats among sequenced plant genomes |
Utricularia gibba |
| MAKER-P de novo annotation build |
uses |
same evidence data sets as Table I and Figures 1 to 4 |
Zea mays |
| MAKER-P |
can annotate |
noncoding RNAs |
Arabidopsis thaliana |
| MAKER-P |
can manage |
genome annotations |
|
| MAKER-P de novo gene build |
contains |
1,250 fewer genes than TAIR10 |
Arabidopsis thaliana |
| MAKER-P update of TAIR10 gene model |
maintained |
all three transcripts |
Arabidopsis thaliana |
| Arabidopsis thaliana |
has annotated |
66 putative full-length pectin methyl-esterases (PMEs) |
Arabidopsis thaliana |
| false-positive rate of pipeline |
is |
3.1% |
Arabidopsis thaliana |
| Photoperiod-H1 |
is annotated as |
high-confidence gene |
Hordeum vulgare |
| iPlant Cyberinfrastructure |
was used for |
MAKER-P update and revision of maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build |
Zea mays |
| 4,049 multiexon MAKER-P de novo models |
many are |
sizable, multiexon gene models that contain domains |
Zea mays |
| pseudogenes and noncoding RNAs |
are absent from |
The Arabidopsis Information Resource 10 build |
Arabidopsis thaliana |
| MAKER-P |
provides basic resource that democratizes |
genome annotation |
|
| highly supported, highly expressed genes |
often have |
some data that strongly support a given transcript model |
Arabidopsis thaliana |
| MAKER-P |
has extended MAKER to include means for annotation of |
pseudogenes and ncRNAs |
|
| MAKER-P |
identified and annotated |
4,466 additional protein-coding genes |
Zea mays |
| 6a annotation build |
includes |
102,370 pseudogene fragments |
|
| MAKER-P de novo build |
is |
quite similar to 5b+ build |
Zea mays |
| 2,192 annotated tRNA genes |
comprise |
1,398 decode standard amino acids, 4 decode seleno-Cys, 7 are suppressor tRNAs, 12 are undetermined, 771 are pseudogenized |
Zea mays |
| 6a build |
includes |
additional 3,006 ncRNA genes and 102,370 pseudogene annotations |
Zea mays |
| MAKER-P on Texas Advanced Computing Center |
can de novo annotate |
Zea mays genome |
Zea mays |
| MAKER-P update |
brings gene models into better agreement with |
available evidence |
Arabidopsis thaliana |
| MAKER-P |
is |
multithreaded, fully message passing interface-compliant annotation engine |
|
| pseudogenes |
impact |
gene structure annotation |
|
| MAKER-P |
can carry out |
complete de novo annotation of 17.83-Gb draft loblolly pine genome |
Pinus taeda |
| 6a build genes |
are more congruent with |
evidence |
|
| Acer truncatum genome |
contains |
868 snRNAs |
Acer truncatum |
| 23 160 genes in v3.1 |
is 5142 more than in |
v2.5 |
Aethionema arabicum |
| MAKER-P |
is installed and benchmarked on |
Texas Advanced Computing Center |
|
| MAKER-P |
results in demonstrable improvements to |
annotations of Arabidopsis genome |
Arabidopsis thaliana |
| Arabidopsis thaliana |
is used as model for |
genome annotation benchmarking |
Arabidopsis thaliana |
| single full-length cDNA |
may confirm |
entire exon-intron structure of annotated transcript |
Arabidopsis thaliana |
| WebApollo |
provides functionality for |
remote editing of annotations |
|
| genetically anchored physical contigs |
provide extended information about |
genomic context and local neighborhood |
Hordeum vulgare |
| annotation |
consists of |
137,208 gene transcripts |
Zea mays |
| all new models |
have |
gene-finder support |
Zea mays |
| MAKER-P |
provides effective means for annotation of |
plant genomes |
|
| 5b annotation build |
differs from |
5b+ annotation build |
|
| MAKER-P |
provides |
management functions |
|
| MAKER-P-trained version of Augustus |
calls |
about 5% more genes |
Zea mays |
| MAKER-P toolkit |
provides |
process for annotation of pseudogenes |
|
| 54.6% of pseudogenes |
have |
one or more MapMan annotations |
Zea mays |
| Gypsy superfamily |
represented |
54.56 Mbp |
Ficus carica |
| Euscaphis japonica genome |
contains |
349 microRNAs |
Euscaphis japonica |
| oligos |
are located 6 kb upstream of |
transcription start sites |
Triticum aestivum |
| wheat pan-'NLRome' |
could be |
larger than estimated |
Triticum aestivum |
| Oryza sativa |
has |
35 pectin methyl-esterases (PMEs) |
Oryza sativa |
| Brachypodium distachyon |
has |
29 pectin methyl-esterases (PMEs) |
Brachypodium distachyon |
| TE detection across final assembly |
is significantly increased compared with |
previous version of fig genome |
Ficus carica L. |
| genome of L. sativa |
has been sequenced and assembled recently |
high-quality, comprehensive reference genome |
Lactuca sativa |
| A. thaliana genome survey |
revealed |
RRM-containing proteins |
Arabidopsis thaliana |
| 20-kb region |
contains |
Loc_08g07740 |
Oryza sativa |
| gene bodies and (ATNACK2, NACK2, TES, AT3G43210) |
represented |
28.06% and 37.39% of assembly |
Ficus carica |
| barley IBSC genome |
contains |
36 XTH sequences |
Hordeum vulgare |
| transposable elements |
covered |
111.06 Mbp |
Ficus carica |
| tandem repeats |
represent |
12.74 Mbp |
Ficus carica |
| centromeric region search |
isolated |
42 centromeric contigs |
Ficus carica |
| exon |
had total length of |
43.5 Mbp |
Ficus carica |
| intron |
had total length of |
49.55 Mbp |
Ficus carica |
| genome assembly |
searched for |
sex determining region location |
Ficus carica |
| Acer truncatum genome |
contains |
1345 miRNAs |
Acer truncatum |
| simplest genome annotation method |
is based on |
ab initio predictors |
|
| verified gene models from Ae. arabicum v3.0 free of evident errors |
were used to support |
gene prediction |
Aethionema arabicum |
| MAKER |
tries to predict genes based on |
canonical splice sites by default |
Aethionema arabicum |
| protein-coding genes |
predicted |
11,528 5′-UTRs |
Ficus carica |
| 17 765 genes (98.6%) |
were found using BLAST |
v3.1 protein set |
Aethionema arabicum |
| gene annotation |
performed using |
MAKER, AUGUSTUS, and SNAP pipelines |
Manihot esculenta |
| retrotransposons or Class I elements |
represented |
28.3% of genome assembly |
Ficus carica |
| Aethionema arabicum genome version 3 (V3) |
did not predict gene models de novo but lifted over gene models from |
v2.5, which were lifted before from v1.0 |
Aethionema arabicum |
| Trinity |
showed worse results for |
all the other parameters tested |
Aethionema arabicum |
| gene Aa3LG10G286 of v3.0 |
is 37 kbp long and was split into |
four genes in v3.1 |
Aethionema arabicum |
| tandem repeats |
represent |
3.82% of assembly |
Ficus carica |
| DNA transposons or Class II elements |
represented |
5.01% of genome assembly |
Ficus carica |
| highly repetitive 103 bp-long tandem repeats |
considered as |
putative centromeric repeat |
Ficus carica |
| 3′ UTR intron |
was not annotated in |
v2.5 or v3.0 |
Aethionema arabicum |
| GFF from v2.5 and v3.0 |
had formatting problems |
such as missing features or CDSs that were not multiples of three |
Aethionema arabicum |
| Acer truncatum genome |
contains |
744 tRNAs |
Acer truncatum |
| SNAP |
was used as |
ab initio gene predictor |
Aethionema arabicum |
| Repeat annotation in V3.0 |
presents a striking increase compared to |
72 Mbp (42.3% of the nucleotide sequence) in the initial public assembly V1.0 |
Aethionema arabicum |
| free numbers between tens |
can be used in |
future annotations |
Aethionema arabicum |
| interrupted open reading frames (ORFs) |
were migrated in |
both versions, 2.5 and 3.0 |
Aethionema arabicum |
| 1224 genes from v3.0 and 1772 genes from v3.1 |
were labeled as |
putative (ATNACK2, NACK2, TES, AT3G43210) |
Aethionema arabicum |
| lettuce reference genome |
provides |
high-quality, comprehensive reference genome for analysis of the Compositeae family |
Lactuca sativa |
| intron |
had average length of |
367 bp |
Ficus carica |
| 70-kb region flanked by markers SEQ3-1 and SEQ5-1 |
contains |
11 predicted genes |
Oryza sativa |
| transposable elements |
covered |
33.57% of assembly |
Ficus carica |
| transcripts produced by Scallop |
were provided to |
MAKER |
Aethionema arabicum |
| code and tools of Ae. arabicum DB |
are based on |
previous databases such as OliveTreeDB, Physcomitrella patens Gene Model Lookup DB, and Sol Genomics Network |
Aethionema arabicum |
| lift-over method |
could only migrate genes in |
regions that already existed in previous genome sequence versions |
Aethionema arabicum |
| Ae. arabicum genome sequence V3 |
was used as reference for |
gene annotation v3.1 |
Aethionema arabicum |
| MAKER results |
yielded comparable results with regard to |
number of genes supported by PacBio full-length transcripts, BUSCO completeness, Gene Ontology (GO) terms and protein domain evidence |
Aethionema arabicum |
| gene annotation of Ae. arabicum v1.0 |
was lifted over to |
v2.5 |
Aethionema arabicum |
| Chromovirus elements |
typically found in |
centromeric structures |
|
| gene annotation process |
identified |
1685 non-protein-coding genes |
Ficus carica |
| lift-over tools |
are used to migrate |
previous gene versions to new genome versions |
|
| 5520 genes in v2.5 and 4589 genes in v3.0 |
were affected by |
obvious annotation errors or annotated as possible (ATNACK2, NACK2, TES, AT3G43210) |
Aethionema arabicum |
| 4,354 large INDELs |
another 882 within |
2,000 bp upstream of genes |
Manihot esculenta |
| 2,941 genes |
newly identified in |
new Gastrodia elata genome assembly |
Gastrodia elata |
| 241 201 genes with identical sequence between two assemblies |
were unambiguously mapped |
IWGSC RefSeq v2.1 |
|
| annotation |
is essential for |
research community benefit from sequence data |
Arabidopsis; rice; Medicago; poplar |
| phylogenetic heatmap approach |
was used to classify |
55,801 genes in the MSU7 rice genome |
Oryza sativa |
| gene model annotation v3.1 for Aethionema arabicum |
predicts gene model structure de novo using |
MAKER |
Aethionema arabicum |
| MAKER with Scallop |
produced a 0.7% higher BUSCO completeness and 433 more genes supported by full-length transcripts than |
PASA |
Aethionema arabicum |
| SNAP ab initio prediction |
supported |
two gene models instead of one |
Aethionema arabicum |
| locus name 'Aa31LG1G10' |
comprises |
'Aa' stands for Ae. arabicum, '31' for gene annotation version 3.1, 'LG' for linkage group, followed by the number of the LG, and 'G' for gene followed by the gene number |
Aethionema arabicum |
| analysis of protein domains with InterProScan |
detected |
947 genes in v3.0 containing TE domains |
Aethionema arabicum |
| 23 160 genes in v3.1 |
includes |
5606 genes that did not overlap with the genes of v3.0 |
Aethionema arabicum |
| 20 LsOFP, 22 LsSUN, 10 LsWOX and five LsYABBY genes |
identified in |
HZ genome |
|
| MAKER-P |
is benchmarked using |
Zea mays genome |
Zea mays |
| 6a build genes |
have more |
exons |
|
| Scallop transcriptome representation |
was selected as the input for |
final annotation in MAKER |
Aethionema arabicum |
| 2579 TAPs in v3.1 |
is 449 more than in |
v3.0 |
Aethionema arabicum |
| 18 066 (76.7%) genes |
were assigned |
putative functions |
|
| gene intervals |
spanned |
136 Mb of ChrUn scaffolds |
|
| 105 534 HC and 155 624 LC genes |
were located on |
pseudomolecules |
|
| coding sequence |
defined by |
TAIR8 |
Arabidopsis thaliana |
| MAKER results |
did not indicate superior performance of |
any of the different assemblers |
Aethionema arabicum |
| annotation v3.1 |
generated |
24 932 genes |
Aethionema arabicum |
| annotation pipelines such as MAKER |
is possible to include |
expression data and proteins that were not available when the previous gene annotation was created |
Aethionema arabicum |
| 2728 genes (48.7%) |
correspond to |
fixed version of broken genes in v3.0 |
Aethionema arabicum |
| Arabidopsis genome |
contains different numbers of TFs in |
NAC TF family |
Arabidopsis thaliana |
| Full-length cDNA (FLcDNA) |
provides |
high-resolution evidence to accurately define coding and noncoding features |
|
| ab initio gene predictors |
face challenges in |
defining exon-intron boundaries |
|
| annotation |
requires |
expert oversight to distinguish biologically meaningful features from technical artifacts |
|
| (PSY1, AT5G58650) gene |
is annotated as |
Model estExt_GenewiseH_1.c_620008 |
Chlamydomonas |
| 375 receptor-like cytoplasmic kinases |
identified in |
rice genome |
Oryza sativa |
| R2R3-MYBs |
were identified in |
Populus trichocarpa genome |
Populus trichocarpa |
| retrotransposons or Class I elements |
represented |
84.95% of repetitive content |
Ficus carica |
| long terminal repeat retrotransposons |
accounted for |
28.03% of total genome assembly |
Ficus carica |
| Euscaphis japonica genome |
contains |
3940 small nuclear RNAs |
Euscaphis japonica |
| repetitive elements |
occupy |
66.36% of Gastrodia elata genome |
Gastrodia elata |
| percentages of subgenome lengths represented by individual super-families and families |
were similar among |
A-, B- and D-subgenomes |
|
| final gene set |
was named according to |
previously introduced nomenclature |
Aethionema arabicum |
| MAKER transcripts with isoforms and PacBio full-length transcripts |
were included in |
Ae. arabicum DB for downloading and for inspection in the genome browser |
Aethionema arabicum |
| lift-over process from v1.0 to v3.0 |
resulted in loss of |
987 gene models |
Aethionema arabicum |
| 3′ UTR intron |
was found in |
v3.1 annotation |
Aethionema arabicum |
| lower number of genes in Ae. arabicum |
could be due to |
in some cases, the concatenation of close genes |
Aethionema arabicum |
| integrated pipeline combining de novo prediction, homology search and RNA-sequencing (RNA-Seq) verification |
identified |
23 541 putative gene models |
|
| Euscaphis japonica genome |
contains |
759 ribosomal RNAs |
Euscaphis japonica |
| gene duplication from whole genome duplication |
limits ability to |
identify accurately which genes are haplotype specific and missing from one annotation |
Manihot esculenta |
| total length of (ATNACK2, NACK2, TES, AT3G43210) |
increased from |
11 921 309 743 to 12 092 094 168 bp |
|
| rice genome |
contains different numbers of TFs in |
ALFIN-like TF family |
Oryza sativa |
| bottle gourd genome |
contains |
1062 transfer RNAs (tRNAs) |
|
| bottle gourd genome |
contains |
340 small nuclear RNAs |
|
| transposable elements (TEs) |
were reannotated in |
IWGSC RefSeq v2.1 |
Triticum aestivum |
| machine-learning classifier |
predicts in |
fungal gene catalogs |
|
| gene annotation |
included |
transcript evidence from RNA-sequencing (RNA-seq) from 11 tissue types |
Manihot esculenta |
| Phase1 assembly |
identified |
1,159 arrays containing 2,608 genes |
Manihot esculenta |
| annotations from Yuan et al. (2018) and updated annotations |
merged to generate |
21,115 protein-coding genes and 3,664 pseudogenes |
Gastrodia elata |
| bioinformatics approaches to investigate plant genetic structure |
include |
annotating functional elements |
|
| rice genome |
contains different numbers of TFs in |
MADS TF family |
Oryza sativa |
| supervised learning |
is applied to |
predicting regulatory and non-regulatory regions in the maize genome |
Zea mays |
| MAKER tests with short-read assemblers |
provided |
PacBio transcripts separately |
Aethionema arabicum |
| proteins |
supported |
two gene models instead of one |
Aethionema arabicum |
| Marker-Assisted Gene Annotation Transfer for Triticeae (MAGATT) pipeline |
was used for |
gene annotation transfer |
Triticum aestivum; Triticum turgidum ssp. durum |
| 264 876 gene models |
represented |
207 575 intervals containing between 1 and 4 genes |
|
| Arabidopsis MYB TFs |
show huge difference in numbers of |
subfamily members |
Arabidopsis thaliana |
| methods to construct genomewide maps of fitness variation |
may profoundly improve efforts to |
discover functional elements in plant genomes |
|
| over 70% of annotated genes |
were supported by |
high evidence levels |
Manihot esculenta |
| gene intervals |
had average size of |
9.6 kb |
|
| common wheat genome |
contains |
137 (BSK12, SSP, AT2G17090) genes |
Triticum aestivum |
| Cucumis metuliferus CM27 genome |
contains |
29,214 protein-coding genes |
Cucumis metuliferus |
| 1379 HC and 4216 LC genes |
were located on |
scaffolds assigned to ChrUn |
|
| Arabidopsis genome |
contains different numbers of TFs in |
(AtbZIP, bZIP, AT1G68880) TF family |
Arabidopsis thaliana |
| Phase0 assembly |
identified |
1256 arrays containing 2,865 genes |
Manihot esculenta |
| common wheat genome |
contains |
3606 TFs |
Triticum aestivum |
| increase in pseudomolecule length in IWGSC RefSeq v2.1 |
resulted in |
percentage of CS genome accounted for by (ATNACK2, NACK2, TES, AT3G43210) (85.0%) nearly identical to IWGSC RefSeq v1.0 (84.7%) |
|
| lift-over process from v1.0 through v2.5 to v3.0 |
caused |
formatting inconsistencies and gene structure annotation errors |
Aethionema arabicum |
| co-location of biosynthetic enzymes |
can dramatically increase |
ease of identifying biosynthetic pathway genes |
|
| protein evidences |
were provided to |
MAKER |
Aethionema arabicum |
| functionally enriched, context-aware plant genome annotations |
bridge |
structure and function |
|
| 23 541 putative gene models |
is |
1069 more than predicted from USV |
|
| gene annotation |
annotated |
33,653 and 35,684 genes in Phase0 and Phase1 assemblies, respectively |
Manihot esculenta |
| oligos |
are located 6 kb downstream of |
transcription termination sites |
Triticum aestivum |
| uniform set of Medicago gene annotations |
was generated by |
coordinated international effort |
Medicago truncatula |
| differences in bioinformatic search stringency |
cause |
huge difference in numbers of TF subfamily members |
Arabidopsis thaliana |
| NLR-Annotator |
scans genomic sequences for |
combinations of NLR-associated sequence-motifs |
|
| array sizes |
varied to |
11 and 8 genes in array in Phase0 and Phase1, respectively |
Manihot esculenta |
| MCScanX gene anchor file |
supplied to help |
users identify probable best hits between genes of two phases |
Manihot esculenta |
| Gastrodia elata genome from Yuan et al. (2018) |
has |
86.60% coding genes identical to updated genome |
Gastrodia elata |
| 2792 IWGSC RefSeq v1.0 genes |
could not be identified in |
IWGSC RefSeq v2.1 |
|
| A. cruentus Isoseq transcript long-read data |
complemented |
homology and ab initio gene prediction approaches |
Amaranthus cruentus |
| annotation |
extends into |
structural dimensions |
|
| proteomic data |
gives useful information for |
genome annotation |
|
| BAC clones |
were sequenced and manually annotated for |
genes and (ATNACK2, NACK2, TES, AT3G43210) |
Oryza spp. |
| empirical data |
may be incorporated to train models, validate gene structures, and identify |
untranslated regions, promoters, enhancers, and other non-coding features |
|
| 20K array and 45K array |
is mapped to |
release 5 of the TIGR Rice Genome Annotation |
Oryza sativa |
| a SAM T99 'superfamily'-level model built and scored |
is one of |
seven procedures used in the identification process |
|
| rice |
has |
769 gene models |
Oryza sativa |
| Volvox carteri genome (v2.1) in Phytozome 12 |
contains computationally annotated |
pherophorin genes |
Volvox carteri |
| evidence-based frameworks |
are rapidly advancing |
the field |
|
| VvWOX genes from 'Chardonnay' |
showed nucleotide differences from |
sequences reported in grapevine databases |
Vitis vinifera |
| many seed samples (234 of 294) |
were included for |
RNA-seq data used in this annotation |
Aethionema arabicum |
| bottle gourd genome |
contains |
155 ribosomal RNAs |
|
| repetitive elements |
identified in |
Gastrodia elata genome |
Gastrodia elata |
| bioinformatics approaches to investigate plant genetic structure |
include |
identifying repetitive sequences |
|
| 2974 IWGSC RefSeq v1.0 genes |
had sequence changed in |
IWGSC RefSeq v2.1 compared with v1.0 |
|
| de novo annotation of (ATNACK2, NACK2, TES, AT3G43210) with CLARITE |
annotated |
4 199 592 (ATNACK2, NACK2, TES, AT3G43210) belonging to 506 families |
|
| annotation v3.1 including long-read transcripts |
allows a better detection of |
proximal regulator elements, like introns in the 3′ UTRs |
Aethionema arabicum |
| transcription factors (TFs) in the whole genome of Triticum urartu |
have been successfully identified |
1238 transcription factors |
Triticum urartu |
| Gastrodia elata genome annotation |
performed using |
homologous genes from closely related species |
Gastrodia elata |
| Gastrodia elata genome annotation |
performed using |
de novo prediction results |
Gastrodia elata |
| BLASTP method |
identified |
fifty-four candidate proteins |
Cajanus cajan |
| experience gained from previous whole-genome efforts |
informed generation of |
uniform set of Medicago gene annotations |
Medicago truncatula |
| Arabidopsis genome |
contains different numbers of TFs in |
ALFIN-like TF family |
Arabidopsis thaliana |
| Helixer |
improves |
exon-intron boundary accuracy |
|
| Tiberius |
enables |
learning both sequence features and structural rules directly from DNA |
|
| chromosome-scale, haplotype-resolved genomes |
are fueling |
innovations in annotation frameworks, including pangenomes, machine learning, and multi-omic integration |
|
| evidence-based frameworks |
integrate |
RNA sequencing |
|
| 7.21-kb heterozygous deletion |
overlaps |
upstream regulatory region of Manes.03G086200 |
Manihot esculenta |
| RNA sequencing (RNA-seq) data |
mapped to |
Gastrodia elata genome assembly |
Gastrodia elata |
| uniform set of Medicago gene annotations and other views of genome data |
has been provided at |
several websites |
Medicago truncatula |
| rice genome |
contains different numbers of TFs in |
WRKY TF family |
Oryza sativa |
| AI-driven approaches |
enable |
predictive and interpretable models of genome function across diverse plant species |
|
| 61 U-box proteins |
identified from |
Arabidopsis genome |
Arabidopsis thaliana |
| scoring an HMM built from the PFAM 'full' alignment |
is one of |
seven procedures used in the identification process |
|
| gene prediction and annotation |
typically begin with |
ab initio, AI-based approaches |
|
| U-box proteins |
identified in |
rice japonica and indica subspecies genomes |
Oryza sativa |
| genome sequencing and functional analysis of rice |
have been |
completed |
Oryza sativa |
| training on rapidly expanding multi-omic datasets |
allows |
capture subtle genomic signals and regulatory complexity |
|
| annotation data produced in this study and the Ae. arabicum genome sequence V3.0 |
are available via |
web-accessible database |
Aethionema arabicum |
| genes of v3.1 |
are supported by higher numbers of |
proteins domains, GO terms and A. thaliana homologs and higher BUSCO completeness |
Aethionema arabicum |
| oligos |
are located on |
predicted coding regions |
Triticum aestivum |
| NLR annotation |
requires |
specialized tools |
|
| artificial intelligence (AI)-driven gene predictors |
outperform |
traditional ab initio tools |
|
| WU-BLAST of known U-box proteins against the genome |
is one of |
seven procedures used in the identification process |
|
| putative VvRop- and VvRab-interacting proteins |
are present in |
Vitis vinifera genome |
Vitis vinifera |
| exon 5 sequence from Glyma 18g43210.1 |
showed |
annotation error from phytozome |
Glycine max |
| multi-omic datasets |
enhance |
training and identification of features such as promoters |
|
| 150 kb region of chromosome 12 |
contains |
60 predicted genes |
Oryza sativa L. |
| StringTie |
produced very similar results to |
Scallop |
Aethionema arabicum |
| 368 genes from v3.0 |
did not overlap |
with genes in v3.1 |
Aethionema arabicum |
| distance to nearest genes |
measured for |
large indels |
Manihot esculenta |
| new annotation release |
contained |
108 010 HC and 161 535 LC gene models |
|
| IWGSC RefSeq Annotation v2.1 |
contains |
266 753 genes |
|
| D-subgenome |
had under-represented |
Gypsy super-family |
|
| 77 putative U-box proteins |
identified from |
rice genome |
Oryza sativa |
| v3.0/v4.0 annotation |
is underrepresented in |
tandemly repeated genes (e.g., NBS genes) and repetitive elements |
Solanum melongena |
| gapless, haplotype-resolved genomes |
shifts bottleneck from assembly to |
annotation and interpretation |
|
| VvWOX13A |
has uncorrected annotation in |
C-terminus of Pinot Noir ENTAV115 database |
Vitis vinifera |
| VvWOX1 and VvWOX6 genes |
were renamed from |
VvWOX1A and VvWOX1B |
Vitis vinifera |
| large-scale orthology networks |
continue to improve |
functional inference |
|
| high-resolution evidence from full-length cDNA, long- and short-read transcriptomes, and epigenomic layers |
accurately define |
chromatin states |
|
| grapevine draft genome sequences |
were screened using |
AtWOX proteins from Arabidopsis |
Vitis vinifera; Arabidopsis thaliana |
| epigenomic layers such as DNA methylation |
provide |
high-resolution evidence to accurately define coding and noncoding features |
|
| HMM profile |
used for searching |
rice chromosomes pseudomolecules version 5 of TIGR |
Oryza sativa |
| HvGS gene sequences |
were aligned to |
barley genomic sequence |
Hordeum vulgare |
| Protein function annotation |
was performed by |
InterProScan v5.52–86.0 |
Selaginella kraussiana |
| transposable elements (TEs) |
account for |
35.90% of (GCS1, HAP2, AT4G11720) assembly |
Prunus persica |
| RING-domain variants |
were not included in |
curated analysis |
Arabidopsis thaliana |
| transposable elements (TEs) |
account for |
36.32% of (HAP1, MAGO, MEE63, AT1G02140) assembly |
Prunus persica |
| Pereira |
recovered |
215 Pseudoviridae insertions |
Arabidopsis thaliana |
| 104 elements identified as (CMT3, AT1G69770) targets within transposons |
include |
long terminal repeat (LTR), long interspersed element (LINE) and short interspersed element (SINE) retrotransposons, DNA transposons and helitrons |
Arabidopsis thaliana |
| complete rice genome sequence |
indicates presence of |
at least 11 glutelin genes |
Oryza sativa |
| current AI-based methods |
instead report |
primary gene models |
|
| plant genome annotations |
constructed for |
model and crop species |
|
| SWEET transporter search strategies |
validates |
complete set of SWEET transporters from V. vinifera |
Vitis vinifera |
| methylated reads |
mapped to |
genes |
Populus trichocarpa |
| SGN data |
provided chromosome assignment for |
129 loci |
Solanum lycopersicum |
| tomato genome sequence |
provides robust method to identify |
genomic location of unmapped loci |
Solanum lycopersicum |
| evidence-based frameworks |
integrate |
3D genome data |
|
| proteins from same loci of japonica and indica genome |
may have lower protein sequence identity due to |
different annotation procedures |
Oryza sativa subsp. japonica; Oryza sativa subsp. indica |
| innovations in annotation frameworks, including pangenomes, machine learning, and multi-omic integration |
reveal |
functional elements long hidden by fragmented, low-contiguity plant genomes |
|
| four additional RLCKs |
identified by performing search in |
KOME database |
Oryza sativa |
| Arabidopsis |
contains |
21 BTB-nonphototropic hypocotyl (NPH)3 proteins |
Arabidopsis thaliana |
| 21 BTB-NPH3 proteins in Arabidopsis |
represent |
over 25% of the BTB proteins in this genome |
Arabidopsis thaliana |
| detailed knowledge of the Elbe retrotransposons and their numerous remnants |
enables |
straightforward annotation |
|
| comparative studies of Elbe families |
will support |
annotation of these genomes |
|
| lineage-specific genes |
had average GC content much higher than |
other genes |
Cicer arietinum |
| transcription factor genes |
belonging to |
transcription factor families |
Cicer arietinum |
| full-length cDNA or RNA-Seq contigs |
aligned against |
Morex assembly |
Hordeum vulgare |
| three candidate SSK genes |
were obtained from |
apple genome |
Malus domestica |
| CCHC domain-containing transcription factors |
relatively low fraction in |
chickpea draft genome |
Cicer arietinum |
| Rice GT Database |
now contains |
622 loci |
Oryza sativa |
| Twenty-one NB-ARC HMMs |
matched |
EL10 genome without de novo predicted proteins |
Beta vulgaris |
| 38 349 Dub candidate proteins |
were identified in |
1642 fungal genomes |
|
| full-length transcript sequencing data |
improves |
accuracy |
|
| new genome annotation |
is comparable in accuracy to |
well-annotated model plant genomes |
Amaranthus hypochondriacus |
| predictions of cysteine-rich domains |
have to be met with skepticism |
reliability of domain predictions |
Arabidopsis thaliana |
| Repbase update |
is used to identify and classify |
repeat sequences |
|
| Pereira analysis |
calculated |
5.60% retrotransposon DNA content |
Arabidopsis thaliana |
| repetitive elements |
were identified using |
RepeatMasker v 4.1.2-p1 |
Selaginella kraussiana |
| non-coding RNA prediction |
yielded |
8,299 rRNA genes in (HAP1, MAGO, MEE63, AT1G02140) |
Prunus persica |
| centromere regions |
were successfully predicted in |
assembled chromosomes |
Prunus persica |
| Pereira |
recovered |
219 Athila insertions |
Arabidopsis thaliana |
| number of annotated pseudogenes in Arabidopsis thaliana |
has been increasing dramatically |
TIGR1 annotation to TAIR7 annotation |
Arabidopsis thaliana |
| nodulation-related genes in chickpea |
lower than |
nodulation genes in Medicago, soybean and pigeonpea |
Cicer arietinum; Medicago truncatula; Glycine max; Cajanus cajan |
| majority of identified R gene loci |
reside in |
poorly or previously unannotated regions of the genome |
Solanum tuberosum |
| de novo transcriptome |
completely covered |
55% of annotated Bd21 genes by at least one unbroken de novo transcript |
Brachypodium distachyon |
| newly identified transcripts |
were annotated by |
Gene Ontology (GO) terms through BLASTx and Interpro scan |
Brachypodium distachyon |
| Further seven partial NB-LRRs |
were revised to yield |
more complete NB-LRR genes |
Solanum tuberosum |
| RenSeq |
can be used to improve |
existing NB-LRR gene annotations |
|
| genes described in this study |
75% reside in |
poorly or previously unannotated regions of the potato and tomato genome |
Solanum tuberosum; Solanum lycopersicum |
| 19 genes containing an F-box motif |
were found by |
Blast search of the 'Golden Delicious' genome |
Malus domestica |
| methylated reads |
mapped to |
repeats |
Populus trichocarpa |
| Arabidopsis repeat library |
was constructed using |
approach outlined in basic tutorial |
Arabidopsis thaliana |
| MAKER-P pseudogene analysis |
identified |
4,204 pseudogenes |
Arabidopsis thaliana |
| MAKER-P |
added additional untranslated regions (UTRs) to |
1,393 5b+ gene models |
Zea mays |
| 5b+ annotation build |
included |
213 improved models |
Zea mays |
| version of MAKER-P available within the iPlant Cyberinfrastructure |
can reannotate entire maize genome in less than |
3 h |
Zea mays |
| MAKER-P revision process for 5b+ |
merged |
31 annotations |
Zea mays |
| MAKER-P toolkit |
provides means for |
identification of known and new classes of ncRNAs |
|
| BLASTP search |
identified |
(AtNMD3, NMD3, AT2G03820) sequence |
Oryza sativa |
| MAKER-P installed on iPlant resources at Texas Advanced Computing Center (TACC) |
grants ability to revise and manage |
existing plant genomes |
|
| MAKER-P |
generates |
3,059 gene annotations on maize chromosome 10 |
Zea mays |
| maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build (5b+) |
was updated to produce |
6a maize genome annotation |
Zea mays |
| most changes during MAKER-P revision |
are to |
models having lowest (best) AED scores |
Zea mays |
| MAKER-P revised models |
have on average |
more exons |
Zea mays |
| 4,049 multiexon MAKER-P de novo models |
although shorter than |
average 5b+ annotation |
Zea mays |
| MAKER-P |
includes support for |
pseudogene identification |
|
| automated high-quality genome annotation system |
is of the utmost importance for |
high-quality genome annotation |
Zea mays |
| MAKER-P |
was used for |
revision of 5b+ annotations in light of 96 different RNA-seq data sets |
Zea mays |
| 6a build |
is composed of |
MAKER-P updated 5b+ gene models with additional 5′ and 3′ exons and UTR sequences |
Zea mays |
| R-genes |
selected from |
PRG database and NCBI protein database |
|
| INFERNAL |
identified |
non-coding RNAs |
Cicer arietinum |
| polyploid Hordeum species |
advantageous to rely on |
barley assembly as reference |
Hordeum species |
| identified NB-LRRs in Solanum tuberosum clone DM |
increased from 438 to |
755 NB-LRRs |
Solanum tuberosum |
| 95 Or genes |
were reported by |
Engsontia et al. (2014) |
Plutella xylostella |
| repeat library of the Selaginella kraussiana genome |
was constructed ab initio using |
RepeatModeler v2.0.2 with the parameter "-LTRStruct" |
Selaginella kraussiana |
| gene structure annotation dataset |
was filtered using |
AGAT v0.8.0 to remove genes with incomplete structures and those encoding proteins less than 50 amino acids in length |
Selaginella kraussiana |
| MAKER-P |
rapidly updates |
genome annotations |
|
| MAKER-P maize annotations |
compare favorably with |
current chromosome 10 V2 annotations |
Zea mays |
| RefGen_v2 |
includes |
110,028 transcript models in the Working Gene Set |
Zea mays |
| genome projects |
have annotations that embody |
years of manual curation and revision |
|
| MAKER-P |
was used for |
systematic comparison of 5b and 5b+ annotation builds |
Zea mays |
| proportion of uniquely mapped reads in 16C samples |
were respectively |
39% intergenic, 26% intronic, 19% exonic and 16% ribosomal regions |
Solanum lycopersicum |
| Sp9509_oxford_v3 assembly |
predicted |
20 661 protein coding genes |
Spirodela polyrhiza |
| TranSeq reads mapped to reference tomato genome |
could significantly improve |
genome annotation |
Solanum lycopersicum |
| TranSeq and TruSeq analysis |
identified |
new exon |
Solanum lycopersicum |
| many de novo sequenced plant genomes |
suffer from |
extensive fragmentation and poorly defined gene models |
|
| complementary approach combining two RNA-seq methods |
takes advantage of |
standard RNA-seq procedure to obtain complete transcript sequences combined with 3′-end sequencing method |
|
| large portion of gene models in tomato genome |
was |
misannotated at 3′-end |
Solanum lycopersicum |
| uniquely mapped reads in DSN-treated samples |
fell in |
intergenic, intronic, exonic and ribosomal regions |
Solanum lycopersicum |
| regions in (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) and/or PH207 genome with homology to syntenic loci |
were not annotated as |
gene models |
|
| e-value cut-off for genomic comparison |
confirmed effectiveness for |
ubiquiton subfamily comparison |
|
| ubiquiton genes |
could be clustered into |
three groups based on copy-number profiles |
|
| Ub2 subfamily |
containing |
2 Ub domains |
|
| 1327 resistance gene analogues (RGAs) |
were identified on |
15 chromosomes |
Linum usitatissimum |
| TranSeq reads |
are mapped to |
tomato reference genome (ITAG2.4) |
Solanum lycopersicum |
| Chlorella vulgaris 211/11P genome |
has exon and intron average length shorter than |
Chlorella zofingiensis |
Chlorella vulgaris; Chlorella zofingiensis |
| size of domain family with annotated genes |
generally correlates with |
number of pseudogenes |
Arabidopsis thaliana |
| WebApollo database containing TAIR10, MAKER-P de novo, and MAKER-P updated annotations |
is available online at |
http://weatherby.genetics.utah.edu:8080/WebApollo_A_thaliana |
Arabidopsis thaliana |
| MAKER-P de novo build |
contains |
additional pseudogene, ncRNA, and well-supported protein-coding gene models |
Zea mays |
| MAKER-P transcript structure |
reflects |
best-possible gestalt of all evidence for that gene |
Arabidopsis thaliana |
| WebApollo |
can be rapidly deployed in support of |
distributed genome jamborees |
|
| MAKER-P |
was applied to |
much less tractable maize genome |
Zea mays |
| underlying annotations in miRBase |
are |
generally experimentally determined or experimentally verified |
|
| mammalian and fish genomes |
contain |
25 to 50 examples from each of BTB-ZF, BTB-BACK-kelch (BBK) and T1-Kv families |
|
| Caenorhabditis elegans |
contains |
very few BTB-ZF and BBK proteins |
Caenorhabditis elegans |
| annotation of the sugar beet genome sequence |
supported by |
FISH karyotype results |
Beta vulgaris L. |
| classification and analysis results |
are available as |
integrated web resource, the database of Arabidopsis Splicing Related Genes (ASRG) |
Arabidopsis thaliana |
| newly sequenced genomes |
have problems with |
protein annotation |
|
| iterative BLAST searches with Athila query sequences |
estimated |
Athila copy numbers |
Arabidopsis thaliana |
| gene fragments |
were removed from |
repeat library using Protexcluder v1.1 after comparison of sequences with those in the alluniRefprexp070416 database |
Selaginella kraussiana |
| annotation pipeline |
identified |
26,131 protein-coding genes in (GCS1, HAP2, AT4G11720) |
Prunus persica |
| ordered set of manually curated RING domains |
includes |
all bona fide RING domains |
Arabidopsis thaliana |
| CEGMA pipeline |
revealed |
eukaryotic orthologous groups (KOGs) |
Cicer arietinum |
| Arabidopsis GTs |
total number is |
456 loci |
Arabidopsis thaliana |
| training results of Augustus using BRAKER v2.1.6 in the etpmode mode |
were used as |
starting point for training Augustus through the embryophyta_odb10 database using BUSCO |
Selaginella kraussiana |
| chickpea-specific orphan genes |
less than |
orphan genes in rice |
Cicer arietinum; Oryza sativa |
| lineage-specific genes |
compared with |
other genes |
Cicer arietinum |
| NB-LRR complement in Solanum lycopersicum 'Heinz 1706' |
extended to |
394 loci |
Solanum lycopersicum |
| GT48 (LOC_Os03 g02756.1) |
is |
clearly a valid rice locus |
Oryza sativa |
| genome-wide approach |
identified |
97 (AtbZIP, bZIP, AT1G68880) transcription factor family members |
Elaeis guineensis |
| inconsistent annotation of genes in (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) or PH207 |
could lead to |
false positive identification of differentially fractionated genes |
|
| proportion of differentially fractionated GWAS hits |
was not significantly different from |
proportion of differentially fractionated genes among non-GWAS hits |
|
| phytochrome proteins of Arabidopsis |
used as query sequences for |
potato reference genome |
Arabidopsis thaliana; Solanum tuberosum |
| ATPase and nucleotide, cation and sugar transporters |
significantly higher in numbers in |
Thellungiella parvula than in Arabidopsis thaliana |
Thellungiella parvula; Arabidopsis thaliana |
| receptor and transporter activity |
more abundant in |
chickpea-specific gene families |
Cicer arietinum |
| CCHC domain-containing transcription factors |
similar to |
other plants |
|
| transcript contig mapping |
collapsed |
target space to 61.6 Mb of non-overlapping intervals |
Hordeum vulgare |
| RenSeq |
reveals NB-LRR loci from |
uncharacterized genomes of Solanaceous species |
Solanum spp. |
| unannotated transcripts |
were filtered for |
sequences with similarity to transposon-associated protein sequences and putative unspliced pre-mRNA transcripts |
Brachypodium distachyon |
| three R2R3-MYB transcription factor encoding genes |
have been identified in |
Medicago truncatula reference genome sequence |
Medicago truncatula |
| annotated assembly of grapevine genome |
found |
261 putative genes located between IBMP locus markers |
Vitis vinifera |
| Ub subfamily |
further divided into |
Ub1, Ub2, Ub3, Ub4 and Ub5 subfamilies |
|
| Glycine latifolia genome annotation |
yielded |
high confidence gene set of 54,475 protein-coding loci |
Glycine latifolia |
| predicted secreted proteins (SPs) |
might represent |
pseudogenes |
Rhizophagus irregularis |
| Sp9509_oxford_v3 assembly |
identified |
801 full-length long terminal repeat (LTR) retrotransposons |
Spirodela polyrhiza |
| 801 full-length LTR retrotransposons |
compared with |
656 identified in previously published Sp9509v3 assembly |
Spirodela polyrhiza |
| 801 full-length LTR retrotransposons in ON assembly |
is |
22% more LTRs identified in ON assembly |
Spirodela polyrhiza |
| non-coding RNA prediction |
yielded |
501 tRNA genes in (HAP1, MAGO, MEE63, AT1G02140) |
Prunus persica |
| RetroMap |
delimits |
LTR retroelement insertions |
|
| element searches using reverse transcriptase sequences as queries |
will not identify |
elements lacking reverse transcriptase motifs |
|
| list of syntenic gene assignments |
contains |
219 PH207 putative gene models that could not be identified due to assembly gaps |
|
| full-length, solo and fragmented LTRs |
accounted for |
25.73% of all pseudomolecules |
Linum usitatissimum |
| two sequences located on chromosomes 8 and 10 |
exhibited no similarity to |
the carboxyl terminus |
Zea mays |
| TranSeq method use |
in settings such as gene expression in natural diversity sets will be reserved to |
species with de novo sequenced genomes |
|
| repetitive sequence annotation |
identified |
repetitive sequences comprising 41.99% of (GCS1, HAP2, AT4G11720) |
Prunus persica |
| RetroMap analysis |
calculated |
3.36% retrotransposon DNA content |
Arabidopsis thaliana |
| (ASK1, ATSKP1, SKP1, SKP1A, UIP1, AT1G75950) proteins |
show |
significant expansions in C. elegans and A. thaliana |
Caenorhabditis elegans; Arabidopsis thaliana |
| non-coding RNA prediction |
yielded |
130 miRNA genes in (HAP1, MAGO, MEE63, AT1G02140) |
Prunus persica |
| identified splicing-related genes |
were annotated with respect to |
inferred gene structure |
Arabidopsis thaliana |
| RetroMap |
identified |
210 full-length Pseudoviridae elements |
Arabidopsis thaliana |
| unicellular eukaryotes |
contain |
only a handful of BTB domain proteins |
|