Notice: Database construction is still in progress. Certain features may be incomplete, slower than usual, or temporarily unavailable while we re-ingest the knowledge graph with citation data. Thank you for your patience.
← All pathways

genome annotation

9539 relationships annotated with this phrase. Showing first 500 of 9539.
Source entity Relationship Target entity Species
polished genome was used for genome annotation Amaranthus hypochondriacus
BRAKER2 genes were combined with predicted coding sequence of full-length transcripts Amaranthus hypochondriacus
unannotated locus with high similarity to ANR genes had no annotation support from computational and long-read annotation Amaranthus hypochondriacus
gene PVIT_0015215.T1 corresponds to gene g166 Plasmopara viticola
single-star and no-star genes have little supporting evidence Arabidopsis thaliana
pseudogenes without disabling substitutions have median length of 175 bp Arabidopsis thaliana
MAKER-P updated and revised maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build (5b+) Zea mays
MAKER-P update mode provides means to refresh annotations of established plant genomes
MAKER-P revision process for 5b+ decreased 5b+ gene set from 39,155 to 38,783 annotations Zea mays
transporter genes have been annotated in model and non-model plant genomes
full-length transcript sequencing data was added to genome annotation Amaranthus hypochondriacus
MAKER-P annotations are comparable in quality to The Arabidopsis Information Resource 10 annotations Arabidopsis thaliana
previously annotated ncRNAs are not transcribed or have extremely low transcription levels in RNA-Seq data Arabidopsis thaliana
repeated genes and other sequences result in more sequence alignments and gene predictions
RNA sequencing (RNA-seq) data hold great potential for annotation of newly sequenced plant genomes
MAKER-P allows it to scale to even the largest plant genomes
5b+ annotation build contains 213 improved genes
MAKER-P uses RNA-seq data to add untranslated region (UTR) and exon sequences
genome annotation v2.2 contains 23 817 annotated genes Amaranthus hypochondriacus
Rice Annotation Project Database contains revised annotations for many rice genes Oryza sativa
SDR of Y chromosome had 219 genes Spinacia oleracea L. subsp. turkestanica
haplotigs contained 15,642 protein coding gene loci Plasmopara viticola
plant genomes can be difficult targets for annotation
domain families differ significantly in pseudogene:gene ratio Arabidopsis thaliana
gene finders trained with unmatched species parameters causes suffering in gene model accuracy
miRNA prediction pipeline miR-PREFeR of MAKER-P follows criteria for plant miRNA annotation
Arabidopsis genome contains fewest repeats among sequenced plant genomes Arabidopsis thaliana
relative complexity of many plant genomes makes challenging creation, quality control, and dissemination of high-quality gene structure annotations
remaining MAKER-P unique protein-coding gene models were broken into two classes: multiexon models with confirmed splice sites and single-exon models with domains Zea mays
amaranth genome contains three 3R MYB candidate genes Amaranthus hypochondriacus
polished genome and full-length transcript sequencing data enabled production of most complete genome annotation of amaranth to date Amaranthus hypochondriacus
MAKER-P can annotate in only a few hours large, complex plant genomes Zea mays
WebApollo can be easily deployed in classroom for hands-on instruction
MAKER-P in update mode revises intron-exon structures of reference annotation data set
MAKER-P defaults to original reference annotation
MAKER-P revised models contain additional UTR sequence Zea mays
evidence set provides support for 90% of annotated genes in TAIR10 Arabidopsis thaliana
MAKER-P improvements are made across entire TAIR10 data set Arabidopsis thaliana
single investigator using MAKER-P can carry out update of existing genome annotations with new RNA-Seq data
MAKER-P will fulfill need for automated high-quality genome annotation system Zea mays
4,049 multiexon MAKER-P de novo models encode multiexon transcripts with at least one confirmed splice site Zea mays
MAKER-P guarantees constant, complete analysis of RNA-seq data
MAKER-P throughput demonstrates that even largest plant genomes could be annotated in reasonable time frame
pseudogenes are an issue, especially for plant genomes
MAKER-P uses RNA-seq data
Arabidopsis genome encodes 10 MAP2Ks Arabidopsis thaliana
6a build was created to provide maize community with single annotation build comprising best-possible annotated gene models Zea mays
MAKER-P on Texas Advanced Computing Center can de novo annotate Arabidopsis thaliana genome Arabidopsis thaliana
MAKER-P provides solution to genome annotation synchronization problem
MAKER-P update extends and modifies exon coordinates of TAIR10 gene annotations Arabidopsis thaliana
MAKER-P is based upon widely used MAKER genome annotation pipeline
MAKER-P training process uses splice-aware aligner Exonerate
5b+ miRNA annotations were created by aligning genomic sequences against miRBase using BLASTN Zea mays
Copia superfamily represented 8.74% of genome assembly Ficus carica
MAKER-P tool kit is freely available for academic use
maize assembly annotation using 2,172 CPUs finished in 2 h and 53 min Zea mays
MAKER-P can systematically improve upon quality of existing V2 annotation build Zea mays
5b+ annotation build contains 251 new genes
5b+ annotation build has higher percentage of models with annotated start and stop codons
AUGUSTUS comes pretrained for maize Zea mays
wheat ESTs and homology to rice loci used for improved and up-to-date annotation of wheat array probesets
MAKER-P includes capability for noncoding RNA annotation
MAKER-P on Texas Advanced Computing Center completes annotation in less than 3 hours
MAKER-P annotation of alternatively spliced transcripts mirrors performance on Arabidopsis genome Zea mays
MAKER-P default model excludes fourth exon of (ATEXO70A1, EXO70A1, AT5G03540) .3 Arabidopsis thaliana
RNA-seq data provides means for improvement of genome annotations
better supported genes have correspondingly more evidence Arabidopsis thaliana
novel genomes often contain new classes of repeats absent from RepBase and MAKER's internal repeat library
MAKER-P tool kit contains two guided tutorials for repeat library construction
MAKER-P is available to iPlant users as supported module on TACC Lonestar cluster
MAKER-P creates new gene models
MAKER-P de novo build contains 5,045 additional annotations not overlapping 5b+ gene models Zea mays
genome annotations fall out of synchronization with available evidence
MAKER-P installed on iPlant resources at Texas Advanced Computing Center (TACC) grants ability to rapidly annotate new plant genomes
Arabidopsis thaliana genome contains expressed sequence tags (ESTs) Arabidopsis thaliana
MAKER-P uses update functionality to automatically update TAIR10 annotations Arabidopsis thaliana
MAKER-P includes integrated means for tRNA and snoRNAs
gene finders trained for other genomes are challenging and fraught with difficulties gene model accuracy
MAKER-P can carry out complete de novo annotation of 17.83-Gb draft loblolly pine genome in less than 24 h Pinus taeda
6a build is composed of additional new, well-supported genes from MAKER-P de novo build Zea mays
MAKER-P provides means for management of existing plant genome annotations
plant genomes can be unusually rich in transposable elements
F-box family has pseudogene:gene ratio of 152:567 Arabidopsis thaliana
MAKER-P is useful for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes
MAKER-P performance with custom repeat library shows little difference in de novo annotation of Arabidopsis Arabidopsis thaliana
WebApollo database can be constructed and placed online within hours of finishing annotation run
regional and whole-genome duplication events impact gene structure annotation
maize genome annotation build 6a is demonstrably superior to existing 5b+ build Zea mays
6a annotation build includes 4,466 additional new gene annotations
MAKER-P can be used to train Augustus and SNAP
Arabidopsis thaliana genome is well annotated Arabidopsis thaliana
Arabidopsis thaliana genome sequencing discovered conserved plant (AT.EIF4E1, CUM1, EIF4E, eIF4E1, AT4G18040) and (EIF(ISO)4E, EIF4E2, eIFiso4E, LSP, LSP1, AT5G35620) Arabidopsis thaliana
MAKER-P is benchmarked using Arabidopsis thaliana genome Arabidopsis thaliana
MAKER-P demonstrates utility for annotation of novel plant genomes
Arabidopsis assembly is approximately 120 Mb Arabidopsis thaliana
MAKER-P default model summarizes three transcripts with single consensus transcript Arabidopsis thaliana
6a annotation build contains 4,466 new genes
MAKER-P can improve existing genome annotation build
MAKER-P updates can be carried out much more rapidly and frequently than heretofore possible
MAKER-P training process uses RNA-seq data and ESTs
Rice BAC clones annotated data obtained from RAP-DB Oryza sativa
MAKER-P speed and flexibility will enable individual iPlant users to generate custom genome annotation data sets
MAKER-P is able to effectively revise gene models regardless of complexity or quantity of evidence
repetitive DNA masked with RepeatMasker Oryza sativa
MAKER-P annotations are comparable in quality to maize V2 annotation build Zea mays
carnivorous bladderwort plant Utricularia gibba has fewest repeats among sequenced plant genomes Utricularia gibba
MAKER-P de novo annotation build uses same evidence data sets as Table I and Figures 1 to 4 Zea mays
MAKER-P can annotate noncoding RNAs Arabidopsis thaliana
MAKER-P can manage genome annotations
MAKER-P de novo gene build contains 1,250 fewer genes than TAIR10 Arabidopsis thaliana
MAKER-P update of TAIR10 gene model maintained all three transcripts Arabidopsis thaliana
Arabidopsis thaliana has annotated 66 putative full-length pectin methyl-esterases (PMEs) Arabidopsis thaliana
false-positive rate of pipeline is 3.1% Arabidopsis thaliana
Photoperiod-H1 is annotated as high-confidence gene Hordeum vulgare
iPlant Cyberinfrastructure was used for MAKER-P update and revision of maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build Zea mays
4,049 multiexon MAKER-P de novo models many are sizable, multiexon gene models that contain domains Zea mays
pseudogenes and noncoding RNAs are absent from The Arabidopsis Information Resource 10 build Arabidopsis thaliana
MAKER-P provides basic resource that democratizes genome annotation
highly supported, highly expressed genes often have some data that strongly support a given transcript model Arabidopsis thaliana
MAKER-P has extended MAKER to include means for annotation of pseudogenes and ncRNAs
MAKER-P identified and annotated 4,466 additional protein-coding genes Zea mays
6a annotation build includes 102,370 pseudogene fragments
MAKER-P de novo build is quite similar to 5b+ build Zea mays
2,192 annotated tRNA genes comprise 1,398 decode standard amino acids, 4 decode seleno-Cys, 7 are suppressor tRNAs, 12 are undetermined, 771 are pseudogenized Zea mays
6a build includes additional 3,006 ncRNA genes and 102,370 pseudogene annotations Zea mays
MAKER-P on Texas Advanced Computing Center can de novo annotate Zea mays genome Zea mays
MAKER-P update brings gene models into better agreement with available evidence Arabidopsis thaliana
MAKER-P is multithreaded, fully message passing interface-compliant annotation engine
pseudogenes impact gene structure annotation
MAKER-P can carry out complete de novo annotation of 17.83-Gb draft loblolly pine genome Pinus taeda
6a build genes are more congruent with evidence
Acer truncatum genome contains 868 snRNAs Acer truncatum
23 160 genes in v3.1 is 5142 more than in v2.5 Aethionema arabicum
MAKER-P is installed and benchmarked on Texas Advanced Computing Center
MAKER-P results in demonstrable improvements to annotations of Arabidopsis genome Arabidopsis thaliana
Arabidopsis thaliana is used as model for genome annotation benchmarking Arabidopsis thaliana
single full-length cDNA may confirm entire exon-intron structure of annotated transcript Arabidopsis thaliana
WebApollo provides functionality for remote editing of annotations
genetically anchored physical contigs provide extended information about genomic context and local neighborhood Hordeum vulgare
annotation consists of 137,208 gene transcripts Zea mays
all new models have gene-finder support Zea mays
MAKER-P provides effective means for annotation of plant genomes
5b annotation build differs from 5b+ annotation build
MAKER-P provides management functions
MAKER-P-trained version of Augustus calls about 5% more genes Zea mays
MAKER-P toolkit provides process for annotation of pseudogenes
54.6% of pseudogenes have one or more MapMan annotations Zea mays
Gypsy superfamily represented 54.56 Mbp Ficus carica
Euscaphis japonica genome contains 349 microRNAs Euscaphis japonica
oligos are located 6 kb upstream of transcription start sites Triticum aestivum
wheat pan-'NLRome' could be larger than estimated Triticum aestivum
Oryza sativa has 35 pectin methyl-esterases (PMEs) Oryza sativa
Brachypodium distachyon has 29 pectin methyl-esterases (PMEs) Brachypodium distachyon
TE detection across final assembly is significantly increased compared with previous version of fig genome Ficus carica L.
genome of L. sativa has been sequenced and assembled recently high-quality, comprehensive reference genome Lactuca sativa
A. thaliana genome survey revealed RRM-containing proteins Arabidopsis thaliana
20-kb region contains Loc_08g07740 Oryza sativa
gene bodies and (ATNACK2, NACK2, TES, AT3G43210) represented 28.06% and 37.39% of assembly Ficus carica
barley IBSC genome contains 36 XTH sequences Hordeum vulgare
transposable elements covered 111.06 Mbp Ficus carica
tandem repeats represent 12.74 Mbp Ficus carica
centromeric region search isolated 42 centromeric contigs Ficus carica
exon had total length of 43.5 Mbp Ficus carica
intron had total length of 49.55 Mbp Ficus carica
genome assembly searched for sex determining region location Ficus carica
Acer truncatum genome contains 1345 miRNAs Acer truncatum
simplest genome annotation method is based on ab initio predictors
verified gene models from Ae. arabicum v3.0 free of evident errors were used to support gene prediction Aethionema arabicum
MAKER tries to predict genes based on canonical splice sites by default Aethionema arabicum
protein-coding genes predicted 11,528 5′-UTRs Ficus carica
17 765 genes (98.6%) were found using BLAST v3.1 protein set Aethionema arabicum
gene annotation performed using MAKER, AUGUSTUS, and SNAP pipelines Manihot esculenta
retrotransposons or Class I elements represented 28.3% of genome assembly Ficus carica
Aethionema arabicum genome version 3 (V3) did not predict gene models de novo but lifted over gene models from v2.5, which were lifted before from v1.0 Aethionema arabicum
Trinity showed worse results for all the other parameters tested Aethionema arabicum
gene Aa3LG10G286 of v3.0 is 37 kbp long and was split into four genes in v3.1 Aethionema arabicum
tandem repeats represent 3.82% of assembly Ficus carica
DNA transposons or Class II elements represented 5.01% of genome assembly Ficus carica
highly repetitive 103 bp-long tandem repeats considered as putative centromeric repeat Ficus carica
3′ UTR intron was not annotated in v2.5 or v3.0 Aethionema arabicum
GFF from v2.5 and v3.0 had formatting problems such as missing features or CDSs that were not multiples of three Aethionema arabicum
Acer truncatum genome contains 744 tRNAs Acer truncatum
SNAP was used as ab initio gene predictor Aethionema arabicum
Repeat annotation in V3.0 presents a striking increase compared to 72 Mbp (42.3% of the nucleotide sequence) in the initial public assembly V1.0 Aethionema arabicum
free numbers between tens can be used in future annotations Aethionema arabicum
interrupted open reading frames (ORFs) were migrated in both versions, 2.5 and 3.0 Aethionema arabicum
1224 genes from v3.0 and 1772 genes from v3.1 were labeled as putative (ATNACK2, NACK2, TES, AT3G43210) Aethionema arabicum
lettuce reference genome provides high-quality, comprehensive reference genome for analysis of the Compositeae family Lactuca sativa
intron had average length of 367 bp Ficus carica
70-kb region flanked by markers SEQ3-1 and SEQ5-1 contains 11 predicted genes Oryza sativa
transposable elements covered 33.57% of assembly Ficus carica
transcripts produced by Scallop were provided to MAKER Aethionema arabicum
code and tools of Ae. arabicum DB are based on previous databases such as OliveTreeDB, Physcomitrella patens Gene Model Lookup DB, and Sol Genomics Network Aethionema arabicum
lift-over method could only migrate genes in regions that already existed in previous genome sequence versions Aethionema arabicum
Ae. arabicum genome sequence V3 was used as reference for gene annotation v3.1 Aethionema arabicum
MAKER results yielded comparable results with regard to number of genes supported by PacBio full-length transcripts, BUSCO completeness, Gene Ontology (GO) terms and protein domain evidence Aethionema arabicum
gene annotation of Ae. arabicum v1.0 was lifted over to v2.5 Aethionema arabicum
Chromovirus elements typically found in centromeric structures
gene annotation process identified 1685 non-protein-coding genes Ficus carica
lift-over tools are used to migrate previous gene versions to new genome versions
5520 genes in v2.5 and 4589 genes in v3.0 were affected by obvious annotation errors or annotated as possible (ATNACK2, NACK2, TES, AT3G43210) Aethionema arabicum
4,354 large INDELs another 882 within 2,000 bp upstream of genes Manihot esculenta
2,941 genes newly identified in new Gastrodia elata genome assembly Gastrodia elata
241 201 genes with identical sequence between two assemblies were unambiguously mapped IWGSC RefSeq v2.1
annotation is essential for research community benefit from sequence data Arabidopsis; rice; Medicago; poplar
phylogenetic heatmap approach was used to classify 55,801 genes in the MSU7 rice genome Oryza sativa
gene model annotation v3.1 for Aethionema arabicum predicts gene model structure de novo using MAKER Aethionema arabicum
MAKER with Scallop produced a 0.7% higher BUSCO completeness and 433 more genes supported by full-length transcripts than PASA Aethionema arabicum
SNAP ab initio prediction supported two gene models instead of one Aethionema arabicum
locus name 'Aa31LG1G10' comprises 'Aa' stands for Ae. arabicum, '31' for gene annotation version 3.1, 'LG' for linkage group, followed by the number of the LG, and 'G' for gene followed by the gene number Aethionema arabicum
analysis of protein domains with InterProScan detected 947 genes in v3.0 containing TE domains Aethionema arabicum
23 160 genes in v3.1 includes 5606 genes that did not overlap with the genes of v3.0 Aethionema arabicum
20 LsOFP, 22 LsSUN, 10 LsWOX and five LsYABBY genes identified in HZ genome
MAKER-P is benchmarked using Zea mays genome Zea mays
6a build genes have more exons
Scallop transcriptome representation was selected as the input for final annotation in MAKER Aethionema arabicum
2579 TAPs in v3.1 is 449 more than in v3.0 Aethionema arabicum
18 066 (76.7%) genes were assigned putative functions
gene intervals spanned 136 Mb of ChrUn scaffolds
105 534 HC and 155 624 LC genes were located on pseudomolecules
coding sequence defined by TAIR8 Arabidopsis thaliana
MAKER results did not indicate superior performance of any of the different assemblers Aethionema arabicum
annotation v3.1 generated 24 932 genes Aethionema arabicum
annotation pipelines such as MAKER is possible to include expression data and proteins that were not available when the previous gene annotation was created Aethionema arabicum
2728 genes (48.7%) correspond to fixed version of broken genes in v3.0 Aethionema arabicum
Arabidopsis genome contains different numbers of TFs in NAC TF family Arabidopsis thaliana
Full-length cDNA (FLcDNA) provides high-resolution evidence to accurately define coding and noncoding features
ab initio gene predictors face challenges in defining exon-intron boundaries
annotation requires expert oversight to distinguish biologically meaningful features from technical artifacts
(PSY1, AT5G58650) gene is annotated as Model estExt_GenewiseH_1.c_620008 Chlamydomonas
375 receptor-like cytoplasmic kinases identified in rice genome Oryza sativa
R2R3-MYBs were identified in Populus trichocarpa genome Populus trichocarpa
retrotransposons or Class I elements represented 84.95% of repetitive content Ficus carica
long terminal repeat retrotransposons accounted for 28.03% of total genome assembly Ficus carica
Euscaphis japonica genome contains 3940 small nuclear RNAs Euscaphis japonica
repetitive elements occupy 66.36% of Gastrodia elata genome Gastrodia elata
percentages of subgenome lengths represented by individual super-families and families were similar among A-, B- and D-subgenomes
final gene set was named according to previously introduced nomenclature Aethionema arabicum
MAKER transcripts with isoforms and PacBio full-length transcripts were included in Ae. arabicum DB for downloading and for inspection in the genome browser Aethionema arabicum
lift-over process from v1.0 to v3.0 resulted in loss of 987 gene models Aethionema arabicum
3′ UTR intron was found in v3.1 annotation Aethionema arabicum
lower number of genes in Ae. arabicum could be due to in some cases, the concatenation of close genes Aethionema arabicum
integrated pipeline combining de novo prediction, homology search and RNA-sequencing (RNA-Seq) verification identified 23 541 putative gene models
Euscaphis japonica genome contains 759 ribosomal RNAs Euscaphis japonica
gene duplication from whole genome duplication limits ability to identify accurately which genes are haplotype specific and missing from one annotation Manihot esculenta
total length of (ATNACK2, NACK2, TES, AT3G43210) increased from 11 921 309 743 to 12 092 094 168 bp
rice genome contains different numbers of TFs in ALFIN-like TF family Oryza sativa
bottle gourd genome contains 1062 transfer RNAs (tRNAs)
bottle gourd genome contains 340 small nuclear RNAs
transposable elements (TEs) were reannotated in IWGSC RefSeq v2.1 Triticum aestivum
machine-learning classifier predicts in fungal gene catalogs
gene annotation included transcript evidence from RNA-sequencing (RNA-seq) from 11 tissue types Manihot esculenta
Phase1 assembly identified 1,159 arrays containing 2,608 genes Manihot esculenta
annotations from Yuan et al. (2018) and updated annotations merged to generate 21,115 protein-coding genes and 3,664 pseudogenes Gastrodia elata
bioinformatics approaches to investigate plant genetic structure include annotating functional elements
rice genome contains different numbers of TFs in MADS TF family Oryza sativa
supervised learning is applied to predicting regulatory and non-regulatory regions in the maize genome Zea mays
MAKER tests with short-read assemblers provided PacBio transcripts separately Aethionema arabicum
proteins supported two gene models instead of one Aethionema arabicum
Marker-Assisted Gene Annotation Transfer for Triticeae (MAGATT) pipeline was used for gene annotation transfer Triticum aestivum; Triticum turgidum ssp. durum
264 876 gene models represented 207 575 intervals containing between 1 and 4 genes
Arabidopsis MYB TFs show huge difference in numbers of subfamily members Arabidopsis thaliana
methods to construct genomewide maps of fitness variation may profoundly improve efforts to discover functional elements in plant genomes
over 70% of annotated genes were supported by high evidence levels Manihot esculenta
gene intervals had average size of 9.6 kb
common wheat genome contains 137 (BSK12, SSP, AT2G17090) genes Triticum aestivum
Cucumis metuliferus CM27 genome contains 29,214 protein-coding genes Cucumis metuliferus
1379 HC and 4216 LC genes were located on scaffolds assigned to ChrUn
Arabidopsis genome contains different numbers of TFs in (AtbZIP, bZIP, AT1G68880) TF family Arabidopsis thaliana
Phase0 assembly identified 1256 arrays containing 2,865 genes Manihot esculenta
common wheat genome contains 3606 TFs Triticum aestivum
increase in pseudomolecule length in IWGSC RefSeq v2.1 resulted in percentage of CS genome accounted for by (ATNACK2, NACK2, TES, AT3G43210) (85.0%) nearly identical to IWGSC RefSeq v1.0 (84.7%)
lift-over process from v1.0 through v2.5 to v3.0 caused formatting inconsistencies and gene structure annotation errors Aethionema arabicum
co-location of biosynthetic enzymes can dramatically increase ease of identifying biosynthetic pathway genes
protein evidences were provided to MAKER Aethionema arabicum
functionally enriched, context-aware plant genome annotations bridge structure and function
23 541 putative gene models is 1069 more than predicted from USV
gene annotation annotated 33,653 and 35,684 genes in Phase0 and Phase1 assemblies, respectively Manihot esculenta
oligos are located 6 kb downstream of transcription termination sites Triticum aestivum
uniform set of Medicago gene annotations was generated by coordinated international effort Medicago truncatula
differences in bioinformatic search stringency cause huge difference in numbers of TF subfamily members Arabidopsis thaliana
NLR-Annotator scans genomic sequences for combinations of NLR-associated sequence-motifs
array sizes varied to 11 and 8 genes in array in Phase0 and Phase1, respectively Manihot esculenta
MCScanX gene anchor file supplied to help users identify probable best hits between genes of two phases Manihot esculenta
Gastrodia elata genome from Yuan et al. (2018) has 86.60% coding genes identical to updated genome Gastrodia elata
2792 IWGSC RefSeq v1.0 genes could not be identified in IWGSC RefSeq v2.1
A. cruentus Isoseq transcript long-read data complemented homology and ab initio gene prediction approaches Amaranthus cruentus
annotation extends into structural dimensions
proteomic data gives useful information for genome annotation
BAC clones were sequenced and manually annotated for genes and (ATNACK2, NACK2, TES, AT3G43210) Oryza spp.
empirical data may be incorporated to train models, validate gene structures, and identify untranslated regions, promoters, enhancers, and other non-coding features
20K array and 45K array is mapped to release 5 of the TIGR Rice Genome Annotation Oryza sativa
a SAM T99 'superfamily'-level model built and scored is one of seven procedures used in the identification process
rice has 769 gene models Oryza sativa
Volvox carteri genome (v2.1) in Phytozome 12 contains computationally annotated pherophorin genes Volvox carteri
evidence-based frameworks are rapidly advancing the field
VvWOX genes from 'Chardonnay' showed nucleotide differences from sequences reported in grapevine databases Vitis vinifera
many seed samples (234 of 294) were included for RNA-seq data used in this annotation Aethionema arabicum
bottle gourd genome contains 155 ribosomal RNAs
repetitive elements identified in Gastrodia elata genome Gastrodia elata
bioinformatics approaches to investigate plant genetic structure include identifying repetitive sequences
2974 IWGSC RefSeq v1.0 genes had sequence changed in IWGSC RefSeq v2.1 compared with v1.0
de novo annotation of (ATNACK2, NACK2, TES, AT3G43210) with CLARITE annotated 4 199 592 (ATNACK2, NACK2, TES, AT3G43210) belonging to 506 families
annotation v3.1 including long-read transcripts allows a better detection of proximal regulator elements, like introns in the 3′ UTRs Aethionema arabicum
transcription factors (TFs) in the whole genome of Triticum urartu have been successfully identified 1238 transcription factors Triticum urartu
Gastrodia elata genome annotation performed using homologous genes from closely related species Gastrodia elata
Gastrodia elata genome annotation performed using de novo prediction results Gastrodia elata
BLASTP method identified fifty-four candidate proteins Cajanus cajan
experience gained from previous whole-genome efforts informed generation of uniform set of Medicago gene annotations Medicago truncatula
Arabidopsis genome contains different numbers of TFs in ALFIN-like TF family Arabidopsis thaliana
Helixer improves exon-intron boundary accuracy
Tiberius enables learning both sequence features and structural rules directly from DNA
chromosome-scale, haplotype-resolved genomes are fueling innovations in annotation frameworks, including pangenomes, machine learning, and multi-omic integration
evidence-based frameworks integrate RNA sequencing
7.21-kb heterozygous deletion overlaps upstream regulatory region of Manes.03G086200 Manihot esculenta
RNA sequencing (RNA-seq) data mapped to Gastrodia elata genome assembly Gastrodia elata
uniform set of Medicago gene annotations and other views of genome data has been provided at several websites Medicago truncatula
rice genome contains different numbers of TFs in WRKY TF family Oryza sativa
AI-driven approaches enable predictive and interpretable models of genome function across diverse plant species
61 U-box proteins identified from Arabidopsis genome Arabidopsis thaliana
scoring an HMM built from the PFAM 'full' alignment is one of seven procedures used in the identification process
gene prediction and annotation typically begin with ab initio, AI-based approaches
U-box proteins identified in rice japonica and indica subspecies genomes Oryza sativa
genome sequencing and functional analysis of rice have been completed Oryza sativa
training on rapidly expanding multi-omic datasets allows capture subtle genomic signals and regulatory complexity
annotation data produced in this study and the Ae. arabicum genome sequence V3.0 are available via web-accessible database Aethionema arabicum
genes of v3.1 are supported by higher numbers of proteins domains, GO terms and A. thaliana homologs and higher BUSCO completeness Aethionema arabicum
oligos are located on predicted coding regions Triticum aestivum
NLR annotation requires specialized tools
artificial intelligence (AI)-driven gene predictors outperform traditional ab initio tools
WU-BLAST of known U-box proteins against the genome is one of seven procedures used in the identification process
putative VvRop- and VvRab-interacting proteins are present in Vitis vinifera genome Vitis vinifera
exon 5 sequence from Glyma 18g43210.1 showed annotation error from phytozome Glycine max
multi-omic datasets enhance training and identification of features such as promoters
150 kb region of chromosome 12 contains 60 predicted genes Oryza sativa L.
StringTie produced very similar results to Scallop Aethionema arabicum
368 genes from v3.0 did not overlap with genes in v3.1 Aethionema arabicum
distance to nearest genes measured for large indels Manihot esculenta
new annotation release contained 108 010 HC and 161 535 LC gene models
IWGSC RefSeq Annotation v2.1 contains 266 753 genes
D-subgenome had under-represented Gypsy super-family
77 putative U-box proteins identified from rice genome Oryza sativa
v3.0/v4.0 annotation is underrepresented in tandemly repeated genes (e.g., NBS genes) and repetitive elements Solanum melongena
gapless, haplotype-resolved genomes shifts bottleneck from assembly to annotation and interpretation
VvWOX13A has uncorrected annotation in C-terminus of Pinot Noir ENTAV115 database Vitis vinifera
VvWOX1 and VvWOX6 genes were renamed from VvWOX1A and VvWOX1B Vitis vinifera
large-scale orthology networks continue to improve functional inference
high-resolution evidence from full-length cDNA, long- and short-read transcriptomes, and epigenomic layers accurately define chromatin states
grapevine draft genome sequences were screened using AtWOX proteins from Arabidopsis Vitis vinifera; Arabidopsis thaliana
epigenomic layers such as DNA methylation provide high-resolution evidence to accurately define coding and noncoding features
HMM profile used for searching rice chromosomes pseudomolecules version 5 of TIGR Oryza sativa
HvGS gene sequences were aligned to barley genomic sequence Hordeum vulgare
Protein function annotation was performed by InterProScan v5.52–86.0 Selaginella kraussiana
transposable elements (TEs) account for 35.90% of (GCS1, HAP2, AT4G11720) assembly Prunus persica
RING-domain variants were not included in curated analysis Arabidopsis thaliana
transposable elements (TEs) account for 36.32% of (HAP1, MAGO, MEE63, AT1G02140) assembly Prunus persica
Pereira recovered 215 Pseudoviridae insertions Arabidopsis thaliana
104 elements identified as (CMT3, AT1G69770) targets within transposons include long terminal repeat (LTR), long interspersed element (LINE) and short interspersed element (SINE) retrotransposons, DNA transposons and helitrons Arabidopsis thaliana
complete rice genome sequence indicates presence of at least 11 glutelin genes Oryza sativa
current AI-based methods instead report primary gene models
plant genome annotations constructed for model and crop species
SWEET transporter search strategies validates complete set of SWEET transporters from V. vinifera Vitis vinifera
methylated reads mapped to genes Populus trichocarpa
SGN data provided chromosome assignment for 129 loci Solanum lycopersicum
tomato genome sequence provides robust method to identify genomic location of unmapped loci Solanum lycopersicum
evidence-based frameworks integrate 3D genome data
proteins from same loci of japonica and indica genome may have lower protein sequence identity due to different annotation procedures Oryza sativa subsp. japonica; Oryza sativa subsp. indica
innovations in annotation frameworks, including pangenomes, machine learning, and multi-omic integration reveal functional elements long hidden by fragmented, low-contiguity plant genomes
four additional RLCKs identified by performing search in KOME database Oryza sativa
Arabidopsis contains 21 BTB-nonphototropic hypocotyl (NPH)3 proteins Arabidopsis thaliana
21 BTB-NPH3 proteins in Arabidopsis represent over 25% of the BTB proteins in this genome Arabidopsis thaliana
detailed knowledge of the Elbe retrotransposons and their numerous remnants enables straightforward annotation
comparative studies of Elbe families will support annotation of these genomes
lineage-specific genes had average GC content much higher than other genes Cicer arietinum
transcription factor genes belonging to transcription factor families Cicer arietinum
full-length cDNA or RNA-Seq contigs aligned against Morex assembly Hordeum vulgare
three candidate SSK genes were obtained from apple genome Malus domestica
CCHC domain-containing transcription factors relatively low fraction in chickpea draft genome Cicer arietinum
Rice GT Database now contains 622 loci Oryza sativa
Twenty-one NB-ARC HMMs matched EL10 genome without de novo predicted proteins Beta vulgaris
38 349 Dub candidate proteins were identified in 1642 fungal genomes
full-length transcript sequencing data improves accuracy
new genome annotation is comparable in accuracy to well-annotated model plant genomes Amaranthus hypochondriacus
predictions of cysteine-rich domains have to be met with skepticism reliability of domain predictions Arabidopsis thaliana
Repbase update is used to identify and classify repeat sequences
Pereira analysis calculated 5.60% retrotransposon DNA content Arabidopsis thaliana
repetitive elements were identified using RepeatMasker v 4.1.2-p1 Selaginella kraussiana
non-coding RNA prediction yielded 8,299 rRNA genes in (HAP1, MAGO, MEE63, AT1G02140) Prunus persica
centromere regions were successfully predicted in assembled chromosomes Prunus persica
Pereira recovered 219 Athila insertions Arabidopsis thaliana
number of annotated pseudogenes in Arabidopsis thaliana has been increasing dramatically TIGR1 annotation to TAIR7 annotation Arabidopsis thaliana
nodulation-related genes in chickpea lower than nodulation genes in Medicago, soybean and pigeonpea Cicer arietinum; Medicago truncatula; Glycine max; Cajanus cajan
majority of identified R gene loci reside in poorly or previously unannotated regions of the genome Solanum tuberosum
de novo transcriptome completely covered 55% of annotated Bd21 genes by at least one unbroken de novo transcript Brachypodium distachyon
newly identified transcripts were annotated by Gene Ontology (GO) terms through BLASTx and Interpro scan Brachypodium distachyon
Further seven partial NB-LRRs were revised to yield more complete NB-LRR genes Solanum tuberosum
RenSeq can be used to improve existing NB-LRR gene annotations
genes described in this study 75% reside in poorly or previously unannotated regions of the potato and tomato genome Solanum tuberosum; Solanum lycopersicum
19 genes containing an F-box motif were found by Blast search of the 'Golden Delicious' genome Malus domestica
methylated reads mapped to repeats Populus trichocarpa
Arabidopsis repeat library was constructed using approach outlined in basic tutorial Arabidopsis thaliana
MAKER-P pseudogene analysis identified 4,204 pseudogenes Arabidopsis thaliana
MAKER-P added additional untranslated regions (UTRs) to 1,393 5b+ gene models Zea mays
5b+ annotation build included 213 improved models Zea mays
version of MAKER-P available within the iPlant Cyberinfrastructure can reannotate entire maize genome in less than 3 h Zea mays
MAKER-P revision process for 5b+ merged 31 annotations Zea mays
MAKER-P toolkit provides means for identification of known and new classes of ncRNAs
BLASTP search identified (AtNMD3, NMD3, AT2G03820) sequence Oryza sativa
MAKER-P installed on iPlant resources at Texas Advanced Computing Center (TACC) grants ability to revise and manage existing plant genomes
MAKER-P generates 3,059 gene annotations on maize chromosome 10 Zea mays
maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build (5b+) was updated to produce 6a maize genome annotation Zea mays
most changes during MAKER-P revision are to models having lowest (best) AED scores Zea mays
MAKER-P revised models have on average more exons Zea mays
4,049 multiexon MAKER-P de novo models although shorter than average 5b+ annotation Zea mays
MAKER-P includes support for pseudogene identification
automated high-quality genome annotation system is of the utmost importance for high-quality genome annotation Zea mays
MAKER-P was used for revision of 5b+ annotations in light of 96 different RNA-seq data sets Zea mays
6a build is composed of MAKER-P updated 5b+ gene models with additional 5′ and 3′ exons and UTR sequences Zea mays
R-genes selected from PRG database and NCBI protein database
INFERNAL identified non-coding RNAs Cicer arietinum
polyploid Hordeum species advantageous to rely on barley assembly as reference Hordeum species
identified NB-LRRs in Solanum tuberosum clone DM increased from 438 to 755 NB-LRRs Solanum tuberosum
95 Or genes were reported by Engsontia et al. (2014) Plutella xylostella
repeat library of the Selaginella kraussiana genome was constructed ab initio using RepeatModeler v2.0.2 with the parameter "-LTRStruct" Selaginella kraussiana
gene structure annotation dataset was filtered using AGAT v0.8.0 to remove genes with incomplete structures and those encoding proteins less than 50 amino acids in length Selaginella kraussiana
MAKER-P rapidly updates genome annotations
MAKER-P maize annotations compare favorably with current chromosome 10 V2 annotations Zea mays
RefGen_v2 includes 110,028 transcript models in the Working Gene Set Zea mays
genome projects have annotations that embody years of manual curation and revision
MAKER-P was used for systematic comparison of 5b and 5b+ annotation builds Zea mays
proportion of uniquely mapped reads in 16C samples were respectively 39% intergenic, 26% intronic, 19% exonic and 16% ribosomal regions Solanum lycopersicum
Sp9509_oxford_v3 assembly predicted 20 661 protein coding genes Spirodela polyrhiza
TranSeq reads mapped to reference tomato genome could significantly improve genome annotation Solanum lycopersicum
TranSeq and TruSeq analysis identified new exon Solanum lycopersicum
many de novo sequenced plant genomes suffer from extensive fragmentation and poorly defined gene models
complementary approach combining two RNA-seq methods takes advantage of standard RNA-seq procedure to obtain complete transcript sequences combined with 3′-end sequencing method
large portion of gene models in tomato genome was misannotated at 3′-end Solanum lycopersicum
uniquely mapped reads in DSN-treated samples fell in intergenic, intronic, exonic and ribosomal regions Solanum lycopersicum
regions in (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) and/or PH207 genome with homology to syntenic loci were not annotated as gene models
e-value cut-off for genomic comparison confirmed effectiveness for ubiquiton subfamily comparison
ubiquiton genes could be clustered into three groups based on copy-number profiles
Ub2 subfamily containing 2 Ub domains
1327 resistance gene analogues (RGAs) were identified on 15 chromosomes Linum usitatissimum
TranSeq reads are mapped to tomato reference genome (ITAG2.4) Solanum lycopersicum
Chlorella vulgaris 211/11P genome has exon and intron average length shorter than Chlorella zofingiensis Chlorella vulgaris; Chlorella zofingiensis
size of domain family with annotated genes generally correlates with number of pseudogenes Arabidopsis thaliana
WebApollo database containing TAIR10, MAKER-P de novo, and MAKER-P updated annotations is available online at http://weatherby.genetics.utah.edu:8080/WebApollo_A_thaliana Arabidopsis thaliana
MAKER-P de novo build contains additional pseudogene, ncRNA, and well-supported protein-coding gene models Zea mays
MAKER-P transcript structure reflects best-possible gestalt of all evidence for that gene Arabidopsis thaliana
WebApollo can be rapidly deployed in support of distributed genome jamborees
MAKER-P was applied to much less tractable maize genome Zea mays
underlying annotations in miRBase are generally experimentally determined or experimentally verified
mammalian and fish genomes contain 25 to 50 examples from each of BTB-ZF, BTB-BACK-kelch (BBK) and T1-Kv families
Caenorhabditis elegans contains very few BTB-ZF and BBK proteins Caenorhabditis elegans
annotation of the sugar beet genome sequence supported by FISH karyotype results Beta vulgaris L.
classification and analysis results are available as integrated web resource, the database of Arabidopsis Splicing Related Genes (ASRG) Arabidopsis thaliana
newly sequenced genomes have problems with protein annotation
iterative BLAST searches with Athila query sequences estimated Athila copy numbers Arabidopsis thaliana
gene fragments were removed from repeat library using Protexcluder v1.1 after comparison of sequences with those in the alluniRefprexp070416 database Selaginella kraussiana
annotation pipeline identified 26,131 protein-coding genes in (GCS1, HAP2, AT4G11720) Prunus persica
ordered set of manually curated RING domains includes all bona fide RING domains Arabidopsis thaliana
CEGMA pipeline revealed eukaryotic orthologous groups (KOGs) Cicer arietinum
Arabidopsis GTs total number is 456 loci Arabidopsis thaliana
training results of Augustus using BRAKER v2.1.6 in the etpmode mode were used as starting point for training Augustus through the embryophyta_odb10 database using BUSCO Selaginella kraussiana
chickpea-specific orphan genes less than orphan genes in rice Cicer arietinum; Oryza sativa
lineage-specific genes compared with other genes Cicer arietinum
NB-LRR complement in Solanum lycopersicum 'Heinz 1706' extended to 394 loci Solanum lycopersicum
GT48 (LOC_Os03 g02756.1) is clearly a valid rice locus Oryza sativa
genome-wide approach identified 97 (AtbZIP, bZIP, AT1G68880) transcription factor family members Elaeis guineensis
inconsistent annotation of genes in (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) or PH207 could lead to false positive identification of differentially fractionated genes
proportion of differentially fractionated GWAS hits was not significantly different from proportion of differentially fractionated genes among non-GWAS hits
phytochrome proteins of Arabidopsis used as query sequences for potato reference genome Arabidopsis thaliana; Solanum tuberosum
ATPase and nucleotide, cation and sugar transporters significantly higher in numbers in Thellungiella parvula than in Arabidopsis thaliana Thellungiella parvula; Arabidopsis thaliana
receptor and transporter activity more abundant in chickpea-specific gene families Cicer arietinum
CCHC domain-containing transcription factors similar to other plants
transcript contig mapping collapsed target space to 61.6 Mb of non-overlapping intervals Hordeum vulgare
RenSeq reveals NB-LRR loci from uncharacterized genomes of Solanaceous species Solanum spp.
unannotated transcripts were filtered for sequences with similarity to transposon-associated protein sequences and putative unspliced pre-mRNA transcripts Brachypodium distachyon
three R2R3-MYB transcription factor encoding genes have been identified in Medicago truncatula reference genome sequence Medicago truncatula
annotated assembly of grapevine genome found 261 putative genes located between IBMP locus markers Vitis vinifera
Ub subfamily further divided into Ub1, Ub2, Ub3, Ub4 and Ub5 subfamilies
Glycine latifolia genome annotation yielded high confidence gene set of 54,475 protein-coding loci Glycine latifolia
predicted secreted proteins (SPs) might represent pseudogenes Rhizophagus irregularis
Sp9509_oxford_v3 assembly identified 801 full-length long terminal repeat (LTR) retrotransposons Spirodela polyrhiza
801 full-length LTR retrotransposons compared with 656 identified in previously published Sp9509v3 assembly Spirodela polyrhiza
801 full-length LTR retrotransposons in ON assembly is 22% more LTRs identified in ON assembly Spirodela polyrhiza
non-coding RNA prediction yielded 501 tRNA genes in (HAP1, MAGO, MEE63, AT1G02140) Prunus persica
RetroMap delimits LTR retroelement insertions
element searches using reverse transcriptase sequences as queries will not identify elements lacking reverse transcriptase motifs
list of syntenic gene assignments contains 219 PH207 putative gene models that could not be identified due to assembly gaps
full-length, solo and fragmented LTRs accounted for 25.73% of all pseudomolecules Linum usitatissimum
two sequences located on chromosomes 8 and 10 exhibited no similarity to the carboxyl terminus Zea mays
TranSeq method use in settings such as gene expression in natural diversity sets will be reserved to species with de novo sequenced genomes
repetitive sequence annotation identified repetitive sequences comprising 41.99% of (GCS1, HAP2, AT4G11720) Prunus persica
RetroMap analysis calculated 3.36% retrotransposon DNA content Arabidopsis thaliana
(ASK1, ATSKP1, SKP1, SKP1A, UIP1, AT1G75950) proteins show significant expansions in C. elegans and A. thaliana Caenorhabditis elegans; Arabidopsis thaliana
non-coding RNA prediction yielded 130 miRNA genes in (HAP1, MAGO, MEE63, AT1G02140) Prunus persica
identified splicing-related genes were annotated with respect to inferred gene structure Arabidopsis thaliana
RetroMap identified 210 full-length Pseudoviridae elements Arabidopsis thaliana
unicellular eukaryotes contain only a handful of BTB domain proteins