PlantConnectome

predicted coding sequence of full-length transcripts

Amaranthus hypochondriacus

BRAKER2 genes

were combined with

Amaranthus hypochondriacus

unannotated locus with high similarity to ANR genes

had no annotation support from

computational and long-read annotation

Amaranthus hypochondriacus

gene PVIT_0015215.T1

corresponds to

gene g166

Plasmopara viticola

single-star and no-star genes

have little

supporting evidence

Arabidopsis thaliana

pseudogenes without disabling substitutions

have

median length of 175 bp

Arabidopsis thaliana

maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build (5b+)

updated and revised

Zea mays

MAKER-P update mode

provides means to refresh

annotations of established plant genomes

MAKER-P revision process for 5b+

decreased

5b+ gene set from 39,155 to 38,783 annotations

Zea mays

transporter genes

have been annotated in

model and non-model plant genomes

full-length transcript sequencing data

was added to

The Arabidopsis Information Resource 10 annotations

Amaranthus hypochondriacus

MAKER-P annotations

are comparable in quality to

Arabidopsis thaliana

previously annotated ncRNAs

are not transcribed or have

extremely low transcription levels in RNA-Seq data

Arabidopsis thaliana

repeated genes and other sequences

result in

more sequence alignments and gene predictions

RNA sequencing (RNA-seq) data

hold great potential for

annotation of newly sequenced plant genomes

even the largest plant genomes

allows it to scale to

contains

213 improved genes

untranslated region (UTR) and exon sequences

uses RNA-seq data to add

genome annotation v2.2

contains

23 817 annotated genes

Amaranthus hypochondriacus

Rice Annotation Project Database

contains

revised annotations for many rice genes

Oryza sativa

SDR of Y chromosome

had

219 genes

Spinacia oleracea L. subsp. turkestanica

haplotigs

contained

15,642 protein coding gene loci

Plasmopara viticola

plant genomes

can be difficult targets for

gene finders trained with unmatched species parameters

domain families

differ significantly in

pseudogene:gene ratio

Arabidopsis thaliana

causes suffering in

gene model accuracy

miRNA prediction pipeline miR-PREFeR of MAKER-P

follows

criteria for plant miRNA annotation

fewest repeats among sequenced plant genomes

contains

Arabidopsis thaliana

relative complexity of many plant genomes

makes challenging

creation, quality control, and dissemination of high-quality gene structure annotations

remaining MAKER-P unique protein-coding gene models

were broken into

two classes: multiexon models with confirmed splice sites and single-exon models with domains

Zea mays

amaranth genome

contains

three 3R MYB candidate genes

Amaranthus hypochondriacus

polished genome and full-length transcript sequencing data

enabled production of

most complete genome annotation of amaranth to date

Amaranthus hypochondriacus

large, complex plant genomes

can annotate in only a few hours

Zea mays

WebApollo

can be easily deployed in

classroom for hands-on instruction

MAKER-P in update mode

revises

intron-exon structures of reference annotation data set

original reference annotation

defaults to

MAKER-P revised models

contain

additional UTR sequence

Zea mays

evidence set

provides support for

90% of annotated genes in TAIR10

Arabidopsis thaliana

MAKER-P improvements

are made across

entire TAIR10 data set

Arabidopsis thaliana

single investigator using MAKER-P

can carry out

update of existing genome annotations with new RNA-Seq data

need for automated high-quality genome annotation system

will fulfill

Zea mays

4,049 multiexon MAKER-P de novo models

encode

multiexon transcripts with at least one confirmed splice site

Zea mays

even largest plant genomes could be annotated in reasonable time frame

guarantees constant, complete analysis of

RNA-seq data

MAKER-P throughput

demonstrates that

pseudogenes

are an issue, especially for

uses

encodes

Arabidopsis thaliana

was created to provide

maize community with single annotation build comprising best-possible annotated gene models

Zea mays

MAKER-P on Texas Advanced Computing Center

can de novo annotate

Arabidopsis thaliana

genome annotation synchronization problem

provides solution to

MAKER-P update

extends and modifies

exon coordinates of TAIR10 gene annotations

Arabidopsis thaliana

widely used MAKER genome annotation pipeline

is based upon

MAKER-P training process

uses

splice-aware aligner Exonerate

5b+ miRNA annotations

were created by

aligning genomic sequences against miRBase using BLASTN

Zea mays

Copia superfamily

represented

8.74% of genome assembly

Ficus carica

MAKER-P tool kit

is freely available for

academic use

maize assembly annotation using 2,172 CPUs

finished in

2 h and 53 min

Zea mays

quality of existing V2 annotation build

can systematically improve upon

Zea mays

contains

251 new genes

annotated start and stop codons

has higher percentage of models with

AUGUSTUS

comes pretrained for

maize

Zea mays

wheat ESTs and homology to rice loci

used for

improved and up-to-date annotation of wheat array probesets

MAKER-P on Texas Advanced Computing Center

includes capability for

noncoding RNA annotation

completes annotation in

less than 3 hours

MAKER-P annotation of alternatively spliced transcripts

mirrors

performance on Arabidopsis genome

Zea mays

MAKER-P default model

excludes

fourth exon of (ATEXO70A1, EXO70A1, AT5G03540) .3

Arabidopsis thaliana

RNA-seq data

provides means for

improvement of genome annotations

better supported genes

have

correspondingly more evidence

Arabidopsis thaliana

novel genomes

often contain

new classes of repeats absent from RepBase and MAKER's internal repeat library

MAKER-P tool kit

contains

two guided tutorials for repeat library construction

supported module on TACC Lonestar cluster

is available to iPlant users as

5,045 additional annotations not overlapping 5b+ gene models

creates

new gene models

MAKER-P de novo build

contains

Zea mays

genome annotations

fall out of synchronization with

available evidence

MAKER-P installed on iPlant resources at Texas Advanced Computing Center (TACC)

grants ability to rapidly annotate

new plant genomes

expressed sequence tags (ESTs)

contains

Arabidopsis thaliana

automatically update TAIR10 annotations

uses update functionality to

Arabidopsis thaliana

gene finders trained for other genomes

includes integrated means for

tRNA and snoRNAs

are challenging and fraught with difficulties

gene model accuracy

additional new, well-supported genes from MAKER-P de novo build

can carry out complete de novo annotation of 17.83-Gb draft loblolly pine genome in less than

24 h

Pinus taeda

6a build

is composed of

Zea mays

management of existing plant genome annotations

provides means for

plant genomes

can be unusually rich in

transposable elements

F-box family

has

pseudogene:gene ratio of 152:567

Arabidopsis thaliana

rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes

is useful for

MAKER-P performance with custom repeat library

shows little difference in

de novo annotation of Arabidopsis

Arabidopsis thaliana

WebApollo database

can be constructed and placed online within

hours of finishing annotation run

regional and whole-genome duplication events

impact

gene structure annotation

maize genome annotation build 6a

is demonstrably superior to

existing 5b+ build

Zea mays

6a annotation build

includes

4,466 additional new gene annotations

can be used to train

Augustus and SNAP

conserved plant (AT.EIF4E1, CUM1, EIF4E, eIF4E1, AT4G18040) and (EIF(ISO)4E, EIF4E2, eIFiso4E, LSP, LSP1, AT5G35620)

well annotated

Arabidopsis thaliana

genome sequencing discovered

Arabidopsis thaliana

is benchmarked using

Arabidopsis thaliana

annotation of novel plant genomes

demonstrates utility for

Arabidopsis assembly

is approximately

120 Mb

Arabidopsis thaliana

MAKER-P default model

summarizes

three transcripts with single consensus transcript

Arabidopsis thaliana

6a annotation build

contains

4,466 new genes

existing genome annotation build

can improve

MAKER-P updates

can be carried out much more rapidly and frequently than

heretofore possible

MAKER-P training process

uses

RNA-seq data and ESTs

Rice BAC clones annotated data

obtained from

RAP-DB

Oryza sativa

MAKER-P speed and flexibility

will enable individual iPlant users to generate

custom genome annotation data sets

effectively revise gene models regardless of complexity or quantity of evidence

is able to

repetitive DNA

masked with

RepeatMasker

Oryza sativa

MAKER-P annotations

are comparable in quality to

maize V2 annotation build

Zea mays

carnivorous bladderwort plant Utricularia gibba

has

fewest repeats among sequenced plant genomes

Utricularia gibba

MAKER-P de novo annotation build

uses

same evidence data sets as Table I and Figures 1 to 4

Zea mays

can annotate

Arabidopsis thaliana

can manage

MAKER-P de novo gene build

contains

1,250 fewer genes than TAIR10

Arabidopsis thaliana

MAKER-P update of TAIR10 gene model

maintained

all three transcripts

Arabidopsis thaliana

has annotated

66 putative full-length pectin methyl-esterases (PMEs)

Arabidopsis thaliana

false-positive rate of pipeline

3.1%

Arabidopsis thaliana

Photoperiod-H1

is annotated as

high-confidence gene

Hordeum vulgare

iPlant Cyberinfrastructure

was used for

MAKER-P update and revision of maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build

Zea mays

4,049 multiexon MAKER-P de novo models

many are

sizable, multiexon gene models that contain domains

Zea mays

pseudogenes and noncoding RNAs

are absent from

The Arabidopsis Information Resource 10 build

Arabidopsis thaliana

provides basic resource that democratizes

highly supported, highly expressed genes

often have

some data that strongly support a given transcript model

Arabidopsis thaliana

has extended MAKER to include means for annotation of

pseudogenes and ncRNAs

4,466 additional protein-coding genes

identified and annotated

Zea mays

6a annotation build

includes

102,370 pseudogene fragments

MAKER-P de novo build

quite similar to 5b+ build

Zea mays

2,192 annotated tRNA genes

comprise

1,398 decode standard amino acids, 4 decode seleno-Cys, 7 are suppressor tRNAs, 12 are undetermined, 771 are pseudogenized

Zea mays

6a build

includes

additional 3,006 ncRNA genes and 102,370 pseudogene annotations

Zea mays

MAKER-P on Texas Advanced Computing Center

can de novo annotate

Zea mays genome

Zea mays

MAKER-P update

brings gene models into better agreement with

available evidence

Arabidopsis thaliana

multithreaded, fully message passing interface-compliant annotation engine

pseudogenes

impact

gene structure annotation

complete de novo annotation of 17.83-Gb draft loblolly pine genome

can carry out

Pinus taeda

6a build genes

are more congruent with

evidence

Acer truncatum genome

contains

Acer truncatum

is 5142 more than in

Aethionema arabicum

is installed and benchmarked on

Texas Advanced Computing Center

annotations of Arabidopsis genome

results in demonstrable improvements to

Arabidopsis thaliana

is used as model for

genome annotation benchmarking

Arabidopsis thaliana

single full-length cDNA

may confirm

entire exon-intron structure of annotated transcript

Arabidopsis thaliana

WebApollo

provides functionality for

remote editing of annotations

genetically anchored physical contigs

provide extended information about

genomic context and local neighborhood

Hordeum vulgare

consists of

137,208 gene transcripts

Zea mays

all new models

have

gene-finder support

Zea mays

MAKER-P-trained version of Augustus

provides effective means for annotation of

differs from

provides

calls

about 5% more genes

Zea mays

MAKER-P toolkit

provides

process for annotation of pseudogenes

54.6% of pseudogenes

have

one or more MapMan annotations

Zea mays

Gypsy superfamily

represented

54.56 Mbp

Ficus carica

Euscaphis japonica genome

contains

349 microRNAs

Euscaphis japonica

oligos

are located 6 kb upstream of

transcription start sites

Triticum aestivum

wheat pan-'NLRome'

could be

larger than estimated

Triticum aestivum

Oryza sativa

has

35 pectin methyl-esterases (PMEs)

Oryza sativa

Brachypodium distachyon

has

29 pectin methyl-esterases (PMEs)

Brachypodium distachyon

TE detection across final assembly

is significantly increased compared with

previous version of fig genome

Ficus carica L.

genome of L. sativa

has been sequenced and assembled recently

high-quality, comprehensive reference genome

Lactuca sativa

A. thaliana genome survey

revealed

RRM-containing proteins

Arabidopsis thaliana

20-kb region

contains

Loc_08g07740

Oryza sativa

gene bodies and (ATNACK2, NACK2, TES, AT3G43210)

represented

28.06% and 37.39% of assembly

Ficus carica

barley IBSC genome

contains

36 XTH sequences

Hordeum vulgare

transposable elements

covered

111.06 Mbp

Ficus carica

tandem repeats

represent

12.74 Mbp

Ficus carica

centromeric region search

isolated

42 centromeric contigs

Ficus carica

had total length of

Ficus carica

had total length of

Ficus carica

searched for

sex determining region location

Ficus carica

Acer truncatum genome

contains

1345 miRNAs

Acer truncatum

simplest genome annotation method

is based on

ab initio predictors

verified gene models from Ae. arabicum v3.0 free of evident errors

were used to support

gene prediction

Aethionema arabicum

canonical splice sites by default

tries to predict genes based on

Aethionema arabicum

protein-coding genes

predicted

11,528 5′-UTRs

Ficus carica

17 765 genes (98.6%)

were found using BLAST

v3.1 protein set

Aethionema arabicum

gene annotation

performed using

MAKER, AUGUSTUS, and SNAP pipelines

Manihot esculenta

retrotransposons or Class I elements

represented

28.3% of genome assembly

Ficus carica

Aethionema arabicum genome version 3 (V3)

did not predict gene models de novo but lifted over gene models from

v2.5, which were lifted before from v1.0

Aethionema arabicum

Trinity

showed worse results for

all the other parameters tested

Aethionema arabicum

gene Aa3LG10G286 of v3.0

is 37 kbp long and was split into

four genes in v3.1

Aethionema arabicum

tandem repeats

represent

3.82% of assembly

Ficus carica

DNA transposons or Class II elements

represented

5.01% of genome assembly

Ficus carica

highly repetitive 103 bp-long tandem repeats

considered as

putative centromeric repeat

Ficus carica

3′ UTR intron

was not annotated in

v2.5 or v3.0

Aethionema arabicum

GFF from v2.5 and v3.0

had formatting problems

such as missing features or CDSs that were not multiples of three

Aethionema arabicum

Acer truncatum genome

contains

744 tRNAs

Acer truncatum

SNAP

was used as

ab initio gene predictor

Aethionema arabicum

Repeat annotation in V3.0

presents a striking increase compared to

72 Mbp (42.3% of the nucleotide sequence) in the initial public assembly V1.0

Aethionema arabicum

free numbers between tens

can be used in

future annotations

Aethionema arabicum

interrupted open reading frames (ORFs)

were migrated in

both versions, 2.5 and 3.0

Aethionema arabicum

1224 genes from v3.0 and 1772 genes from v3.1

were labeled as

putative (ATNACK2, NACK2, TES, AT3G43210)

Aethionema arabicum

lettuce reference genome

provides

high-quality, comprehensive reference genome for analysis of the Compositeae family

Lactuca sativa

intron

had average length of

367 bp

Ficus carica

70-kb region flanked by markers SEQ3-1 and SEQ5-1

contains

11 predicted genes

Oryza sativa

transposable elements

covered

33.57% of assembly

Ficus carica

transcripts produced by Scallop

were provided to

code and tools of Ae. arabicum DB

Aethionema arabicum

are based on

previous databases such as OliveTreeDB, Physcomitrella patens Gene Model Lookup DB, and Sol Genomics Network

Aethionema arabicum

lift-over method

could only migrate genes in

regions that already existed in previous genome sequence versions

Aethionema arabicum

Ae. arabicum genome sequence V3

was used as reference for

gene annotation v3.1

Aethionema arabicum

MAKER results

yielded comparable results with regard to

number of genes supported by PacBio full-length transcripts, BUSCO completeness, Gene Ontology (GO) terms and protein domain evidence

Aethionema arabicum

gene annotation of Ae. arabicum v1.0

was lifted over to

v2.5

Aethionema arabicum

Chromovirus elements

typically found in

centromeric structures

gene annotation process

identified

1685 non-protein-coding genes

Ficus carica

lift-over tools

are used to migrate

previous gene versions to new genome versions

5520 genes in v2.5 and 4589 genes in v3.0

were affected by

obvious annotation errors or annotated as possible (ATNACK2, NACK2, TES, AT3G43210)

Aethionema arabicum

4,354 large INDELs

another 882 within

2,000 bp upstream of genes

Manihot esculenta

2,941 genes

newly identified in

new Gastrodia elata genome assembly

Gastrodia elata

241 201 genes with identical sequence between two assemblies

were unambiguously mapped

IWGSC RefSeq v2.1

research community benefit from sequence data

is essential for

Arabidopsis; rice; Medicago; poplar

phylogenetic heatmap approach

was used to classify

55,801 genes in the MSU7 rice genome

Oryza sativa

gene model annotation v3.1 for Aethionema arabicum

predicts gene model structure de novo using

SNAP ab initio prediction

Aethionema arabicum

MAKER with Scallop

produced a 0.7% higher BUSCO completeness and 433 more genes supported by full-length transcripts than

PASA

Aethionema arabicum

supported

two gene models instead of one

Aethionema arabicum

locus name 'Aa31LG1G10'

comprises

'Aa' stands for Ae. arabicum, '31' for gene annotation version 3.1, 'LG' for linkage group, followed by the number of the LG, and 'G' for gene followed by the gene number

Aethionema arabicum

analysis of protein domains with InterProScan

detected

947 genes in v3.0 containing TE domains

Aethionema arabicum

23 160 genes in v3.1

includes

5606 genes that did not overlap with the genes of v3.0

Aethionema arabicum

20 LsOFP, 22 LsSUN, 10 LsWOX and five LsYABBY genes

identified in

is benchmarked using

Zea mays

have more

Scallop transcriptome representation

was selected as the input for

final annotation in MAKER

Aethionema arabicum

is 449 more than in

Aethionema arabicum

were assigned

spanned

136 Mb of ChrUn scaffolds

105 534 HC and 155 624 LC genes

were located on

defined by

Arabidopsis thaliana

did not indicate superior performance of

any of the different assemblers

Aethionema arabicum

annotation v3.1

generated

24 932 genes

Aethionema arabicum

annotation pipelines such as MAKER

is possible to include

expression data and proteins that were not available when the previous gene annotation was created

Aethionema arabicum

2728 genes (48.7%)

correspond to

fixed version of broken genes in v3.0

Aethionema arabicum

Full-length cDNA (FLcDNA)

contains different numbers of TFs in

NAC TF family

Arabidopsis thaliana

provides

high-resolution evidence to accurately define coding and noncoding features

ab initio gene predictors

face challenges in

defining exon-intron boundaries

expert oversight to distinguish biologically meaningful features from technical artifacts

requires

(PSY1, AT5G58650) gene

is annotated as

Model estExt_GenewiseH_1.c_620008

Chlamydomonas

375 receptor-like cytoplasmic kinases

identified in

Populus trichocarpa genome

Oryza sativa

R2R3-MYBs

were identified in

Populus trichocarpa

retrotransposons or Class I elements

represented

84.95% of repetitive content

Ficus carica

long terminal repeat retrotransposons

accounted for

28.03% of total genome assembly

Ficus carica

Euscaphis japonica genome

contains

3940 small nuclear RNAs

Euscaphis japonica

repetitive elements

occupy

66.36% of Gastrodia elata genome

Gastrodia elata

percentages of subgenome lengths represented by individual super-families and families

were similar among

A-, B- and D-subgenomes

final gene set

was named according to

previously introduced nomenclature

Aethionema arabicum

MAKER transcripts with isoforms and PacBio full-length transcripts

were included in

Ae. arabicum DB for downloading and for inspection in the genome browser

Aethionema arabicum

lift-over process from v1.0 to v3.0

resulted in loss of

987 gene models

Aethionema arabicum

3′ UTR intron

was found in

v3.1 annotation

Aethionema arabicum

lower number of genes in Ae. arabicum

could be due to

in some cases, the concatenation of close genes

Aethionema arabicum

integrated pipeline combining de novo prediction, homology search and RNA-sequencing (RNA-Seq) verification

identified

23 541 putative gene models

Euscaphis japonica genome

contains

759 ribosomal RNAs

Euscaphis japonica

gene duplication from whole genome duplication

limits ability to

identify accurately which genes are haplotype specific and missing from one annotation

Manihot esculenta

total length of (ATNACK2, NACK2, TES, AT3G43210)

increased from

11 921 309 743 to 12 092 094 168 bp

1062 transfer RNAs (tRNAs)

contains different numbers of TFs in

ALFIN-like TF family

Oryza sativa

bottle gourd genome

contains

bottle gourd genome

contains

340 small nuclear RNAs

transposable elements (TEs)

were reannotated in

IWGSC RefSeq v2.1

Triticum aestivum

machine-learning classifier

predicts in

fungal gene catalogs

gene annotation

included

transcript evidence from RNA-sequencing (RNA-seq) from 11 tissue types

Manihot esculenta

Phase1 assembly

identified

1,159 arrays containing 2,608 genes

Manihot esculenta

annotations from Yuan et al. (2018) and updated annotations

merged to generate

21,115 protein-coding genes and 3,664 pseudogenes

Gastrodia elata

bioinformatics approaches to investigate plant genetic structure

include

annotating functional elements

predicting regulatory and non-regulatory regions in the maize genome

contains different numbers of TFs in

MADS TF family

Oryza sativa

supervised learning

is applied to

Zea mays

MAKER tests with short-read assemblers

provided

PacBio transcripts separately

Aethionema arabicum

proteins

supported

two gene models instead of one

Aethionema arabicum

Marker-Assisted Gene Annotation Transfer for Triticeae (MAGATT) pipeline

was used for

gene annotation transfer

Triticum aestivum; Triticum turgidum ssp. durum

264 876 gene models

represented

207 575 intervals containing between 1 and 4 genes

Arabidopsis MYB TFs

show huge difference in numbers of

subfamily members

Arabidopsis thaliana

methods to construct genomewide maps of fitness variation

may profoundly improve efforts to

discover functional elements in plant genomes

over 70% of annotated genes

were supported by

Manihot esculenta

had average size of

contains

137 (BSK12, SSP, AT2G17090) genes

Triticum aestivum

Cucumis metuliferus CM27 genome

contains

29,214 protein-coding genes

Cucumis metuliferus

1379 HC and 4216 LC genes

were located on

scaffolds assigned to ChrUn

(AtbZIP, bZIP, AT1G68880) TF family

contains different numbers of TFs in

Arabidopsis thaliana

Phase0 assembly

identified

1256 arrays containing 2,865 genes

Manihot esculenta

common wheat genome

contains

3606 TFs

Triticum aestivum

increase in pseudomolecule length in IWGSC RefSeq v2.1

resulted in

percentage of CS genome accounted for by (ATNACK2, NACK2, TES, AT3G43210) (85.0%) nearly identical to IWGSC RefSeq v1.0 (84.7%)

lift-over process from v1.0 through v2.5 to v3.0

caused

formatting inconsistencies and gene structure annotation errors

Aethionema arabicum

co-location of biosynthetic enzymes

can dramatically increase

ease of identifying biosynthetic pathway genes

protein evidences

were provided to

functionally enriched, context-aware plant genome annotations

Aethionema arabicum

bridge

structure and function

23 541 putative gene models

1069 more than predicted from USV

gene annotation

annotated

33,653 and 35,684 genes in Phase0 and Phase1 assemblies, respectively

Manihot esculenta

oligos

are located 6 kb downstream of

transcription termination sites

Triticum aestivum

uniform set of Medicago gene annotations

was generated by

coordinated international effort

Medicago truncatula

differences in bioinformatic search stringency

cause

huge difference in numbers of TF subfamily members

Arabidopsis thaliana

NLR-Annotator

scans genomic sequences for

combinations of NLR-associated sequence-motifs

array sizes

varied to

11 and 8 genes in array in Phase0 and Phase1, respectively

Manihot esculenta

MCScanX gene anchor file

supplied to help

users identify probable best hits between genes of two phases

Manihot esculenta

Gastrodia elata genome from Yuan et al. (2018)

has

86.60% coding genes identical to updated genome

Gastrodia elata

2792 IWGSC RefSeq v1.0 genes

could not be identified in

IWGSC RefSeq v2.1

A. cruentus Isoseq transcript long-read data

complemented

homology and ab initio gene prediction approaches

Amaranthus cruentus

extends into

structural dimensions

proteomic data

gives useful information for

genes and (ATNACK2, NACK2, TES, AT3G43210)

BAC clones

were sequenced and manually annotated for

Oryza spp.

empirical data

may be incorporated to train models, validate gene structures, and identify

untranslated regions, promoters, enhancers, and other non-coding features

20K array and 45K array

is mapped to

release 5 of the TIGR Rice Genome Annotation

Oryza sativa

a SAM T99 'superfamily'-level model built and scored

is one of

seven procedures used in the identification process

rice

has

769 gene models

Oryza sativa

Volvox carteri genome (v2.1) in Phytozome 12

contains computationally annotated

pherophorin genes

Volvox carteri

evidence-based frameworks

are rapidly advancing

the field

VvWOX genes from 'Chardonnay'

showed nucleotide differences from

sequences reported in grapevine databases

Vitis vinifera

many seed samples (234 of 294)

were included for

RNA-seq data used in this annotation

Aethionema arabicum

bottle gourd genome

contains

155 ribosomal RNAs

repetitive elements

identified in

Gastrodia elata genome

Gastrodia elata

bioinformatics approaches to investigate plant genetic structure

include

identifying repetitive sequences

2974 IWGSC RefSeq v1.0 genes

had sequence changed in

IWGSC RefSeq v2.1 compared with v1.0

de novo annotation of (ATNACK2, NACK2, TES, AT3G43210) with CLARITE

annotated

4 199 592 (ATNACK2, NACK2, TES, AT3G43210) belonging to 506 families

annotation v3.1 including long-read transcripts

allows a better detection of

proximal regulator elements, like introns in the 3′ UTRs

Aethionema arabicum

transcription factors (TFs) in the whole genome of Triticum urartu

have been successfully identified

1238 transcription factors

Triticum urartu

Gastrodia elata genome annotation

performed using

homologous genes from closely related species

Gastrodia elata

Gastrodia elata genome annotation

performed using

de novo prediction results

Gastrodia elata

BLASTP method

identified

fifty-four candidate proteins

Cajanus cajan

experience gained from previous whole-genome efforts

informed generation of

uniform set of Medicago gene annotations

Medicago truncatula

exon-intron boundary accuracy

contains different numbers of TFs in

ALFIN-like TF family

Arabidopsis thaliana

Helixer

improves

Tiberius

enables

learning both sequence features and structural rules directly from DNA

chromosome-scale, haplotype-resolved genomes

are fueling

innovations in annotation frameworks, including pangenomes, machine learning, and multi-omic integration

evidence-based frameworks

integrate

RNA sequencing

7.21-kb heterozygous deletion

overlaps

upstream regulatory region of Manes.03G086200

Manihot esculenta

RNA sequencing (RNA-seq) data

mapped to

Gastrodia elata genome assembly

Gastrodia elata

uniform set of Medicago gene annotations and other views of genome data

has been provided at

several websites

Medicago truncatula

predictive and interpretable models of genome function across diverse plant species

contains different numbers of TFs in

WRKY TF family

Oryza sativa

AI-driven approaches

enable

61 U-box proteins

identified from

scoring an HMM built from the PFAM 'full' alignment

Arabidopsis thaliana

is one of

seven procedures used in the identification process

gene prediction and annotation

typically begin with

ab initio, AI-based approaches

U-box proteins

identified in

rice japonica and indica subspecies genomes

Oryza sativa

genome sequencing and functional analysis of rice

have been

completed

Oryza sativa

training on rapidly expanding multi-omic datasets

allows

capture subtle genomic signals and regulatory complexity

annotation data produced in this study and the Ae. arabicum genome sequence V3.0

are available via

web-accessible database

Aethionema arabicum

genes of v3.1

are supported by higher numbers of

proteins domains, GO terms and A. thaliana homologs and higher BUSCO completeness

Aethionema arabicum

oligos

are located on

predicted coding regions

Triticum aestivum

NLR annotation

requires

specialized tools

artificial intelligence (AI)-driven gene predictors

outperform

traditional ab initio tools

WU-BLAST of known U-box proteins against the genome

is one of

seven procedures used in the identification process

putative VvRop- and VvRab-interacting proteins

are present in

Vitis vinifera genome

Vitis vinifera

exon 5 sequence from Glyma 18g43210.1

showed

annotation error from phytozome

Glycine max

multi-omic datasets

enhance

training and identification of features such as promoters

150 kb region of chromosome 12

contains

60 predicted genes

Oryza sativa L.

StringTie

produced very similar results to

Scallop

Aethionema arabicum

368 genes from v3.0

did not overlap

with genes in v3.1

Aethionema arabicum

distance to nearest genes

measured for

large indels

Manihot esculenta

new annotation release

contained

108 010 HC and 161 535 LC gene models

IWGSC RefSeq Annotation v2.1

contains

266 753 genes

D-subgenome

had under-represented

Gypsy super-family

77 putative U-box proteins

identified from

tandemly repeated genes (e.g., NBS genes) and repetitive elements

Oryza sativa

v3.0/v4.0 annotation

is underrepresented in

Solanum melongena

gapless, haplotype-resolved genomes

shifts bottleneck from assembly to

annotation and interpretation

VvWOX13A

has uncorrected annotation in

C-terminus of Pinot Noir ENTAV115 database

Vitis vinifera

VvWOX1 and VvWOX6 genes

were renamed from

VvWOX1A and VvWOX1B

Vitis vinifera

large-scale orthology networks

continue to improve

functional inference

high-resolution evidence from full-length cDNA, long- and short-read transcriptomes, and epigenomic layers

accurately define

chromatin states

grapevine draft genome sequences

were screened using

AtWOX proteins from Arabidopsis

Vitis vinifera; Arabidopsis thaliana

epigenomic layers such as DNA methylation

provide

high-resolution evidence to accurately define coding and noncoding features

HMM profile

used for searching

rice chromosomes pseudomolecules version 5 of TIGR

Oryza sativa

HvGS gene sequences

were aligned to

barley genomic sequence

Hordeum vulgare

Protein function annotation

was performed by

InterProScan v5.52–86.0

Selaginella kraussiana

transposable elements (TEs)

account for

35.90% of (GCS1, HAP2, AT4G11720) assembly

Prunus persica

RING-domain variants

were not included in

curated analysis

Arabidopsis thaliana

transposable elements (TEs)

account for

36.32% of (HAP1, MAGO, MEE63, AT1G02140) assembly

Prunus persica

Pereira

recovered

215 Pseudoviridae insertions

Arabidopsis thaliana

104 elements identified as (CMT3, AT1G69770) targets within transposons

include

long terminal repeat (LTR), long interspersed element (LINE) and short interspersed element (SINE) retrotransposons, DNA transposons and helitrons

Arabidopsis thaliana

complete rice genome sequence

indicates presence of

at least 11 glutelin genes

Oryza sativa

current AI-based methods

instead report

primary gene models

plant genome annotations

constructed for

model and crop species

SWEET transporter search strategies

validates

complete set of SWEET transporters from V. vinifera

Vitis vinifera

methylated reads

mapped to

genes

Populus trichocarpa

SGN data

provided chromosome assignment for

129 loci

Solanum lycopersicum

tomato genome sequence

provides robust method to identify

genomic location of unmapped loci

Solanum lycopersicum

evidence-based frameworks

integrate

3D genome data

proteins from same loci of japonica and indica genome

may have lower protein sequence identity due to

different annotation procedures

Oryza sativa subsp. japonica; Oryza sativa subsp. indica

innovations in annotation frameworks, including pangenomes, machine learning, and multi-omic integration

reveal

functional elements long hidden by fragmented, low-contiguity plant genomes

four additional RLCKs

identified by performing search in

KOME database

Oryza sativa

Arabidopsis

contains

21 BTB-nonphototropic hypocotyl (NPH)3 proteins

Arabidopsis thaliana

21 BTB-NPH3 proteins in Arabidopsis

represent

over 25% of the BTB proteins in this genome

Arabidopsis thaliana

detailed knowledge of the Elbe retrotransposons and their numerous remnants

enables

straightforward annotation

comparative studies of Elbe families

will support

annotation of these genomes

lineage-specific genes

had average GC content much higher than

other genes

Cicer arietinum

transcription factor genes

belonging to

transcription factor families

Cicer arietinum

full-length cDNA or RNA-Seq contigs

aligned against

Morex assembly

Hordeum vulgare

three candidate SSK genes

were obtained from

apple genome

Malus domestica

CCHC domain-containing transcription factors

relatively low fraction in

chickpea draft genome

Cicer arietinum

Rice GT Database

now contains

622 loci

Oryza sativa

Twenty-one NB-ARC HMMs

matched

EL10 genome without de novo predicted proteins

Beta vulgaris

38 349 Dub candidate proteins

were identified in

1642 fungal genomes

full-length transcript sequencing data

improves

accuracy

new genome annotation

is comparable in accuracy to

well-annotated model plant genomes

Amaranthus hypochondriacus

predictions of cysteine-rich domains

have to be met with skepticism

reliability of domain predictions

Arabidopsis thaliana

Repbase update

is used to identify and classify

repeat sequences

Pereira analysis

calculated

5.60% retrotransposon DNA content

Arabidopsis thaliana

repetitive elements

were identified using

RepeatMasker v 4.1.2-p1

Selaginella kraussiana

non-coding RNA prediction

yielded

8,299 rRNA genes in (HAP1, MAGO, MEE63, AT1G02140)

Prunus persica

centromere regions

were successfully predicted in

assembled chromosomes

Prunus persica

Pereira

recovered

219 Athila insertions

Arabidopsis thaliana

number of annotated pseudogenes in Arabidopsis thaliana

has been increasing dramatically

TIGR1 annotation to TAIR7 annotation

Arabidopsis thaliana

nodulation-related genes in chickpea

lower than

nodulation genes in Medicago, soybean and pigeonpea

Cicer arietinum; Medicago truncatula; Glycine max; Cajanus cajan

majority of identified R gene loci

reside in

poorly or previously unannotated regions of the genome

Solanum tuberosum

de novo transcriptome

completely covered

55% of annotated Bd21 genes by at least one unbroken de novo transcript

Brachypodium distachyon

newly identified transcripts

were annotated by

Gene Ontology (GO) terms through BLASTx and Interpro scan

Brachypodium distachyon

Further seven partial NB-LRRs

were revised to yield

more complete NB-LRR genes

Solanum tuberosum

RenSeq

can be used to improve

existing NB-LRR gene annotations

genes described in this study

75% reside in

poorly or previously unannotated regions of the potato and tomato genome

Solanum tuberosum; Solanum lycopersicum

19 genes containing an F-box motif

were found by

Blast search of the 'Golden Delicious' genome

Malus domestica

methylated reads

mapped to

repeats

Populus trichocarpa

Arabidopsis repeat library

was constructed using

approach outlined in basic tutorial

Arabidopsis thaliana

MAKER-P pseudogene analysis

identified

4,204 pseudogenes

Arabidopsis thaliana

added additional untranslated regions (UTRs) to

1,393 5b+ gene models

Zea mays

version of MAKER-P available within the iPlant Cyberinfrastructure

included

213 improved models

Zea mays

can reannotate entire maize genome in less than

3 h

Zea mays

MAKER-P revision process for 5b+

merged

31 annotations

Zea mays

MAKER-P toolkit

provides means for

identification of known and new classes of ncRNAs

BLASTP search

identified

(AtNMD3, NMD3, AT2G03820) sequence

Oryza sativa

MAKER-P installed on iPlant resources at Texas Advanced Computing Center (TACC)

grants ability to revise and manage

existing plant genomes

3,059 gene annotations on maize chromosome 10

generates

Zea mays

maize (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) RefGen_v3 annotation build (5b+)

was updated to produce

6a maize genome annotation

Zea mays

most changes during MAKER-P revision

are to

models having lowest (best) AED scores

Zea mays

MAKER-P revised models

have on average

more exons

Zea mays

4,049 multiexon MAKER-P de novo models

although shorter than

average 5b+ annotation

Zea mays

pseudogene identification

includes support for

automated high-quality genome annotation system

is of the utmost importance for

high-quality genome annotation

Zea mays

revision of 5b+ annotations in light of 96 different RNA-seq data sets

was used for

Zea mays

6a build

is composed of

MAKER-P updated 5b+ gene models with additional 5′ and 3′ exons and UTR sequences

Zea mays

R-genes

selected from

PRG database and NCBI protein database

INFERNAL

identified

non-coding RNAs

Cicer arietinum

polyploid Hordeum species

advantageous to rely on

barley assembly as reference

Hordeum species

identified NB-LRRs in Solanum tuberosum clone DM

increased from 438 to

755 NB-LRRs

Solanum tuberosum

95 Or genes

were reported by

Engsontia et al. (2014)

Plutella xylostella

repeat library of the Selaginella kraussiana genome

was constructed ab initio using

RepeatModeler v2.0.2 with the parameter "-LTRStruct"

Selaginella kraussiana

gene structure annotation dataset

was filtered using

AGAT v0.8.0 to remove genes with incomplete structures and those encoding proteins less than 50 amino acids in length

Selaginella kraussiana

MAKER-P maize annotations

rapidly updates

genome annotations

compare favorably with

current chromosome 10 V2 annotations

Zea mays

RefGen_v2

includes

110,028 transcript models in the Working Gene Set

Zea mays

genome projects

have annotations that embody

years of manual curation and revision

systematic comparison of 5b and 5b+ annotation builds

was used for

Zea mays

proportion of uniquely mapped reads in 16C samples

were respectively

39% intergenic, 26% intronic, 19% exonic and 16% ribosomal regions

Solanum lycopersicum

Sp9509_oxford_v3 assembly

predicted

20 661 protein coding genes

Spirodela polyrhiza

TranSeq reads mapped to reference tomato genome

could significantly improve

TranSeq and TruSeq analysis

Solanum lycopersicum

identified

new exon

Solanum lycopersicum

many de novo sequenced plant genomes

suffer from

extensive fragmentation and poorly defined gene models

complementary approach combining two RNA-seq methods

takes advantage of

standard RNA-seq procedure to obtain complete transcript sequences combined with 3′-end sequencing method

large portion of gene models in tomato genome

was

misannotated at 3′-end

Solanum lycopersicum

uniquely mapped reads in DSN-treated samples

fell in

intergenic, intronic, exonic and ribosomal regions

Solanum lycopersicum

regions in (B73, CHL6, CNX, CNX1, SIR4, AT5G20990) and/or PH207 genome with homology to syntenic loci

were not annotated as

gene models

e-value cut-off for genomic comparison

confirmed effectiveness for

ubiquiton subfamily comparison

ubiquiton genes

could be clustered into

three groups based on copy-number profiles

Ub2 subfamily

containing

2 Ub domains

1327 resistance gene analogues (RGAs)

were identified on

15 chromosomes

Linum usitatissimum

TranSeq reads

are mapped to

tomato reference genome (ITAG2.4)

Solanum lycopersicum

Chlorella vulgaris 211/11P genome

has exon and intron average length shorter than

Chlorella zofingiensis

Chlorella vulgaris; Chlorella zofingiensis

size of domain family with annotated genes

generally correlates with

number of pseudogenes

Arabidopsis thaliana

WebApollo database containing TAIR10, MAKER-P de novo, and MAKER-P updated annotations

is available online at

http://weatherby.genetics.utah.edu:8080/WebApollo_A_thaliana

Arabidopsis thaliana

MAKER-P de novo build

contains

additional pseudogene, ncRNA, and well-supported protein-coding gene models

Zea mays

MAKER-P transcript structure

reflects

best-possible gestalt of all evidence for that gene

Arabidopsis thaliana

WebApollo

can be rapidly deployed in support of

distributed genome jamborees