Most partial domains in proteins are alignment and annotation artifacts.

Abstract:

BACKGROUND:Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2). RESULTS:We characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts. CONCLUSIONS:Partial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein's gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins.

journal_name

Genome Biol

journal_title

Genome biology

authors

Triant DA,Pearson WR

doi

10.1186/s13059-015-0656-7

subject

Has Abstract

pub_date

2015-05-15 00:00:00

pages

99

eissn

1474-7596

issn

1474-760X

pii

10.1186/s13059-015-0656-7

journal_volume

16

pub_type

杂志文章
  • Localizing the proteome.

    abstract::The subcellular localization of the entire proteome of an organism, the yeast Saccharomyces cerevisiae, has been revealed for the first time. Comparison with less comprehensive studies of mammalian cells provides insights into the localization of the mammalian proteome. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2003-4-12-240

    authors: Simpson JC,Pepperkok R

    更新日期:2003-01-01 00:00:00

  • Accelerated exon evolution within primate segmental duplications.

    abstract:BACKGROUND:The identification of signatures of natural selection has long been used as an approach to understanding the unique features of any given species. Genes within segmental duplications are overlooked in most studies of selection due to the limitations of draft nonhuman genome assemblies and to the methodologic...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-1-r9

    authors: Lorente-Galdos B,Bleyhl J,Santpere G,Vives L,Ramírez O,Hernandez J,Anglada R,Cooper GM,Navarro A,Eichler EE,Marques-Bonet T

    更新日期:2013-01-29 00:00:00

  • FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease.

    abstract:BACKGROUND:Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically p...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-12-r170

    authors: Chen R,Morgan AA,Dudley J,Deshpande T,Li L,Kodama K,Chiang AP,Butte AJ

    更新日期:2008-01-01 00:00:00

  • Inferring protein domain interactions from databases of interacting proteins.

    abstract::We describe domain pair exclusion analysis (DPEA), a method for inferring domain interactions from databases of interacting proteins. DPEA features a log odds score, Eij, reflecting confidence that domains i and j interact. We analyzed 177,233 potential domain interactions underlying 26,032 protein interactions. In to...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-10-r89

    authors: Riley R,Lee C,Sabatti C,Eisenberg D

    更新日期:2005-01-01 00:00:00

  • Membrane transporters and protein traffic networks differentially affecting metal tolerance: a genomic phenotyping study in yeast.

    abstract:BACKGROUND:The cellular mechanisms that underlie metal toxicity and detoxification are rather variegated and incompletely understood. Genomic phenotyping was used to assess the roles played by all nonessential Saccharomyces cerevisiae proteins in modulating cell viability after exposure to cadmium, nickel, and other me...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-4-r67

    authors: Ruotolo R,Marchini G,Ottonello S

    更新日期:2008-04-07 00:00:00

  • Large-scale and high-confidence proteomic analysis of human seminal plasma.

    abstract:BACKGROUND:The development of mass spectrometric (MS) techniques now allows the investigation of very complex protein mixtures ranging from subcellular structures to tissues. Body fluids are also popular targets of proteomic analysis because of their potential for biomarker discovery. Seminal plasma has not yet receive...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2006-7-5-r40

    authors: Pilch B,Mann M

    更新日期:2006-01-01 00:00:00

  • Dynamic reprogramming of chromatin accessibility during Drosophila embryo development.

    abstract:BACKGROUND:The development of complex organisms is believed to involve progressive restrictions in cellular fate. Understanding the scope and features of chromatin dynamics during embryogenesis, and identifying regulatory elements important for directing developmental processes remain key goals of developmental biology...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-5-r43

    authors: Thomas S,Li XY,Sabo PJ,Sandstrom R,Thurman RE,Canfield TK,Giste E,Fisher W,Hammonds A,Celniker SE,Biggin MD,Stamatoyannopoulos JA

    更新日期:2011-01-01 00:00:00

  • RNA methylomes reveal the m6A-mediated regulation of DNA demethylase gene SlDML2 in tomato fruit ripening.

    abstract:BACKGROUND:Methylation of nucleotides, notably in the forms of 5-methylcytosine (5mC) in DNA and N6-methyladenosine (m6A) in mRNA, carries important information for gene regulation. 5mC has been elucidated to participate in the regulation of fruit ripening, whereas the function of m6A in this process and the interplay ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1771-7

    authors: Zhou L,Tian S,Qin G

    更新日期:2019-08-06 00:00:00

  • Probing the yeast proteome for RNA-processing factors.

    abstract::A method has been developed to identify proteins required for the biogenesis of non-coding RNA in yeast, using a microarray to screen for aberrant patterns of RNA processing in mutant strains, and new proteins involved in the processing of ribosomal and non-coding RNAs have been found. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2003-4-10-229

    authors: Granneman S,Baserga SJ

    更新日期:2003-01-01 00:00:00

  • Signaling netwErks get the global treatment.

    abstract::Two landmark studies of cell signaling, by RNA interference and phosphoproteomics, provide complementary global views of the pathways downstream of receptor kinases, including those regulated by Erks. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-1-202

    authors: Yaffe MB,White FM

    更新日期:2007-01-01 00:00:00

  • Characterization of background noise in capture-based targeted sequencing data.

    abstract:BACKGROUND:Targeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is therefore essential that errors that constitute baseline noise and impose a practical limit on detection are characterized. In the present study, we systematically evaluate the extent to which errors are incurred dur...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1275-2

    authors: Park G,Park JK,Shin SH,Jeon HJ,Kim NKD,Kim YJ,Shin HT,Lee E,Lee KH,Son DS,Park WY,Park D

    更新日期:2017-07-21 00:00:00

  • Anticipatory evolution and DNA shuffling.

    abstract::DNA shuffling has proven to be a powerful technique for the directed evolution of proteins. A mix of theoretical and applied research has now provided insights into how recombination can be guided to more efficiently generate proteins and even organisms with altered functions. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2002-3-8-reviews1021

    authors: Bacher JM,Reiss BD,Ellington AD

    更新日期:2002-07-31 00:00:00

  • Hemispheric asymmetry in the human brain and in Parkinson's disease is linked to divergent epigenetic patterns in neurons.

    abstract:BACKGROUND:Hemispheric asymmetry in neuronal processes is a fundamental feature of the human brain and drives symptom lateralization in Parkinson's disease (PD), but its molecular determinants are unknown. Here, we identify divergent epigenetic patterns involved in hemispheric asymmetry by profiling DNA methylation in ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-01960-1

    authors: Li P,Ensink E,Lang S,Marshall L,Schilthuis M,Lamp J,Vega I,Labrie V

    更新日期:2020-03-09 00:00:00

  • Characterizing human lung tissue microbiota and its relationship to epidemiological and clinical features.

    abstract:BACKGROUND:The human lung tissue microbiota remains largely uncharacterized, although a number of studies based on airway samples suggest the existence of a viable human lung microbiota. Here we characterized the taxonomic and derived functional profiles of lung microbiota in 165 non-malignant lung tissue samples from ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-016-1021-1

    authors: Yu G,Gail MH,Consonni D,Carugno M,Humphrys M,Pesatori AC,Caporaso NE,Goedert JJ,Ravel J,Landi MT

    更新日期:2016-07-28 00:00:00

  • Personal genomes and precision medicine.

    abstract::A report of the fifth annual Personal Genomes and Medical Genomics meeting, held at Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA, November 14-17, 2012. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2012-13-12-324

    authors: Highnam G,Mittelman D

    更新日期:2012-12-19 00:00:00

  • A cell surface interaction network of neural leucine-rich repeat receptors.

    abstract:BACKGROUND:The vast number of precise intercellular connections within vertebrate nervous systems is only partly explained by the comparatively few known extracellular guidance cues. Large families of neural orphan receptor proteins have been identified and are likely to contribute to these recognition processes but du...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2009-10-9-r99

    authors: Söllner C,Wright GJ

    更新日期:2009-01-01 00:00:00

  • Computational identification of the normal and perturbed genetic networks involved in myeloid differentiation and acute promyelocytic leukemia.

    abstract:BACKGROUND:Acute myeloid leukemia (AML) comprises a group of diseases characterized by the abnormal development of malignant myeloid cells. Recent studies have demonstrated an important role for aberrant transcriptional regulation in AML pathophysiology. Although several transcription factors (TFs) involved in myeloid ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-2-r38

    authors: Chang LW,Payton JE,Yuan W,Ley TJ,Nagarajan R,Stormo GD

    更新日期:2008-01-01 00:00:00

  • Direct measurement of transcription rates reveals multiple mechanisms for configuration of the Arabidopsis ambient temperature response.

    abstract:BACKGROUND:Sensing and responding to ambient temperature is important for controlling growth and development of many organisms, in part by regulating mRNA levels. mRNA abundance can change with temperature, but it is unclear whether this results from changes in transcription or decay rates, and whether passive or activ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2014-15-3-r45

    authors: Sidaway-Lee K,Costa MJ,Rand DA,Finkenstadt B,Penfield S

    更新日期:2014-03-03 00:00:00

  • The greatest catch: big game fishing for mRNA-bound proteins.

    abstract::Purification of proteins cross-linked to mRNAs has identified 800 mRNA-binding proteins and their characteristics. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb4030

    authors: Sibley CR,Attig J,Ule J

    更新日期:2012-07-17 00:00:00

  • The nuclear receptor ERβ engages AGO2 in regulation of gene transcription, RNA splicing and RISC loading.

    abstract:BACKGROUND:The RNA-binding protein Argonaute 2 (AGO2) is a key effector of RNA-silencing pathways It exerts a pivotal role in microRNA maturation and activity and can modulate chromatin remodeling, transcriptional gene regulation and RNA splicing. Estrogen receptor beta (ERβ) is endowed with oncosuppressive activities,...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1321-0

    authors: Tarallo R,Giurato G,Bruno G,Ravo M,Rizzo F,Salvati A,Ricciardi L,Marchese G,Cordella A,Rocco T,Gigantino V,Pierri B,Cimmino G,Milanesi L,Ambrosino C,Nyman TA,Nassa G,Weisz A

    更新日期:2017-10-06 00:00:00

  • Visualization of pseudogenes in intracellular bacteria reveals the different tracks to gene destruction.

    abstract:BACKGROUND:Pseudogenes reveal ancestral gene functions. Some obligate intracellular bacteria, such as Mycobacterium leprae and Rickettsia spp., carry substantial fractions of pseudogenes. Until recently, horizontal gene transfers were considered to be rare events in obligate host-associated bacteria. RESULTS:We presen...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-2-r42

    authors: Fuxelius HH,Darby AC,Cho NH,Andersson SG

    更新日期:2008-01-01 00:00:00

  • Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights.

    abstract:BACKGROUND:Transcription factors (TFs) play a central role in regulating gene expression by interacting with cis-regulatory DNA elements associated with their target genes. Recent surveys have examined the DNA binding specificities of most Saccharomyces cerevisiae TFs, but a comprehensive evaluation of their data has b...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-12-r125

    authors: Gordân R,Murphy KF,McCord RP,Zhu C,Vedenko A,Bulyk ML

    更新日期:2011-12-21 00:00:00

  • Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells.

    abstract:BACKGROUND:While CRISPR-Cas systems hold tremendous potential for engineering the human genome, it is unclear how well each system performs against one another in both non-homologous end joining (NHEJ)-mediated and homology-directed repair (HDR)-mediated genome editing. RESULTS:We systematically compare five different...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-018-1445-x

    authors: Wang Y,Liu KI,Sutrisnoh NB,Srinivasan H,Zhang J,Li J,Zhang F,Lalith CRJ,Xing H,Shanmugam R,Foo JN,Yeo HT,Ooi KH,Bleckwehl T,Par YYR,Lee SM,Ismail NNB,Sanwari NAB,Lee STV,Lew J,Tan MH

    更新日期:2018-05-29 00:00:00

  • Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat.

    abstract:BACKGROUND:Bread wheat is one of the most important and broadly studied crops. However, due to the complexity of its genome and incomplete genome collection of wild populations, the bread wheat genome landscape and domestication history remain elusive. RESULTS:By investigating the whole-genome resequencing data of 93 ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1744-x

    authors: Cheng H,Liu J,Wen J,Nie X,Xu L,Chen N,Li Z,Wang Q,Zheng Z,Li M,Cui L,Liu Z,Bian J,Wang Z,Xu S,Yang Q,Appels R,Han D,Song W,Sun Q,Jiang Y

    更新日期:2019-07-12 00:00:00

  • VlincRNAs controlled by retroviral elements are a hallmark of pluripotency and cancer.

    abstract:BACKGROUND:The function of the non-coding portion of the human genome remains one of the most important questions of our time. Its vast complexity is exemplified by the recent identification of an unusual and notable component of the transcriptome - very long intergenic non-coding RNAs, termed vlincRNAs. RESULTS:Here ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-7-r73

    authors: St Laurent G,Shtokalo D,Dong B,Tackett MR,Fan X,Lazorthes S,Nicolas E,Sang N,Triche TJ,McCaffrey TA,Xiao W,Kapranov P

    更新日期:2013-07-22 00:00:00

  • Interrogation of global mutagenesis data with a genome scale model of Neisseria meningitidis to assess gene fitness in vitro and in sera.

    abstract:BACKGROUND:Neisseria meningitidis is an important human commensal and pathogen that causes several thousand deaths each year, mostly in young children. How the pathogen replicates and causes disease in the host is largely unknown, particularly the role of metabolism in colonization and disease. Completed genome sequenc...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-12-r127

    authors: Mendum TA,Newcombe J,Mannan AA,Kierzek AM,McFadden J

    更新日期:2011-12-30 00:00:00

  • A comparison of automatic cell identification methods for single-cell RNA sequencing data.

    abstract:BACKGROUND:Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growt...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1795-z

    authors: Abdelaal T,Michielsen L,Cats D,Hoogduin D,Mei H,Reinders MJT,Mahfouz A

    更新日期:2019-09-09 00:00:00

  • An ontology for cell types.

    abstract::We describe an ontology for cell types that covers the prokaryotic, fungal, animal and plant worlds. It includes over 680 cell types. These cell types are classified under several generic categories and are organized as a directed acyclic graph. The ontology is available in the formats adopted by the Open Biological O...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-2-r21

    authors: Bard J,Rhee SY,Ashburner M

    更新日期:2005-01-01 00:00:00

  • Avianbase: a community resource for bird genomics.

    abstract::Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released ...

    journal_title:Genome biology

    pub_type: 信件

    doi:10.1186/s13059-015-0588-2

    authors: Eöry L,Gilbert MT,Li C,Li B,Archibald A,Aken BL,Zhang G,Jarvis E,Flicek P,Burt DW

    更新日期:2015-01-29 00:00:00

  • MicroRNAs and their isomiRs function cooperatively to target common biological pathways.

    abstract:BACKGROUND:Variants of microRNAs (miRNAs), called isomiRs, are commonly reported in deep-sequencing studies; however, the functional significance of these variants remains controversial. Observational studies show that isomiR patterns are non-random, hinting that these molecules could be regulated and therefore functio...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-12-r126

    authors: Cloonan N,Wani S,Xu Q,Gu J,Lea K,Heater S,Barbacioru C,Steptoe AL,Martin HC,Nourbakhsh E,Krishnan K,Gardiner B,Wang X,Nones K,Steen JA,Matigian NA,Wood DL,Kassahn KS,Waddell N,Shepherd J,Lee C,Ichikawa J,McKer

    更新日期:2011-12-30 00:00:00