Ab-origin: an enhanced tool to identify the sourcing gene segments in germline for rearranged antibodies.

Abstract:

BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding antibody maturation but also for promoting the potential engineering of the therapeutic antibodies. To date, several tools have been developed to perform such "trace-back" calculations. Yet, the predicting ability and processing volume of those tools vary significantly for different sets of data. Moreover, none of them give a confidence for immunoglobulin heavy diversity (IGHD) identification. Developing fast, efficient and enhanced tools is always needed with the booming of immunological data. RESULTS:Here, a program named Ab-origin is presented. It is designed by batch query against germline databases based on empirical knowledge, optimized scoring scheme and appropriate parameters. Special efforts have been paid to improve the identification accuracy of the short and volatile region, IGHD. In particular, a threshold score for certain sensitivity and specificity is provided to give the confidence level of the IGHD identification. CONCLUSION:When evaluated using different sets of both simulated data and experimental data, Ab-origin outperformed all the other five popular tools in terms of prediction accuracy. The features of batch query and confidence indication of IGHD identification would provide extra help to users. The program is freely available at http://mpsq.biosino.org/ab-origin/supplementary.html.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Wang X,Wu D,Zheng S,Sun J,Tao L,Li Y,Cao Z

doi

10.1186/1471-2105-9-S12-S20

subject

Has Abstract

pub_date

2008-12-12 00:00:00

pages

S20

issn

1471-2105

pii

1471-2105-9-S12-S20

journal_volume

9 Suppl 12

pub_type

杂志文章
  • Genome Projector: zoomable genome map with multiple views.

    abstract:BACKGROUND:Molecular biology data exist on diverse scales, from the level of molecules to -omics. At the same time, the data at each scale can be categorised into multiple layers, such as the genome, transcriptome, proteome, metabolome, and biochemical pathways. Due to the highly multi-layer and multi-dimensional natur...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-31

    authors: Arakawa K,Tamaki S,Kono N,Kido N,Ikegami K,Ogawa R,Tomita M

    更新日期:2009-01-23 00:00:00

  • Thresher: determining the number of clusters while removing outliers.

    abstract:BACKGROUND:Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1998-9

    authors: Wang M,Abrams ZB,Kornblau SM,Coombes KR

    更新日期:2018-01-08 00:00:00

  • Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

    abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-158

    authors: Stoltzfus A,Lapp H,Matasci N,Deus H,Sidlauskas B,Zmasek CM,Vaidya G,Pontelli E,Cranston K,Vos R,Webb CO,Harmon LJ,Pirrung M,O'Meara B,Pennell MW,Mirarab S,Rosenberg MS,Balhoff JP,Bik HM,Heath TA,Midford PE,Brown

    更新日期:2013-05-13 00:00:00

  • WebChem Viewer: a tool for the easy dissemination of chemical and structural data sets.

    abstract:BACKGROUND:Sharing sets of chemical data (e.g., chemical properties, docking scores, etc.) among collaborators with diverse skill sets is a common task in computer-aided drug design and medicinal chemistry. The ability to associate this data with images of the relevant molecular structures greatly facilitates scientifi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-159

    authors: Durrant JD,Amaro RE

    更新日期:2014-05-23 00:00:00

  • TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

    abstract:BACKGROUND:Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1419-5

    authors: Cumbo F,Fiscon G,Ceri S,Masseroli M,Weitschek E

    更新日期:2017-01-03 00:00:00

  • ATMAD: robust image analysis for Automatic Tissue MicroArray De-arraying.

    abstract:BACKGROUND:Over the last two decades, an innovative technology called Tissue Microarray (TMA), which combines multi-tissue and DNA microarray concepts, has been widely used in the field of histology. It consists of a collection of several (up to 1000 or more) tissue samples that are assembled onto a single support - ty...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2111-8

    authors: Nguyen HN,Paveau V,Cauchois C,Kervrann C

    更新日期:2018-04-19 00:00:00

  • Application of whole genome data for in silico evaluation of primers and probes routinely employed for the detection of viral species by RT-qPCR using dengue virus as a case study.

    abstract:BACKGROUND:Viral infection by dengue virus is a major public health problem in tropical countries. Early diagnosis and detection are increasingly based on quantitative reverse transcriptase real-time polymerase chain reaction (RT-qPCR) directed against genomic regions conserved between different isolates. Genetic varia...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2313-0

    authors: Vanneste K,Garlant L,Broeders S,Van Gucht S,Roosens NH

    更新日期:2018-09-04 00:00:00

  • Efficient computation of absent words in genomic sequences.

    abstract:BACKGROUND:Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-167

    authors: Herold J,Kurtz S,Giegerich R

    更新日期:2008-03-26 00:00:00

  • Predicting and improving the protein sequence alignment quality by support vector regression.

    abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-471

    authors: Lee M,Jeong CS,Kim D

    更新日期:2007-12-03 00:00:00

  • AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions.

    abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-163

    authors: Chew DS,Leung MY,Choi KP

    更新日期:2007-05-21 00:00:00

  • Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome.

    abstract:BACKGROUND:Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S66

    authors: Freudenberg J,Wang M,Yang Y,Li W

    更新日期:2009-01-30 00:00:00

  • hsegHMM: hidden Markov model-based allele-specific copy number alteration analysis accounting for hypersegmentation.

    abstract:BACKGROUND:Somatic copy number alternation (SCNA) is a common feature of the cancer genome and is associated with cancer etiology and prognosis. The allele-specific SCNA analysis of a tumor sample aims to identify the allele-specific copy numbers of both alleles, adjusting for the ploidy and the tumor purity. Next gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2412-y

    authors: Choo-Wosoba H,Albert PS,Zhu B

    更新日期:2018-11-14 00:00:00

  • libcov: a C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny.

    abstract:BACKGROUND:An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task. RESULTS:The bioinformatics library libcov is a collection of C++ classe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-138

    authors: Butt D,Roger AJ,Blouin C

    更新日期:2005-06-06 00:00:00

  • Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.

    abstract:BACKGROUND:A rapidly increasing flow of genomic data requires the development of efficient methods for obtaining its compact representation. Feature extraction facilitates classification, clustering and model analysis for testing and refining biological hypotheses. "Shotgun" metagenome is an analytically challenging ty...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0875-7

    authors: Dubinkina VB,Ischenko DS,Ulyantsev VI,Tyakht AV,Alexeev DG

    更新日期:2016-01-16 00:00:00

  • Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

    abstract:BACKGROUND:The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-27

    authors: Maulik U,Mukhopadhyay A,Bandyopadhyay S

    更新日期:2009-01-20 00:00:00

  • The G protein-coupled receptors in the pufferfish Takifugu rubripes.

    abstract:BACKGROUND:Guanine protein-coupled receptors (GPCRs) constitute a eukaryotic transmembrane protein family and function as "molecular switches" in the second messenger cascades and are found in all organisms between yeast and humans. They form the single, biggest drug-target family due to their versatility of action and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S3

    authors: Sarkar A,Kumar S,Sundar D

    更新日期:2011-02-15 00:00:00

  • Fast batch searching for protein homology based on compression and clustering.

    abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1938-8

    authors: Ge H,Sun L,Yu J

    更新日期:2017-11-21 00:00:00

  • EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

    abstract:BACKGROUND:Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of seque...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-186

    authors: Smith RP,Buchser WJ,Lemmon MB,Pardinas JR,Bixby JL,Lemmon VP

    更新日期:2008-04-10 00:00:00

  • Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance.

    abstract:BACKGROUND:PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position-Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate est...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1686-9

    authors: Oda T,Lim K,Tomii K

    更新日期:2017-06-02 00:00:00

  • Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data.

    abstract:BACKGROUND:Time-course microarray experiments are being increasingly used to characterize dynamic biological processes. In these experiments, the goal is to identify genes differentially expressed in time-course data, measured between different biological conditions. These differentially expressed genes can reveal the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-267

    authors: Jonnalagadda S,Srinivasan R

    更新日期:2008-06-06 00:00:00

  • Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability.

    abstract:BACKGROUND:The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1608-x

    authors: Santos J,Monteagudo Á

    更新日期:2017-03-27 00:00:00

  • DLAD4U: deriving and prioritizing disease lists from PubMed literature.

    abstract:BACKGROUND:Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2463-0

    authors: Shen J,Vasaikar S,Zhang B

    更新日期:2018-12-28 00:00:00

  • Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

    abstract:BACKGROUND:Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging vari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-248

    authors: Masino AJ,Dechene ET,Dulik MC,Wilkens A,Spinner NB,Krantz ID,Pennington JW,Robinson PN,White PS

    更新日期:2014-07-21 00:00:00

  • Anatomy of enzyme channels.

    abstract:BACKGROUND:Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0379-x

    authors: Pravda L,Berka K,Svobodová Vařeková R,Sehnal D,Banáš P,Laskowski RA,Koča J,Otyepka M

    更新日期:2014-11-18 00:00:00

  • An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse.

    abstract:BACKGROUND:Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. Finding cis-regulatory elements in higher eukaryote genomes, however, remains a challenge, as searching in the promoter regions of genes with ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-44

    authors: Kim RS,Ji H,Wong WH

    更新日期:2006-01-26 00:00:00

  • Robust structure measures of metabolic networks that predict prokaryotic optimal growth temperature.

    abstract:BACKGROUND:Metabolic networks reflect the relationships between metabolites (biomolecules) and the enzymes (proteins), and are of particular interest since they describe all chemical reactions of an organism. The metabolic networks are constructed from the genome sequence of an organism, and the graphs can be used to s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3112-y

    authors: Weber Zendrera A,Sokolovska N,Soula HA

    更新日期:2019-10-15 00:00:00

  • CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation.

    abstract:BACKGROUND:The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been develo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-439

    authors: Li GH,Huang JF

    更新日期:2010-08-27 00:00:00

  • Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases.

    abstract:BACKGROUND:Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously, or at a short distance, and may involve some mutations (such as substitutions, insertions, and deletions). Tandem repeats have been extensively studied also for their association with the class of repeat expansion dise...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S4-S3

    authors: Pellegrini M,Renda ME,Vecchio A

    更新日期:2012-03-28 00:00:00

  • Homology modeling, molecular docking, and molecular dynamics simulations elucidated α-fetoprotein binding modes.

    abstract:BACKGROUND:An important mechanism of endocrine activity is chemicals entering target cells via transport proteins and then interacting with hormone receptors such as the estrogen receptor (ER). α-Fetoprotein (AFP) is a major transport protein in rodent serum that can bind and sequester estrogens, thus preventing entry ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S14-S6

    authors: Shen J,Zhang W,Fang H,Perkins R,Tong W,Hong H

    更新日期:2013-01-01 00:00:00

  • Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure.

    abstract:BACKGROUND:A professional recognition mechanism is required to encourage expedited publishing of an adequate volume of 'fit-for-use' biodiversity data. As a component of such a recognition mechanism, we propose the development of the Data Usage Index (DUI) to demonstrate to data publishers that their efforts of creatin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S15-S3

    authors: Ingwersen P,Chavan V

    更新日期:2011-01-01 00:00:00