Abstract:
BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding antibody maturation but also for promoting the potential engineering of the therapeutic antibodies. To date, several tools have been developed to perform such "trace-back" calculations. Yet, the predicting ability and processing volume of those tools vary significantly for different sets of data. Moreover, none of them give a confidence for immunoglobulin heavy diversity (IGHD) identification. Developing fast, efficient and enhanced tools is always needed with the booming of immunological data. RESULTS:Here, a program named Ab-origin is presented. It is designed by batch query against germline databases based on empirical knowledge, optimized scoring scheme and appropriate parameters. Special efforts have been paid to improve the identification accuracy of the short and volatile region, IGHD. In particular, a threshold score for certain sensitivity and specificity is provided to give the confidence level of the IGHD identification. CONCLUSION:When evaluated using different sets of both simulated data and experimental data, Ab-origin outperformed all the other five popular tools in terms of prediction accuracy. The features of batch query and confidence indication of IGHD identification would provide extra help to users. The program is freely available at http://mpsq.biosino.org/ab-origin/supplementary.html.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Wang X,Wu D,Zheng S,Sun J,Tao L,Li Y,Cao Zdoi
10.1186/1471-2105-9-S12-S20subject
Has Abstractpub_date
2008-12-12 00:00:00pages
S20issn
1471-2105pii
1471-2105-9-S12-S20journal_volume
9 Suppl 12pub_type
杂志文章abstract:BACKGROUND:Molecular biology data exist on diverse scales, from the level of molecules to -omics. At the same time, the data at each scale can be categorised into multiple layers, such as the genome, transcriptome, proteome, metabolome, and biochemical pathways. Due to the highly multi-layer and multi-dimensional natur...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-31
更新日期:2009-01-23 00:00:00
abstract:BACKGROUND:Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier pro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1998-9
更新日期:2018-01-08 00:00:00
abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-158
更新日期:2013-05-13 00:00:00
abstract:BACKGROUND:Sharing sets of chemical data (e.g., chemical properties, docking scores, etc.) among collaborators with diverse skill sets is a common task in computer-aided drug design and medicinal chemistry. The ability to associate this data with images of the relevant molecular structures greatly facilitates scientifi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-159
更新日期:2014-05-23 00:00:00
abstract:BACKGROUND:Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1419-5
更新日期:2017-01-03 00:00:00
abstract:BACKGROUND:Over the last two decades, an innovative technology called Tissue Microarray (TMA), which combines multi-tissue and DNA microarray concepts, has been widely used in the field of histology. It consists of a collection of several (up to 1000 or more) tissue samples that are assembled onto a single support - ty...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2111-8
更新日期:2018-04-19 00:00:00
abstract:BACKGROUND:Viral infection by dengue virus is a major public health problem in tropical countries. Early diagnosis and detection are increasingly based on quantitative reverse transcriptase real-time polymerase chain reaction (RT-qPCR) directed against genomic regions conserved between different isolates. Genetic varia...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2313-0
更新日期:2018-09-04 00:00:00
abstract:BACKGROUND:Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-167
更新日期:2008-03-26 00:00:00
abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-471
更新日期:2007-12-03 00:00:00
abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-163
更新日期:2007-05-21 00:00:00
abstract:BACKGROUND:Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S66
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:Somatic copy number alternation (SCNA) is a common feature of the cancer genome and is associated with cancer etiology and prognosis. The allele-specific SCNA analysis of a tumor sample aims to identify the allele-specific copy numbers of both alleles, adjusting for the ploidy and the tumor purity. Next gene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2412-y
更新日期:2018-11-14 00:00:00
abstract:BACKGROUND:An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task. RESULTS:The bioinformatics library libcov is a collection of C++ classe...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-138
更新日期:2005-06-06 00:00:00
abstract:BACKGROUND:A rapidly increasing flow of genomic data requires the development of efficient methods for obtaining its compact representation. Feature extraction facilitates classification, clustering and model analysis for testing and refining biological hypotheses. "Shotgun" metagenome is an analytically challenging ty...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0875-7
更新日期:2016-01-16 00:00:00
abstract:BACKGROUND:The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-27
更新日期:2009-01-20 00:00:00
abstract:BACKGROUND:Guanine protein-coupled receptors (GPCRs) constitute a eukaryotic transmembrane protein family and function as "molecular switches" in the second messenger cascades and are found in all organisms between yeast and humans. They form the single, biggest drug-target family due to their versatility of action and...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S1-S3
更新日期:2011-02-15 00:00:00
abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1938-8
更新日期:2017-11-21 00:00:00
abstract:BACKGROUND:Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of seque...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-186
更新日期:2008-04-10 00:00:00
abstract:BACKGROUND:PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position-Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate est...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1686-9
更新日期:2017-06-02 00:00:00
abstract:BACKGROUND:Time-course microarray experiments are being increasingly used to characterize dynamic biological processes. In these experiments, the goal is to identify genes differentially expressed in time-course data, measured between different biological conditions. These differentially expressed genes can reveal the ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-267
更新日期:2008-06-06 00:00:00
abstract:BACKGROUND:The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1608-x
更新日期:2017-03-27 00:00:00
abstract:BACKGROUND:Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2463-0
更新日期:2018-12-28 00:00:00
abstract:BACKGROUND:Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging vari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-248
更新日期:2014-07-21 00:00:00
abstract:BACKGROUND:Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0379-x
更新日期:2014-11-18 00:00:00
abstract:BACKGROUND:Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. Finding cis-regulatory elements in higher eukaryote genomes, however, remains a challenge, as searching in the promoter regions of genes with ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-44
更新日期:2006-01-26 00:00:00
abstract:BACKGROUND:Metabolic networks reflect the relationships between metabolites (biomolecules) and the enzymes (proteins), and are of particular interest since they describe all chemical reactions of an organism. The metabolic networks are constructed from the genome sequence of an organism, and the graphs can be used to s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3112-y
更新日期:2019-10-15 00:00:00
abstract:BACKGROUND:The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been develo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-439
更新日期:2010-08-27 00:00:00
abstract:BACKGROUND:Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously, or at a short distance, and may involve some mutations (such as substitutions, insertions, and deletions). Tandem repeats have been extensively studied also for their association with the class of repeat expansion dise...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S4-S3
更新日期:2012-03-28 00:00:00
abstract:BACKGROUND:An important mechanism of endocrine activity is chemicals entering target cells via transport proteins and then interacting with hormone receptors such as the estrogen receptor (ER). α-Fetoprotein (AFP) is a major transport protein in rodent serum that can bind and sequester estrogens, thus preventing entry ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S14-S6
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:A professional recognition mechanism is required to encourage expedited publishing of an adequate volume of 'fit-for-use' biodiversity data. As a component of such a recognition mechanism, we propose the development of the Data Usage Index (DUI) to demonstrate to data publishers that their efforts of creatin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S15-S3
更新日期:2011-01-01 00:00:00