Abstract:
BACKGROUND:The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify approximately 50% of homologies (with a false positive rate set at 1/1000). RESULTS:We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to bootstrap from standard sequence similarity search methods. First a standard method is run, then HI learns rules which are true for sequences of high similarity to the target (assumed homologues) and not true for general sequences, these rules are then used to discriminate sequences in the twilight zone. To learn the rules HI describes the sequences in a novel way based on a bioinformatic knowledge base, and the machine learning method of inductive logic programming. To evaluate HI we used the PDB40D benchmark which lists sequences of known homology but low sequence similarity. We compared the HI methodology with PSI-BLAST alone and found HI performed significantly better. In addition, Receiver Operating Characteristic (ROC) curve analysis showed that these improvements were robust for all reasonable error costs. The predictive homology rules learnt by HI by can be interpreted biologically to provide insight into conserved features of homologous protein families. CONCLUSIONS:HI is a new technique for the detection of remote protein homology--a central bioinformatic problem. HI with PSI-BLAST is shown to outperform PSI-BLAST for all error costs. It is expect that similar improvements would be obtained using HI with any sequence similarity method.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Karwath A,King RDdoi
10.1186/1471-2105-3-11keywords:
subject
Has Abstractpub_date
2002-04-23 00:00:00pages
11issn
1471-2105journal_volume
3pub_type
杂志文章abstract:BACKGROUND:Microsatellite (simple sequence repeat - SSR) and single nucleotide polymorphism (SNP) markers are two types of important genetic markers useful in genetic mapping and genotyping. Often, large-scale genomic research projects require high-throughput computer-assisted primer design. Numerous such web-based or ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-253
更新日期:2008-05-29 00:00:00
abstract:BACKGROUND:Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-457
更新日期:2010-09-10 00:00:00
abstract:BACKGROUND:Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map con...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-96
更新日期:2012-05-14 00:00:00
abstract:BACKGROUND:DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it rema...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2115-4
更新日期:2018-03-27 00:00:00
abstract:BACKGROUND:Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-176
更新日期:2010-04-08 00:00:00
abstract:BACKGROUND:Cryo-electron tomography (cryo-ET) enables the 3D visualization of cellular organization in near-native state which plays important roles in the field of structural cell biology. However, due to the low signal-to-noise ratio (SNR), large volume and high content complexity within cells, it remains difficult a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2650-7
更新日期:2019-03-29 00:00:00
abstract:BACKGROUND:De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence dif...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03740-x
更新日期:2020-09-14 00:00:00
abstract:BACKGROUND:High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino ac...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-118
更新日期:2010-03-05 00:00:00
abstract:BACKGROUND:Long-range interactions between regulatory DNA elements such as enhancers, insulators and promoters play an important role in regulating transcription. As chromatin contacts have been found throughout the human genome and in different cell types, spatial transcriptional control is now viewed as a general mec...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-414
更新日期:2011-10-25 00:00:00
abstract:BACKGROUND:Infections are often associated to comorbidity that increases the risk of medical conditions which can lead to further morbidity and mortality. SARS is a threat which is similar to MERS virus, but the comorbidity is the key aspect to underline their different impacts. One UK doctor says "I'd rather have HIV ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-333
更新日期:2014-10-24 00:00:00
abstract:BACKGROUND:Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0592-2
更新日期:2015-05-13 00:00:00
abstract:BACKGROUND:Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inferen...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S4-S18
更新日期:2006-12-12 00:00:00
abstract:BACKGROUND:SNP genotyping microarrays have revolutionized the study of complex disease. The current range of commercially available genotyping products contain extensive catalogues of low frequency and rare variants. Existing SNP calling algorithms have difficulty dealing with these low frequency variants, as the under...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-158
更新日期:2014-05-23 00:00:00
abstract:BACKGROUND:Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-510
更新日期:2008-12-01 00:00:00
abstract:BACKGROUND:Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain datab...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-16
更新日期:2003-05-06 00:00:00
abstract:BACKGROUND:The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-391
更新日期:2006-08-25 00:00:00
abstract::Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate E...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S5-S2
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3015-y
更新日期:2019-08-15 00:00:00
abstract:BACKGROUND:The exponential growth of gigantic biological data from various sources, such as protein-protein interaction (PPI), genome sequences scaffolding, Mass spectrometry (MS) molecular networking and metabolic flux, demands an efficient way for better visualization and interpretation beyond the conventional, two-d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-322
更新日期:2013-11-14 00:00:00
abstract:BACKGROUND:The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is req...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0613-1
更新日期:2015-05-28 00:00:00
abstract::We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techn...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S11-S1
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:The development of high-throughput sequencing and analysis has accelerated multi-omics studies of thousands of microbial species, metagenomes, and infectious disease pathogens. Omics studies are enabling genotype-phenotype association studies which identify genetic determinants of pathogen virulence and drug...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2580-9
更新日期:2019-01-07 00:00:00
abstract:BACKGROUND:Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2463-0
更新日期:2018-12-28 00:00:00
abstract:BACKGROUND:Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-22
更新日期:2010-01-12 00:00:00
abstract:BACKGROUND:In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-276
更新日期:2008-06-11 00:00:00
abstract:BACKGROUND:Here we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0887-y
更新日期:2016-01-20 00:00:00
abstract:BACKGROUND:Infectious disease modeling and computational power have evolved such that large-scale agent-based models (ABMs) have become feasible. However, the increasing hardware complexity requires adapted software designs to achieve the full potential of current high-performance workstations. RESULTS:We have found l...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0612-2
更新日期:2015-06-02 00:00:00
abstract:BACKGROUND:Cyclic nucleotides are ubiquitous intracellular messengers. Until recently, the roles of cyclic nucleotides in plant cells have proven difficult to uncover. With an understanding of the protein domains which can bind cyclic nucleotides (CNB and GAF domains) we scanned the completed genomes of the higher plan...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-6
更新日期:2005-01-11 00:00:00
abstract:BACKGROUND:Interaction of a drug or chemical with a biological system can result in a gene-expression profile or signature characteristic of the event. Using a suitably robust algorithm these signatures can potentially be used to connect molecules with similar pharmacological or toxicological properties by gene express...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-258
更新日期:2008-06-02 00:00:00
abstract:BACKGROUND:Most human genes produce several transcripts with different exon contents by using alternative promoters, alternative polyadenylation sites and alternative splice sites. Much effort has been devoted to describing known gene transcripts through the development of numerous databases. Nevertheless, owing to the...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-180
更新日期:2007-06-04 00:00:00