Abstract:
BACKGROUND:Enhancers are stretches of DNA (100-1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques. RESULTS:The use of fast similarity measures, like alignment-free measures, to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. In this paper we study the use of alignment-free measures for the classification of CRMs. However, alignment-free measures are generally tied to a fixed resolution k. Here we propose an alignment-free statistic, called [Formula: see text], that is based on multiple resolution patterns derived from the Entropic Profiles (EPs). The Entropic Profile is a function of the genomic location that captures the importance of that region with respect to the whole genome. As a byproduct we provide a formula to compute the exact variance of variable length word counts, a result that can be of general interest also in other applications. CONCLUSIONS:We evaluate several alignment-free statistics on simulated data and real mouse ChIP-seq sequences. The new statistic, [Formula: see text], is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. We implemented the new alignment-free measures, as well as traditional ones, in a software called EP-sim that is freely available: http://www.dei.unipd.it/~ciompin/main/EP-sim.html .
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Comin M,Antonello Mdoi
10.1186/s12859-016-0980-2subject
Has Abstractpub_date
2016-03-18 00:00:00pages
130issn
1471-2105pii
10.1186/s12859-016-0980-2journal_volume
17pub_type
杂志文章abstract:BACKGROUND:As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target ide...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-99
更新日期:2006-02-28 00:00:00
abstract:UNLABELLED: BACKGROUND:Acquiring and exploring whole genome sequence information for a species under investigation is now a routine experimental approach. On most genome browsers, typically, only the DNA sequence, EST support, motif search results, and GO annotations are displayed. However, for many species, a growing...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-447
更新日期:2011-11-15 00:00:00
abstract:BACKGROUND:SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations foun...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-309
更新日期:2009-09-23 00:00:00
abstract:BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preli...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S3-S3
更新日期:2011-06-09 00:00:00
abstract:BACKGROUND:Genes encoding transcription factors that constitute gene-regulatory networks and maternal factors accumulating in egg cytoplasm are two classes of essential genes that play crucial roles in developmental processes. Transcription factors control the expression of their downstream target genes by interacting ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0552-x
更新日期:2015-04-10 00:00:00
abstract:BACKGROUND:In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplicat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-28
更新日期:2006-01-19 00:00:00
abstract:BACKGROUND:Leishmaniasis is a virulent parasitic infection that causes a worldwide disease burden. Most treatments have toxic side-effects and efficacy has decreased due to the emergence of resistant strains. The outlook is worsened by the absence of promising drug targets for this disease. We have taken a computationa...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-484
更新日期:2010-09-27 00:00:00
abstract:BACKGROUND:Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-183
更新日期:2009-06-15 00:00:00
abstract:BACKGROUND:Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1616-x
更新日期:2017-04-11 00:00:00
abstract:BACKGROUND:Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3278-3
更新日期:2019-12-27 00:00:00
abstract:BACKGROUND:Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other. RESULTS:We applied 25 conservation and v...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-388
更新日期:2010-07-21 00:00:00
abstract:BACKGROUND:The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-62
更新日期:2011-02-25 00:00:00
abstract::Semantic Web technologies offer a promising framework for integration of disparate biomedical data. In this paper we present the semantic information integration platform under development at the Center for Clinical and Translational Sciences (CCTS) at the University of Texas Health Science Center at Houston (UTHSC-H)...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S2-S2
更新日期:2009-02-05 00:00:00
abstract:BACKGROUND:Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide "reverse engineering" of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a n...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-S1-S7
更新日期:2006-03-20 00:00:00
abstract:BACKGROUND:Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S13-S5
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how thi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-430
更新日期:2010-08-18 00:00:00
abstract:BACKGROUND:Hierarchical Multi-Label Classification is a classification task where the classes to be predicted are hierarchically organized. Each instance can be assigned to classes belonging to more than one path in the hierarchy. This scenario is typically found in protein function prediction, considering that each pr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1232-1
更新日期:2016-09-15 00:00:00
abstract:BACKGROUND:The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify approximately 50% of homologies (with a false positive rate set at 1/1000). RESULTS:We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-3-11
更新日期:2002-04-23 00:00:00
abstract:BACKGROUND:We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, ma...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-101
更新日期:2004-07-26 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are a class of endogenous regulatory small RNAs which play an important role in posttranscriptional regulations by targeting mRNAs for cleavage or translational repression. The base-pairing between the 5'-end of miRNA and the target mRNA 3'-UTRs is essential for the miRNA:mRNA recognition....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-432
更新日期:2007-11-08 00:00:00
abstract:BACKGROUND:We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. METHODS:We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples e...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S3-S4
更新日期:2009-03-19 00:00:00
abstract:BACKGROUND:With the growing availability of entire genome sequences, an increasing number of scientists can exploit oligonucleotide microarrays for genome-scale expression studies. While probe-design is a major research area, relatively little work has been reported on the optimization of microarray protocols. RESULTS...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-73
更新日期:2011-03-14 00:00:00
abstract:BACKGROUND:Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-228
更新日期:2008-05-06 00:00:00
abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1493-3
更新日期:2017-02-02 00:00:00
abstract:BACKGROUND:A relevant problem in drug design is the comparison and recognition of protein binding sites. Binding sites recognition is generally based on geometry often combined with physico-chemical properties of the site since the conformation, size and chemical composition of the protein surface are all relevant for ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-488
更新日期:2010-09-29 00:00:00
abstract:BACKGROUND:Linkage disequilibrium (LD)-the non-random association of alleles at different loci-defines population-specific haplotypes which vary by genomic ancestry. Assessment of allelic frequencies and LD patterns from a variety of ancestral populations enables researchers to better understand population histories as...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3340-1
更新日期:2020-01-10 00:00:00
abstract:BACKGROUND:Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-121
更新日期:2006-03-09 00:00:00
abstract:BACKGROUND:As molecular biology is creating an increasing amount of sequence and structure data, the multitude of software to analyze this data is also rising. Most of the programs are made for a specific task, hence the user often needs to combine multiple programs in order to reach a goal. This can make the data proc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2367-z
更新日期:2018-10-01 00:00:00
abstract:BACKGROUNDS:Next-Generation Sequencing (NGS) is now widely used in biomedical research for various applications. Processing of NGS data requires multiple programs and customization of the processing pipelines according to the data platforms. However, rapid progress of the NGS applications and processing methods urgentl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2676-x
更新日期:2019-02-20 00:00:00
abstract:BACKGROUND:A professional recognition mechanism is required to encourage expedited publishing of an adequate volume of 'fit-for-use' biodiversity data. As a component of such a recognition mechanism, we propose the development of the Data Usage Index (DUI) to demonstrate to data publishers that their efforts of creatin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S15-S3
更新日期:2011-01-01 00:00:00