Abstract:
BACKGROUND:Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the immense effort needed to characterize them. Therefore, there is a need for advanced computational techniques takes sequence information alone to distinguish membrane transporter proteins; this can then be used to direct new experiments and give a hint about the function of a protein. RESULTS:This work proposes an ensemble classifier TooT-T that is trained to optimally combine the predictions from homology annotation transfer and machine-learning methods to determine the final prediction. Experimental results obtained by cross-validation and independent testing show that combining the two approaches is more beneficial than employing only one. CONCLUSION:The proposed model outperforms all of the state-of-the-art methods that rely on the protein sequence alone, with respect to accuracy and MCC. TooT-T achieved an overall accuracy of 90.07% and 92.22% and an MCC 0.80 and 0.82 with the training and independent datasets, respectively.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Alballa M,Butler Gdoi
10.1186/s12859-019-3311-6subject
Has Abstractpub_date
2020-04-23 00:00:00pages
25issue
Suppl 3issn
1471-2105pii
10.1186/s12859-019-3311-6journal_volume
21pub_type
杂志文章abstract:BACKGROUND:Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incor...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0815-6
更新日期:2015-11-18 00:00:00
abstract:BACKGROUND:The frequent exchange of genetic material among prokaryotes means that extracting a majority or plurality phylogenetic signal from many gene families, and the identification of gene families that are in significant conflict with the plurality signal is a frequent task in comparative genomics, and especially ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-123
更新日期:2012-06-07 00:00:00
abstract:BACKGROUND:Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2403-z
更新日期:2018-10-17 00:00:00
abstract:BACKGROUND:Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. RESULTS:I desc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0590-4
更新日期:2015-05-20 00:00:00
abstract:BACKGROUND:SARS coronavirus main proteinase (SARS CoVMpro) is an important enzyme for the replication of Severe Acute Respiratory Syndrome virus. The active site region of SARS CoVMpro is divided into 8 subsites. Understanding the binding mode of SARS CoVMpro with a specific substrate is useful and contributes to struc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S48
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:This paper presents a novel approach for Generative Anatomy Modeling Language (GAML). This approach automatically detects the geometric partitions in 3D anatomy that in turn speeds up integrated non-linear optimization model in GAML for 3D anatomy modeling with constraints (e.g. joints). This integrated non-...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2626-7
更新日期:2019-03-14 00:00:00
abstract:BACKGROUND:Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S17-S9
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Polychromatic flow cytometry is a popular technique that has wide usage in the medical sciences, especially for studying phenotypic properties of cells. The high-dimensionality of data generated by flow cytometry usually makes it difficult to visualize. The naive solution of simply plotting two-dimensional g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1662-4
更新日期:2017-06-07 00:00:00
abstract:BACKGROUND:Protein function in eukaryotic cells is often controlled in a cell cycle-dependent manner. Therefore, the correct assignment of cellular phenotypes to cell cycle phases is a crucial task in cell biology research. Nuclear proteins whose localization varies during the cell cycle are valuable and frequently use...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0618-9
更新日期:2015-05-29 00:00:00
abstract:BACKGROUND:Knowledge of when and in which cells each gene is expressed across multicellular organisms is critical in understanding both gene function and regulation of cell type diversity. However, methods for measuring expression typically involve a trade-off between imaging-based methods, which give the precise locat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-204
更新日期:2013-06-22 00:00:00
abstract:BACKGROUND:Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-108
更新日期:2005-04-27 00:00:00
abstract:BACKGROUND:Periodic phenomena are widespread in biology. The problem of finding periodicity in biological time series can be viewed as a multiple hypothesis testing of the spectral content of a given time series. The exact noise characteristics are unknown in many bioinformatics applications. Furthermore, the observed ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-117
更新日期:2005-05-13 00:00:00
abstract:BACKGROUND:The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-43
更新日期:2014-02-05 00:00:00
abstract:BACKGROUND:Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S11-S4
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-272
更新日期:2010-05-20 00:00:00
abstract:BACKGROUND:As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnos...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S1-S1
更新日期:2010-01-18 00:00:00
abstract:BACKGROUND:An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biolo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-94
更新日期:2012-05-11 00:00:00
abstract:BACKGROUND:To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S15-S8
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-231
更新日期:2012-09-12 00:00:00
abstract:BACKGROUND:For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-orde...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-255
更新日期:2012-10-03 00:00:00
abstract:BACKGROUND:As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-222
更新日期:2007-06-26 00:00:00
abstract:BACKGROUND:Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to populatio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-255
更新日期:2013-08-21 00:00:00
abstract:BACKGROUND:The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S8-S5
更新日期:2010-10-26 00:00:00
abstract:BACKGROUND:Automatic quantification of neuronal morphology from images of fluorescence microscopy plays an increasingly important role in high-content screenings. However, there exist very few freeware tools and methods which provide automatic neuronal morphology quantification for pharmacological discovery. RESULTS:T...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-230
更新日期:2011-06-08 00:00:00
abstract:BACKGROUND:The 5'-terminal cap structure plays an important role in many aspects of mRNA metabolism. Capping enzymes encoded by viruses and pathogenic fungi are attractive targets for specific inhibitors. There is a large body of experimental data on viral and cellular methyltransferases (MTases) that carry out guanine...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-2-2
更新日期:2001-01-01 00:00:00
abstract:BACKGROUND:Peptidases are proteolytic enzymes responsible for fundamental cellular activities in all organisms. Apparently about 2-5% of the genes encode for peptidases, irrespectively of the organism source. The basic peptidase function is "protein digestion" and this can be potentially dangerous in living organisms w...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S1-S3
更新日期:2007-03-08 00:00:00
abstract:BACKGROUND:Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1203-6
更新日期:2016-09-05 00:00:00
abstract:BACKGROUND:Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1375-0
更新日期:2016-12-22 00:00:00
abstract:BACKGROUND:Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03906-7
更新日期:2020-12-02 00:00:00
abstract:BACKGROUND:Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a nove...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-416
更新日期:2008-10-06 00:00:00