Identification of properties important to protein aggregation using feature selection.

Abstract:

BACKGROUND:Protein aggregation is a significant problem in the biopharmaceutical industry (protein drug stability) and is associated medically with over 40 human diseases. Although a number of computational models have been developed for predicting aggregation propensity and identifying aggregation-prone regions in proteins, little systematic research has been done to determine physicochemical properties relevant to aggregation and their relative importance to this important process. Such studies may result in not only accurately predicting peptide aggregation propensities and identifying aggregation prone regions in proteins, but also aid in discovering additional underlying mechanisms governing this process. RESULTS:We use two feature selection algorithms to identify 16 features, out of a total of 560 physicochemical properties, presumably important to protein aggregation. Two predictors (ProA-SVM and ProA-RF) using selected features are built for predicting peptide aggregation propensity and identifying aggregation prone regions in proteins. Both methods are compared favourably to other state-of-the-art algorithms in cross validation. The identified important properties are fairly consistent with previous studies and bring some new insights into protein and peptide aggregation. One interesting new finding is that aggregation prone peptide sequences have similar properties to signal peptide and signal anchor sequences. CONCLUSIONS:Both predictors are implemented in a freely available web application (http://www.abl.ku.edu/ProA/). We suggest that the quaternary structure of protein aggregates, especially soluble oligomers, may allow the formation of new molecular recognition signals that guide aggregate targeting to specific cellular sites.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Fang Y,Gao S,Tai D,Middaugh CR,Fang J

doi

10.1186/1471-2105-14-314

subject

Has Abstract

pub_date

2013-10-28 00:00:00

pages

314

issn

1471-2105

pii

1471-2105-14-314

journal_volume

14

pub_type

杂志文章
  • phyloXML: XML for evolutionary biology and comparative genomics.

    abstract:BACKGROUND:Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree b...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-356

    authors: Han MV,Zmasek CM

    更新日期:2009-10-27 00:00:00

  • Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling.

    abstract:BACKGROUND:The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-133

    authors: Pelé J,Bécu JM,Abdi H,Chabbert M

    更新日期:2012-06-15 00:00:00

  • Considering scores between unrelated proteins in the search database improves profile comparison.

    abstract:BACKGROUND:Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-399

    authors: Sadreyev RI,Wang Y,Grishin NV

    更新日期:2009-12-04 00:00:00

  • Is searching full text more effective than searching abstracts?

    abstract:BACKGROUND:With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the followin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-46

    authors: Lin J

    更新日期:2009-02-03 00:00:00

  • Combining calls from multiple somatic mutation-callers.

    abstract:BACKGROUND:Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperfor...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-154

    authors: Kim SY,Jacob L,Speed TP

    更新日期:2014-05-21 00:00:00

  • antaRNA--Multi-objective inverse folding of pseudoknot RNA using ant-colony optimization.

    abstract:BACKGROUND:Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incor...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0815-6

    authors: Kleinkauf R,Houwaart T,Backofen R,Mann M

    更新日期:2015-11-18 00:00:00

  • Mapping transcription mechanisms from multimodal genomic data.

    abstract:BACKGROUND:Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data. RESULTS:We ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S9-S2

    authors: Chang HH,McGeachie M,Alterovitz G,Ramoni MF

    更新日期:2010-10-28 00:00:00

  • SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups.

    abstract:BACKGROUND:To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3407-z

    authors: Everaert C,Volders PJ,Morlion A,Thas O,Mestdagh P

    更新日期:2020-02-17 00:00:00

  • VITCOMIC: visualization tool for taxonomic compositions of microbial communities based on 16S rRNA gene sequences.

    abstract:BACKGROUND:Understanding the community structure of microbes is typically accomplished by sequencing 16S ribosomal RNA (16S rRNA) genes. These community data can be represented by constructing a phylogenetic tree and comparing it with other samples using statistical methods. However, owing to high computational complex...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-332

    authors: Mori H,Maruyama F,Kurokawa K

    更新日期:2010-06-18 00:00:00

  • A note on generalized Genome Scan Meta-Analysis statistics.

    abstract:BACKGROUND:Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-32

    authors: Koziol JA,Feng AC

    更新日期:2005-02-17 00:00:00

  • Multi-omic analysis of signalling factors in inflammatory comorbidities.

    abstract:BACKGROUND:Inflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and relate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2413-x

    authors: Xiao H,Bartoszek K,Lio' P

    更新日期:2018-11-30 00:00:00

  • A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection.

    abstract:BACKGROUND:The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire geno...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2077-6

    authors: Wang W,Sun W,Wang W,Szatkiewicz J

    更新日期:2018-03-01 00:00:00

  • Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie.

    abstract:BACKGROUND:Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the softw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S16-S15

    authors: Giannoulatou E,Park SH,Humphreys DT,Ho JW

    更新日期:2014-01-01 00:00:00

  • Predicting anatomic therapeutic chemical classification codes using tiered learning.

    abstract:BACKGROUND:The low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value. The Anatomical Therapeutic Chemical (ATC) Code System is a World Health Organization (WHO) proposed classification that assigns multi-level codes to compounds based on th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1660-6

    authors: Olson T,Singh R

    更新日期:2017-06-07 00:00:00

  • Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

    abstract:BACKGROUND:The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, w...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3427-8

    authors: Smith AM,Walsh JR,Long J,Davis CB,Henstock P,Hodge MR,Maciejewski M,Mu XJ,Ra S,Zhao S,Ziemek D,Fisher CK

    更新日期:2020-03-20 00:00:00

  • A new method for 2D gel spot alignment: application to the analysis of large sample sets in clinical proteomics.

    abstract:BACKGROUND:In current comparative proteomics studies, the large number of images generated by 2D gels is currently compared using spot matching algorithms. Unfortunately, differences in gel migration and sample variability make efficient spot alignment very difficult to obtain, and, as consequence most of the software ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-460

    authors: Pérès S,Molina L,Salvetat N,Granier C,Molina F

    更新日期:2008-10-28 00:00:00

  • Protein local 3D structure prediction by Super Granule Support Vector Machines (Super GSVM).

    abstract:BACKGROUND:Understanding the relationship between the protein sequence and the 3D structure is a major research area in bioinformatics. The prediction of complete protein tertiary structure based only on sequence information is still an impractical work. This paper aims at revealing the hidden knowledge of the sequence...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S11-S15

    authors: Chen B,Johnson M

    更新日期:2009-10-08 00:00:00

  • A comparative study of conservation and variation scores.

    abstract:BACKGROUND:Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other. RESULTS:We applied 25 conservation and v...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-388

    authors: Johansson F,Toh H

    更新日期:2010-07-21 00:00:00

  • Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data.

    abstract::Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S6-S11

    authors: Kakaradov B,Xiong HY,Lee LJ,Jojic N,Frey BJ

    更新日期:2012-04-19 00:00:00

  • XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data.

    abstract:BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2353-5

    authors: Kluin RJC,Kemper K,Kuilman T,de Ruiter JR,Iyer V,Forment JV,Cornelissen-Steijger P,de Rink I,Ter Brugge P,Song JY,Klarenbeek S,McDermott U,Jonkers J,Velds A,Adams DJ,Peeper DS,Krijgsman O

    更新日期:2018-10-04 00:00:00

  • JNets: exploring networks by integrating annotation.

    abstract:BACKGROUND:A common method for presenting and studying biological interaction networks is visualization. Software tools can enhance our ability to explore network visualizations and improve our understanding of biological systems, particularly when these tools offer analysis capabilities. However, most published networ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-95

    authors: Macpherson JI,Pinney JW,Robertson DL

    更新日期:2009-03-26 00:00:00

  • A theorem proving approach for automatically synthesizing visualizations of flow cytometry data.

    abstract:BACKGROUND:Polychromatic flow cytometry is a popular technique that has wide usage in the medical sciences, especially for studying phenotypic properties of cells. The high-dimensionality of data generated by flow cytometry usually makes it difficult to visualize. The naive solution of simply plotting two-dimensional g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1662-4

    authors: Raj S,Hussain F,Husein Z,Torosdagli N,Turgut D,Deo N,Pattanaik S,Chang CJ,Jha SK

    更新日期:2017-06-07 00:00:00

  • Blazing Signature Filter: a library for fast pairwise similarity comparisons.

    abstract:BACKGROUND:Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2210-6

    authors: Lee JY,Fujimoto GM,Wilson R,Wiley HS,Payne SH

    更新日期:2018-06-11 00:00:00

  • Francisella tularensis novicida proteomic and transcriptomic data integration and annotation based on semantic web technologies.

    abstract:BACKGROUND:This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, us...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S10-S3

    authors: Anwar N,Hunt E

    更新日期:2009-10-01 00:00:00

  • Preliminary nanopore cheminformatics analysis of aptamer-target binding strength.

    abstract:BACKGROUND:Aptamers are nucleic acids selected for their ability to bind to molecules of interest and may provide the basis for a whole new class of medicines. If the aptamer is simply a dsDNA molecule with a ssDNA overhang (a "sticky" end) then the segment of ssDNA that complements that overhang provides a known bindi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S7-S11

    authors: Thomson K,Amin I,Morales E,Winters-Hilt S

    更新日期:2007-11-01 00:00:00

  • Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

    abstract:BACKGROUND:The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. RESULTS:...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2103-8

    authors: Müller HM,Van Auken KM,Li Y,Sternberg PW

    更新日期:2018-03-09 00:00:00

  • Conservation of regulatory elements between two species of Drosophila.

    abstract:BACKGROUND:One of the important goals in the post-genomic era is to determine the regulatory elements within the non-coding DNA of a given organism's genome. The identification of functional cis-regulatory modules has proven difficult since the component factor binding sites are small and the rules governing their arra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-57

    authors: Emberly E,Rajewsky N,Siggia ED

    更新日期:2003-11-20 00:00:00

  • NOXclass: prediction of protein-protein interaction types.

    abstract:BACKGROUND:Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investiga...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-27

    authors: Zhu H,Domingues FS,Sommer I,Lengauer T

    更新日期:2006-01-19 00:00:00

  • Effective automated pipeline for 3D reconstruction of synapses based on deep learning.

    abstract:BACKGROUND:The locations and shapes of synapses are important in reconstructing connectomes and analyzing synaptic plasticity. However, current synapse detection and segmentation methods are still not adequate for accurately acquiring the synaptic connectivity, and they cannot effectively alleviate the burden of synaps...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2232-0

    authors: Xiao C,Li W,Deng H,Chen X,Yang Y,Xie Q,Han H

    更新日期:2018-07-13 00:00:00

  • Texture based skin lesion abruptness quantification to detect malignancy.

    abstract:BACKGROUND:Abruptness of pigment patterns at the periphery of a skin lesion is one of the most important dermoscopic features for detection of malignancy. In current clinical setting, abrupt cutoff of a skin lesion determined by an examination of a dermatologist. This process is subjective, nonquantitative, and error-p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1892-5

    authors: Erol R,Bayraktar M,Kockara S,Kaya S,Halic T

    更新日期:2017-12-28 00:00:00