NSAMD: A new approach to discover structured contiguous substrings in sequence datasets using Next-Symbol-Array.

Abstract:

:In many sequence data mining applications, the goal is to find frequent substrings. Some of these applications like extracting motifs in protein and DNA sequences are looking for frequently occurring approximate contiguous substrings called simple motifs. By approximate we mean that some mismatches are allowed during similarity test between substrings, and it helps to discover unknown patterns. Structured motifs in DNA sequences are frequent structured contiguous substrings which contains two or more simple motifs. There are some works that have been done to find simple motifs but these works have problems such as low scalability, high execution time, no guarantee to find all patterns, and low flexibility in adaptation to other application. The Flame is the only algorithm that can find all unknown structured patterns in a dataset and has solved most of these problems but its scalability for very large sequences is still weak. In this research a new approach named Next-Symbol-Array based Motif Discovery (NSAMD) is represented to improve scalability in extracting all unknown simple and structured patterns. To reach this goal a new data structure has been presented called Next-Symbol-Array. This data structure makes change in how to find patterns by NSAMD in comparison with Flame and helps to find structured motif faster. Proposed algorithm is as accurate as Flame and extracts all existing patterns in dataset. Performance comparisons show that NSAMD outperforms Flame in extracting structured motifs in both execution time (51% faster) and memory usage (more than 99%). Proposed algorithm is slower in extracting simple motifs but considerable improvement in memory usage (more than 99%) makes NSAMD more scalable than Flame. This advantage of NSAMD is very important in biological applications in which very large sequences are applied.

journal_name

Comput Biol Chem

authors

Pari A,Baraani A,Parseh S

doi

10.1016/j.compbiolchem.2016.09.001

subject

Has Abstract

pub_date

2016-10-01 00:00:00

pages

384-395

eissn

1476-9271

issn

1476-928X

pii

S1476-9271(15)30073-6

journal_volume

64

pub_type

杂志文章
  • Multilocus consensus genetic maps (MCGM): formulation, algorithms, and results.

    abstract::In process of creating genetic maps different labs/research groups obtain overlapping parts of the map. Merging these parts into one integrative map is based on looking for maximum shared marker orders among the maps. Really, not all shared markers of such maps have consensus order that obstructs building of the integ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2005.09.007

    authors: Mester DI,Ronin YI,Korostishevsky MA,Pikus VL,Glazman AE,Korol AB

    更新日期:2006-02-01 00:00:00

  • Structure based pharmacophore study to identify possible natural selective PARP-1 trapper as anti-cancer agent.

    abstract::Inhibition of poly(ADP-ribose) polymerase-1 (PARP-1) has turned out an innovative approach for cancer therapy due to its involvement in DNA repair pathways. Although several potent PARP-1 inhibitors have been identified, they exhibit high toxicity, resistivity and diverse pharmacological profile in clinical trials, wh...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.04.018

    authors: Kumar C,P T V L,Arunachalam A

    更新日期:2019-06-01 00:00:00

  • Study of the structure and binding site features of FaEXPA2, an α-expansin protein involved in strawberry fruit softening.

    abstract::Tissue softening accompanies the ripening of many fruits and initiates the processes of irreversible deterioration. Expansins are plant cell wall proteins that have been proposed to disrupt hydrogen bonds within the cell wall polymer matrix. Several authors have shown that FaEXPA2 is a key gene that shows an increased...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107279

    authors: Valenzuela-Riffo F,Morales-Quintana L

    更新日期:2020-05-30 00:00:00

  • Genome-wide identification and expression analysis of StTCP transcription factors of potato (Solanum tuberosum L.).

    abstract::The plant-specific TCP transcription factors, which play critical roles in diverse aspects of biological processes, have been identified and analyzed in various plant species. However, no systematical study of TCP family genes in potato (Solanum tuberosum L.) has been undertaken. In this study, a total of 31 non-redun...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.11.009

    authors: Wang Y,Zhang N,Li T,Yang J,Zhu X,Fang C,Li S,Si H

    更新日期:2019-02-01 00:00:00

  • Protein complex prediction by date hub removal.

    abstract::Proteins physically interact with each other and form protein complexes to perform their biological functions. The prediction of protein complexes from protein-protein interaction (PPI) network is usually difficult when the complexes are overlapping with each other in a dense region of the network. To address the prob...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.03.012

    authors: Pyrogova I,Wong L

    更新日期:2018-06-01 00:00:00

  • The effect of structure on improvement of the PNA Young modulus: A study of steered molecular dynamics.

    abstract::Prefoldin is a molecular chaperone and acts as a nano-actuator in cargo carriage and drug delivery for disease treatment. Investigating the mechanical properties of nano-actuator helps predict its behavior and measure its performance under various environmental conditions, like external forces that are applied. Accord...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.107133

    authors: Ghasemi RH,Keramati M,Mojarrad MHS

    更新日期:2019-12-01 00:00:00

  • Potential protein biomarkers for systemic lupus erythematosus determined by bioinformatics analysis.

    abstract::Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disorder, and its pathogenesis in males and in cases without accompanying lupus nephritis (LN-) is not fully understood. In this study, we identified 90 (82 up- and 8 downregulated) differentially expressed genes (DEGs) common to female LN-, female LN+ a...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.107135

    authors: Kong J,Li L,Zhimin L,Yan J,Ji D,Chen Y,Yuanyuan W,Chen X,Shao H,Wang J,Da Z

    更新日期:2019-12-01 00:00:00

  • Patterns of cation binding to the aromatic amino acid R groups in Trp, Tyr, and Phe.

    abstract::Previous joint experimental and theoretical work demonstrates that typically soluble peptides will be rendered insoluble in the presence of saturated sodium ions in aqueous solution due to disruption of cation-π interactions between Trp and Lys. The present work utilizes quantum chemical methods including density func...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2017.12.009

    authors: Scherer SL,Stewart AL,Fortenberry RC

    更新日期:2018-02-01 00:00:00

  • Comprehensive comparison of two protein family of P-ATPases (13A1 and 13A3) in insects.

    abstract::The P-type ATPases (P-ATPases) are present in all living cells where they mediate ion transport across membranes on the expense of ATP hydrolysis. Different ions which are transported by these pumps are protons like calcium, sodium, potassium, and heavy metals such as manganese, iron, copper, and zinc. Maintenance of ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2017.04.006

    authors: Seddigh S

    更新日期:2017-06-01 00:00:00

  • Immunopeptidome screening to design An immunogenic construct against PRAME positive breast cancer; An in silico study.

    abstract:BACKGROUND:Metastasis is the main cause of breast cancer (BC) lethality, especially in early stages, led to improvements in therapeutic procedures. Lately, by improvements in our perception of biological processes and immune system new classes of vaccines are emerged that grant us the opportunity of designing resolute ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107231

    authors: Afzali F,Minuchehr Z,Jahangiri S,Ranjbar MM

    更新日期:2020-04-01 00:00:00

  • Physical quantity of residue electrostatic energy in flavin mononucleotide binding protein dimer.

    abstract::The electrostatic (ES) energy of each residue was for the first time quantitatively evaluated in a flavin mononucleotide binding protein (FBP). A residue electrostatic energy (RES) was obtained as the sum of the ES energies between atoms in each residue and all other atoms in the FBP dimer using atomic coordinates obt...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.01.001

    authors: Nunthaboot N,Nueangaudom A,Lugsanangarm K,Pianwanit S,Kokpol S,Tanaka F

    更新日期:2018-02-01 00:00:00

  • In silico study of porphyrin-anthraquinone hybrids as CDK2 inhibitor.

    abstract::Cyclin-Dependent Kinases (CDKs) are known to play crucial roles in controlling cell cycle progression of eukaryotic cell and inhibition of their activity has long been considered as potential strategy in anti-cancer drug research. In the present work, a series of porphyrin-anthraquinone hybrids bearing meso-substituen...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2016.12.005

    authors: Arba M,Ihsan S,Ramadhan OA,Tjahjono DH

    更新日期:2017-04-01 00:00:00

  • Optimal hybrid sequencing and assembly: Feasibility conditions for accurate genome reconstruction and cost minimization strategy.

    abstract::Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. Modern sequencers generate a huge number of small sequence fragments called reads, where the read length and the per-base sequencing cost depend on the te...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2017.03.016

    authors: Chen CC,Ghaffari N,Qian X,Yoon BJ

    更新日期:2017-08-01 00:00:00

  • A benchmark of optimally folded protein structures using integer programming and the 3D-HP-SC model.

    abstract::The Protein Structure Prediction (PSP) problem comprises, among other issues, forecasting the three-dimensional native structure of proteins using only their primary structure information. Most computational studies in this area use synthetic data instead of real biological data. However, the closer to the real-world,...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.107192

    authors: Hattori LT,Gutoski M,Vargas Benítez CM,Nunes LF,Lopes HS

    更新日期:2020-02-01 00:00:00

  • Chemical reaction optimization for solving shortest common supersequence problem.

    abstract::Shortest common supersequence (SCS) is a classical NP-hard problem, where a string to be constructed that is the supersequence of a given string set. The SCS problem has an enormous application of data compression, query optimization in the database and different bioinformatics activities. Due to NP-hardness, the exac...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2016.05.004

    authors: Khaled Saifullah CM,Rafiqul Islam M

    更新日期:2016-10-01 00:00:00

  • 1,3-Oxazole derivatives of cytisine as potential inhibitors of glutathione reductase of Candida spp.: QSAR modeling, docking analysis and experimental study of new anti-Candida agents.

    abstract::Natural products as well as their derivatives play a significant role in the discovery of new biologically active compounds in the different areas of our life especially in the field of medicine. The synthesis of compounds produced from natural products including cytisine is one approach for the wider use of natural s...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107407

    authors: Metelytsia LO,Trush MM,Kovalishyn VV,Hodyna DM,Kachaeva MV,Brovarets VS,Pilyo SG,Sukhoveev VV,Tsyhankov SA,Blagodatnyi VM,Semenyuta IV

    更新日期:2020-11-05 00:00:00

  • Identification of possible siRNA molecules for TDP43 mutants causing amyotrophic lateral sclerosis: In silico design and molecular dynamics study.

    abstract::The DNA binding protein, TDP43 is a major protein involved in amyotrophic lateral sclerosis and other neurological disorders such as frontotemporal dementia, Alzheimer disease, etc. In the present study, we have designed possible siRNAs for the glycine rich region of tardbp mutants causing ALS disorder based on a syst...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2016.01.001

    authors: Bhandare VV,Ramaswamy A

    更新日期:2016-04-01 00:00:00

  • On application of directons to functional classification of genes in prokaryotes.

    abstract::Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene f...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2008.02.007

    authors: Wu H,Mao F,Olman V,Xu Y

    更新日期:2008-06-01 00:00:00

  • A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data.

    abstract::DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning metho...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.03.017

    authors: He Y,Zhou J,Lin Y,Zhu T

    更新日期:2019-06-01 00:00:00

  • Borrowing information from relevant microarray studies for sample classification using weighted partial least squares.

    abstract::With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2005.04.002

    authors: Huang X,Pan W,Han X,Chen Y,Miller LW,Hall J

    更新日期:2005-06-01 00:00:00

  • WITHDRAWN: Identification of microRNA precursor based on gapped n-tuple structure status composition kernel.

    abstract::This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy. ...

    journal_title:Computational biology and chemistry

    pub_type: 撤回出版物

    doi:10.1016/j.compbiolchem.2016.02.010

    authors: Liu B,Fang L

    更新日期:2016-02-17 00:00:00

  • A local average connectivity-based method for identifying essential proteins from the network level.

    abstract::Identifying essential proteins is very important for understanding the minimal requirements of cellular survival and development. Fast growth in the amount of available protein-protein interactions has produced unprecedented opportunities for detecting protein essentiality from the network level. Essential proteins ha...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2011.04.002

    authors: Li M,Wang J,Chen X,Wang H,Pan Y

    更新日期:2011-06-01 00:00:00

  • PK-means: A new algorithm for gene clustering.

    abstract::Microarray technology has been widely applied in study of measuring gene expression levels for thousands of genes simultaneously. Gene cluster analysis is found useful for discovering the function of gene because co-expressed genes are likely to share the same biological function. K-means is one of well-known clusteri...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2008.03.020

    authors: Du Z,Wang Y,Ji Z

    更新日期:2008-08-01 00:00:00

  • Drug-target network and polypharmacology studies of a Traditional Chinese Medicine for type II diabetes mellitus.

    abstract::Many Traditional Chinese Medicines (TCMs) are effective to relieve complicated diseases such as type II diabetes mellitus (T2DM). In this work, molecular docking and network analysis were employed to elucidate the action mechanism of a medical composition which had clinical efficacy for T2DM. We found that multiple ac...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2011.07.003

    authors: Gu J,Zhang H,Chen L,Xu S,Yuan G,Xu X

    更新日期:2011-10-12 00:00:00

  • Improving the power to detect differentially expressed genes in comparative microarray experiments by including information from self-self hybridizations.

    abstract::Our ability to detect differentially expressed genes in a microarray experiment can be hampered when the number of biological samples of interest is limited. In this situation, we propose the use of information from self-self hybridizations to acuminate our inference of differential expression. A unified modelling str...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2007.03.005

    authors: Gusnanto A,Tom B,Burns P,Macaulay I,Thijssen-Timmer DC,Tijssen MR,Langford C,Watkins N,Ouwehand W,Berzuini C,Dudbridge F

    更新日期:2007-06-01 00:00:00

  • Structure-based virtual screening of influenza virus RNA polymerase inhibitors from natural compounds: Molecular dynamics simulation and MM-GBSA calculation.

    abstract::The resistances of matrix protein 2 (M2) protein inhibitors and neuraminidase inhibitors for influenza virus have attracted much attention and there is an urgent need for new drug. The antiviral drugs that selectively act on RNA polymerase are less prone to resistance and possess fewer side effects on the patient. The...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107241

    authors: Jin Z,Wang Y,Yu XF,Tan QQ,Liang SS,Li T,Zhang H,Shaw PC,Wang J,Hu C

    更新日期:2020-04-01 00:00:00

  • Automated prediction of three-way junction topological families in RNA secondary structures.

    abstract::We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure: the sequence and the Watson-Crick pairings. The parameters of the algorithm have been determined on a data set of 33 three-way junctions whose 3D conformat...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2011.11.001

    authors: Lamiable A,Barth D,Denise A,Quessette F,Vial S,Westhof E

    更新日期:2012-04-01 00:00:00

  • Spontaneous formation of annular structures observed in molecular dynamics simulations of polyglutamine peptides.

    abstract::Annular structures have been observed experimentally in aggregates of polyglutamine-containing proteins and other proteins associated with diseases of the brain. Here we report the observation of annular structures in molecular-level simulations of large systems of model polyglutamine peptides. A system of 24 polyglut...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2006.01.003

    authors: Marchut AJ,Hall CK

    更新日期:2006-06-01 00:00:00

  • The aspartate aminotransferase-like domain of Firmicutes MocR transcriptional regulators.

    abstract::Bacterial MocR transcriptional regulators possess an N-terminal DNA-binding domain containing a conserved helix-turn-helix module and an effector-binding and/or oligomerization domain at the C-terminus, homologous to fold type-I pyridoxal 5'-phosphate (PLP) enzymes. Since a comprehensive structural analysis of the Moc...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2015.05.003

    authors: Milano T,Contestabile R,Lo Presti A,Ciccozzi M,Pascarella S

    更新日期:2015-10-01 00:00:00

  • In silico identification of novel IL-1β inhibitors to target protein-protein interfaces.

    abstract::Interleukin-1β is a drug target in rheumatoid arthritis and several auto-immune disorders. In this study, a set of 48 compounds with the determined IC50 values were used for QSAR analysis by MOE. The QSAR model was developed by using training set of 41 compounds, based on 12 unique descriptors. Model was validated by ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2015.06.004

    authors: Halim SA,Jawad M,Ilyas M,Mir Z,Mirza AA,Husnain T

    更新日期:2015-10-01 00:00:00