Optimal hybrid sequencing and assembly: Feasibility conditions for accurate genome reconstruction and cost minimization strategy.

Abstract:

:Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. Modern sequencers generate a huge number of small sequence fragments called reads, where the read length and the per-base sequencing cost depend on the technology used. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. However, rigorous investigation of the feasibility conditions for complete genome reconstruction and the optimal sequencing strategy for minimizing the sequencing cost has been conspicuously missing. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this paper, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. Through extensive evaluations based on several genomes and different read sets, we verify the derived feasibility conditions and demonstrate the performance of the proposed optimal hybrid sequencing and assembly strategy.

journal_name

Comput Biol Chem

authors

Chen CC,Ghaffari N,Qian X,Yoon BJ

doi

10.1016/j.compbiolchem.2017.03.016

subject

Has Abstract

pub_date

2017-08-01 00:00:00

pages

153-163

eissn

1476-9271

issn

1476-928X

pii

S1476-9271(17)30199-8

journal_volume

69

pub_type

杂志文章
  • Simulating the Monty Hall problem in a DNA sequencing machine.

    abstract::The Monty Hall problem is a decision problem with an answer that is surprisingly counter-intuitive yet provably correct. Here we simulate and prove this decision in a high-throughput DNA sequencing machine, using a simple encoding. All possible scenarios are represented by DNA oligonucleotides, and gameplay decisions ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.107122

    authors: Mamet N,Harari G,Zamir A,Bachelet I

    更新日期:2019-12-01 00:00:00

  • Prioritization of potential drug targets against P. aeruginosa by core proteomic analysis using computational subtractive genomics and Protein-Protein interaction network.

    abstract::Pseudomonas aeruginosa is an opportunistic gram-negative bacterium that has the capability to acquire resistance under hostile conditions and become a threat worldwide. It is involved in nosocomial infections. In the current study, potential novel drug targets against P. aeruginosa have been identified using core prot...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.02.017

    authors: Uddin R,Jamil F

    更新日期:2018-06-01 00:00:00

  • WITHDRAWN: Identification of microRNA precursor based on gapped n-tuple structure status composition kernel.

    abstract::This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy. ...

    journal_title:Computational biology and chemistry

    pub_type: 撤回出版物

    doi:10.1016/j.compbiolchem.2016.02.010

    authors: Liu B,Fang L

    更新日期:2016-02-17 00:00:00

  • PK-means: A new algorithm for gene clustering.

    abstract::Microarray technology has been widely applied in study of measuring gene expression levels for thousands of genes simultaneously. Gene cluster analysis is found useful for discovering the function of gene because co-expressed genes are likely to share the same biological function. K-means is one of well-known clusteri...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2008.03.020

    authors: Du Z,Wang Y,Ji Z

    更新日期:2008-08-01 00:00:00

  • QSAR study of pyrazolo[4,3-e][1,2,4]triazine sulfonamides against tumor-associated human carbonic anhydrase isoforms IX and XII.

    abstract::The QSAR models for a set of pyrazolo[4,3-e][1,2,4]triazines incorporating benzenesulfonamide moiety combined directly with the heterocyclic ring or by NH linkage were generated. The inhibitory potency of compounds against human carbonic anhydrase isoforms IX and XII and antiproliferative activity against human MCF-7 ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2017.09.006

    authors: Matysiak J,Skrzypek A,Tarasiuk P,Mojzych M

    更新日期:2017-12-01 00:00:00

  • Synthesis, monoamine oxidase inhibitory activity and computational study of novel isoxazole derivatives as potential antiparkinson agents.

    abstract::Monoamine oxidase (MAO) enzymes are one of the most promising targets for the treatment of neurological disorders. A series of phenylisoxazole carbohydrazides was designed, synthesized and screened for both MAO-A and MAO-B inhibition using Amplex Red assays. None of the compounds inhibited the MAO-A activity while mos...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.01.012

    authors: Agrawal N,Mishra P

    更新日期:2019-04-01 00:00:00

  • Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern.

    abstract::G protein-coupled receptors (GPCRs), a large eukaryotic protein family, have proved difficult to comprehensively detect and functionally identify by homology searches and domain detection, because they are highly divergent and their sequences share strikingly little similarity. Transmembrane (TM) topology pattern anal...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2003.11.003

    authors: Inoue Y,Ikeda M,Shimizu T

    更新日期:2004-02-01 00:00:00

  • Identification of evolutionarily conserved Momordica charantia microRNAs using computational approach and its utility in phylogeny analysis.

    abstract::Momordica charantia (bitter gourd, bitter melon) is a monoecious Cucurbitaceae with anti-oxidant, anti-microbial, anti-viral and anti-diabetic potential. Molecular studies on this economically valuable plant are very essential to understand its phylogeny and evolution. MicroRNAs (miRNAs) are conserved, small, non-codi...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2015.04.011

    authors: Thirugnanasambantham K,Saravanan S,Karikalan K,Bharanidharan R,Lalitha P,Ilango S,HairulIslam VI

    更新日期:2015-10-01 00:00:00

  • Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature.

    abstract::Bio-entity name recognition is the key step for information extraction from biomedical literature. This paper presents a dictionary-based bio-entity name recognition approach. The approach expands the bio-entity name dictionary via the Abbreviation Definitions identifying algorithm, improves the recall rate through th...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2008.03.008

    authors: Yang Z,Lin H,Li Y

    更新日期:2008-08-01 00:00:00

  • Predicting human intestinal absorption of diverse chemicals using ensemble learning based QSAR modeling approaches.

    abstract::Human intestinal absorption (HIA) of the drugs administered through the oral route constitutes an important criterion for the candidate molecules. The computational approach for predicting the HIA of molecules may potentiate the screening of new drugs. In this study, ensemble learning (EL) based qualitative and quanti...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2016.01.005

    authors: Basant N,Gupta S,Singh KP

    更新日期:2016-04-01 00:00:00

  • A vibrational entropy term for DNA docking with autodock.

    abstract::DNA interacts with small molecules, from water to endogenous reactive oxygen and nitrogen species, environmental mutagens and carcinogens, and pharmaceutical anticancer molecules. Understanding and predicting the physical interactions of small molecules with DNA via docking is key not only for the comprehension of mol...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.03.027

    authors: McElfresh GW,Deligkaris C

    更新日期:2018-06-01 00:00:00

  • ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures.

    abstract::The occurrence of similar structural repeats in a protein structure has evolved through gene duplication. These repeats act as a structural building block and form more than one compact structural and functional unit called a repeat domain. The protein families comprising similar structural repeats are mainly involved...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2010.03.006

    authors: Sabarinathan R,Basu R,Sekar K

    更新日期:2010-04-01 00:00:00

  • Aconitum and Delphinium sp. alkaloids as antagonist modulators of voltage-gated Na+ channels. AM1/DFT electronic structure investigations and QSAR studies.

    abstract::Early pharmacological studies of Aconitum and Delphinium sp. alkaloids suggested that these neurotoxins act at site 2 of voltage-gated Na(+) channel and allosterically modulate its function. Understanding structural requirements for these compounds to exhibit binding activity at voltage-gated Na(+) channel has been im...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2007.10.003

    authors: Turabekova MA,Rasulev BF,Levkovich MG,Abdullaev ND,Leszczynski J

    更新日期:2008-04-01 00:00:00

  • Interaction of small molecules with the SARS-CoV-2 main protease in silico and in vitro validation of potential lead compounds using an enzyme-linked immunosorbent assay.

    abstract::Caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the COVID-19 pandemic is ongoing, with no proven safe and effective vaccine to date. Further, effective therapeutic agents for COVID-19 are limited, and as a result, the identification of potential small molecule antiviral drugs is of particul...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107408

    authors: Pitsillou E,Liang J,Karagiannis C,Ververis K,Darmawan KK,Ng K,Hung A,Karagiannis TC

    更新日期:2020-12-01 00:00:00

  • Ambush hypothesis revisited: Evidences for phylogenetic trends.

    abstract::Recoding events occur in competition with standard readout of the transcript, and are site-specific. Recoding is the reprogramming of mRNA translation by localized alterations in the standard translational rules. Frame-shifting is one class of recoding and defined as protein translations that start not at the first, b...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2009.04.002

    authors: Singh TR,Pardasani KR

    更新日期:2009-06-01 00:00:00

  • Functional and structural insights into novel DREB1A transcription factors in common wheat (Triticum aestivum L.): A molecular modeling approach.

    abstract::Triticum aestivum L. known as common wheat is one of the most important cereal crops feeding a large and growing population. Various environmental stress factors including drought, high salinity and heat etc. adversely affect wheat production in a significant manner. Dehydration-responsive element-binding (DREB1A) fac...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2016.07.008

    authors: Kumar A,Kumar S,Kumar U,Suravajhala P,Gajula MN

    更新日期:2016-10-01 00:00:00

  • Protein function prediction using neighbor relativity in protein-protein interaction network.

    abstract::There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interact...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2012.12.003

    authors: Moosavi S,Rahgozar M,Rahimi A

    更新日期:2013-04-01 00:00:00

  • Drug-target network and polypharmacology studies of a Traditional Chinese Medicine for type II diabetes mellitus.

    abstract::Many Traditional Chinese Medicines (TCMs) are effective to relieve complicated diseases such as type II diabetes mellitus (T2DM). In this work, molecular docking and network analysis were employed to elucidate the action mechanism of a medical composition which had clinical efficacy for T2DM. We found that multiple ac...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2011.07.003

    authors: Gu J,Zhang H,Chen L,Xu S,Yuan G,Xu X

    更新日期:2011-10-12 00:00:00

  • Chemical reaction optimization for solving shortest common supersequence problem.

    abstract::Shortest common supersequence (SCS) is a classical NP-hard problem, where a string to be constructed that is the supersequence of a given string set. The SCS problem has an enormous application of data compression, query optimization in the database and different bioinformatics activities. Due to NP-hardness, the exac...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2016.05.004

    authors: Khaled Saifullah CM,Rafiqul Islam M

    更新日期:2016-10-01 00:00:00

  • A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species.

    abstract::Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent ...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.107147

    authors: Mishra S,Rastogi YP,Jabin S,Kaur P,Amir M,Khatun S

    更新日期:2019-12-01 00:00:00

  • AROHap: An effective algorithm for single individual haplotype reconstruction based on asexual reproduction optimization.

    abstract::In this paper, a method for single individual haplotype (SIH) reconstruction using Asexual reproduction optimization (ARO) is proposed. Haplotypes, as a set of genetic variations in each chromosome, contain vital information such as the relationship between human genome and diseases. Finding haplotypes in diploid orga...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2017.12.005

    authors: Olyaee MH,Khanteymoori A

    更新日期:2018-02-01 00:00:00

  • Predicting microRNA biological functions based on genes discriminant analysis.

    abstract::Although thousands of microRNAs (miRNAs) have been identified in recent experimental efforts, it remains a challenge to explore their specific biological functions through molecular biological experiments. Since those members from same family share same or similar biological functions, classifying new miRNAs into thei...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2017.09.008

    authors: Ding T,Xu J,Sun M,Zhu S,Gao J

    更新日期:2017-12-01 00:00:00

  • Temperature effect on the structure and conformational fluctuations in two zinc knuckles from the mouse mammary tumor virus.

    abstract::Zinc fingers are small protein domains in which zinc plays a structural role, contributing to the stability of the zinc-peptide complex. Zinc fingers are structurally diverse and are present in proteins that perform a broad range of functions in various cellular processes, such as replication and repair, transcription...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.03.005

    authors: Nedjoua D,Krallafa AM

    更新日期:2018-06-01 00:00:00

  • Genome re-seqeunce and analysis of Burkholderia glumae strain AU6208 and evidence of toxoflavin: A potential bacterial toxin.

    abstract::Burkholderia glumae, the primary causative agent of bacterial panicle blight in rice, has been reported as an opportunistic pathogen in patients with chronic infections. This study aimed to re-sequence the clinical isolate B. glumae strain AU6208 and comparatively analyze its genome using B. glumae strain BGR1 from ri...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107245

    authors: Hussain A,Shahbaz M,Tariq M,Ibrahim M,Hong X,Naeem F,Khalid Z,Raza HMZ,Bo Z,Bin L

    更新日期:2020-06-01 00:00:00

  • Markovian encoding models in human splice site recognition using SVM.

    abstract::Splice site recognition is among the most significant and challenging tasks in bioinformatics due to its key role in gene annotation. Effective prediction of splice site requires nucleotide encoding methods that reveal the characteristics of DNA sequences to provide appropriate features to serve as input of machine le...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.02.005

    authors: Pashaei E,Aydin N

    更新日期:2018-04-01 00:00:00

  • Potential protein biomarkers for systemic lupus erythematosus determined by bioinformatics analysis.

    abstract::Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disorder, and its pathogenesis in males and in cases without accompanying lupus nephritis (LN-) is not fully understood. In this study, we identified 90 (82 up- and 8 downregulated) differentially expressed genes (DEGs) common to female LN-, female LN+ a...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2019.107135

    authors: Kong J,Li L,Zhimin L,Yan J,Ji D,Chen Y,Yuanyuan W,Chen X,Shao H,Wang J,Da Z

    更新日期:2019-12-01 00:00:00

  • Predicting interspecies transmission of avian influenza virus based on wavelet packet decomposition.

    abstract::Using wavelet packet decomposition, the energy coefficients in the fifth level of viral protein sequences were achieved to predict interspecies transmission. Since avian-origin influenza viruses could have high sequence similarities with human-origin avian influenza virus and could have the phenotype of interspecies t...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.11.029

    authors: Qiang X,Kou Z

    更新日期:2019-02-01 00:00:00

  • FWAVina: A novel optimization algorithm for protein-ligand docking based on the fireworks algorithm.

    abstract::Protein-ligand docking is an essential process that has accelerated drug discovery. How to accurately and effectively optimize the predominant position and orientation of ligands in the binding pocket of a target protein is a major challenge. This paper proposed a novel ligand binding pose search method called FWAVina...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2020.107363

    authors: Li J,Song Y,Li F,Zhang H,Liu W

    更新日期:2020-10-01 00:00:00

  • C3: An R package for cross-species compendium-based cell-type identification.

    abstract::Cell type identification from an unknown sample can often be done by comparing its gene expression profile against a gene expression database containing profiles of a large number of cell-types. This type of compendium-based cell-type identification strategy is particularly successful for human and mouse samples becau...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2018.10.003

    authors: Kabir MH,Djordjevic D,O'Connor MD,Ho JWK

    更新日期:2018-12-01 00:00:00

  • Mutually exclusive binding of APPL(PH) to BAR domain and Reptin regulates β-catenin dependent transcriptional events.

    abstract::Reptin functions in a wide range of biological processes including chromatin remodelling, nucleolar organization and transcriptional regulation of WNT signalling. As β-catenin dependent transcriptional repression and activation events involve binding of Reptin and histone deacetylase 1 to APPL endocytic proteins, this...

    journal_title:Computational biology and chemistry

    pub_type: 杂志文章

    doi:10.1016/j.compbiolchem.2013.05.005

    authors: Rashid S,Parveen Z,Ferdous S,Bibi N

    更新日期:2013-12-01 00:00:00