Machine learning meets genome assembly.

Abstract:

MOTIVATION:With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale. RESULTS:This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers-particularly the ones that use machine learning-to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.

journal_name

Brief Bioinform

authors

Padovani de Souza K,Setubal JC,Ponce de Leon F de Carvalho AC,Oliveira G,Chateau A,Alves R

doi

10.1093/bib/bby072

subject

Has Abstract

pub_date

2019-11-27 00:00:00

pages

2116-2129

issue

6

eissn

1467-5463

issn

1477-4054

pii

5074612

journal_volume

20

pub_type

杂志文章,评审
  • A brief history of bioinformatics.

    abstract::It is easy for today's students and researchers to believe that modern bioinformatics emerged recently to assist next-generation sequencing data analysis. However, the very beginnings of bioinformatics occurred more than 50 years ago, when desktop computers were still a hypothesis and DNA could not yet be sequenced. T...

    journal_title:Briefings in bioinformatics

    pub_type: 历史文章,杂志文章,评审

    doi:10.1093/bib/bby063

    authors: Gauthier J,Vincent AT,Charette SJ,Derome N

    更新日期:2019-11-27 00:00:00

  • Dr AFC: drug repositioning through anti-fibrosis characteristic.

    abstract::Fibrosis is a key component in the pathogenic mechanism of a variety of diseases. These diseases involving fibrosis may share common mechanisms and therapeutic targets, and therefore common intervention strategies and medicines may be applicable for these diseases. For this reason, deliberately introducing anti-fibros...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa115

    authors: Wu D,Gao W,Li X,Tian C,Jiao N,Fang S,Xiao J,Xu Z,Zhu L,Zhang G,Zhu R

    更新日期:2020-06-22 00:00:00

  • Comparison of haplotype-based tests for detecting gene-environment interactions with rare variants.

    abstract::Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene-environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detec...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz031

    authors: Papachristou C,Biswas S

    更新日期:2020-05-21 00:00:00

  • A bi-Poisson model for clustering gene expression profiles by RNA-seq.

    abstract::With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene e...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbt029

    authors: Wang N,Wang Y,Hao H,Wang L,Wang Z,Wang J,Wu R

    更新日期:2014-07-01 00:00:00

  • Computational recognition for long non-coding RNA (lncRNA): Software and databases.

    abstract::Since the completion of the Human Genome Project, it has been widely established that most DNA is not transcribed into proteins. These non-protein-coding regions are believed to be moderators within transcriptional and post-transcriptional processes, which play key roles in the onset of diseases. Long non-coding RNAs ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbv114

    authors: Yotsukura S,duVerle D,Hancock T,Natsume-Kitatani Y,Mamitsuka H

    更新日期:2017-01-01 00:00:00

  • A solid quality-control analysis of AB SOLiD short-read sequencing data.

    abstract::Next generation sequencers have greatly improved our ability to mine polymorphisms and mutations out of entire (or portions of) genomes. The reliability of their outputs, though, showed to be very related to the sequencing chemistry and to deeply affect the quality of the downstream analyses. We focus here on the two-...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbs048

    authors: Castellana S,Romani M,Valente EM,Mazza T

    更新日期:2013-11-01 00:00:00

  • Resolving the problem of multiple accessions of the same transcript deposited across various public databases.

    abstract::Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' seq...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbw017

    authors: Weirick T,John D,Uchida S

    更新日期:2017-03-01 00:00:00

  • Conceptual and computational framework for logical modelling of biological networks deregulated in diseases.

    abstract::Mathematical models can serve as a tool to formalize biological knowledge from diverse sources, to investigate biological questions in a formal way, to test experimental hypotheses, to predict the effect of perturbations and to identify underlying mechanisms. We present a pipeline of computational tools that performs ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbx163

    authors: Montagud A,Traynard P,Martignetti L,Bonnet E,Barillot E,Zinovyev A,Calzone L

    更新日期:2019-07-19 00:00:00

  • CyanoPATH: a knowledgebase of genome-scale functional repertoire for toxic cyanobacterial blooms.

    abstract::CyanoPATH is a database that curates and analyzes the common genomic functional repertoire for cyanobacteria harmful algal blooms (CyanoHABs) in eutrophic waters. Based on the literature of empirical studies and genome/protein databases, it summarizes four types of information: common biological functions (pathways) d...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa375

    authors: Du W,Li G,Ho N,Jenkins L,Hockaday D,Tan J,Cao H

    更新日期:2020-12-16 00:00:00

  • Exon array data analysis using Affymetrix power tools and R statistical software.

    abstract::The use of microarray technology to measure gene expression on a genome-wide scale has been well established for more than a decade. Methods to process and analyse the vast quantity of expression data generated by a typical microarray experiment are similarly well-established. The Affymetrix Exon 1.0 ST array is a rel...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbq086

    authors: Lockstone HE

    更新日期:2011-11-01 00:00:00

  • Federating data with Information Integrator.

    abstract::Information Integrator is an extension to IBM's relational database DB2, which uses data federation to provide benefits to molecular biology researchers through two unique capabilities: increased flexibility in combining data from disparate sources, and SQL access to non-SQL data, easing the task of automating data an...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/4.4.375

    authors: Arenson AD

    更新日期:2003-12-01 00:00:00

  • Result verification, code verification and computation of support values in phylogenetics.

    abstract::Verification in phylogenetics represents an extremely difficult subject. Phylogenetic analysis deals with the reconstruction of evolutionary histories of species, and as long as mankind is not able to travel in time, it will not be possible to verify deep evolutionary histories reconstructed with modern computational ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbq079

    authors: Stamatakis A,Izquierdo-Carrasco F

    更新日期:2011-05-01 00:00:00

  • Are dropout imputation methods for scRNA-seq effective for scHi-C data?

    abstract::The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structura...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa289

    authors: Han C,Xie Q,Lin S

    更新日期:2020-11-17 00:00:00

  • A survey of sequence alignment algorithms for next-generation sequencing.

    abstract::Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. I...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbq015

    authors: Li H,Homer N

    更新日期:2010-09-01 00:00:00

  • Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.

    abstract::Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate prote...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz081

    authors: Hong J,Luo Y,Zhang Y,Ying J,Xue W,Xie T,Tao L,Zhu F

    更新日期:2020-07-15 00:00:00

  • Deep learning for brain disorders: from data processing to disease treatment.

    abstract::In order to reach precision medicine and improve patients' quality of life, machine learning is increasingly used in medicine. Brain disorders are often complex and heterogeneous, and several modalities such as demographic, clinical, imaging, genetics and environmental data have been studied to improve their understan...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa310

    authors: Burgos N,Bottani S,Faouzi J,Thibeau-Sutre E,Colliot O

    更新日期:2020-12-15 00:00:00

  • Structural database resources for biological macromolecules.

    abstract::This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank (PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis tools tailored to specific kinds of molecules and interactions, often including ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbw049

    authors: Abriata LA

    更新日期:2017-07-01 00:00:00

  • Understanding the unimodal distributions of cancer occurrence rates: it takes two factors for a cancer to occur.

    abstract::Data from the SEER reports reveal that the occurrence rate of a cancer type generally follows a unimodal distribution over age, peaking at an age that is cancer-type specific and ranges from 30+ through 70+. Previous studies attribute such bell-shaped distributions to the reduced proliferative potential in senior year...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa349

    authors: Qiu S,An Z,Tan R,He PA,Jing J,Li H,Wu S,Xu Y

    更新日期:2020-12-30 00:00:00

  • Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor.

    abstract::The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge owing to the vast amounts of data and the large variety of preprocessing and filtering steps used before the actual analysis is ca...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbt011

    authors: Rodrigo-Domingo M,Waagepetersen R,Bødker JS,Falgreen S,Kjeldsen MK,Johnsen HE,Dybkær K,Bøgsted M

    更新日期:2014-07-01 00:00:00

  • Molecular dynamics simulations for genetic interpretation in protein coding regions: where we are, where to go and when.

    abstract::The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical-chemical properties to predict whether r...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz146

    authors: Galano-Frutos JJ,García-Cebollada H,Sancho J

    更新日期:2021-01-18 00:00:00

  • Visualising gene expression in its metabolic context.

    abstract::Relative changes in mRNA as well as protein levels induced by sublethal doses of antibiotics on bacteria are measured and results visualised in the context of metabolic pathway diagrams. The mRNA levels present at a given time point after the addition of the antibiotic are measured using microarrays from Affymetrix. A...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/1.3.297

    authors: Wolf D,Gray CP,de Saizieu A

    更新日期:2000-09-01 00:00:00

  • Methodological aspects of whole-genome bisulfite sequencing analysis.

    abstract::The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation beyond CpG sites and CpG islands. These technologies have opened new avenues to understand the interplay between epigenetic events, chromatin plasticity and gene regulation. ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbu016

    authors: Adusumalli S,Mohd Omar MF,Soong R,Benoukraf T

    更新日期:2015-05-01 00:00:00

  • Elucidating the editome: bioinformatics approaches for RNA editing detection.

    abstract::RNA editing is a widespread co/posttranscriptional mechanism affecting primary RNAs by specific nucleotide modifications, which plays relevant roles in molecular processes including regulation of gene expression and/or the processing of noncoding RNAs. In recent years, the detection of editing sites has been improved ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbx129

    authors: Diroma MA,Ciaccia L,Pesole G,Picardi E

    更新日期:2019-03-22 00:00:00

  • The computational challenges of applying comparative-based computational methods to whole genomes.

    abstract::The explosion in genomic sequence available in public databases has resulted in an unprecedented opportunity for computational whole genome analyses. A number of promising comparative-based approaches have been developed for gene finding, regulatory element discovery and other purposes, and it is clear that these tool...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/3.1.18

    authors: Dubchak I,Pachter L

    更新日期:2002-03-01 00:00:00

  • Vertical integration methods for gene expression data analysis.

    abstract::Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a 'lack of information' problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where verti...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa169

    authors: Wu M,Yi H,Ma S

    更新日期:2020-08-14 00:00:00

  • Capacity building for whole genome sequencing of Mycobacterium tuberculosis and bioinformatics in high TB burden countries.

    abstract:BACKGROUND:Whole genome sequencing (WGS) is increasingly used for Mycobacterium tuberculosis (Mtb) research. Countries with the highest tuberculosis (TB) burden face important challenges to integrate WGS into surveillance and research. METHODS:We assessed the global status of Mtb WGS and developed a 3-week training co...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa246

    authors: Rivière E,Heupink TH,Ismail N,Dippenaar A,Clarke C,Abebe G,Heusden P,Warren R,Meehan CJ,Van Rie A

    更新日期:2020-10-03 00:00:00

  • Proteome-scale analysis of phase-separated proteins in immunofluorescence images.

    abstract::Phase separation is an important mechanism that mediates the spatial distribution of proteins in different cellular compartments. While phase-separated proteins share certain sequence characteristics, including intrinsically disordered regions (IDRs) and prion-like domains, such characteristics are insufficient for ma...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa187

    authors: Yu C,Shen B,You K,Huang Q,Shi M,Wu C,Chen Y,Zhang C,Li T

    更新日期:2020-09-02 00:00:00

  • LARMD: integration of bioinformatic resources to profile ligand-driven protein dynamics with a case on the activation of estrogen receptor.

    abstract::Protein dynamics is central to all biological processes, including signal transduction, cellular regulation and biological catalysis. Among them, in-depth exploration of ligand-driven protein dynamics contributes to an optimal understanding of protein function, which is particularly relevant to drug discovery. Hence, ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz141

    authors: Yang JF,Wang F,Chen YZ,Hao GF,Yang GF

    更新日期:2020-12-01 00:00:00

  • Optimization of cell lines as tumour models by integrating multi-omics data.

    abstract::Cell lines are widely used as in vitro models of tumorigenesis. However, an increasing number of researchers have found that cell lines differ from their sourced tumour samples after long-term cell culture. The application of unsuitable cell lines in experiments will affect the experimental accuracy and the treatment ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbw082

    authors: Zhao N,Liu Y,Wei Y,Yan Z,Zhang Q,Wu C,Chang Z,Xu Y

    更新日期:2017-05-01 00:00:00

  • Current development of integrated web servers for preclinical safety and pharmacokinetics assessments in drug development.

    abstract::In drug development, preclinical safety and pharmacokinetics assessments of candidate drugs to ensure the safety profile are a must. While in vivo and in vitro tests are traditionally used, experimental determinations have disadvantages, as they are usually time-consuming and costly. In silico predictions of these pre...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa160

    authors: Hsiao Y,Su BH,Tseng YJ

    更新日期:2020-08-07 00:00:00