Predictability of drug-induced liver injury by machine learning.

Abstract:

BACKGROUND:Drug-induced liver injury (DILI) is a major concern in drug development, as hepatotoxicity may not be apparent at early stages but can lead to life threatening consequences. The ability to predict DILI from in vitro data would be a crucial advantage. In 2018, the Critical Assessment Massive Data Analysis group proposed the CMap Drug Safety challenge focusing on DILI prediction. METHODS AND RESULTS:The challenge data included Affymetrix GeneChip expression profiles for the two cancer cell lines MCF7 and PC3 treated with 276 drug compounds and empty vehicles. Binary DILI labeling and a recommended train/test split for the development of predictive classification approaches were also provided. We devised three deep learning architectures for DILI prediction on the challenge data and compared them to random forest and multi-layer perceptron classifiers. On a subset of the data and for some of the models we additionally tested several strategies for balancing the two DILI classes and to identify alternative informative train/test splits. All the models were trained with the MAQC data analysis protocol (DAP), i.e., 10x5 cross-validation over the training set. In all the experiments, the classification performance in both cross-validation and external validation gave Matthews correlation coefficient (MCC) values below 0.2. We observed minimal differences between the two cell lines. Notably, deep learning approaches did not give an advantage on the classification performance. DISCUSSION:We extensively tested multiple machine learning approaches for the DILI classification task obtaining poor to mediocre performance. The results suggest that the CMap expression data on the two cell lines MCF7 and PC3 are not sufficient for accurate DILI label prediction. REVIEWERS:This article was reviewed by Maciej Kandula and Paweł P. Labaj.

journal_name

Biol Direct

journal_title

Biology direct

authors

Chierici M,Francescatto M,Bussola N,Jurman G,Furlanello C

doi

10.1186/s13062-020-0259-4

subject

Has Abstract

pub_date

2020-02-13 00:00:00

pages

3

issue

1

issn

1745-6150

pii

10.1186/s13062-020-0259-4

journal_volume

15

pub_type

杂志文章
  • Strong association between pseudogenization mechanisms and gene sequence length.

    abstract:UNLABELLED:Pseudogenes arise from the decay of gene copies following either RNA-mediated duplication (processed pseudogenes) or DNA-mediated duplication (nonprocessed pseudogenes). Here, we show that long protein-coding genes tend to produce more nonprocessed pseudogenes than short genes, whereas the opposite is true f...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-38

    authors: Khachane AN,Harrison PM

    更新日期:2009-10-06 00:00:00

  • Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions.

    abstract:BACKGROUND:H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homolo...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-5

    authors: Zhou H,Gao S,Nguyen NN,Fan M,Jin J,Liu B,Zhao L,Xiong G,Tan M,Li S,Wong L

    更新日期:2014-04-08 00:00:00

  • Human gammadelta T cell recognition of lipid A is predominately presented by CD1b or CD1c on dendritic cells.

    abstract:BACKGROUND:The gammadelta T cells serve as early immune defense against certain encountered microbes. Only a few gammadelta T cell-recognized ligands from microbial antigens have been identified so far and the mechanisms by which gammadelta T cells recognize these ligands remain unknown. Here we explored the mechanism ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-47

    authors: Cui Y,Kang L,Cui L,He W

    更新日期:2009-12-01 00:00:00

  • Plant viruses of the Amalgaviridae family evolved via recombination between viruses with double-stranded and negative-strand RNA genomes.

    abstract::Plant viruses of the recently recognized family Amalgaviridae have monopartite double-stranded (ds) RNA genomes and encode two proteins: an RNA-dependent RNA polymerase (RdRp) and a putative capsid protein (CP). Whereas the RdRp of amalgaviruses has been found to be most closely related to the RdRps of dsRNA viruses o...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-015-0047-8

    authors: Krupovic M,Dolja VV,Koonin EV

    更新日期:2015-03-29 00:00:00

  • Comprehensive comparative-genomic analysis of type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes.

    abstract:BACKGROUND:The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in v...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-19

    authors: Makarova KS,Wolf YI,Koonin EV

    更新日期:2009-06-03 00:00:00

  • IPC - Isoelectric Point Calculator.

    abstract:BACKGROUND:Accurate estimation of the isoelectric point (pI) based on the amino acid sequence is useful for many analytical biochemistry and proteomics techniques such as 2-D polyacrylamide gel electrophoresis, or capillary isoelectric focusing used in combination with high-throughput mass spectrometry. Additionally, p...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-016-0159-9

    authors: Kozlowski LP

    更新日期:2016-10-21 00:00:00

  • Description of plant tRNA-derived RNA fragments (tRFs) associated with argonaute and identification of their putative targets.

    abstract::tRNA-derived RNA fragments (tRFs) are 19mer small RNAs that associate with Argonaute (AGO) proteins in humans. However, in plants, it is unknown if tRFs bind with AGO proteins. Here, using public deep sequencing libraries of immunoprecipitated Argonaute proteins (AGO-IP) and bioinformatics approaches, we identified th...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-8-6

    authors: Loss-Morais G,Waterhouse PM,Margis R

    更新日期:2013-02-12 00:00:00

  • The multiple personalities of Watson and Crick strands.

    abstract:BACKGROUND:In genetics it is customary to refer to double-stranded DNA as containing a "Watson strand" and a "Crick strand." However, there seems to be no consensus in the literature on the exact meaning of these two terms, and the many usages contradict one another as well as the original definition. Here, we review t...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-7

    authors: Cartwright RA,Graur D

    更新日期:2011-02-08 00:00:00

  • A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis.

    abstract:BACKGROUND:Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches--the examination of similarities to known disease genes and/or the evaluation of functional annotation of...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-30

    authors: Lombard Z,Park C,Makova KD,Ramsay M

    更新日期:2011-06-13 00:00:00

  • Structural analysis of hubs in human NR-RTK network.

    abstract:BACKGROUND:Currently a huge amount of protein-protein interaction data is available therefore extracting meaningful ones are a challenging task. In a protein-protein interaction network, hubs are considered as key proteins maintaining function and stability of the network. Therefore, studying protein-protein complexes ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-6-49

    authors: Choura M,Rebaï A

    更新日期:2011-10-05 00:00:00

  • Origin of the nuclear proteome on the basis of pre-existing nuclear localization signals in prokaryotic proteins.

    abstract:BACKGROUND:The origin of the selective nuclear protein import machinery, which consists of nuclear pore complexes and adaptor molecules interacting with the nuclear localization signals (NLSs) of cargo molecules, is one of the most important events in the evolution of eukaryotic cells. How proteins were selected for im...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-020-00263-6

    authors: Lisitsyna OM,Kurnaeva MA,Arifulin EA,Shubina MY,Musinova YR,Mironov AA,Sheval EV

    更新日期:2020-04-28 00:00:00

  • Infinitely long branches and an informal test of common ancestry.

    abstract:BACKGROUND:The evidence for universal common ancestry (UCA) is vast and persuasive. A phylogenetic test has been proposed for quantifying its odds against independently originated sequences based on the comparison between one versus several trees. This test was successfully applied to a well-supported homologous sequen...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-016-0120-y

    authors: de Oliveira Martins L,Posada D

    更新日期:2016-04-07 00:00:00

  • Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements.

    abstract:BACKGROUND:In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the ta...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-4-29

    authors: Makarova KS,Wolf YI,van der Oost J,Koonin EV

    更新日期:2009-08-25 00:00:00

  • The progene hypothesis: the nucleoprotein world and how life began.

    abstract::In this article, I review the results of studies on the origin of life distinct from the popular RNA world hypothesis. The alternate scenario postulates the origin of the first bimolecular genetic system (a polynucleotide gene and a polypeptide processive polymerase) with simultaneous replication and translation and i...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-015-0096-z

    authors: Altstein AD

    更新日期:2015-11-26 00:00:00

  • Evolution before genes.

    abstract:BACKGROUND:Our current understanding of evolution is so tightly linked to template-dependent replication of DNA and RNA molecules that the old idea from Oparin of a self-reproducing 'garbage bag' ('coacervate') of chemicals that predated fully-fledged cell-like entities seems to be farfetched to most scientists today. ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-1

    authors: Vasas V,Fernando C,Santos M,Kauffman S,Szathmáry E

    更新日期:2012-01-05 00:00:00

  • The UBR-box and its relationship to binuclear RING-like treble clef zinc fingers.

    abstract:BACKGROUND:The N-end rule pathway is a part of the ubiquitin-dependent proteolytic system wherein N-recognin proteins recognize the amino terminal degradation signals (N-degrons) of the substrate. The type 1 N-degron recognizing UBR-box domain of the eukaryotic Arg/N-end rule pathway is known to possess a novel three-z...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-015-0066-5

    authors: Kaur G,Subramanian S

    更新日期:2015-07-17 00:00:00

  • Orphan SelD proteins and selenium-dependent molybdenum hydroxylases.

    abstract::Bacterial and Archaeal cells use selenium structurally in selenouridine-modified tRNAs, in proteins translated with selenocysteine, and in the selenium-dependent molybdenum hydroxylases (SDMH). The first two uses both require the selenophosphate synthetase gene, selD. Examining over 500 complete prokaryotic genomes fi...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-4

    authors: Haft DH,Self WT

    更新日期:2008-02-20 00:00:00

  • Optimal treatment and stochastic modeling of heterogeneous tumors.

    abstract:UNLABELLED:In this work we review past articles that have mathematically studied cancer heterogeneity and the impact of this heterogeneity on the structure of optimal therapy. We look at past works on modeling how heterogeneous tumors respond to radiotherapy, and take a particularly close look at how the optimal radiot...

    journal_title:Biology direct

    pub_type: 杂志文章,评审

    doi:10.1186/s13062-016-0142-5

    authors: Badri H,Leder K

    更新日期:2016-08-23 00:00:00

  • The manoeuvrability hypothesis to explain the maintenance of bilateral symmetry in animal evolution.

    abstract:BACKGROUND:The overwhelming majority of animal species exhibit bilateral symmetry. However, the precise evolutionary importance of bilateral symmetry is unknown, although elements of the understanding of the phenomenon have been present within the scientific community for decades. PRESENTATION OF THE HYPOTHESIS:Here w...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-22

    authors: Holló G,Novák M

    更新日期:2012-07-12 00:00:00

  • Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination.

    abstract:UNLABELLED:Recently Mycobacterium tuberculosis was shown to possess a novel protein modification, in which a small protein Pup is conjugated to the epsilon-amino groups of lysines in target proteins. Analogous to ubiquitin modification in eukaryotes, this remarkable modification recruits proteins for degradation via ar...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-45

    authors: Iyer LM,Burroughs AM,Aravind L

    更新日期:2008-11-03 00:00:00

  • Proteomic changes associated with deletion of the Magnaporthe oryzae conidial morphology-regulating gene COM1.

    abstract:BACKGROUND:The rice blast disease caused by Magnaporthe oryzae is a major constraint on world rice production. The conidia produced by this fungal pathogen are the main source of disease dissemination. The morphology of conidia may be a critical factor in the spore dispersal and virulence of M. oryzae in the field. Del...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-5-61

    authors: Bhadauria V,Wang LX,Peng YL

    更新日期:2010-11-02 00:00:00

  • Modeling the population dynamics of lemon sharks.

    abstract:BACKGROUND:Long-lived marine megavertebrates (e.g. sharks, turtles, mammals, and seabirds) are inherently vulnerable to anthropogenic mortality. Although some mathematical models have been applied successfully to manage these animals, more detailed treatments are often needed to assess potential drivers of population d...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-23

    authors: White ER,Nagy JD,Gruber SH

    更新日期:2014-11-18 00:00:00

  • Evidence-based gene models for structural and functional annotations of the oil palm genome.

    abstract:BACKGROUND:Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-017-0191-4

    authors: Chan KL,Tatarinova TV,Rosli R,Amiruddin N,Azizi N,Halim MAA,Sanusi NSNM,Jayanthi N,Ponomarenko P,Triska M,Solovyev V,Firdaus-Raih M,Sambanthamurthi R,Murphy D,Low EL

    更新日期:2017-09-08 00:00:00

  • A highly conserved family of inactivated archaeal B family DNA polymerases.

    abstract::A widespread and highly conserved family of apparently inactivated derivatives of archaeal B-family DNA polymerases is described. Phylogenetic analysis shows that the inactivated forms comprise a distinct clade among archaeal B-family polymerases and that, within this clade, Euryarchaea and Crenarchaea are clearly sep...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-3-32

    authors: Rogozin IB,Makarova KS,Pavlov YI,Koonin EV

    更新日期:2008-08-06 00:00:00

  • Pseudo-chaotic oscillations in CRISPR-virus coevolution predicted by bifurcation analysis.

    abstract:BACKGROUND:The CRISPR-Cas systems of adaptive antivirus immunity are present in most archaea and many bacteria, and provide resistance to specific viruses or plasmids by inserting fragments of foreign DNA into the host genome and then utilizing transcripts of these spacers to inactivate the cognate foreign genome. The ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-9-13

    authors: Berezovskaya FS,Wolf YI,Koonin EV,Karev GP

    更新日期:2014-07-02 00:00:00

  • The mechanistic and evolutionary aspects of the 2'- and 3'-OH paradigm in biosynthetic machinery.

    abstract:BACKGROUND:The translation machinery underlies a multitude of biological processes within the cell. The design and implementation of the modern translation apparatus on even the simplest course of action is extremely complex, and involves different RNA and protein factors. According to the "RNA world" idea, the critica...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-8-17

    authors: Safro M,Klipcan L

    更新日期:2013-07-08 00:00:00

  • Domain enhanced lookup time accelerated BLAST.

    abstract:BACKGROUND:BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/1745-6150-7-12

    authors: Boratyn GM,Schäffer AA,Agarwala R,Altschul SF,Lipman DJ,Madden TL

    更新日期:2012-04-17 00:00:00

  • Rotational restriction of nascent peptides as an essential element of co-translational protein folding: possible molecular players and structural consequences.

    abstract:BACKGROUND:A basic tenet of protein science is that all information about the spatial structure of proteins is present in their sequences. Nonetheless, many proteins fail to attain native structure upon experimental denaturation and refolding in vitro, raising the question of the specific role of cellular machinery in ...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-017-0186-1

    authors: Sorokina I,Mushegian A

    更新日期:2017-05-31 00:00:00

  • Elusive data underlying debate at the prokaryote-eukaryote divide.

    abstract:BACKGROUND:The origin of eukaryotic cells was an important transition in evolution. The factors underlying the origin and evolutionary success of the eukaryote lineage are still discussed. One camp argues that mitochondria were essential for eukaryote origin because of the unique configuration of internalized bioenerge...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-018-0221-x

    authors: Gerlitz M,Knopp M,Kapust N,Xavier JC,Martin WF

    更新日期:2018-10-03 00:00:00

  • PEPstrMOD: structure prediction of peptides containing natural, non-natural and modified residues.

    abstract:BACKGROUND:In the past, many methods have been developed for peptide tertiary structure prediction but they are limited to peptides having natural amino acids. This study describes a method PEPstrMOD, which is an updated version of PEPstr, developed specifically for predicting the structure of peptides containing natur...

    journal_title:Biology direct

    pub_type: 杂志文章

    doi:10.1186/s13062-015-0103-4

    authors: Singh S,Singh H,Tuknait A,Chaudhary K,Singh B,Kumaran S,Raghava GP

    更新日期:2015-12-21 00:00:00