Gene Expression Value Prediction Based on XGBoost Algorithm.

Abstract:

:Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of XGBoost model on the GEO dataset and RNA-seq dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.

journal_name

Front Genet

journal_title

Frontiers in genetics

authors

Li W,Yin Y,Quan X,Zhang H

doi

10.3389/fgene.2019.01077

subject

Has Abstract

pub_date

2019-11-12 00:00:00

pages

1077

issn

1664-8021

journal_volume

10

pub_type

杂志文章
  • Association of Transfer RNA Fragments in White Blood Cells With Antibody Response to Bovine Leukemia Virus in Holstein Cattle.

    abstract::Bovine leukemia virus (BLV) affects cattle health and productivity worldwide, causing abnormal immune function and immunosuppression. Transfer RNA fragments (tRFs) are known to be involved in inhibition of gene expression and have been associated with stress and immune response, tumor growth, and viral infection. The ...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2018.00236

    authors: Taxis TM,Kehrli ME Jr,D'Orey-Branco R,Casas E

    更新日期:2018-07-04 00:00:00

  • The Spectrum of Neurological and White Matter Changes and Premutation Status Categories of Older Male Carriers of the FMR1 Alleles Are Linked to Genetic (CGG and FMR1 mRNA) and Cellular Stress (AMPK) Markers.

    abstract::The fragile X premutation (PM) allele contains a CGG expansion of 55-200 repeats in the FMR1 gene's promoter. Male PM carriers have an elevated risk of developing neurological and psychiatric changes, including an approximately 50% risk of the fragile X-associated tremor/ataxia syndrome (FXTAS). The aim of this study ...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2018.00531

    authors: Loesch DZ,Trost N,Bui MQ,Hammersley E,Lay ST,Annesley SJ,Sanislav O,Allan CY,Tassone F,Chen ZP,Ngoei KRW,Kemp BE,Francis D,Fisher PR,Storey E

    更新日期:2018-11-12 00:00:00

  • Assessing the Impact of Sample Heterogeneity on Transcriptome Analysis of Human Diseases Using MDP Webtool.

    abstract::Transcriptome analyses have increased our understanding of the molecular mechanisms underlying human diseases. Most approaches aim to identify significant genes by comparing their expression values between healthy subjects and a group of patients with a certain disease. Given that studies normally contain few samples,...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.00971

    authors: Gonçalves ANA,Lever M,Russo PST,Gomes-Correia B,Urbanski AH,Pollara G,Noursadeghi M,Maracaja-Coutinho V,Nakaya HI

    更新日期:2019-10-24 00:00:00

  • Neurologic Manifestations as Initial Clinical Presentation of Familial Hemophagocytic Lymphohistiocytosis Type2 Due to PRF1 Mutation in Chinese Pediatric Patients.

    abstract::Familial hemophagocytic lymphohistiocytosis Type 2 (FHL2) associated central nervous system (CNS) involvement is less understood in children, especially when considering neurologic manifestations as part of the initial presentation. We conducted a retrospective review of the clinical manifestations and genetic abnorma...

    journal_title:Frontiers in genetics

    pub_type:

    doi:10.3389/fgene.2020.00126

    authors: Feng WX,Yang XY,Li JW,Gong S,Wu Y,Zhang WH,Han TL,Zhuo XW,Ding CH,Fang F

    更新日期:2020-03-04 00:00:00

  • Expanded Newborn Screening for Inborn Errors of Metabolism by Tandem Mass Spectrometry in Suzhou, China: Disease Spectrum, Prevalence, Genetic Characteristics in a Chinese Population.

    abstract::Expanded newborn screening for inborn errors of metabolism (IEMs) by tandem mass spectrometry (MS/MS) could simultaneously analyze more than 40 metabolites and identify about 50 kinds of IEMs. Next generation sequencing (NGS) targeting hundreds of IMEs-associated genes as a follow-up test in expanded newborn screening...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.01052

    authors: Wang T,Ma J,Zhang Q,Gao A,Wang Q,Li H,Xiang J,Wang B

    更新日期:2019-10-29 00:00:00

  • Genome-Wide Identification and Characterization of the bHLH Transcription Factor Family in Pepper (Capsicum annuum L.).

    abstract::Plant basic helix-loop-helix (bHLH) transcription factors are involved in the regulation of various biological processes in plant growth, development, and stress response. However, members of this important transcription factor family have not been systematically identified and analyzed in pepper (Capsicum annuum L.)....

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.570156

    authors: Zhang Z,Chen J,Liang C,Liu F,Hou X,Zou X

    更新日期:2020-09-25 00:00:00

  • Corrigendum: Exposing the Causal Effect of C-Reactive Protein on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study.

    abstract::[This corrects the article DOI: 10.3389/fgene.2018.00657.]. ...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章,已发布勘误

    doi:10.3389/fgene.2019.00085

    authors: Cheng L,Zhuang H,Yang S,Jiang H,Wang S,Zhang J

    更新日期:2019-02-18 00:00:00

  • Causal Inference for Genetic Obesity, Cardiometabolic Profile and COVID-19 Susceptibility: A Mendelian Randomization Study.

    abstract:Background:Cross-sectional observational studies have reported obesity and cardiometabolic co-morbidities as important predictors of coronavirus disease 2019 (COVID-19) hospitalization. The causal impact of these risk factors is unknown at present. Methods:We conducted multivariable logistic regression to evaluate the...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.586308

    authors: Aung N,Khanji MY,Munroe PB,Petersen SE

    更新日期:2020-11-11 00:00:00

  • Genetic Model to Study the Co-Morbid Phenotypes of Increased Alcohol Intake and Prior Stress-Induced Enhanced Fear Memory.

    abstract::Posttraumatic Stress Disorder (PTSD) is a complex illness, frequently co-morbid with depression, caused by both genetics, and the environment. Alcohol Use Disorder (AUD), which also co-occurs with depression, is often co-morbid with PTSD. To date, very few genes have been identified for PTSD and even less for PTSD com...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2018.00566

    authors: Lim PH,Shi G,Wang T,Jenz ST,Mulligan MK,Redei EE,Chen H

    更新日期:2018-11-27 00:00:00

  • A Quasi-Domesticate Relic Hybrid Population of Saccharomyces cerevisiae × S. paradoxus Adapted to Olive Brine.

    abstract::The adaptation of the yeast Saccharomyces cerevisiae to man-made environments for the fermentation of foodstuffs and beverages illustrates the scientific, social, and economic relevance of microbe domestication. Here we address a yet unexplored aspect of S. cerevisiae domestication, that of the emergence of lineages h...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.00449

    authors: Pontes A,Čadež N,Gonçalves P,Sampaio JP

    更新日期:2019-05-29 00:00:00

  • A Mini-Atlas of Gene Expression for the Domestic Goat (Capra hircus).

    abstract::Goats (Capra hircus) are an economically important livestock species providing meat and milk across the globe. They are of particular importance in tropical agri-systems contributing to sustainable agriculture, alleviation of poverty, social cohesion, and utilisation of marginal grazing. There are excellent genetic an...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.01080

    authors: Muriuki C,Bush SJ,Salavati M,McCulloch MEB,Lisowski ZM,Agaba M,Djikeng A,Hume DA,Clark EL

    更新日期:2019-11-04 00:00:00

  • Characterizing ncRNAs in Human Pathogenic Protists Using High-Throughput Sequencing Technology.

    abstract::ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses, and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring man...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2011.00096

    authors: Collins LJ

    更新日期:2011-12-27 00:00:00

  • Rice Biofortification With Zinc and Selenium: A Transcriptomic Approach to Understand Mineral Accumulation in Flag Leaves.

    abstract::Human malnutrition due to micronutrient deficiencies, particularly with regards to Zinc (Zn) and Selenium (Se), affects millions of people around the world, and the enrichment of staple foods through biofortification has been successfully used to fight hidden hunger. Rice (Oryza sativa L.) is one of the staple foods m...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.00543

    authors: Roda FA,Marques I,Batista-Santos P,Esquível MG,Ndayiragije A,Lidon FC,Swamy BPM,Ramalho JC,Ribeiro-Barros AI

    更新日期:2020-07-07 00:00:00

  • Genetic Markers for Stevens-Johnson Syndrome/Toxic Epidermal Necrolysis in the Asian Indian Population: Implications on Prevention.

    abstract::This review attempts to collate all the studies performed in India or comprising a population originating from India and to find out if there is an association between the HLA (human leucocyte antigen) type of individual and development of Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) subsequent to med...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章,评审

    doi:10.3389/fgene.2020.607532

    authors: Shanbhag SS,Koduri MA,Kannabiran C,Donthineni PR,Singh V,Basu S

    更新日期:2021-01-12 00:00:00

  • Genome-Wide Analysis of Alternative Splicing Provides Insights Into Stress Response of the Pacific White Shrimp Litopenaeus vanname.

    abstract::Alternative splicing (AS) can enhance transcript diversity dramatically and play an important role in stress adaptation. Limited researches of AS have been reported in the Pacific white shrimp (Litopenaeus vannamei), which is an important aquaculture species in the world. Here, we performed a genome-wide identificatio...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.00845

    authors: Zhang X,Yuan J,Zhang X,Liu C,Xiang J,Li F

    更新日期:2019-09-12 00:00:00

  • A genomic comparison of two termites with different social complexity.

    abstract::The termites evolved eusociality and complex societies before the ants, but have been studied much less. The recent publication of the first two termite genomes provides a unique comparative opportunity, particularly because the sequenced termites represent opposite ends of the social complexity spectrum. Zootermopsis...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2015.00009

    authors: Korb J,Poulsen M,Hu H,Li C,Boomsma JJ,Zhang G,Liebig J

    更新日期:2015-03-04 00:00:00

  • Multi-genome alignment for quality control and contamination screening of next-generation sequencing data.

    abstract::The availability of massive amounts of DNA sequence data, from 1000s of genomes even in a single project has had a huge impact on our understanding of biology, but also creates several problems for biologists carrying out those experiments. Bioinformatic analysis of sequence data is perhaps the most obvious challenge ...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2014.00031

    authors: Hadfield J,Eldridge MD

    更新日期:2014-02-20 00:00:00

  • Machine Learning on Human Muscle Transcriptomic Data for Biomarker Discovery and Tissue-Specific Drug Target Identification.

    abstract::For the past several decades, research in understanding the molecular basis of human muscle aging has progressed significantly. However, the development of accessible tissue-specific biomarkers of human muscle aging that may be measured to evaluate the effectiveness of therapeutic interventions is still a major challe...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2018.00242

    authors: Mamoshina P,Volosnikova M,Ozerov IV,Putin E,Skibina E,Cortese F,Zhavoronkov A

    更新日期:2018-07-12 00:00:00

  • The Digenic Causality in Familial Hypercholesterolemia: Revising the Genotype-Phenotype Correlations of the Disease.

    abstract::Genetically inherited defects in lipoprotein metabolism affect more than 10 million individuals around the globe with preponderance in some parts where consanguinity played a major role in establishing founder mutations. Mutations in four genes have been so far linked to the dominant and recessive form of the disease....

    journal_title:Frontiers in genetics

    pub_type: 杂志文章,评审

    doi:10.3389/fgene.2020.572045

    authors: Kamar A,Khalil A,Nemer G

    更新日期:2021-01-15 00:00:00

  • Employing MCMC under the PPL framework to analyze sequence data in large pedigrees.

    abstract::The increased feasibility of whole-genome (or whole-exome) sequencing has led to renewed interest in using family data to find disease mutations. For clinical phenotypes that lend themselves to study in large families, this approach can be particularly effective, because it may be possible to obtain strong evidence of...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2013.00059

    authors: Huang Y,Thomas A,Vieland VJ

    更新日期:2013-04-19 00:00:00

  • Characterization of Two Satellite DNA Families in the Genome of the Oomycete Plant Pathogen Phytophthora parasitica.

    abstract::Satellite DNA is a class of repetitive sequences that are organized in long arrays of tandemly repeated units in most eukaryotes. Long considered as selfish DNA, satellite sequences are now proposed to contribute to genome integrity. Despite their potential impact on the architecture and evolution of the genome, satel...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.00557

    authors: Panabières F,Rancurel C,da Rocha M,Kuhn ML

    更新日期:2020-06-05 00:00:00

  • ZNF143 in Chromatin Looping and Gene Regulation.

    abstract::ZNF143, a human homolog of the transcriptional activator Staf, is a C2H2-type protein consisting of seven zinc finger domains. As a transcription factor (TF), ZNF143 is sequence specifically binding to chromatin and activates the expression of protein-coding and non-coding genes on a genome scale. Although it is ubiqu...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章,评审

    doi:10.3389/fgene.2020.00338

    authors: Ye B,Yang G,Li Y,Zhang C,Wang Q,Yu G

    更新日期:2020-04-07 00:00:00

  • Pharmacogenomics of acetaminophen in pediatric populations: a moving target.

    abstract::Acetaminophen (APAP) is widely used as an over-the-counter fever reducer and pain reliever. However, the current therapeutic use of APAP is not optimal. The inter-patient variability in both efficacy and toxicity limits the use of this drug. This is particularly an issue in pediatric populations, where tools for predi...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章,评审

    doi:10.3389/fgene.2014.00314

    authors: Krasniak AE,Knipp GT,Svensson CK,Liu W

    更新日期:2014-10-14 00:00:00

  • Biological Network Approach for the Identification of Regulatory Long Non-Coding RNAs Associated With Metabolic Efficiency in Cattle.

    abstract::Background: Genomic regions associated with divergent livestock feed efficiency have been found predominantly outside protein coding sequences. Long non-coding RNAs (lncRNA) can modulate chromatin accessibility, gene expression and act as important metabolic regulators in mammals. By integrating phenotypic, transcript...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.01130

    authors: Nolte W,Weikard R,Brunner RM,Albrecht E,Hammon HM,Reverter A,Kühn C

    更新日期:2019-11-22 00:00:00

  • Identification and Analysis of the GASR Gene Family in Common Wheat (Triticum aestivum L.) and Characterization of TaGASR34, a Gene Associated With Seed Dormancy and Germination.

    abstract::Seed dormancy and germination are important agronomic traits in wheat (Triticum aestivum L.) because they determine pre-harvest sprouting (PHS) resistance and thus affect grain production. These processes are regulated by Gibberellic Acid-Stimulated Regulator (GASR) genes. In this study, we identified 37 GASR genes in...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.00980

    authors: Cheng X,Wang S,Xu D,Liu X,Li X,Xiao W,Cao J,Jiang H,Min X,Wang J,Zhang H,Chang C,Lu J,Ma C

    更新日期:2019-10-18 00:00:00

  • Genome-Wide Association of Genetic Variants With Refraction, Axial Length, and Corneal Curvature: A Longitudinal Study of Chinese Schoolchildren.

    abstract:Background:Myopia is a common eye disorder that is approaching epidemic proportions worldwide. A genome-wide association study identified AREG (rs12511037), GABRR1 (rs13215566), and PDE10A (rs12206610) as being associated with refractive error in Asian populations. The present study investigated the associations betwee...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.00276

    authors: Lin Y,Ding Y,Jiang D,Li C,Huang X,Liu L,Xiao H,Vasudevan B,Chen Y

    更新日期:2020-03-25 00:00:00

  • Estimation of Recombination Rate and Maternal Linkage Disequilibrium in Half-Sibs.

    abstract::A livestock population can be characterized by different population genetic parameters, such as linkage disequilibrium and recombination rate between pairs of genetic markers. The population structure, which may be caused by family stratification, has an influence on the estimates of these parameters. An expectation m...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2018.00186

    authors: Hampel A,Teuscher F,Gomez-Raya L,Doschoris M,Wittenburg D

    更新日期:2018-06-05 00:00:00

  • Dissecting the Invasion-Associated Long Non-coding RNAs Using Single-Cell RNA-Seq Data of Glioblastoma.

    abstract::Glioblastoma (GBM) is characterized by rapid and lethal infiltration of brain tissue, which is the primary cause of treatment failure and deaths for GBM. Therefore, understanding the molecular mechanisms of tumor cell invasion is crucial for the treatment of GBM. In this study, we dissected the single-cell RNA-seq dat...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.633455

    authors: Pang B,Quan F,Ping Y,Hu J,Lan Y,Pang L

    更新日期:2021-01-11 00:00:00

  • DRIM: A Web-Based System for Investigating Drug Response at the Molecular Level by Condition-Specific Multi-Omics Data Integration.

    abstract::Pharmacogenomics is the study of how genes affect a person's response to drugs. Thus, understanding the effect of drug at the molecular level can be helpful in both drug discovery and personalized medicine. Over the years, transcriptome data upon drug treatment has been collected and several databases compiled before ...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2020.564792

    authors: Oh M,Park S,Lee S,Lee D,Lim S,Jeong D,Jo K,Jung I,Kim S

    更新日期:2020-11-12 00:00:00

  • A Combined in silico, in vitro and Clinical Approach to Characterize Novel Pathogenic Missense Variants in PRPF31 in Retinitis Pigmentosa.

    abstract::At least six different proteins of the spliceosome, including PRPF3, PRPF4, PRPF6, PRPF8, PRPF31, and SNRNP200, are mutated in autosomal dominant retinitis pigmentosa (adRP). These proteins have recently been shown to localize to the base of the connecting cilium of the retinal photoreceptor cells, elucidating this fo...

    journal_title:Frontiers in genetics

    pub_type: 杂志文章

    doi:10.3389/fgene.2019.00248

    authors: Wheway G,Nazlamova L,Meshad N,Hunt S,Jackson N,Churchill A

    更新日期:2019-03-22 00:00:00