Predicting disease risks from highly imbalanced data using random forest.

Abstract:

BACKGROUND:We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare. METHODS:We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases. RESULTS:We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process. CONCLUSIONS:In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.

authors

Khalilia M,Chakraborty S,Popescu M

doi

10.1186/1472-6947-11-51

subject

Has Abstract

pub_date

2011-07-29 00:00:00

pages

51

issn

1472-6947

pii

1472-6947-11-51

journal_volume

11

pub_type

杂志文章
  • Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches.

    abstract:BACKGROUND:Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-019-0981-y

    authors: Weegar R,Pérez A,Casillas A,Oronoz M

    更新日期:2019-12-23 00:00:00

  • Computer-aided DSM-IV-diagnostics - acceptance, use and perceived usefulness in relation to users' learning styles.

    abstract:BACKGROUND:CDSS (computerized decision support system) for medical diagnostics have been studied for long. This study was undertaken to investigate how different preferences of Learning Styles (LS) of psychiatrists might affect acceptance, use and perceived usefulness of a CDSS for diagnostics in psychiatry. METHODS:4...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-5-1

    authors: Bergman LG,Fors UG

    更新日期:2005-01-07 00:00:00

  • An algorithm to identify patients with treated type 2 diabetes using medico-administrative data.

    abstract:BACKGROUND:National authorities have to follow the evolution of diabetes to implement public health policies. An algorithm was developed to identify patients with treated type 2 diabetes and estimate its annual prevalence in Luxembourg using health insurance claims when no diagnosis code is available. METHODS:The DIAB...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-11-23

    authors: Renard LM,Bocquet V,Vidal-Trecan G,Lair ML,Couffignal S,Blum-Boisgard C

    更新日期:2011-04-14 00:00:00

  • Predicting cardiovascular health trajectories in time-series electronic health records with LSTM models.

    abstract:BACKGROUND:Cardiovascular disease (CVD) is the leading cause of death in the United States (US). Better cardiovascular health (CVH) is associated with CVD prevention. Predicting future CVH levels may help providers better manage patients' CVH. We hypothesized that CVH measures can be predicted based on previous measure...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-01345-1

    authors: Guo A,Beheshti R,Khan YM,Langabeer JR 2nd,Foraker RE

    更新日期:2021-01-06 00:00:00

  • Comparison of clinical knowledge management capabilities of commercially-available and leading internally-developed electronic health records.

    abstract:BACKGROUND:We have carried out an extensive qualitative research program focused on the barriers and facilitators to successful adoption and use of various features of advanced, state-of-the-art electronic health records (EHRs) within large, academic, teaching facilities with long-standing EHR research and development ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-11-13

    authors: Sittig DF,Wright A,Meltzer S,Simonaitis L,Evans RS,Nichol WP,Ash JS,Middleton B

    更新日期:2011-02-17 00:00:00

  • The Computer-based Health Evaluation Software (CHES): a software for electronic patient-reported outcome monitoring.

    abstract:BACKGROUND:Patient-reported Outcomes (PROs) capturing e.g., quality of life, fatigue, depression, medication side-effects or disease symptoms, have become important outcome parameters in medical research and daily clinical practice. Electronic PRO data capture (ePRO) with software packages to administer questionnaires,...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-12-126

    authors: Holzner B,Giesinger JM,Pinggera J,Zugal S,Schöpf F,Oberguggenberger AS,Gamper EM,Zabernigg A,Weber B,Rumpold G

    更新日期:2012-11-09 00:00:00

  • Automatic schizophrenic discrimination on fNIRS by using complex brain network analysis and SVM.

    abstract:BACKGROUND:Schizophrenia is a kind of serious mental illness. Due to the lack of an objective physiological data supporting and a unified data analysis method, doctors can only rely on the subjective experience of the data to distinguish normal people and patients, which easily lead to misdiagnosis. In recent years, fu...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-017-0559-5

    authors: Song H,Chen L,Gao R,Bogdan IIM,Yang J,Wang S,Dong W,Quan W,Dang W,Yu X

    更新日期:2017-12-20 00:00:00

  • Information sharing across generations and environments (InfoSAGE): study design and methodology protocol.

    abstract:BACKGROUND:Longevity creates increasing care needs for healthcare providers and family caregivers. Increasingly, the burden of care falls to one primary caregiver, increasing stress and reducing health outcomes. Additionally, little has been published on adults', over the age of 75, preferences in the development of he...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-018-0697-4

    authors: Quintana Y,Crotty B,Fahy D,Lipsitz L,Davis RB,Safran C

    更新日期:2018-11-20 00:00:00

  • Factors influencing the surgery intentions and choices of women with early breast cancer: the predictive utility of an extended theory of planned behaviour.

    abstract:BACKGROUND:Women diagnosed with early breast cancer (stage I or II) can be offered the choice between mastectomy or breast conservation surgery with radiotherapy due to equivalence in survival rates. A wide variation in the surgical management of breast cancer and a lack of theoretically guided research on this issue h...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-92

    authors: Sivell S,Elwyn G,Edwards A,Manstead AS,BresDex group.

    更新日期:2013-08-20 00:00:00

  • Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts.

    abstract:BACKGROUND:The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-016-0294-3

    authors: Verspoor KM,Heo GE,Kang KY,Song M

    更新日期:2016-07-18 00:00:00

  • Applying a framework for assessing the health system challenges to scaling up mHealth in South Africa.

    abstract:BACKGROUND:Mobile phone technology has demonstrated the potential to improve health service delivery, but there is little guidance to inform decisions about acquiring and implementing mHealth technology at scale in health systems. Using the case of community-based health services (CBS) in South Africa, we apply a frame...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-12-123

    authors: Leon N,Schneider H,Daviaud E

    更新日期:2012-11-05 00:00:00

  • Web-based interactive mapping from data dictionaries to ontologies, with an application to cancer registry.

    abstract:BACKGROUND:The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S. and Canadian cancer registries, KCR uses a data dictionary provided by t...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-01288-7

    authors: Tao S,Zeng N,Hands I,Hurt-Mueller J,Durbin EB,Cui L,Zhang GQ

    更新日期:2020-12-15 00:00:00

  • The caCORE Software Development Kit: streamlining construction of interoperable biomedical information services.

    abstract:BACKGROUND:Robust, programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources are challenging to construct. Such systems require the adoption of common information models, data representations and terminology standards as well as documented applicat...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-6-2

    authors: Phillips J,Chilukuri R,Fragoso G,Warzel D,Covitz PA

    更新日期:2006-01-06 00:00:00

  • A cohort study of a tailored web intervention for preconception care.

    abstract:BACKGROUND:Preconception care may be an efficacious tool to reduce risk factors for adverse pregnancy outcomes that are associated with lifestyles and health status before pregnancy. We conducted a web-based cohort study in Italian women planning a pregnancy to assess whether a tailored web intervention may change know...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-14-33

    authors: Agricola E,Pandolfi E,Gonfiantini MV,Gesualdo F,Romano M,Carloni E,Mastroiacovo P,Tozzi AE

    更新日期:2014-04-15 00:00:00

  • The challenges of emerging HISs in bridging the communication gaps among physicians and nurses in China: an interview study.

    abstract:BACKGROUND:To explore the current situation, existing problems and possible causes of said problems with regards to physician-nurse communication under an environment of increasingly widespread usage of Hospital Information Systems and to seek out new potential strategies in information technology to improve physician-...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-017-0473-x

    authors: Wen D,Zhang X,Wan J,Fu J,Lei J

    更新日期:2017-06-12 00:00:00

  • Development of a validation algorithm for 'present on admission' flagging.

    abstract:BACKGROUND:The use of routine hospital data for understanding patterns of adverse outcomes has been limited in the past by the fact that pre-existing and post-admission conditions have been indistinguishable. The use of a 'Present on Admission' (or POA) indicator to distinguish pre-existing or co-morbid conditions from...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-9-48

    authors: Jackson TJ,Michel JL,Roberts R,Shepheard J,Cheng D,Rust J,Perry C

    更新日期:2009-12-01 00:00:00

  • Association between borderline dysnatremia and mortality insight into a new data mining approach.

    abstract:BACKGROUND:Even small variations of serum sodium concentration may be associated with mortality. Our objective was to confirm the impact of borderline dysnatremia for patients admitted to hospital on in-hospital mortality using real life care data from our electronic health record (EHR) and a phenome-wide association a...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-017-0549-7

    authors: Girardeau Y,Jannot AS,Chatellier G,Saint-Jean O

    更新日期:2017-11-22 00:00:00

  • Information discovery on electronic health records using authority flow techniques.

    abstract:BACKGROUND:As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieva...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-10-64

    authors: Hristidis V,Varadarajan RR,Biondich P,Weiner M

    更新日期:2010-10-22 00:00:00

  • Designing a multifaceted survivorship care plan to meet the information and communication needs of breast cancer patients and their family physicians: results of a qualitative pilot study.

    abstract:BACKGROUND:Following the completion of treatment and as they enter the follow-up phase, breast cancer patients (BCPs) often recount feeling 'lost in transition', and are left with many questions concerning how their ongoing care and monitoring for recurrence will be managed. Family physicians (FPs) also frequently repo...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-76

    authors: Haq R,Heus L,Baker NA,Dastur D,Leung FH,Leung E,Li B,Vu K,Parsons JA

    更新日期:2013-07-25 00:00:00

  • "Assessment of the social influence and facilitating conditions that support nurses' adoption of hospital electronic information management systems (HEIMS) in Ghana using the unified theory of acceptance and use of technology (UTAUT) model".

    abstract:BACKGROUND:Hospital electronic information management systems (HEIMS) are widely used in Ghana, and hence its performance must be carefully assessed. Nurses as clinical health personnel are the largest cluster of hospital staff and are the pillar of healthcare delivery. Therefore, they play a crucial role in the adopti...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-019-0956-z

    authors: Zhou LL,Owusu-Marfo J,Asante Antwi H,Antwi MO,Kachie ADT,Ampon-Wireko S

    更新日期:2019-11-21 00:00:00

  • Initial development of Supportive care Assessment, Prioritization and Recommendations for Kids (SPARK), a symptom screening and management application.

    abstract:BACKGROUND:We developed Supportive care Prioritization, Assessment and Recommendations for Kids (SPARK), a web-based application designed to facilitate symptom screening by children receiving cancer treatments and access to supportive care clinical practice guidelines primarily by healthcare providers. The objective wa...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-018-0715-6

    authors: Cook S,Vettese E,Soman D,Hyslop S,Kuczynski S,Spiegler B,Davis H,Duong N,Ou Wai S,Golabek R,Golabek P,Antoszek-Rallo A,Schechter T,Lee Dupuis L,Sung L

    更新日期:2019-01-10 00:00:00

  • Recommended practices for computerized clinical decision support and knowledge management in community settings: a qualitative study.

    abstract:BACKGROUND:The purpose of this study was to identify recommended practices for computerized clinical decision support (CDS) development and implementation and for knowledge management (KM) processes in ambulatory clinics and community hospitals using commercial or locally developed systems in the U.S. METHODS:Guided b...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-12-6

    authors: Ash JS,Sittig DF,Guappone KP,Dykstra RH,Richardson J,Wright A,Carpenter J,McMullen C,Shapiro M,Bunce A,Middleton B

    更新日期:2012-02-14 00:00:00

  • BioSunMS: a plug-in-based software for the management of patients information and the analysis of peptide profiles from mass spectrometry.

    abstract:BACKGROUND:With wide applications of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS), statistical comparison of serum peptide profiles and management of patients information play ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-9-13

    authors: Cao Y,Wang N,Ying X,Li A,Wang H,Zhang X,Li W

    更新日期:2009-02-17 00:00:00

  • Implementation of informatics for integrating biology and the bedside (i2b2) platform as Docker containers.

    abstract:BACKGROUND:Informatics for Integrating Biology and the Bedside (i2b2) is an open source clinical data analytics platform used at over 200 healthcare institutions for querying patient data. The i2b2 platform has several components with numerous dependencies and configuration parameters, which renders the task of install...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-018-0646-2

    authors: Wagholikar KB,Dessai P,Sanz J,Mendis ME,Bell DS,Murphy SN

    更新日期:2018-07-16 00:00:00

  • Customization scenarios for de-identification of clinical notes.

    abstract:BACKGROUND:Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-1026-2

    authors: Hartman T,Howell MD,Dean J,Hoory S,Slyper R,Laish I,Gilon O,Vainstein D,Corrado G,Chou K,Po MJ,Williams J,Ellis S,Bee G,Hassidim A,Amira R,Beryozkin G,Szpektor I,Matias Y

    更新日期:2020-01-30 00:00:00

  • Methods for identifying 30 chronic conditions: application to administrative data.

    abstract:BACKGROUND:Multimorbidity is common and associated with poor clinical outcomes and high health care costs. Administrative data are a promising tool for studying the epidemiology of multimorbidity. Our goal was to derive and apply a new scheme for using administrative data to identify the presence of chronic conditions ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-015-0155-5

    authors: Tonelli M,Wiebe N,Fortin M,Guthrie B,Hemmelgarn BR,James MT,Klarenbach SW,Lewanczuk R,Manns BJ,Ronksley P,Sargious P,Straus S,Quan H,Alberta Kidney Disease Network.

    更新日期:2015-04-17 00:00:00

  • Usability evaluation and adaptation of the e-health Personal Patient Profile-Prostate decision aid for Spanish-speaking Latino men.

    abstract:BACKGROUND:The Personal Patient Profile-Prostate (P3P), a web-based decision aid, was demonstrated to reduce decisional conflict in English-speaking men with localized prostate cancer early after initial diagnosis. The purpose of this study was to explore and enhance usability and cultural appropriateness of a Spanish ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-015-0180-4

    authors: Berry DL,Halpenny B,Bosco JLF,Bruyere J Jr,Sanda MG

    更新日期:2015-07-24 00:00:00

  • Increasing utilization of Internet-based resources following efforts to promote evidence-based medicine: a national study in Taiwan.

    abstract:BACKGROUND:Since the beginning of 2007, the National Health Research Institutes has been promoting the dissemination of evidence-based medicine (EBM). The current study examined longitudinal trends of behaviors in how hospital-based physicians and nurses have searched for medical information during the spread of EBM. ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-4

    authors: Weng YH,Kuo KN,Yang CY,Lo HL,Shih YH,Chen C,Chiu YW

    更新日期:2013-01-07 00:00:00

  • Patient and provider acceptance of telecoaching in type 2 diabetes: a mixed-method study embedded in a randomised clinical trial.

    abstract:BACKGROUND:Despite advances in diagnosis and treatment of type 2 diabetes, suboptimal metabolic control persists. Patient education in diabetes has been proved to enhance self-efficacy and guideline-driven treatment, however many people with type 2 diabetes do not have access to or do not participate in self-management...

    journal_title:BMC medical informatics and decision making

    pub_type: 临床试验,杂志文章

    doi:10.1186/s12911-016-0383-3

    authors: Odnoletkova I,Buysse H,Nobels F,Goderis G,Aertgeerts B,Annemans L,Ramaekers D

    更新日期:2016-11-09 00:00:00

  • Evaluation of syndromic algorithms for detecting patients with potentially transmissible infectious diseases based on computerised emergency-department data.

    abstract:BACKGROUND:The objective of this study was to ascertain the performance of syndromic algorithms for the early detection of patients in healthcare facilities who have potentially transmissible infectious diseases, using computerised emergency department (ED) data. METHODS:A retrospective cohort in an 810-bed University...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-101

    authors: Gerbier-Colomban S,Gicquel Q,Millet AL,Riou C,Grando J,Darmoni S,Potinet-Pagliaroli V,Metzger MH

    更新日期:2013-09-03 00:00:00