Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method.

Abstract:

BACKGROUND:Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS:The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS:We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION:The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step.

authors

Siadaty MS,Knaus WA

doi

10.1186/1472-6947-6-13

keywords:

subject

Has Abstract

pub_date

2006-03-07 00:00:00

pages

13

issn

1472-6947

pii

1472-6947-6-13

journal_volume

6

pub_type

杂志文章
  • Evaluation of syndromic algorithms for detecting patients with potentially transmissible infectious diseases based on computerised emergency-department data.

    abstract:BACKGROUND:The objective of this study was to ascertain the performance of syndromic algorithms for the early detection of patients in healthcare facilities who have potentially transmissible infectious diseases, using computerised emergency department (ED) data. METHODS:A retrospective cohort in an 810-bed University...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-101

    authors: Gerbier-Colomban S,Gicquel Q,Millet AL,Riou C,Grando J,Darmoni S,Potinet-Pagliaroli V,Metzger MH

    更新日期:2013-09-03 00:00:00

  • Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition.

    abstract:BACKGROUND:This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in cl...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-019-0865-1

    authors: Lee W,Choi J

    更新日期:2019-07-15 00:00:00

  • Concordance between decision analysis and matching systematic review of randomized controlled trials in assessment of treatment comparisons: a systematic review.

    abstract:BACKGROUND:Systematic review (SR) of randomized controlled trials (RCT) is the gold standard for informing treatment choice. Decision analyses (DA) also play an important role in informing health care decisions. It is unknown how often the results of DA and matching SR of RCTs are in concordance. We assessed whether th...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章,评审

    doi:10.1186/1472-6947-14-57

    authors: Mhaskar RS,Wao H,Mahony H,Kumar A,Djulbegovic B

    更新日期:2014-07-15 00:00:00

  • Optimum binary cut-off threshold of a diagnostic test: comparison of different methods using Monte Carlo technique.

    abstract:BACKGROUND:Using Monte Carlo simulations, we compare different methods (maximizing Youden index, maximizing mutual information, and logistic regression) for their ability to determine optimum binary cut-off thresholds for a ratio-scaled diagnostic test variable. Special attention is given to the stability and precision...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-014-0099-1

    authors: Reibnegger G,Schrabmair W

    更新日期:2014-11-25 00:00:00

  • Assessing data availability and quality within an electronic health record system through external validation against an external clinical data source.

    abstract:BACKGROUND:Approximately 20% of deaths in the US each year are attributable to smoking, yet current practices in the recording of this health risk in electronic health records (EHRs) have not led to discernable changes in health outcomes. Several groups have developed algorithms for extracting smoking behaviors from cl...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-019-0864-2

    authors: Palmer EL,Higgins J,Hassanpour S,Sargent J,Robinson CM,Doherty JA,Onega T

    更新日期:2019-07-25 00:00:00

  • Socioeconomic and behavioural factors associated with access to and use of Personal Health Records.

    abstract:BACKGROUND:Access to and use of digital technology are more common among people of more advantaged socioeconomic status. These differences might be due to lack of interest, not having physical access or having lower intentions to use this technology. By integrating the digital divide approach and the User Acceptance of...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-01383-9

    authors: Paccoud I,Baumann M,Le Bihan E,Pétré B,Breinbauer M,Böhme P,Chauvel L,Leist AK

    更新日期:2021-01-13 00:00:00

  • A usability design checklist for Mobile electronic data capturing forms: the validation process.

    abstract:BACKGROUND:New Specific Application Domain (SAD) heuristics or design principles are being developed to guide the design and evaluation of mobile applications in a bid to improve on the usability of these applications. This is because the existing heuristics are rather generic and are often unable to reveal a large num...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-018-0718-3

    authors: Mugisha A,Nankabirwa V,Tylleskär T,Babic A

    更新日期:2019-01-09 00:00:00

  • Temporal aggregation impacts on epidemiological simulations employing microcontact data.

    abstract:BACKGROUND:Microcontact datasets gathered automatically by electronic devices have the potential augment the study of the spread of contagious disease by providing detailed representations of the study population's contact dynamics. However, the impact of data collection experimental design on the subsequent simulation...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-12-132

    authors: Hashemian M,Qian W,Stanley KG,Osgood ND

    更新日期:2012-11-15 00:00:00

  • The information imperative: to study the impact of informational discontinuity on clinical decision making among doctors.

    abstract:BACKGROUND:Informational discontinuity can have far reaching consequences like medical errors, increased re-hospitalization rates and adverse events among others. Thus the holy grail of seamless informational continuity in healthcare has been an enigma with some nations going the digital way. Digitization in healthcare...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-01190-2

    authors: Gowda NR,Kumar A,Arya SK,H V

    更新日期:2020-07-28 00:00:00

  • Assessing measures of comorbidity and functional status for risk adjustment to compare hospital performance for colorectal cancer surgery: a retrospective data-linkage study.

    abstract:BACKGROUND:Comparing outcomes between hospitals requires consideration of patient factors that could account for any observed differences. Adjusting for comorbid conditions is common when studying outcomes following cancer surgery, and a commonly used measure is the Charlson comorbidity index. Other measures of patient...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-015-0175-1

    authors: Dobbins TA,Badgery-Parker T,Currow DC,Young JM

    更新日期:2015-07-15 00:00:00

  • An algorithm to identify patients with treated type 2 diabetes using medico-administrative data.

    abstract:BACKGROUND:National authorities have to follow the evolution of diabetes to implement public health policies. An algorithm was developed to identify patients with treated type 2 diabetes and estimate its annual prevalence in Luxembourg using health insurance claims when no diagnosis code is available. METHODS:The DIAB...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-11-23

    authors: Renard LM,Bocquet V,Vidal-Trecan G,Lair ML,Couffignal S,Blum-Boisgard C

    更新日期:2011-04-14 00:00:00

  • Health care professionals' attitudes towards evidence-based medicine in the workers' compensation setting: a cohort study.

    abstract:BACKGROUND:Problems may arise during the approval process of treatment after a compensable work injury, which include excess paperwork, delays in approving services, disputes, and allegations of over-servicing. This is perceived as undesirable for injured people, health care professionals and claims managers, and costl...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-017-0460-2

    authors: Elbers NA,Chase R,Craig A,Guy L,Harris IA,Middleton JW,Nicholas MK,Rebbeck T,Walsh J,Willcock S,Lockwood K,Cameron ID

    更新日期:2017-05-22 00:00:00

  • Stratification of coronary artery disease patients for revascularization procedure based on estimating adverse effects.

    abstract:BACKGROUND:Percutaneous coronary intervention (PCI) is the most commonly performed treatment for coronary atherosclerosis. It is associated with a higher incidence of repeat revascularization procedures compared to coronary artery bypass grafting surgery. Recent results indicate that PCI is only cost-effective for a su...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-015-0131-0

    authors: Pölsterl S,Singh M,Katouzian A,Navab N,Kastrati A,Ladic L,Kamen A

    更新日期:2015-02-14 00:00:00

  • Designing a multifaceted survivorship care plan to meet the information and communication needs of breast cancer patients and their family physicians: results of a qualitative pilot study.

    abstract:BACKGROUND:Following the completion of treatment and as they enter the follow-up phase, breast cancer patients (BCPs) often recount feeling 'lost in transition', and are left with many questions concerning how their ongoing care and monitoring for recurrence will be managed. Family physicians (FPs) also frequently repo...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-76

    authors: Haq R,Heus L,Baker NA,Dastur D,Leung FH,Leung E,Li B,Vu K,Parsons JA

    更新日期:2013-07-25 00:00:00

  • Automatic schizophrenic discrimination on fNIRS by using complex brain network analysis and SVM.

    abstract:BACKGROUND:Schizophrenia is a kind of serious mental illness. Due to the lack of an objective physiological data supporting and a unified data analysis method, doctors can only rely on the subjective experience of the data to distinguish normal people and patients, which easily lead to misdiagnosis. In recent years, fu...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-017-0559-5

    authors: Song H,Chen L,Gao R,Bogdan IIM,Yang J,Wang S,Dong W,Quan W,Dang W,Yu X

    更新日期:2017-12-20 00:00:00

  • A community assessment of privacy preserving techniques for human genomes.

    abstract::To answer the need for the rigorous protection of biomedical data, we organized the Critical Assessment of Data Privacy and Protection initiative as a community effort to evaluate privacy-preserving dissemination techniques for biomedical data. We focused on the challenge of sharing aggregate human genomic data (e.g.,...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-14-S1-S1

    authors: Jiang X,Zhao Y,Wang X,Malin B,Wang S,Ohno-Machado L,Tang H

    更新日期:2014-01-01 00:00:00

  • Customization scenarios for de-identification of clinical notes.

    abstract:BACKGROUND:Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-1026-2

    authors: Hartman T,Howell MD,Dean J,Hoory S,Slyper R,Laish I,Gilon O,Vainstein D,Corrado G,Chou K,Po MJ,Williams J,Ellis S,Bee G,Hassidim A,Amira R,Beryozkin G,Szpektor I,Matias Y

    更新日期:2020-01-30 00:00:00

  • Use of online knowledge base in primary health care and correlation to health care quality: an observational study.

    abstract:BACKGROUND:Evidence-based information available at the point of care improves patient care outcomes. Online knowledge bases can increase the application of evidence-based medicine and influence patient outcome data which may be captured in quality registries. The aim of this study was to explore the effect of use of an...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-020-01313-9

    authors: Gerdesköld C,Toth-Pal E,Wårdh I,Nilsson GH,Nager A

    更新日期:2020-11-16 00:00:00

  • Development and pilot feasibility study of a health information technology tool to calculate mortality risk for patients with asymptomatic carotid stenosis: the Carotid Risk Assessment Tool (CARAT).

    abstract:BACKGROUND:Patients with no history of stroke but with stenosis of the carotid arteries can reduce the risk of future stroke with surgery or stenting. At present, a physicians' ability to recommend optimal treatments based on an individual's risk profile requires estimating the likelihood that a patient will have a poo...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-015-0141-y

    authors: Faerber AE,Horvath R,Stillman C,O'Connell ML,Hamilton AL,Newhall KA,Likosky DS,Goodney PP

    更新日期:2015-03-24 00:00:00

  • Are cancer-related decision aids appropriate for socially disadvantaged patients? A systematic review of US randomized controlled trials.

    abstract:BACKGROUND:Shared decision-making (SDM) is considered a key component of high quality cancer care and may be supported by patient decision aids (PtDAs). Many patients, however, face multiple social disadvantages that may influence their ability to fully participate in SDM or to use PtDAs; additionally, these social dis...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-016-0303-6

    authors: Enard KR,Dolan Mullen P,Kamath GR,Dixon NM,Volk RJ

    更新日期:2016-06-06 00:00:00

  • Atrial fibrillation classification based on convolutional neural networks.

    abstract:BACKGROUND:The global age-adjusted mortality rate related to atrial fibrillation (AF) registered a rapid growth in the last four decades, i.e., from 0.8 to 1.6 and 0.9 to 1.7 per 100,000 for men and women during 1990-2010, respectively. In this context, this study uses convolutional neural networks for classifying (dia...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-019-0946-1

    authors: Lee KS,Jung S,Gil Y,Son HS

    更新日期:2019-10-29 00:00:00

  • Patient and provider acceptance of telecoaching in type 2 diabetes: a mixed-method study embedded in a randomised clinical trial.

    abstract:BACKGROUND:Despite advances in diagnosis and treatment of type 2 diabetes, suboptimal metabolic control persists. Patient education in diabetes has been proved to enhance self-efficacy and guideline-driven treatment, however many people with type 2 diabetes do not have access to or do not participate in self-management...

    journal_title:BMC medical informatics and decision making

    pub_type: 临床试验,杂志文章

    doi:10.1186/s12911-016-0383-3

    authors: Odnoletkova I,Buysse H,Nobels F,Goderis G,Aertgeerts B,Annemans L,Ramaekers D

    更新日期:2016-11-09 00:00:00

  • Usability evaluation and adaptation of the e-health Personal Patient Profile-Prostate decision aid for Spanish-speaking Latino men.

    abstract:BACKGROUND:The Personal Patient Profile-Prostate (P3P), a web-based decision aid, was demonstrated to reduce decisional conflict in English-speaking men with localized prostate cancer early after initial diagnosis. The purpose of this study was to explore and enhance usability and cultural appropriateness of a Spanish ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-015-0180-4

    authors: Berry DL,Halpenny B,Bosco JLF,Bruyere J Jr,Sanda MG

    更新日期:2015-07-24 00:00:00

  • A practical approach for incorporating dependence among fields in probabilistic record linkage.

    abstract:BACKGROUND:Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-13-97

    authors: Daggy JK,Xu H,Hui SL,Gamache RE,Grannis SJ

    更新日期:2013-08-30 00:00:00

  • Design and evaluation of a mobile application to assist the self-monitoring of the chronic kidney disease in developing countries.

    abstract:BACKGROUND:The chronic kidney disease (CKD) is a worldwide critical problem, especially in developing countries. CKD patients usually begin their treatment in advanced stages, which requires dialysis and kidney transplantation, and consequently, affects mortality rates. This issue is faced by a mobile health (mHealth) ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-018-0587-9

    authors: Sobrinho A,da Silva LD,Perkusich A,Pinheiro ME,Cunha P

    更新日期:2018-01-12 00:00:00

  • Diagnostic support for selected neuromuscular diseases using answer-pattern recognition and data mining techniques: a proof of concept multicenter prospective trial.

    abstract:BACKGROUND:Diagnosis of neuromuscular diseases in primary care is often challenging. Rare diseases such as Pompe disease are easily overlooked by the general practitioner. We therefore aimed to develop a diagnostic support tool using patient-oriented questions and combined data mining algorithms recognizing answer patt...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章,多中心研究

    doi:10.1186/s12911-016-0268-5

    authors: Grigull L,Lechner W,Petri S,Kollewe K,Dengler R,Mehmecke S,Schumacher U,Lücke T,Schneider-Gold C,Köhler C,Güttsches AK,Kortum X,Klawonn F

    更新日期:2016-03-08 00:00:00

  • SciReader enables reading of medical content with instantaneous definitions.

    abstract:BACKGROUND:A major problem patients encounter when reading about health related issues is document interpretation, which limits reading comprehension and therefore negatively impacts health care. Currently, searching for medical definitions from an external source is time consuming, distracting, and negatively impacts ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-11-4

    authors: Gradie PR,Litster M,Thomas R,Vyas J,Schiller MR

    更新日期:2011-01-25 00:00:00

  • Models predicting the growth response to growth hormone treatment in short children independent of GH status, birth size and gestational age.

    abstract:BACKGROUND:Mathematical models can be used to predict individual growth responses to growth hormone (GH) therapy. The aim of this study was to construct and validate high-precision models to predict the growth response to GH treatment of short children, independent of their GH status, birth size and gestational age. As...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-7-40

    authors: Dahlgren J,Kriström B,Niklasson A,Nierop AF,Rosberg S,Albertsson-Wikland K

    更新日期:2007-12-12 00:00:00

  • Using machine learning models to improve stroke risk level classification methods of China national stroke screening.

    abstract:BACKGROUND:With the character of high incidence, high prevalence and high mortality, stroke has brought a heavy burden to families and society in China. In 2009, the Ministry of Health of China launched the China national stroke screening and intervention program, which screens stroke and its risk factors and conducts ...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/s12911-019-0998-2

    authors: Li X,Bian D,Yu J,Li M,Zhao D

    更新日期:2019-12-10 00:00:00

  • Brain mapping and detection of functional patterns in fMRI using wavelet transform; application in detection of dyslexia.

    abstract:BACKGROUND:Functional Magnetic Resonance Imaging (fMRI) has been proven to be useful for studying brain functions. However, due to the existence of noise and distortion, mapping between the fMRI signal and the actual neural activity is difficult. Because of the difficulty, differential pattern analysis of fMRI brain im...

    journal_title:BMC medical informatics and decision making

    pub_type: 杂志文章

    doi:10.1186/1472-6947-9-S1-S6

    authors: Ji SY,Ward K,Najarian K

    更新日期:2009-11-03 00:00:00