Abstract:
:The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.
journal_name
J Biomed Informjournal_title
Journal of biomedical informaticsauthors
O'Driscoll A,Belogrudov V,Carroll J,Kropp K,Walsh P,Ghazal P,Sleator RDdoi
10.1016/j.jbi.2015.01.008subject
Has Abstractpub_date
2015-04-01 00:00:00pages
58-64eissn
1532-0464issn
1532-0480pii
S1532-0464(15)00010-6journal_volume
54pub_type
杂志文章abstract:BACKGROUND:Data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching/listening. However, health-related terms are often misspelled in such noisy text sources due to their complex morphology, resulting in the exclusion of relevant data for studies. In this pape...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.11.007
更新日期:2018-12-01 00:00:00
abstract::A graphical representation of DNA sequences in which the distribution of a particular base B=A,C,G,T is represented by a set of discrete lines has been formulated. The methodology of this approach has been borrowed from two areas of physics: spectroscopy and dynamics. Consequently, the set of discrete lines is referre...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.06.001
更新日期:2017-08-01 00:00:00
abstract::Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.08.003
更新日期:2017-09-01 00:00:00
abstract::Recruiting eligible patients for clinical trials is crucial for reliably answering specific questions about medical interventions and evaluation. However, clinical trial recruitment is a bottleneck in clinical research and drug development. Our goal is to provide an approach towards automating this manual and time-con...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103406
更新日期:2020-05-01 00:00:00
abstract::Chronic patients must carry out a rigorous control of diverse factors in their lives. Diet, sport activity, medical analysis or blood glucose levels are some of them. This is a hard task, because some of these controls are performed very often, for instance some diabetics measure their glucose levels several times eve...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.12.015
更新日期:2014-04-01 00:00:00
abstract::End-stage renal disease (ESRD) is the final stage of chronic kidney disease (CKD) and requires hemodialysis (HD) for survival. Intradialytic blood pressure (IBP) measurements are necessary to ensure patient safety during HD treatments and have critical clinical and prognostic significance. Studies on IBP measurements,...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.05.013
更新日期:2018-07-01 00:00:00
abstract::Epidemiological time series forecasting plays an important role in health public systems, due to its ability to allow managers to develop strategic planning to avoid possible epidemics. In this paper, a hybrid learning framework is developed to forecast multi-step-ahead (one, two, and three-month-ahead) meningitis cas...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103575
更新日期:2020-11-01 00:00:00
abstract::We describe a novel experiment that we conducted with the Drug Interaction Knowledge-base (DIKB) to determine which combinations of evidence enable a rule-based theory of metabolic drug-drug interactions to make the most optimal set of predictions. The focus of the experiment was a group of 16 drugs including six memb...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.05.010
更新日期:2009-12-01 00:00:00
abstract::The DDIExtraction Shared Task 2013 is the second edition of the DDIExtraction Shared Task series, a community-wide effort to promote the implementation and comparative assessment of natural language processing (NLP) techniques in the field of the pharmacovigilance domain, in particular, to address the extraction of dr...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.05.007
更新日期:2014-10-01 00:00:00
abstract::A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metada...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.06.017
更新日期:2017-08-01 00:00:00
abstract:OBJECTIVE:Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes. MATERIALS AND METHODS:Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into lo...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.09.005
更新日期:2018-10-01 00:00:00
abstract:INTRODUCTION:Machine learning (ML) and natural language processing have great potential to improve information extraction (IE) within electronic medical records (EMRs) for a wide variety of clinical search and summarization tools. Despite ML advancements, clinical adoption of real time IE tools for patient care remains...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103354
更新日期:2020-02-01 00:00:00
abstract::One of the challenging problems in drug discovery is to identify the novel targets for drugs. Most of the traditional methods for drug targets optimization focused on identifying the particular families of "druggable targets", but ignored their topological properties based on the biological pathways. In this study, we...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.02.007
更新日期:2015-04-01 00:00:00
abstract::Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline m...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103525
更新日期:2020-09-01 00:00:00
abstract:OBJECTIVES:Major adverse cardiac events (MACE) of acute coronary syndrome (ACS) often occur suddenly resulting in high mortality and morbidity. Recently, the rapid development of electronic medical records (EMR) provides the opportunity to utilize the potential of EMR to improve the performance of MACE prediction. In t...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.01.001
更新日期:2017-02-01 00:00:00
abstract::Named entity recognition is an extremely important and fundamental task of biomedical text mining. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have complex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.12.012
更新日期:2009-10-01 00:00:00
abstract::Community pharmacists are critically placed in the patient care chain being an extended frontline within primary healthcare networks across Europe. They are trained to ensure safe and effective medication use, a crucial and responsible role, extending beyond the common misconception limited to just providing timely ac...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103336
更新日期:2019-12-01 00:00:00
abstract::Patient discharge summaries provide detailed medical information about individuals who have been hospitalized. To make a precise and legitimate assessment of the abundant data, a proper time layout of the sequence of relevant events should be compiled and used to drive a patient-specific timeline, which could further ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.09.007
更新日期:2013-12-01 00:00:00
abstract:OBJECTIVE:To develop a method for profiling the collective populations targeted for recruitment by multiple clinical studies addressing the same medical condition using one eligibility feature each time. METHODS:Using a previously published database COMPACT as the backend, we designed a scalable method for visual aggr...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.01.005
更新日期:2015-04-01 00:00:00
abstract:OBJECTIVE:Our aim is to extract clinically-meaningful phenotypes from longitudinal electronic health records (EHRs) of medically-complex children. This is a fragile set of patients consuming a disproportionate amount of pediatric care resources but who often end up with sub-optimal clinical outcome. The rise in availab...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103125
更新日期:2019-05-01 00:00:00
abstract::We have developed EpDis and MassPred, extendable open source software tools that support bioinformatic research and enable parallel use of different methods for the prediction of T cell epitopes, disorder and disordered binding regions and hydropathy calculation. These tools offer a semi-automated installation of chos...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.01.016
更新日期:2016-04-01 00:00:00
abstract::Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public da...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.02.004
更新日期:2016-04-01 00:00:00
abstract:BACKGROUND:The difficulty of managing patient recruitment and documentation for clinical trials prompts a demand for instruments for closely monitoring these critical but unpredictable processes. Increasingly adopted Electronic Data Capture (EDC) applications provide novel opportunities to reutilize stored information ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.10.010
更新日期:2016-12-01 00:00:00
abstract::Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with great...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,meta分析
doi:10.1016/j.jbi.2008.01.011
更新日期:2008-12-01 00:00:00
abstract::Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.12.002
更新日期:2014-08-01 00:00:00
abstract::Real-time Obstructive Sleep Apnea (OSA) episode detection and monitoring are important for society in terms of an improvement in the health of the general population and of a reduction in mortality and healthcare costs. Currently, to diagnose OSA patients undergo PolySomnoGraphy (PSG), a complicated and invasive test ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.02.015
更新日期:2014-06-01 00:00:00
abstract::Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biome...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.10.001
更新日期:2014-02-01 00:00:00
abstract::Probabilistic record linkage is a method commonly used to determine whether demographic records refer to the same person. The Fellegi-Sunter method is a probabilistic approach that uses field weights based on log likelihood ratios to determine record similarity. This paper introduces an extension of the Fellegi-Sunter...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.08.004
更新日期:2010-02-01 00:00:00
abstract::We introduce a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances between points with respect to any two reference patterns in a special two-dimensional coordinate system, the relative distance pl...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2004.07.005
更新日期:2004-10-01 00:00:00
abstract::Chronic Obstructive Pulmonary Disease (COPD) is the fourth leading cause of death worldwide and represents one of the major causes of chronic morbidity. Cigarette smoking is the most important risk factor for COPD. In these patients, the airflow limitation is caused by a mixture of small airways disease and parenchyma...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.05.008
更新日期:2009-12-01 00:00:00