HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

Abstract:

:The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

journal_name

J Biomed Inform

authors

O'Driscoll A,Belogrudov V,Carroll J,Kropp K,Walsh P,Ghazal P,Sleator RD

doi

10.1016/j.jbi.2015.01.008

subject

Has Abstract

pub_date

2015-04-01 00:00:00

pages

58-64

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(15)00010-6

journal_volume

54

pub_type

杂志文章
  • Consensus and Meta-analysis regulatory networks for combining multiple microarray gene expression datasets.

    abstract::Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with great...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,meta分析

    doi:10.1016/j.jbi.2008.01.011

    authors: Steele E,Tucker A

    更新日期:2008-12-01 00:00:00

  • Cognitive simulators for medical education and training.

    abstract::Simulators for honing procedural skills (such as surgical skills and central venous catheter placement) have proven to be valuable tools for medical educators and students. While such simulations represent an effective paradigm in surgical education, there is an opportunity to add a layer of cognitive exercises to the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.02.008

    authors: Kahol K,Vankipuram M,Smith ML

    更新日期:2009-08-01 00:00:00

  • Understanding infusion administration in the ICU through Distributed Cognition.

    abstract::To understand how healthcare technologies are used in practice and evaluate them, researchers have argued for adopting the theoretical framework of Distributed Cognition (DC). This paper describes the methods and results of a study in which a DC methodology, Distributed Cognition for Teamwork (DiCoT), was applied to s...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.02.003

    authors: Rajkomar A,Blandford A

    更新日期:2012-06-01 00:00:00

  • The development and validation of a simulation tool for health policy decision making.

    abstract::Computer simulations have been used to model infectious diseases to examine the outcomes of alternative strategies for managing their spread. Methicillin resistant Staphylococcus aureus (MRSA) skin and soft tissue infections have become prominent in many communities and efforts are underway to reduce the spread of thi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.03.013

    authors: Panchanathan SS,Petitti DB,Fridsma DB

    更新日期:2010-08-01 00:00:00

  • A comparison of machine learning methods for the diagnosis of pigmented skin lesions.

    abstract::We analyze the discriminatory power of k-nearest neighbors, logistic regression, artificial neural networks (ANNs), decision tress, and support vector machines (SVMs) on the task of classifying pigmented skin lesions as common nevi, dysplastic nevi, or melanoma. Three different classification tasks were used as benchm...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1006/jbin.2001.1004

    authors: Dreiseitl S,Ohno-Machado L,Kittler H,Vinterbo S,Billhardt H,Binder M

    更新日期:2001-02-01 00:00:00

  • A comparison of two methods for retrieving ICD-9-CM data: the effect of using an ontology-based method for handling terminology changes.

    abstract:OBJECTIVE:Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information sys...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.01.005

    authors: Yu AC,Cimino JJ

    更新日期:2011-04-01 00:00:00

  • Balancing volume and duration of information consumption by physicians: The case of health information exchange in critical care.

    abstract:BACKGROUND:The realization of the potential benefits of health information exchange systems (HIEs) for emergency departments (EDs) depends on the way these systems are actually used. The attributes of volume of information and duration of information processing are important for the study of HIE use patterns in the ED,...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.05.007

    authors: Politi L,Codish S,Sagy I,Fink L

    更新日期:2017-07-01 00:00:00

  • Integrating cancer diagnosis terminologies based on logical definitions of SNOMED CT concepts.

    abstract::In oncology, the reuse of data is confronted with the heterogeneity of terminologies. It is necessary to semantically integrate these distinct terminologies. The semantic integration by using a third terminology as a support is a conventional approach for the integration of two terminologies that are not very structur...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.08.013

    authors: Nikiema JN,Jouhet V,Mougin F

    更新日期:2017-10-01 00:00:00

  • Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients.

    abstract:BACKGROUND:Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.02.016

    authors: Chen J,Yu H

    更新日期:2017-04-01 00:00:00

  • Toward analyzing and synthesizing previous research in early prediction of cardiac arrest using machine learning based on a multi-layered integrative framework.

    abstract:BACKGROUND:One of the significant problems in the field of healthcare is the low survival rate of people who have experienced sudden cardiac arrest. Early prediction of cardiac arrest can provide the time required for intervening and preventing its onset in order to reduce mortality. Traditional statistical methods hav...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.10.008

    authors: Layeghian Javan S,Sepehri MM,Aghajani H

    更新日期:2018-12-01 00:00:00

  • Characterizing and optimizing human anticancer drug targets based on topological properties in the context of biological pathways.

    abstract::One of the challenging problems in drug discovery is to identify the novel targets for drugs. Most of the traditional methods for drug targets optimization focused on identifying the particular families of "druggable targets", but ignored their topological properties based on the biological pathways. In this study, we...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.02.007

    authors: Zhang J,Wang Y,Shang D,Yu F,Liu W,Zhang Y,Feng C,Wang Q,Xu Y,Liu Y,Bai X,Li X,Li C

    更新日期:2015-04-01 00:00:00

  • Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

    abstract::Neural embeddings are a popular set of methods for representing words, phrases or text as a low dimensional vector (typically 50-500 dimensions). However, it is difficult to interpret these dimensions in a meaningful manner, and creating neural embeddings requires extensive training and tuning of multiple parameters a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103096

    authors: Smalheiser NR,Cohen AM,Bonifield G

    更新日期:2019-02-01 00:00:00

  • Neural network-based approaches for biomedical relation classification: A review.

    abstract::The explosive growth of biomedical literature has created a rich source of knowledge, such as that on protein-protein interactions (PPIs) and drug-drug interactions (DDIs), locked in unstructured free text. Biomedical relation classification aims to automatically detect and classify biomedical relations, which has gre...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103294

    authors: Zhang Y,Lin H,Yang Z,Wang J,Sun Y,Xu B,Zhao Z

    更新日期:2019-11-01 00:00:00

  • Role of OpenEHR as an open source solution for the regional modelling of patient data in obstetrics.

    abstract::This work investigates, whether openEHR with its reference model, archetypes and templates is suitable for the digital representation of demographic as well as clinical data. Moreover, it elaborates openEHR as a tool for modelling Hospital Information Systems on a regional level based on a national logical infrastruct...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.04.004

    authors: Pahl C,Zare M,Nilashi M,de Faria Borges MA,Weingaertner D,Detschew V,Supriyanto E,Ibrahim O

    更新日期:2015-06-01 00:00:00

  • A survey on single and multi omics data mining methods in cancer data classification.

    abstract::Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2020.103466

    authors: Momeni Z,Hassanzadeh E,Saniee Abadeh M,Bellazzi R

    更新日期:2020-07-01 00:00:00

  • A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain.

    abstract:BACKGROUND:The association of genotyping information with common traits is not satisfactorily solved. One of the most complex traits is pain and association studies have failed so far to provide reproducible predictions of pain phenotypes from genotypes in the general population despite a well-established genetic basis...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.07.010

    authors: Lötsch J,Ultsch A

    更新日期:2013-10-01 00:00:00

  • A controlled greedy supervised approach for co-reference resolution on clinical text.

    abstract::Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly beca...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.03.007

    authors: Chowdhury MF,Zweigenbaum P

    更新日期:2013-06-01 00:00:00

  • Benchmarking relief-based feature selection methods for bioinformatics data mining.

    abstract::Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. 'omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. g...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.015

    authors: Urbanowicz RJ,Olson RS,Schmitt P,Meeker M,Moore JH

    更新日期:2018-09-01 00:00:00

  • Modeling association detection in order to discover compounds to inhibit oral cancer.

    abstract::In the past, algorithms exploiting varying semantics in interactions between biological objects such as genes and diseases have been used in bioinformatics to uncover latent relationships within biological datasets. In this paper, we consider the algorithm Medusa in parallel with binary classification in order to find...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.005

    authors: Vittal S,Karthikeyan G

    更新日期:2018-08-01 00:00:00

  • Development of a clinician reputation metric to identify appropriate problem-medication pairs in a crowdsourced knowledge base.

    abstract:BACKGROUND:Correlation of data within electronic health records is necessary for implementation of various clinical decision support functions, including patient summarization. A key type of correlation is linking medications to clinical problems; while some databases of problem-medication links are available, they are...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.11.010

    authors: McCoy AB,Wright A,Rogith D,Fathiamini S,Ottenbacher AJ,Sittig DF

    更新日期:2014-04-01 00:00:00

  • Information extraction from biomedical text.

    abstract::Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. It requires deeper analysis than key word searches, but its aims fall short of the very hard and long-term problem of full text understanding. Information extraction r...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/s1532-0464(03)00015-7

    authors: Hobbs JR

    更新日期:2002-08-01 00:00:00

  • PharmActa: Personalized pharmaceutical care eHealth platform for patients and pharmacists.

    abstract::Community pharmacists are critically placed in the patient care chain being an extended frontline within primary healthcare networks across Europe. They are trained to ensure safe and effective medication use, a crucial and responsible role, extending beyond the common misconception limited to just providing timely ac...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103336

    authors: Spanakis M,Sfakianakis S,Kallergis G,Spanakis EG,Sakkalis V

    更新日期:2019-12-01 00:00:00

  • Colorado Care Tablet: the design of an interoperable Personal Health Application to help older adults with multimorbidity manage their medications.

    abstract::Medication errors are common and cause serious health issues during care transitions, particularly for older adults with multiple chronic conditions. In this paper, we discuss the design and evaluation of the Colorado Care Tablet, a Personal Health Application (PHA) that helps older adults and their lay caregivers man...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.05.007

    authors: Siek KA,Ross SE,Khan DU,Haverhals LM,Cali SR,Meyers J

    更新日期:2010-10-01 00:00:00

  • Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

    abstract::The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.09.003

    authors: Cohen T,Schvaneveldt R,Widdows D

    更新日期:2010-04-01 00:00:00

  • The Analytic Information Warehouse (AIW): a platform for analytics using electronic health record data.

    abstract:OBJECTIVE:To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations. MATERIALS AND METHODS:We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transfor...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.01.005

    authors: Post AR,Kurc T,Cholleti S,Gao J,Lin X,Bornstein W,Cantrell D,Levine D,Hohmann S,Saltz JH

    更新日期:2013-06-01 00:00:00

  • Research-IQ: development and evaluation of an ontology-anchored integrative query tool.

    abstract::Investigators in the translational research and systems medicine domains require highly usable, efficient and integrative tools and methods that allow for the navigation of and reasoning over emerging large-scale data sets. Such resources must cover a spectrum of granularity from bio-molecules to population phenotypes...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.07.006

    authors: Borlawsky TB,Lele O,Payne PR

    更新日期:2011-12-01 00:00:00

  • Algorithms for rapid outbreak detection: a research synthesis.

    abstract::The threat of bioterrorism has stimulated interest in enhancing public health surveillance to detect disease outbreaks more rapidly than is currently possible. To advance research on improving the timeliness of outbreak detection, the Defense Advanced Research Project Agency sponsored the Bio-event Advanced Leading In...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2004.11.007

    authors: Buckeridge DL,Burkom H,Campbell M,Hogan WR,Moore AW

    更新日期:2005-04-01 00:00:00

  • Selecting significant genes by randomization test for cancer classification using gene expression data.

    abstract::Gene selection is an important task in bioinformatics studies, because the accuracy of cancer classification generally depends upon the genes that have biological relevance to the classifying problems. In this work, randomization test (RT) is used as a gene selection method for dealing with gene expression data. In th...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.03.009

    authors: Mao Z,Cai W,Shao X

    更新日期:2013-08-01 00:00:00

  • Lexical patterns, features and knowledge resources for coreference resolution in clinical notes.

    abstract::Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general-purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approa...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.02.012

    authors: Gooch P,Roudsari A

    更新日期:2012-10-01 00:00:00

  • Computer mediated reality technologies: A conceptual framework and survey of the state of the art in healthcare intervention systems.

    abstract:INTRODUCTION:The trend of an ageing and growing world population, particularly in developed countries, is expected to continue for decades to come causing an increase in demand for healthcare resources and services. Consequently, demand is growing faster than rises in funding. The UK government, in partnership with the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103102

    authors: Ibrahim Z,Money AG

    更新日期:2019-02-01 00:00:00