Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.

Abstract:

BACKGROUND:Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses. RESULTS:In an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS). CONCLUSION:Odyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3-8 h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Eller RJ,Janga SC,Walsh S

doi

10.1186/s12859-019-2964-5

subject

Has Abstract

pub_date

2019-06-28 00:00:00

pages

364

issue

1

issn

1471-2105

pii

10.1186/s12859-019-2964-5

journal_volume

20

pub_type

杂志文章
  • Graph regularized L2,1-nonnegative matrix factorization for miRNA-disease association prediction.

    abstract:BACKGROUND:The aberrant expression of microRNAs is closely connected to the occurrence and development of a great deal of human diseases. To study human diseases, numerous effective computational models that are valuable and meaningful have been presented by researchers. RESULTS:Here, we present a computational framew...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3409-x

    authors: Gao Z,Wang YT,Wu QW,Ni JC,Zheng CH

    更新日期:2020-02-18 00:00:00

  • CNN-based ranking for biomedical entity normalization.

    abstract:BACKGROUND:Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1805-7

    authors: Li H,Chen Q,Tang B,Wang X,Xu H,Wang B,Huang D

    更新日期:2017-10-03 00:00:00

  • Large scale statistical inference of signaling pathways from RNAi and microarray data.

    abstract:BACKGROUND:The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene ex...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-386

    authors: Froehlich H,Fellmann M,Sueltmann H,Poustka A,Beissbarth T

    更新日期:2007-10-15 00:00:00

  • Multiple sequence alignment accuracy and evolutionary distance estimation.

    abstract:BACKGROUND:Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-278

    authors: Rosenberg MS

    更新日期:2005-11-23 00:00:00

  • Methodology capture: discriminating between the "best" and the rest of community practice.

    abstract:BACKGROUND:The methodologies we use both enable and help define our research. However, as experimental complexity has increased the choice of appropriate methodologies has become an increasingly difficult task. This makes it difficult to keep track of available bioinformatics software, let alone the most suitable proto...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-359

    authors: Eales JM,Pinney JW,Stevens RD,Robertson DL

    更新日期:2008-09-01 00:00:00

  • MQAPRank: improved global protein model quality assessment by learning-to-rank.

    abstract:BACKGROUND:Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure predict...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1691-z

    authors: Jing X,Dong Q

    更新日期:2017-05-25 00:00:00

  • Discrimination of cell cycle phases in PCNA-immunolabeled cells.

    abstract:BACKGROUND:Protein function in eukaryotic cells is often controlled in a cell cycle-dependent manner. Therefore, the correct assignment of cellular phenotypes to cell cycle phases is a crucial task in cell biology research. Nuclear proteins whose localization varies during the cell cycle are valuable and frequently use...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0618-9

    authors: Schönenberger F,Deutzmann A,Ferrando-May E,Merhof D

    更新日期:2015-05-29 00:00:00

  • Computational approaches to protein inference in shotgun proteomics.

    abstract::Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1186/1471-2105-13-S16-S4

    authors: Li YF,Radivojac P

    更新日期:2012-01-01 00:00:00

  • Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.

    abstract:BACKGROUND:When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1149-8

    authors: Mayr A,Hofner B,Schmid M

    更新日期:2016-07-22 00:00:00

  • Taking U out, with two nucleases?

    abstract:BACKGROUND:REX1 and REX2 are protein components of the RNA editing complex (the editosome) and function as exouridylylases. The exact roles of REX1 and REX2 in the editosome are unclear and the consequences of the presence of two related proteins are not fully understood. Here, a variety of computational studies were p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-305

    authors: Mian IS,Worthey EA,Salavati R

    更新日期:2006-06-16 00:00:00

  • Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots.

    abstract:BACKGROUND:Analyses of molecular high-throughput data often lack in robustness, i.e. results are very sensitive to the addition or removal of a single observation. Therefore, the identification of extreme observations is an important step of quality control before doing further data analysis. Standard outlier detection...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1645-5

    authors: Kruppa J,Jung K

    更新日期:2017-05-02 00:00:00

  • The COG database: an updated version includes eukaryotes.

    abstract:BACKGROUND:The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appea...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-41

    authors: Tatusov RL,Fedorova ND,Jackson JD,Jacobs AR,Kiryutin B,Koonin EV,Krylov DM,Mazumder R,Mekhedov SL,Nikolskaya AN,Rao BS,Smirnov S,Sverdlov AV,Vasudevan S,Wolf YI,Yin JJ,Natale DA

    更新日期:2003-09-11 00:00:00

  • Protein complexes identification based on go attributed network embedding.

    abstract:BACKGROUND:Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate diffe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2555-x

    authors: Xu B,Li K,Zheng W,Liu X,Zhang Y,Zhao Z,He Z

    更新日期:2018-12-20 00:00:00

  • Maximum expected accuracy structural neighbors of an RNA secondary structure.

    abstract:BACKGROUND:Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S5-S6

    authors: Clote P,Lou F,Lorenz WA

    更新日期:2012-04-12 00:00:00

  • Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

    abstract:BACKGROUND:Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing inf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0418-7

    authors: Bansal V,Libiger O

    更新日期:2015-01-16 00:00:00

  • Predicting MoRFs in protein sequences using HMM profiles.

    abstract:BACKGROUND:Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1375-0

    authors: Sharma R,Kumar S,Tsunoda T,Patil A,Sharma A

    更新日期:2016-12-22 00:00:00

  • Components of the antigen processing and presentation pathway revealed by gene expression microarray analysis following B cell antigen receptor (BCR) stimulation.

    abstract:BACKGROUND:Activation of naïve B lymphocytes by extracellular ligands, e.g. antigen, lipopolysaccharide (LPS) and CD40 ligand, induces a combination of common and ligand-specific phenotypic changes through complex signal transduction pathways. For example, although all three of these ligands induce proliferation, only ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-237

    authors: Lee JA,Sinkovits RS,Mock D,Rab EL,Cai J,Yang P,Saunders B,Hsueh RC,Choi S,Subramaniam S,Scheuermann RH,Alliance for Cellular Signaling.

    更新日期:2006-05-02 00:00:00

  • Determining gene expression on a single pair of microarrays.

    abstract:BACKGROUND:In microarray experiments the numbers of replicates are often limited due to factors such as cost, availability of sample or poor hybridization. There are currently few choices for the analysis of a pair of microarrays where N = 1 in each condition. In this paper, we demonstrate the effectiveness of a new al...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-489

    authors: Reid RW,Fodor AA

    更新日期:2008-11-21 00:00:00

  • Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

    abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-132

    authors: Peters B,Sette A

    更新日期:2005-05-31 00:00:00

  • An assessment of catalytic residue 3D ensembles for the prediction of enzyme function.

    abstract:BACKGROUND:The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D pos...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0807-6

    authors: Žváček C,Friedrichs G,Heizinger L,Merkl R

    更新日期:2015-11-04 00:00:00

  • Algorithm-driven artifacts in median polish summarization of microarray data.

    abstract:BACKGROUND:High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Althou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-553

    authors: Giorgi FM,Bolger AM,Lohse M,Usadel B

    更新日期:2010-11-11 00:00:00

  • Detecting variants with Metabolic Design, a new software tool to design probes for explorative functional DNA microarray development.

    abstract:BACKGROUND:Microorganisms display vast diversity, and each one has its own set of genes, cell components and metabolic reactions. To assess their huge unexploited metabolic potential in different ecosystems, we need high throughput tools, such as functional microarrays, that allow the simultaneous analysis of thousands...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-478

    authors: Terrat S,Peyretaillade E,Gonçalves O,Dugat-Bony E,Gravelat F,Moné A,Biderre-Petit C,Boucher D,Troquet J,Peyret P

    更新日期:2010-09-23 00:00:00

  • SegCorr a statistical procedure for the detection of genomic regions of correlated expression.

    abstract:BACKGROUND:Detecting local correlations in expression between neighboring genes along the genome has proved to be an effective strategy to identify possible causes of transcriptional deregulation in cancer. It has been successfully used to illustrate the role of mechanisms such as copy number variation (CNV) or epigene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1742-5

    authors: Delatola EI,Lebarbier E,Mary-Huard T,Radvanyi F,Robin S,Wong J

    更新日期:2017-07-11 00:00:00

  • In silico modelling of hormone response elements.

    abstract:BACKGROUND:An important step in understanding the conditions that specify gene expression is the recognition of gene regulatory elements. Due to high diversity of different types of transcription factors and their DNA binding preferences, it is a challenging problem to establish an accurate model for recognition of fun...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S4-S27

    authors: Stepanova M,Lin F,Lin VC

    更新日期:2006-12-12 00:00:00

  • Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach.

    abstract:BACKGROUND:Cellular functions are coordinately carried out by groups of genes forming functional modules. Identifying such modules in the transcriptional regulatory network (TRN) of organisms is important for understanding the structure and function of these fundamental cellular networks and essential for the emerging ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-199

    authors: Ma HW,Buer J,Zeng AP

    更新日期:2004-12-16 00:00:00

  • Pathogenic Bacillus anthracis in the progressive gene losses and gains in adaptive evolution.

    abstract:BACKGROUND:Sequence mutations represent a driving force of adaptive evolution in bacterial pathogens. It is especially evident in reductive genome evolution where bacteria underwent lifestyles shifting from a free-living to a strictly intracellular or host-depending life. It resulted in loss-of-function mutations and/o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S3

    authors: Yu GX

    更新日期:2009-01-30 00:00:00

  • The GMOseek matrix: a decision support tool for optimizing the detection of genetically modified plants.

    abstract:BACKGROUND:Since their first commercialization, the diversity of taxa and the genetic composition of transgene sequences in genetically modified plants (GMOs) are constantly increasing. To date, the detection of GMOs and derived products is commonly performed by PCR-based methods targeting specific DNA sequences introd...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-256

    authors: Block A,Debode F,Grohmann L,Hulin J,Taverniers I,Kluga L,Barbau-Piednoir E,Broeders S,Huber I,Van den Bulcke M,Heinze P,Berben G,Busch U,Roosens N,Janssen E,Žel J,Gruden K,Morisset D

    更新日期:2013-08-22 00:00:00

  • Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins.

    abstract:BACKGROUND:Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-470

    authors: Kelly WP,Stumpf MP

    更新日期:2010-09-20 00:00:00

  • A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR.

    abstract:BACKGROUND:Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S9-S5

    authors: Oh S,Lee J,Kwon MS,Weir B,Ha K,Park T

    更新日期:2012-06-11 00:00:00

  • DNLC: differential network local consistency analysis.

    abstract:BACKGROUND:The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3046-4

    authors: Lu J,Lu Y,Ding Y,Xiao Q,Liu L,Cai Q,Kong Y,Bai Y,Yu T

    更新日期:2019-12-24 00:00:00