Biotite: a unifying open source computational biology framework in Python.

Abstract:

BACKGROUND:As molecular biology is creating an increasing amount of sequence and structure data, the multitude of software to analyze this data is also rising. Most of the programs are made for a specific task, hence the user often needs to combine multiple programs in order to reach a goal. This can make the data processing unhandy, inflexible and even inefficient due to an overhead of read/write operations. Therefore, it is crucial to have a comprehensive, accessible and efficient computational biology framework in a scripting language to overcome these limitations. RESULTS:We have developed the Python package Biotite: a general computational biology framework, that represents sequence and structure data based on NumPyndarrays. Furthermore the package contains seamless interfaces to biological databases and external software. The source code is freely accessible at https://github.com/biotite-dev/biotite . CONCLUSIONS:Biotite is unifying in two ways: At first it bundles popular tasks in sequence analysis and structural bioinformatics in a consistently structured package. Secondly it adresses two groups of users: novice programmers get an easy access to Biotite due to its simplicity and the comprehensive documentation. On the other hand, advanced users can profit from its high performance and extensibility. They can implement their algorithms upon Biotite, so they can skip writing code for general functionality (like file parsers) and can focus on what their software makes unique.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Kunzmann P,Hamacher K

doi

10.1186/s12859-018-2367-z

subject

Has Abstract

pub_date

2018-10-01 00:00:00

pages

346

issue

1

issn

1471-2105

pii

10.1186/s12859-018-2367-z

journal_volume

19

pub_type

杂志文章
  • Application of text-mining for updating protein post-translational modification annotation in UniProtKB.

    abstract:BACKGROUND:The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-104

    authors: Veuthey AL,Bridge A,Gobeill J,Ruch P,McEntyre JR,Bougueleret L,Xenarios I

    更新日期:2013-03-22 00:00:00

  • Simultaneous fitting of real-time PCR data with efficiency of amplification modeled as Gaussian function of target fluorescence.

    abstract:BACKGROUND:In real-time PCR, it is necessary to consider the efficiency of amplification (EA) of amplicons in order to determine initial target levels properly. EAs can be deduced from standard curves, but these involve extra effort and cost and may yield invalid EAs. Alternatively, EA can be extracted from individual ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-95

    authors: Batsch A,Noetel A,Fork C,Urban A,Lazic D,Lucas T,Pietsch J,Lazar A,Schömig E,Gründemann D

    更新日期:2008-02-12 00:00:00

  • Optimizing agent-based transmission models for infectious diseases.

    abstract:BACKGROUND:Infectious disease modeling and computational power have evolved such that large-scale agent-based models (ABMs) have become feasible. However, the increasing hardware complexity requires adapted software designs to achieve the full potential of current high-performance workstations. RESULTS:We have found l...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0612-2

    authors: Willem L,Stijven S,Tijskens E,Beutels P,Hens N,Broeckhove J

    更新日期:2015-06-02 00:00:00

  • COPASAAR--a database for proteomic analysis of single amino acid repeats.

    abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-196

    authors: Depledge DP,Dalby AR

    更新日期:2005-08-03 00:00:00

  • On the detection of functionally coherent groups of protein domains with an extension to protein annotation.

    abstract:BACKGROUND:Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-390

    authors: McLaughlin WA,Chen K,Hou T,Wang W

    更新日期:2007-10-16 00:00:00

  • Computational algorithms to predict Gene Ontology annotations.

    abstract:BACKGROUND:Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biologi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S6-S4

    authors: Pinoli P,Chicco D,Masseroli M

    更新日期:2015-01-01 00:00:00

  • In silico design of targeted SRM-based experiments.

    abstract::Selected reaction monitoring (SRM)-based proteomics approaches enable highly sensitive and reproducible assays for profiling of thousands of peptides in one experiment. The development of such assays involves the determination of retention time, detectability and fragmentation properties of peptides, followed by an op...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S16-S8

    authors: Nahnsen S,Kohlbacher O

    更新日期:2012-01-01 00:00:00

  • GlyStruct: glycation prediction using structural properties of amino acid residues.

    abstract:BACKGROUND:Glycation is a one of the post-translational modifications (PTM) where sugar molecules and residues in protein sequences are covalently bonded. It has become one of the clinically important PTM in recent times attributed to many chronic and age related complications. Being a non-enzymatic reaction, it is a g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2547-x

    authors: Reddy HM,Sharma A,Dehzangi A,Shigemizu D,Chandra AA,Tsunoda T

    更新日期:2019-02-04 00:00:00

  • BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.

    abstract:BACKGROUND:Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical fi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-325

    authors: Tsai RT,Chou WC,Su YS,Lin YC,Sung CL,Dai HJ,Yeh IT,Ku W,Sung TY,Hsu WL

    更新日期:2007-09-01 00:00:00

  • SplicerAV: a tool for mining microarray expression data for changes in RNA processing.

    abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-108

    authors: Robinson TJ,Dinan MA,Dewhirst M,Garcia-Blanco MA,Pearson JL

    更新日期:2010-02-25 00:00:00

  • Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data.

    abstract::Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S6-S11

    authors: Kakaradov B,Xiong HY,Lee LJ,Jojic N,Frey BJ

    更新日期:2012-04-19 00:00:00

  • Analysis of density based and fuzzy c-means clustering methods on lesion border extraction in dermoscopy images.

    abstract:BACKGROUND:Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer var...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S6-S26

    authors: Kockara S,Mete M,Chen B,Aydin K

    更新日期:2010-10-07 00:00:00

  • Towards integrative gene functional similarity measurement.

    abstract:BACKGROUND:In Gene Ontology, the "Molecular Function" (MF) categorization is a widely used knowledge framework for gene function comparison and prediction. Its structure and annotation provide a convenient way to compare gene functional similarities at the molecular level. The existing gene similarity measures, however...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S2-S5

    authors: Peng J,Wang Y,Chen J

    更新日期:2014-01-01 00:00:00

  • FastqPuri: high-performance preprocessing of RNA-seq data.

    abstract:BACKGROUND:RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcript expression in high-throughput. While previously sequence alignment was a time demanding step, fast alignment methods and even more so transcript counting methods which avoid mapping and quantify gene and transcript expres...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2799-0

    authors: Pérez-Rubio P,Lottaz C,Engelmann JC

    更新日期:2019-05-03 00:00:00

  • Enhanced JBrowse plugins for epigenomics data visualization.

    abstract:BACKGROUND:New sequencing techniques require new visualization strategies, as is the case for epigenomics data such as DNA base modifications, small non-coding RNAs, and histone modifications. RESULTS:We present a set of plugins for the genome browser JBrowse that are targeted for epigenomics visualizations. Specifica...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2160-z

    authors: Hofmeister BT,Schmitz RJ

    更新日期:2018-04-25 00:00:00

  • Detecting broad domains and narrow peaks in ChIP-seq data with hiddenDomains.

    abstract:BACKGROUND:Correctly identifying genomic regions enriched with histone modifications and transcription factors is key to understanding their regulatory and developmental roles. Conceptually, these regions are divided into two categories, narrow peaks and broad domains, and different algorithms are used to identify each...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-0991-z

    authors: Starmer J,Magnuson T

    更新日期:2016-03-24 00:00:00

  • Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers.

    abstract:BACKGROUND:The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-139

    authors: Bornelöv S,Marillet S,Komorowski J

    更新日期:2014-05-12 00:00:00

  • Comparing the performance of selected variant callers using synthetic data and genome segmentation.

    abstract:BACKGROUND:High-throughput sequencing has rapidly become an essential part of precision cancer medicine. But validating results obtained from analyzing and interpreting genomic data remains a rate-limiting factor. The gold standard, of course, remains manual validation by expert panels, which is not without its weaknes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2440-7

    authors: Bian X,Zhu B,Wang M,Hu Y,Chen Q,Nguyen C,Hicks B,Meerzaman D

    更新日期:2018-11-19 00:00:00

  • Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics.

    abstract:BACKGROUND:The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-303

    authors: Chepelev LL,Riazanov A,Kouznetsov A,Low HS,Dumontier M,Baker CJ

    更新日期:2011-07-26 00:00:00

  • A novel similarity-measure for the analysis of genetic data in complex phenotypes.

    abstract:BACKGROUND:Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S6-S24

    authors: Lagani V,Montesanto A,Di Cianni F,Moreno V,Landi S,Conforti D,Rose G,Passarino G

    更新日期:2009-06-16 00:00:00

  • EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

    abstract:BACKGROUND:Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of seque...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-186

    authors: Smith RP,Buchser WJ,Lemmon MB,Pardinas JR,Bixby JL,Lemmon VP

    更新日期:2008-04-10 00:00:00

  • Evaluation of gene-expression clustering via mutual information distance measure.

    abstract:BACKGROUND:The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI) measure versus the use of the well known Euclidean distance and Pears...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-111

    authors: Priness I,Maimon O,Ben-Gal I

    更新日期:2007-03-30 00:00:00

  • Compromise or optimize? The breakpoint anti-median.

    abstract:BACKGROUND:The median of k≥3 genomes was originally defined to find a compromise genome indicative of a common ancestor. However, in gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. "Near-medi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1340-y

    authors: Larlee CA,Brandts A,Sankoff D

    更新日期:2016-12-15 00:00:00

  • The tumor as an organ: comprehensive spatial and temporal modeling of the tumor and its microenvironment.

    abstract:BACKGROUND:Research related to cancer is vast, and continues in earnest in many directions. Due to the complexity of cancer, a better understanding of tumor growth dynamics can be gleaned from a dynamic computational model. We present a comprehensive, fully executable, spatial and temporal 3D computational model of the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1168-5

    authors: Bloch N,Harel D

    更新日期:2016-08-24 00:00:00

  • ChemEx: information extraction system for chemical data curation.

    abstract:BACKGROUND:Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S9

    authors: Tharatipyakul A,Numnark S,Wichadakul D,Ingsriswang S

    更新日期:2012-01-01 00:00:00

  • Analysis of cancer metabolism with high-throughput technologies.

    abstract:BACKGROUND:Recent advances in genomics and proteomics have allowed us to study the nuances of the Warburg effect--a long-standing puzzle in cancer energy metabolism--at an unprecedented level of detail. While modern next-generation sequencing technologies are extremely powerful, the lack of appropriate data analysis to...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S10-S8

    authors: Markovets AA,Herman D

    更新日期:2011-10-18 00:00:00

  • Learning smoothing models of copy number profiles using breakpoint annotations.

    abstract:BACKGROUND:Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using v...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-164

    authors: Hocking TD,Schleiermacher G,Janoueix-Lerosey I,Boeva V,Cappo J,Delattre O,Bach F,Vert JP

    更新日期:2013-05-22 00:00:00

  • Insertion and deletion correcting DNA barcodes based on watermarks.

    abstract:BACKGROUND:Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequenc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0482-7

    authors: Kracht D,Schober S

    更新日期:2015-02-18 00:00:00

  • Reordering based integrative expression profiling for microarray classification.

    abstract:BACKGROUND:Current network-based microarray analysis uses the information of interactions among concerned genes/gene products, but still considers each gene expression individually. We propose an organized knowledge-supervised approach - Integrative eXpression Profiling (IXP), to improve microarray classification accur...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S2-S1

    authors: Wu X,Huang H,Sonachalam M,Reinhard S,Shen J,Pandey R,Chen JY

    更新日期:2012-03-13 00:00:00

  • Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data.

    abstract:BACKGROUND:Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published da...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-203

    authors: Zhang Y,Xuan J,de los Reyes BG,Clarke R,Ressom HW

    更新日期:2008-04-21 00:00:00