Abstract:
BACKGROUND:Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decisive losses in power, due to relating errors with individual genomic positions rather than motifs, or do not properly distinguish between motif-induced and sequence-unspecific sources of errors. RESULTS:Here, for the first time, we describe a statistically rigorous framework for the discovery of motifs that induce sequencing errors. We apply our method to several datasets from Illumina GA IIx, HiSeq 2000, and MiSeq sequencers. We confirm previously known error-causing sequence contexts and report new more specific ones. CONCLUSIONS:Checking for error-inducing motifs should be included into SNP calling pipelines to avoid false positives. To facilitate filtering of sets of putative SNPs, we provide tracks of error-prone genomic positions (in BED format). AVAILABILITY:http://discovering-cse.googlecode.com.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Allhoff M,Schönhuth A,Martin M,Costa IG,Rahmann S,Marschall Tdoi
10.1186/1471-2105-14-S5-S1subject
Has Abstractpub_date
2013-01-01 00:00:00pages
S1issn
1471-2105pii
1471-2105-14-S5-S1journal_volume
14 Suppl 5pub_type
杂志文章abstract:BACKGROUND:High-throughput sequencing can identify numerous potential genomic targets for microbial strain typing, but identification of the most informative combinations requires the use of computational screening tools. This paper describes novel software-- Automated Selection of Typing Target Subsets (AuSeTTS)--that...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-148
更新日期:2013-05-01 00:00:00
abstract:BACKGROUND:We present the algorithm PFClust (Parameter Free Clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to be specified by the user. The algorithm partitions a dataset into a number of clusters that share some co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-213
更新日期:2013-07-03 00:00:00
abstract:BACKGROUND:Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures. RESULTS:MPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (ii) selection of genomic ma...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0394-y
更新日期:2014-12-14 00:00:00
abstract:BACKGROUND:In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources. RESULTS:We proposed a probabilistic classification algorithm based on labe...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-S12-S5
更新日期:2013-01-01 00:00:00
abstract:BACKGROUND:As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target ide...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-99
更新日期:2006-02-28 00:00:00
abstract:BACKGROUND:Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-121
更新日期:2006-03-09 00:00:00
abstract:BACKGROUND:Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-164
更新日期:2005-06-29 00:00:00
abstract:BACKGROUND:RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of dif...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0994-9
更新日期:2016-03-31 00:00:00
abstract:BACKGROUND:A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially dev...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2217-z
更新日期:2018-06-19 00:00:00
abstract::Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time se...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-S1-S16
更新日期:2007-03-08 00:00:00
abstract:BACKGROUND:SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations foun...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-309
更新日期:2009-09-23 00:00:00
abstract:BACKGROUND:Mechanistic models that describe the dynamical behaviors of biochemical systems are common in computational systems biology, especially in the realm of cellular signaling. The development of families of such models, either by a single research group or by different groups working within the same area, presen...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-316
更新日期:2014-09-25 00:00:00
abstract:BACKGROUND:Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-133
更新日期:2014-05-07 00:00:00
abstract:BACKGROUND:Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0651-8
更新日期:2015-07-16 00:00:00
abstract:BACKGROUND:Urokinase, its receptor and the integrins are functionally associated and involved in regulation of cell signaling, migration, adhesion and proliferation. No structural information is available on this potential multimolecular complex. However, the tri-dimensional structure of urokinase, urokinase receptor a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S2-S8
更新日期:2008-03-26 00:00:00
abstract:BACKGROUND:To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3407-z
更新日期:2020-02-17 00:00:00
abstract:BACKGROUND:The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-269
更新日期:2006-05-25 00:00:00
abstract:BACKGROUND:Human breast cancer resistance protein (BCRP) is an ATP-binding cassette (ABC) efflux transporter that confers multidrug resistance in cancers and also plays an important role in the absorption, distribution and elimination of drugs. Prediction as to if drugs or new molecular entities are BCRP substrates sho...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-130
更新日期:2013-04-15 00:00:00
abstract:BACKGROUND:Recent advances in genomics and proteomics have allowed us to study the nuances of the Warburg effect--a long-standing puzzle in cancer energy metabolism--at an unprecedented level of detail. While modern next-generation sequencing technologies are extremely powerful, the lack of appropriate data analysis to...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S10-S8
更新日期:2011-10-18 00:00:00
abstract:BACKGROUND:The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-379
更新日期:2009-11-18 00:00:00
abstract:BACKGROUND:We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how thi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-430
更新日期:2010-08-18 00:00:00
abstract:BACKGROUND:With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-3
更新日期:2011-01-04 00:00:00
abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-202
更新日期:2011-05-25 00:00:00
abstract:BACKGROUND:Recently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has beco...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1705-x
更新日期:2017-05-31 00:00:00
abstract:BACKGROUND:Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2284-1
更新日期:2018-07-18 00:00:00
abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S10-S2
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3293-4
更新日期:2019-12-27 00:00:00
abstract:BACKGROUND:To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regula...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-67
更新日期:2011-03-04 00:00:00
abstract:BACKGROUND:Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-238
更新日期:2010-05-11 00:00:00
abstract:BACKGROUND:The frequent exchange of genetic material among prokaryotes means that extracting a majority or plurality phylogenetic signal from many gene families, and the identification of gene families that are in significant conflict with the plurality signal is a frequent task in comparative genomics, and especially ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-123
更新日期:2012-06-07 00:00:00