Abstract:
BACKGROUND:Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. RESULTS:We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of sis, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that sis has overall better performance. CONCLUSIONS:sis is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of sis in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Dias Z,Dias U,Setubal JCdoi
10.1186/1471-2105-13-96subject
Has Abstractpub_date
2012-05-14 00:00:00pages
96issn
1471-2105pii
1471-2105-13-96journal_volume
13pub_type
杂志文章abstract:BACKGROUND:The total number of known three-dimensional protein structures is rapidly increasing. Consequently, the need for fast structural search against complete databases without a significant loss of accuracy is increasingly demanding. Recently, TopSearch, an ultra-fast method for finding rigid structural relations...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0866-8
更新日期:2016-01-05 00:00:00
abstract:BACKGROUND:High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino ac...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-118
更新日期:2010-03-05 00:00:00
abstract:BACKGROUND:Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often repre...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1740-7
更新日期:2017-07-11 00:00:00
abstract:BACKGROUND:RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-323
更新日期:2011-08-04 00:00:00
abstract:BACKGROUND:While researchers have utilized versions of the Affymetrix human GeneChip for the assessment of expression patterns in non human primate (NHP) samples, there has been no comprehensive sequence analysis study undertaken to demonstrate that the probe sequences designed to detect human transcripts are reliably ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-165
更新日期:2004-10-26 00:00:00
abstract:BACKGROUNDS:Next-Generation Sequencing (NGS) is now widely used in biomedical research for various applications. Processing of NGS data requires multiple programs and customization of the processing pipelines according to the data platforms. However, rapid progress of the NGS applications and processing methods urgentl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2676-x
更新日期:2019-02-20 00:00:00
abstract:BACKGROUND:In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multip...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1273-5
更新日期:2016-10-03 00:00:00
abstract:BACKGROUND:The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein f...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S5-S3
更新日期:2009-05-06 00:00:00
abstract:BACKGROUND:In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many exp...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-112
更新日期:2009-04-20 00:00:00
abstract:BACKGROUND:To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the e...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-20
更新日期:2003-05-29 00:00:00
abstract:BACKGROUND:Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-58
更新日期:2014-02-26 00:00:00
abstract:BACKGROUND:Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical c...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0893-0
更新日期:2016-01-20 00:00:00
abstract:BACKGROUND:Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS:In thi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-252
更新日期:2014-07-25 00:00:00
abstract:BACKGROUND:Endometrial cancers (ECs) are one of the most common types of malignant tumor in females. Substantial efforts had been made to identify significantly mutated genes (SMGs) in ECs and use them as biomarkers for the classification of histological subtypes and the prediction of clinical outcomes. However, the im...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1891-6
更新日期:2017-12-28 00:00:00
abstract:BACKGROUND:High-throughput sequencing has rapidly become an essential part of precision cancer medicine. But validating results obtained from analyzing and interpreting genomic data remains a rate-limiting factor. The gold standard, of course, remains manual validation by expert panels, which is not without its weaknes...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2440-7
更新日期:2018-11-19 00:00:00
abstract:BACKGROUND:Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequenc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0482-7
更新日期:2015-02-18 00:00:00
abstract:BACKGROUND:Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0810-y
更新日期:2015-11-04 00:00:00
abstract:BACKGROUND:Emerging and re-emerging infectious diseases such as Zika, SARS, ncovid19 and Pertussis, pose a compelling challenge for epidemiologists due to their significant impact on global public health. In this context, computational models and computer simulations are one of the available research tools that epidemi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03648-6
更新日期:2020-09-16 00:00:00
abstract:BACKGROUND:Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-45
更新日期:2013-02-08 00:00:00
abstract:BACKGROUND:Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Tes...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-431
更新日期:2007-11-07 00:00:00
abstract:BACKGROUND:The BioCreative text mining evaluation investigated the application of text mining methods to the task of automatically extracting information from text in biomedical research articles. We participated in Task 2 of the evaluation. For this task, we built a system to automatically annotate a given protein wit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-S1-S18
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S12-S20
更新日期:2008-12-12 00:00:00
abstract:BACKGROUND:Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. RESULTS:Based on a homogeneity assumptio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-160
更新日期:2008-03-25 00:00:00
abstract:BACKGROUND:Prostate carcinoma is among the most common types of cancer affecting hundreds of thousands people every year. Once the metastatic form of prostate carcinoma is documented, the majority of patients die from their tumors as opposed to other causes. The key to successful treatment is in the earliest possible d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S11-S6
更新日期:2009-10-08 00:00:00
abstract:BACKGROUND:New sequencing techniques require new visualization strategies, as is the case for epigenomics data such as DNA base modifications, small non-coding RNAs, and histone modifications. RESULTS:We present a set of plugins for the genome browser JBrowse that are targeted for epigenomics visualizations. Specifica...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2160-z
更新日期:2018-04-25 00:00:00
abstract:BACKGROUND:The analysis of mass spectra suggests that the existence of derivative peaks is strongly dependent on the intensity of the primary peaks. Peak selection from tandem mass spectrum is used to filter out noise and contaminant peaks. It is widely accepted that a valid primary peak tends to have high intensity an...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-346
更新日期:2011-08-17 00:00:00
abstract:BACKGROUND:We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. METHODS:We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples e...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S3-S4
更新日期:2009-03-19 00:00:00
abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-14
更新日期:2007-01-18 00:00:00
abstract:BACKGROUND:The adaptation of the CRISPR-Cas9 system to pooled library gene knockout screens in mammalian cells represents a major technological leap over RNA interference, the prior state of the art. New methods for analyzing the data and evaluating results are needed. RESULTS:We offer BAGEL (Bayesian Analysis of Gene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1015-8
更新日期:2016-04-16 00:00:00
abstract:BACKGROUND:Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging vari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-248
更新日期:2014-07-21 00:00:00