Abstract:
BACKGROUND:Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies. RESULTS:We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 109 down to 105 bp. CONCLUSION:The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Herold J,Kurtz S,Giegerich Rdoi
10.1186/1471-2105-9-167subject
Has Abstractpub_date
2008-03-26 00:00:00pages
167issn
1471-2105pii
1471-2105-9-167journal_volume
9pub_type
杂志文章abstract:BACKGROUND:Metabolome analysis with GC/MS has meanwhile been established as one of the "omics" techniques. Compound identification is done by comparison of the MS data with compound libraries. Mass spectral libraries in the field of metabolomics ought to connect the relevant mass traces of the metabolites to other rele...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-229
更新日期:2009-07-22 00:00:00
abstract:BACKGROUND:Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the imm...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3311-6
更新日期:2020-04-23 00:00:00
abstract:BACKGROUND:Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale va...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3208-4
更新日期:2019-12-17 00:00:00
abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S1-S19
更新日期:2008-01-01 00:00:00
abstract:BACKGROUND:The efficient and robust statistical analysis of the shape of plant organs of different cultivars is an important investigation issue in plant breeding and enables a robust cultivar description within the breeding progress. Laserscanning is a highly accurate and high resolution technique to acquire the 3D sh...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03654-8
更新日期:2020-07-29 00:00:00
abstract:BACKGROUND:Cell direct reprogramming technology has been rapidly developed with its low risk of tumor risk and avoidance of ethical issues caused by stem cells, but it is still limited to specific cell types. Direct reprogramming from an original cell to target cell type needs the cell similarity and cell specific regu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2699-3
更新日期:2019-03-04 00:00:00
abstract:BACKGROUND:Whole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of effic...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S14-S9
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detec...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2175-5
更新日期:2018-07-09 00:00:00
abstract:BACKGROUND:Increasing number of eQTL (Expression Quantitative Trait Loci) datasets facilitate genetics and systems biology research. Meta-analysis tools are in need to jointly analyze datasets of same or similar issue types to improve statistical power especially in trans-eQTL mapping. Meta-analysis framework is also n...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0392-0
更新日期:2014-11-28 00:00:00
abstract:BACKGROUND:Identification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strateg...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S10-S19
更新日期:2011-10-18 00:00:00
abstract:BACKGROUND:RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-323
更新日期:2011-08-04 00:00:00
abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-471
更新日期:2007-12-03 00:00:00
abstract:BACKGROUND:Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-pr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S57
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:In quantitative proteomics, peptide mapping is a valuable approach to combine positional quantitative information with topographical and domain information of proteins. Quantitative proteomic analysis of cell surface shedding is an exemplary application area of this approach. RESULTS:We developed ImproViser...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-207
更新日期:2014-06-19 00:00:00
abstract:BACKGROUND:Homology search is still a significant step in functional analysis for genomic data. Profile Hidden Markov Model-based homology search has been widely used in protein domain analysis in many different species. In particular, with the fast accumulation of transcriptomic data of non-model species and metagenom...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1826-2
更新日期:2017-10-16 00:00:00
abstract:BACKGROUND:The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appea...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-41
更新日期:2003-09-11 00:00:00
abstract:BACKGROUND:The sequencing of many genomes and tiling arrays consisting of millions of DNA segments spanning entire genomes have made high-resolution copy number analysis possible. Microarray-based comparative genomic hybridization (array CGH) has enabled the high-resolution detection of DNA copy number aberrations. Whi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-203
更新日期:2007-06-14 00:00:00
abstract:BACKGROUND:Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environm...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-96
更新日期:2011-04-13 00:00:00
abstract:BACKGROUND:The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious pro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2665-0
更新日期:2019-02-14 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) play key roles in various cellular functions. In addition, some critical inter-species interactions such as host-pathogen interactions and pathogenicity occur through PPIs. Phytopathogenic bacteria infect hosts through attachment to host tissue, enzyme secretion, exopolysa...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-41
更新日期:2008-01-24 00:00:00
abstract:BACKGROUND:Recently there has been a lot of interest in identifying modules at the level of genetic and metabolic networks of organisms, as well as in identifying single genes and reactions that are essential for the organism. A goal of computational and systems biology is to go beyond identification towards an explana...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-118
更新日期:2006-03-08 00:00:00
abstract:BACKGROUND:The nonparametric trend test (NPT) is well suitable for identifying the genetic variants associated with quantitative traits when the trait values do not satisfy the normal distribution assumption. If the genetic model, defined according to the mode of inheritance, is known, the NPT derived under the given g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0888-x
更新日期:2016-01-28 00:00:00
abstract:BACKGROUND:Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene e...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-60
更新日期:2008-01-28 00:00:00
abstract:BACKGROUND:The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0660-7
更新日期:2015-07-30 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction speci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-476
更新日期:2010-09-22 00:00:00
abstract:BACKGROUND:Mechanotransduction in bone cells plays a pivotal role in osteoblast differentiation and bone remodelling. Mechanotransduction provides the link between modulation of the extracellular matrix by mechanical load and intracellular activity. By controlling the balance between the intracellular and extracellular...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3394-0
更新日期:2020-03-18 00:00:00
abstract:BACKGROUND:Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. RESULTS:I desc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0590-4
更新日期:2015-05-20 00:00:00
abstract:BACKGROUND:In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2171-9
更新日期:2018-05-04 00:00:00
abstract:BACKGROUND:Guide-trees are used as part of an essential heuristic to enable the calculation of multiple sequence alignments. They have been the focus of much method development but there has been little effort at determining systematically, which guide-trees, if any, give the best alignments. Some guide-tree constructi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-338
更新日期:2014-10-04 00:00:00
abstract:BACKGROUND:Relative isotope abundance quantification, which can be used for peptide identification and differential peptide quantification, plays an important role in liquid chromatography-mass spectrometry (LC-MS)-based proteomics. However, several major issues exist in the relative isotopic quantification of peptides...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-74
更新日期:2011-03-15 00:00:00