Abstract:
BACKGROUND:Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. RESULTS:Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam) using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. CONCLUSION:Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam. Therefore, we recommend that both general and specific libraries are used together for annotation and we find that a significant improvement in coverage is achieved by using both Pfam and FPfam.
journal_name
BMC Genomicsjournal_title
BMC genomicsauthors
Alam I,Hubbard SJ,Oliver SG,Rattray Mdoi
10.1186/1471-2164-8-97subject
Has Abstractpub_date
2007-04-10 00:00:00pages
97issn
1471-2164pii
1471-2164-8-97journal_volume
8pub_type
杂志文章相关文献
BMC GENOMICS文献大全abstract:BACKGROUND:Oceans cover more than 70% of the earth's surface and are critical for the homeostasis of the environment. Among the components of the ocean ecosystem, zooplankton play vital roles in energy and matter transfer through the system. Despite their importance, understanding of zooplankton biodiversity is limited...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-438
更新日期:2009-09-17 00:00:00
abstract:BACKGROUND:Effective bioinformatics solutions are needed to tackle challenges posed by industrial-scale genome annotation. We present Bcheck, a wrapper tool which predicts RNase P RNA genes by combining the speed of pattern matching and sensitivity of covariance models. The core of Bcheck is a library of subfamily spec...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-432
更新日期:2010-07-13 00:00:00
abstract:BACKGROUND:Fusarium graminearum virus 1 strain-DK21 (FgV1-DK21) is a mycovirus that confers hypovirulence to F. graminearum, which is the primary phytopathogenic fungus that causes Fusarium head blight (FHB) disease in many cereals. Understanding the interaction between mycoviruses and plant pathogenic fungi is necessa...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-173
更新日期:2012-05-06 00:00:00
abstract:BACKGROUND:Toxin-antitoxin (TA) systems, abundant in prokaryotes, are composed of a toxin gene and its cognate antitoxin. Several toxins are implied to affect the physiological state and stress tolerance of bacteria in a population. We previously identified a chromosomally encoded hok-sok type I TA system in Erwinia am...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-021-07376-w
更新日期:2021-01-22 00:00:00
abstract:BACKGROUND:The transition from exponential to stationary phase in Streptomyces coelicolor is accompanied by a major metabolic switch and results in a strong activation of secondary metabolism. Here we have explored the underlying reorganization of the metabolome by combining computational predictions based on constrain...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-11-202
更新日期:2010-03-26 00:00:00
abstract:BACKGROUND:DNA methylation at promoters is largely correlated with inhibition of gene expression. However, the role of DNA methylation at enhancers is not fully understood, although a crosstalk with chromatin marks is expected. Actually, there exist contradictory reports about positive and negative correlations between...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-4353-7
更新日期:2017-12-12 00:00:00
abstract:BACKGROUND:Embryonic and fetal exposure to maternal obesity causes several maladaptive morphological and epigenetic changes in exposed offspring. The timing of these events is unclear, but changes can be observed even after a short exposure to maternal obesity around the time of conception. The hypothesis of this work ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5120-0
更新日期:2018-10-11 00:00:00
abstract:BACKGROUND:Endogenous small interfering (esi)RNAs repress mRNA levels and retrotransposon mobility in Drosophila somatic cells by poorly understood mechanisms. 21 nucleotide esiRNAs are primarily generated from retrotransposons and two inverted repeat (hairpin) loci in Drosophila culture cells in a Dicer2 dependent man...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-017-3692-8
更新日期:2017-04-17 00:00:00
abstract:BACKGROUND:The eutherian fibroblast growth factors were implicated as key regulators in developmental processes. However, there were major disagreements in descriptions of comprehensive eutherian fibroblast growth factors gene data sets including either 18 or 22 homologues. The present analysis attempted to revise and ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-06958-4
更新日期:2020-08-05 00:00:00
abstract:BACKGROUND:Copy neutral loss of heterozygosity (CN-LOH) refers to a special case of LOH occurring without any resulting loss in copy number. These alterations is sometimes seen in tumors as a way to inactivate a tumor suppressor gene and have been found to be important in several types of cancer. RESULTS:We have used ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-12-443
更新日期:2011-09-07 00:00:00
abstract:BACKGROUND:Tubulin isotypes and expression patterns are highly regulated in diverse organisms. The genome sequence of the protozoan parasite Leishmania major contains three distinct beta-tubulin loci. To investigate the diversity of beta-tubulin genes, we have compared the published genome sequence to draft genome sequ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-7-137
更新日期:2006-06-06 00:00:00
abstract:BACKGROUND:Human genetic variation produces the wide range of phenotypic differences that make us individual. However, little is known about the distribution of variation in the most conserved functional regions of the human genome. We examined whether different subsets of the conserved human genome have been subjected...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-614
更新日期:2009-12-16 00:00:00
abstract:BACKGROUND:The numerous classes of repeats often impede the assembly of genome sequences from the short reads provided by new sequencing technologies. We demonstrate a simple and rapid means to ascertain the repeat structure and total size of a bacterial or archaeal genome without the need for assembly by directly anal...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-14-537
更新日期:2013-08-08 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are short, non-coding RNAs that regulate gene expression mainly through translational repression of target mRNA molecules. More than 2700 human miRNAs have been identified and some are known to be associated with disease phenotypes and to display tissue-specific patterns of expression. ME...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-3114-3
更新日期:2016-10-04 00:00:00
abstract:BACKGROUND:Whole-genome sequencing is an important method to understand the genetic information, gene function, biological characteristics and survival mechanisms of organisms. Sequencing large genomes is very simple at present. However, we encountered a hard-to-sequence genome of Pseudomonas aeruginosa phage PaP1. Sho...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-803
更新日期:2014-09-19 00:00:00
abstract:BACKGROUND:Despite recent work to characterize gene expression changes associated with larval development in oysters, the mechanism by which the larval shell is first formed is still largely unknown. In Crassostrea gigas, this shell forms within the first 24 h post fertilization, and it has been demonstrated that chang...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-4519-y
更新日期:2018-02-22 00:00:00
abstract:BACKGROUND:Banana wilt disease, caused by Fusarium oxysporum f. sp. cubense Tropical Race 4 (Foc TR4), is one of the most devastating diseases in banana (Musa spp.). Foc is a soil borne pathogen that causes rot of the roots or wilt of leaves by colonizing the xylem vessels. The dual RNA sequencing is used to simultaneo...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5902-z
更新日期:2019-06-24 00:00:00
abstract:BACKGROUND:Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. RESULTS:We have d...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-16-S2-S7
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:Epigenetic clocks have been recognized for their precise prediction of chronological age, age-related diseases, and all-cause mortality. Existing epigenetic clocks are based on CpGs from the Illumina HumanMethylation450 BeadChip (450 K) which has now been replaced by the latest platform, Illumina Methylation...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-020-07168-8
更新日期:2020-10-27 00:00:00
abstract:BACKGROUND:Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for s...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-536
更新日期:2012-10-06 00:00:00
abstract:BACKGROUND:Chronic renal disease (CKD) is characterized by complex changes in cell metabolism leading to an increased production of oxygen radicals, that, in turn has been suggested to play a key role in numerous clinical complications of this pathological condition. Several reports have focused on the identification o...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-388
更新日期:2009-08-21 00:00:00
abstract:BACKGROUND:New and improved antimicrobial countermeasures are urgently needed to counteract increased resistance to existing antimicrobial treatments and to combat currently untreatable or new emerging infectious diseases. We demonstrate that computational comparative genomics, together with experimental screening, can...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-10-501
更新日期:2009-10-29 00:00:00
abstract:BACKGROUND:Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objective...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-1376-9
更新日期:2015-03-25 00:00:00
abstract:BACKGROUND:Genetic diversity within a species reflects population evolution, ecology, and ability to adapt. Genome-wide population surveys of both natural and introduced populations provide insights into genetic diversity, the evolutionary processes and the genetic basis underlying local adaptation. Grass carp is the m...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-019-5872-1
更新日期:2019-06-07 00:00:00
abstract:BACKGROUND:Anthocyanins are a group of flavonoid compounds. As a group of important secondary metabolites, they perform several key biological functions in plants. Anthocyanins also play beneficial health roles as potentially protective factors against cancer and heart disease. To elucidate the anthocyanin biosynthetic...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-426
更新日期:2014-06-04 00:00:00
abstract:BACKGROUND:The 2009 pandemic H1N1 influenza virus emerged in swine and quickly became a major global health threat. In mouse, non human primate, and swine infection models, the pH1N1 virus efficiently replicates in the lung and induces pro-inflammatory host responses; however, whether similar or different cellular path...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-13-627
更新日期:2012-11-15 00:00:00
abstract:BACKGROUND:The Brassicaceae family is an exemplary model for studying plant polyploidy. The Brassicaceae knowledge-base includes the well-annotated Arabidopsis thaliana reference sequence; well-established evidence for three rounds of whole genome duplication (WGD); and the conservation of genomic structure, with 24 co...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/1471-2164-15-606
更新日期:2014-07-18 00:00:00
abstract:BACKGROUND:The Egyptian Rousette bat (Rousettus aegyptiacus), a common fruit bat species found throughout Africa and the Middle East, was recently identified as a natural reservoir host of Marburg virus. With Ebola virus, Marburg virus is a member of the family Filoviridae that causes severe hemorrhagic fever disease i...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-015-2124-x
更新日期:2015-12-07 00:00:00
abstract:BACKGROUND:Salmonella enterica is a significant foodborne pathogen, which can be transmitted via several distinct routes, and reports on acquisition of antimicrobial resistance (AMR) are increasing. To better understand the association between human Salmonella clinical isolates and the potential environmental/animal re...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-018-5137-4
更新日期:2018-11-06 00:00:00
abstract:BACKGROUND:Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times ...
journal_title:BMC genomics
pub_type: 杂志文章
doi:10.1186/s12864-016-3082-7
更新日期:2016-09-21 00:00:00