Identification of germ cell-specific genes in mammalian meiotic prophase.

Abstract:

BACKGROUND:Mammalian germ cells undergo meiosis to produce sperm or eggs, haploid cells that are primed to meet and propagate life. Meiosis is initiated by retinoic acid and meiotic prophase is the first and most complex stage of meiosis when homologous chromosomes pair to exchange genetic information. Errors in meiosis can lead to infertility and birth defects. However, despite the importance of this process, germ cell-specific gene expression patterns during meiosis remain undefined due to difficulty in obtaining pure germ cell samples, especially in females, where prophase occurs in the embryonic ovary. Indeed, mixed signals from both germ cells and somatic cells complicate gonadal transcriptome studies. RESULTS:We developed a machine-learning method for identifying germ cell-specific patterns of gene expression in microarray data from mammalian gonads, specifically during meiotic initiation and prophase. At 10% recall, the method detected spermatocyte genes and oocyte genes with 90% and 94% precision, respectively. Our method outperformed gonadal expression levels and gonadal expression correlations in predicting germ cell-specific expression. Top-predicted spermatocyte and oocyte genes were both preferentially localized to the X chromosome and significantly enriched for essential genes. Also identified were transcription factors and microRNAs that might regulate germ cell-specific expression. Finally, we experimentally validated Rps6ka3, a top-predicted X-linked spermatocyte gene. Protein localization studies in the mouse testis revealed germ cell-specific expression of RPS6KA3, mainly detected in the cytoplasm of spermatogonia and prophase spermatocytes. CONCLUSIONS:We have demonstrated that, through the use of machine-learning methods, it is possible to detect germ cell-specific expression from gonadal microarray data. Results from this study improve our understanding of the transition from germ cells to meiocytes in the mammalian gonad. Further, this approach is applicable to other tissues for which isolating cell populations remains difficult.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Li Y,Ray D,Ye P

doi

10.1186/1471-2105-14-72

subject

Has Abstract

pub_date

2013-02-27 00:00:00

pages

72

issn

1471-2105

pii

1471-2105-14-72

journal_volume

14

pub_type

杂志文章
  • Fast batch searching for protein homology based on compression and clustering.

    abstract:BACKGROUND:In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1938-8

    authors: Ge H,Sun L,Yu J

    更新日期:2017-11-21 00:00:00

  • Unsupervised fuzzy pattern discovery in gene expression data.

    abstract:BACKGROUND:Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S5

    authors: Wu GP,Chan KC,Wong AK

    更新日期:2011-01-01 00:00:00

  • Quick, "imputation-free" meta-analysis with proxy-SNPs.

    abstract:BACKGROUND:Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-231

    authors: Meesters C,Leber M,Herold C,Angisch M,Mattheisen M,Drichel D,Lacour A,Becker T

    更新日期:2012-09-12 00:00:00

  • Filling out the structural map of the NTF2-like superfamily.

    abstract:BACKGROUND:The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-ca...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-327

    authors: Eberhardt RY,Chang Y,Bateman A,Murzin AG,Axelrod HL,Hwang WC,Aravind L

    更新日期:2013-11-19 00:00:00

  • SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes.

    abstract:BACKGROUND:Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1437-3

    authors: Mägi R,Suleimanov YV,Clarke GM,Kaakinen M,Fischer K,Prokopenko I,Morris AP

    更新日期:2017-01-11 00:00:00

  • Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

    abstract:BACKGROUND:In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaus...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0526-z

    authors: Lee J,Lee K,Joung I,Joo K,Brooks BR,Lee J

    更新日期:2015-03-21 00:00:00

  • Inferring topology from clustering coefficients in protein-protein interaction networks.

    abstract:BACKGROUND:Although protein-protein interaction networks determined with high-throughput methods are incomplete, they are commonly used to infer the topology of the complete interactome. These partial networks often show a scale-free behavior with only a few proteins having many and the majority having only a few conne...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-519

    authors: Friedel CC,Zimmer R

    更新日期:2006-11-30 00:00:00

  • Ab-origin: an enhanced tool to identify the sourcing gene segments in germline for rearranged antibodies.

    abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S12-S20

    authors: Wang X,Wu D,Zheng S,Sun J,Tao L,Li Y,Cao Z

    更新日期:2008-12-12 00:00:00

  • Scuba: scalable kernel-based gene prioritization.

    abstract:BACKGROUND:The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2025-5

    authors: Zampieri G,Tran DV,Donini M,Navarin N,Aiolli F,Sperduti A,Valle G

    更新日期:2018-01-25 00:00:00

  • Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas.

    abstract:BACKGROUND:The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-62

    authors: Hill JT,Anderson KR,Mastracci TL,Kaestner KH,Sussel L

    更新日期:2011-02-25 00:00:00

  • GenomeBlast: a web tool for small genome comparison.

    abstract:BACKGROUND:Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inferen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S4-S18

    authors: Lu G,Jiang L,Helikar RM,Rowley TW,Zhang L,Chen X,Moriyama EN

    更新日期:2006-12-12 00:00:00

  • Discovering biological connections between experimental conditions based on common patterns of differential gene expression.

    abstract:BACKGROUND:Identifying similarities between patterns of differential gene expression provides an opportunity to identify similarities between the experimental and biological conditions that give rise to these gene expression alterations. The growing volume of gene expression data in open data repositories such as the N...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-381

    authors: Gower AC,Spira A,Lenburg ME

    更新日期:2011-09-27 00:00:00

  • Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees.

    abstract:BACKGROUND:Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic re...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-100

    authors: Chou HL,Yao CT,Su SL,Lee CY,Hu KY,Terng HJ,Shih YW,Chang YT,Lu YF,Chang CW,Wahlqvist ML,Wetter T,Chu CM

    更新日期:2013-03-19 00:00:00

  • A multiobjective approach to the genetic code adaptability problem.

    abstract:BACKGROUND:The organization of the canonical code has intrigued researches since it was first described. If we consider all codes mapping the 64 codes into 20 amino acids and one stop codon, there are more than 1.51×10(84) possible genetic codes. The main question related to the organization of the genetic code is why ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0480-9

    authors: de Oliveira LL,de Oliveira PS,Tinós R

    更新日期:2015-02-19 00:00:00

  • Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets.

    abstract:BACKGROUND:Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology all...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-241

    authors: Aubry M,Monnier A,Chicault C,de Tayrac M,Galibert MD,Burgun A,Mosser J

    更新日期:2006-05-04 00:00:00

  • Automatic localization and identification of mitochondria in cellular electron cryo-tomography using faster-RCNN.

    abstract:BACKGROUND:Cryo-electron tomography (cryo-ET) enables the 3D visualization of cellular organization in near-native state which plays important roles in the field of structural cell biology. However, due to the low signal-to-noise ratio (SNR), large volume and high content complexity within cells, it remains difficult a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2650-7

    authors: Li R,Zeng X,Sigmund SE,Lin R,Zhou B,Liu C,Wang K,Jiang R,Freyberg Z,Lv H,Xu M

    更新日期:2019-03-29 00:00:00

  • Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking.

    abstract:BACKGROUND:In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-234

    authors: Jayaseelan KV,Steinbeck C

    更新日期:2014-07-05 00:00:00

  • Predikin and PredikinDB: a computational framework for the prediction of protein kinase peptide specificity and an associated database of phosphorylation sites.

    abstract:BACKGROUND:We have previously described an approach to predicting the substrate specificity of serine-threonine protein kinases. The method, named Predikin, identifies key conserved substrate-determining residues in the kinase catalytic domain that contact the substrate in the region of the phosphorylation site and so ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-245

    authors: Saunders NF,Brinkworth RI,Huber T,Kemp BE,Kobe B

    更新日期:2008-05-26 00:00:00

  • A MATLAB tool for pathway enrichment using a topology-based pathway regulation score.

    abstract:BACKGROUND:Handling the vast amount of gene expression data generated by genome-wide transcriptional profiling techniques is a challenging task, demanding an informed combination of pre-processing, filtering and analysis methods if meaningful biological conclusions are to be drawn. For example, a range of traditional s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0358-2

    authors: Ibrahim M,Jassim S,Cawthorne MA,Langlands K

    更新日期:2014-11-04 00:00:00

  • GSV: a web-based genome synteny viewer for customized data.

    abstract:BACKGROUND:The analysis of genome synteny is a common practice in comparative genomics. With the advent of DNA sequencing technologies, individual biologists can rapidly produce their genomic sequences of interest. Although web-based synteny visualization tools are convenient for biologists to use, none of the existing...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-316

    authors: Revanna KV,Chiu CC,Bierschank E,Dong Q

    更新日期:2011-08-02 00:00:00

  • Robust joint analysis allowing for model uncertainty in two-stage genetic association studies.

    abstract:BACKGROUND:The cost efficient two-stage design is often used in genome-wide association studies (GWASs) in searching for genetic loci underlying the susceptibility for complex diseases. Replication-based analysis, which considers data from each stage separately, often suffers from loss of efficiency. Joint test that co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-9

    authors: Pan D,Li Q,Jiang N,Liu A,Yu K

    更新日期:2011-01-07 00:00:00

  • Effective automated pipeline for 3D reconstruction of synapses based on deep learning.

    abstract:BACKGROUND:The locations and shapes of synapses are important in reconstructing connectomes and analyzing synaptic plasticity. However, current synapse detection and segmentation methods are still not adequate for accurately acquiring the synaptic connectivity, and they cannot effectively alleviate the burden of synaps...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2232-0

    authors: Xiao C,Li W,Deng H,Chen X,Yang Y,Xie Q,Han H

    更新日期:2018-07-13 00:00:00

  • A weighted string kernel for protein fold recognition.

    abstract:BACKGROUND:Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little simila...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1795-5

    authors: Nojoomi S,Koehl P

    更新日期:2017-08-25 00:00:00

  • SplicerAV: a tool for mining microarray expression data for changes in RNA processing.

    abstract:BACKGROUND:Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-108

    authors: Robinson TJ,Dinan MA,Dewhirst M,Garcia-Blanco MA,Pearson JL

    更新日期:2010-02-25 00:00:00

  • Can Zipf's law be adapted to normalize microarrays?

    abstract:BACKGROUND:Normalization is the process of removing non-biological sources of variation between array experiments. Recent investigations of data in gene expression databases for varying organisms and tissues have shown that the majority of expressed genes exhibit a power-law distribution with an exponent close to -1 (i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-37

    authors: Lu T,Costello CM,Croucher PJ,Häsler R,Deuschl G,Schreiber S

    更新日期:2005-02-23 00:00:00

  • Normalized N50 assembly metric using gap-restricted co-linear chaining.

    abstract:BACKGROUND:For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-orde...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-255

    authors: Mäkinen V,Salmela L,Ylinen J

    更新日期:2012-10-03 00:00:00

  • RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

    abstract:BACKGROUND:RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-323

    authors: Li B,Dewey CN

    更新日期:2011-08-04 00:00:00

  • ProLego: tool for extracting and visualizing topological modules in protein structures.

    abstract:BACKGROUND:In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2171-9

    authors: Khan T,Panday SK,Ghosh I

    更新日期:2018-05-04 00:00:00

  • Deconvolution of gene expression from cell populations across the C. elegans lineage.

    abstract:BACKGROUND:Knowledge of when and in which cells each gene is expressed across multicellular organisms is critical in understanding both gene function and regulation of cell type diversity. However, methods for measuring expression typically involve a trade-off between imaging-based methods, which give the precise locat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-204

    authors: Burdick JT,Murray JI

    更新日期:2013-06-22 00:00:00

  • PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling.

    abstract::Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S5-S9

    authors: Park DS,Baran Y,Hormozdiari F,Eng C,Torgerson DG,Burchard EG,Zaitlen N

    更新日期:2015-01-01 00:00:00