Abstract:
BACKGROUND:Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. RESULTS:We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. CONCLUSIONS:The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Cumbo F,Fiscon G,Ceri S,Masseroli M,Weitschek Edoi
10.1186/s12859-016-1419-5subject
Has Abstractpub_date
2017-01-03 00:00:00pages
6issue
1issn
1471-2105pii
10.1186/s12859-016-1419-5journal_volume
18pub_type
杂志文章abstract:BACKGROUND:Many proteins contain conserved sequence patterns (motifs) that contribute to their functionality. The process of experimentally identifying and validating novel protein motifs can be difficult, expensive, and time consuming. A means for helping to identify in advance the possible function of a novel motif i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-379
更新日期:2011-09-26 00:00:00
abstract:BACKGROUND:Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions. RESULTS:We present a computational pipeline for detecting novel lncRNAs fro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-331
更新日期:2012-12-13 00:00:00
abstract:BACKGROUND:Typical evolutionary events like recombination, hybridization or gene transfer make necessary the use of phylogenetic networks to properly depict the evolution of DNA and protein sequences. Although several theoretical classes have been proposed to characterize these networks, they make stringent assumptions...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-268
更新日期:2010-05-20 00:00:00
abstract:BACKGROUND:Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-287
更新日期:2008-06-18 00:00:00
abstract:BACKGROUND:Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2250-y
更新日期:2018-06-27 00:00:00
abstract:BACKGROUND:Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S13-S5
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer var...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S6-S26
更新日期:2010-10-07 00:00:00
abstract:BACKGROUND:Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS:In thi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-252
更新日期:2014-07-25 00:00:00
abstract:BACKGROUND:The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been im...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-419
更新日期:2008-10-07 00:00:00
abstract:BACKGROUND:A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S1-S11
更新日期:2009-01-30 00:00:00
abstract:BACKGROUND:Cryo-electron microscopy (Cryo-EM) is widely used in the determination of the three-dimensional (3D) structures of macromolecules. Particle picking from 2D micrographs remains a challenging early step in the Cryo-EM pipeline due to the diversity of particle shapes and the extremely low signal-to-noise ratio ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03809-7
更新日期:2020-11-09 00:00:00
abstract:BACKGROUND:Protein-DNA interactions are important for many cellular processes, however structural knowledge for a large fraction of known and putative complexes is still lacking. Computational docking methods aim at the prediction of complex architecture given detailed structures of its constituents. They are becoming ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-228
更新日期:2012-09-11 00:00:00
abstract:BACKGROUND:Virtual screening methods start to be well established as effective approaches to identify hits, candidates and leads for drug discovery research. Among those, structure based virtual screening (SBVS) approaches aim at docking collections of small compounds in the target structure to identify potent compound...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-168
更新日期:2009-06-02 00:00:00
abstract:BACKGROUND:There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integrat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-281
更新日期:2007-08-02 00:00:00
abstract:BACKGROUND:Improvements in technology have been accompanied by the generation of large amounts of complex data. This same technology must be harnessed effectively if the knowledge stored within the data is to be retrieved. Storing data in ontologies aids its management; ontologies serve as controlled vocabularies that ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-74
更新日期:2005-03-24 00:00:00
abstract:BACKGROUND:Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes). Structural and functional annotation both require the complex...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-198
更新日期:2005-08-05 00:00:00
abstract:BACKGROUND:Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures. RESULTS:MPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (ii) selection of genomic ma...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0394-y
更新日期:2014-12-14 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs that function as post-transcriptional regulators of messenger RNA (mRNA) through base-pairing to 6-8 nucleotide long target sites, usually located within the mRNA 3' untranslated region. A common approach to validate and probe microRNA-mRNA interact...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1057-y
更新日期:2016-04-27 00:00:00
abstract:BACKGROUND:Despite the widespread use of high throughput expression platforms and the availability of a desktop implementation of Gene Set Enrichment Analysis (GSEA) that enables non-experts to perform gene set based analyses, the availability of the necessary precompiled gene sets is rare for species other than human....
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-146
更新日期:2014-05-17 00:00:00
abstract:BACKGROUND:To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regula...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-67
更新日期:2011-03-04 00:00:00
abstract:BACKGROUND:In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using feat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-45
更新日期:2008-01-24 00:00:00
abstract:BACKGROUND:Biological networks are representative of the diverse molecular interactions that occur within cells. Some of the commonly studied biological networks are modeled through protein-protein interactions, gene regulatory, and metabolic pathways. Among these, metabolic networks are probably the most studied, as t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03564-9
更新日期:2020-08-21 00:00:00
abstract:BACKGROUND:Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determina...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-277
更新日期:2006-06-02 00:00:00
abstract:BACKGROUND:Protein crystal structures are potentially over-interpreted since they are routinely refined without any restraint on the upper limit of atomic B-factors. Consequently, some of their atoms, undetected in the electron density maps, are allowed to reach extremely large B-factors, even above 100 square Angstrom...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2083-8
更新日期:2018-02-23 00:00:00
abstract:BACKGROUND:Two-dimensional data needs to be processed and analysed in almost any experimental laboratory. Some tasks in this context may be performed with generic software such as spreadsheet programs which are available ubiquitously, others may require more specialised software that requires paid licences. Additionall...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-201
更新日期:2012-08-14 00:00:00
abstract:BACKGROUND:Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Tes...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-431
更新日期:2007-11-07 00:00:00
abstract:BACKGROUND:Multidimensional protein identification technology (MudPIT)-based shot-gun proteomics has been proven to be an effective platform for functional proteomics. In particular, the various sample preparation methods and bioinformatics tools can be integrated to improve the proteomics platform for applications lik...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S15-S8
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of g...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-14
更新日期:2007-01-18 00:00:00
abstract:BACKGROUND:In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S9-S6
更新日期:2008-08-12 00:00:00
abstract:BACKGROUND:In current comparative proteomics studies, the large number of images generated by 2D gels is currently compared using spot matching algorithms. Unfortunately, differences in gel migration and sample variability make efficient spot alignment very difficult to obtain, and, as consequence most of the software ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-460
更新日期:2008-10-28 00:00:00