Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers.

Abstract:

BACKGROUND:Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchronous parallel processing service of Grid Engine is not provided on them. To encourage the use of massively parallel supercomputers in bioinformatics, we developed middleware called Virtual Grid Engine, which enables software pipelines to automatically perform their tasks as MPI programs. RESULT:We conducted basic tests to check the time required to assign jobs to workers by VGE. The results showed that the overhead of the employed algorithm was 246 microseconds and our software can manage thousands of jobs smoothly on the K computer. We also tried a practical test in the bioinformatics field. This test included two tasks, the split and BWA alignment of input FASTQ data. 25,055 nodes (2,000,440 cores) were used for this calculation and accomplished it in three hours. CONCLUSION:We considered that there were four important requirements for this kind of software, non-privilege server program, multiple job handling, dependency control, and usability. We carefully designed and checked all requirements. And this software fulfilled all the requirements and achieved good performance in a large scale analysis.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Ito S,Yadome M,Nishiki T,Ishiduki S,Inoue H,Yamaguchi R,Miyano S

doi

10.1186/s12859-019-3085-x

subject

Has Abstract

pub_date

2019-12-02 00:00:00

pages

591

issue

Suppl 16

issn

1471-2105

pii

10.1186/s12859-019-3085-x

journal_volume

20

pub_type

杂志文章
  • The IronChip evaluation package: a package of perl modules for robust analysis of custom microarrays.

    abstract:BACKGROUND:Gene expression studies greatly contribute to our understanding of complex relationships in gene regulatory networks. However, the complexity of array design, production and manipulations are limiting factors, affecting data quality. The use of customized DNA microarrays improves overall data quality in many...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-112

    authors: Vainshtein Y,Sanchez M,Brazma A,Hentze MW,Dandekar T,Muckenthaler MU

    更新日期:2010-03-01 00:00:00

  • Enrichment of homologs in insignificant BLAST hits by co-complex network alignment.

    abstract:BACKGROUND:Homology is a crucial concept in comparative genomics. The algorithm probably most widely used for homology detection in comparative genomics, is BLAST. Usually a stringent score cutoff is applied to distinguish putative homologs from possible false positive hits. As a consequence, some BLAST hits are discar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-86

    authors: Fokkens L,Botelho SM,Boekhorst J,Snel B

    更新日期:2010-02-12 00:00:00

  • Toward an interactive article: integrating journals and biological databases.

    abstract:BACKGROUND:Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-175

    authors: Rangarajan A,Schedl T,Yook K,Chan J,Haenel S,Otis L,Faelten S,DePellegrin-Connelly T,Isaacson R,Skrzypek MS,Marygold SJ,Stefancsik R,Cherry JM,Sternberg PW,Müller HM

    更新日期:2011-05-19 00:00:00

  • Graph-representation of oxidative folding pathways.

    abstract:BACKGROUND:The process of oxidative folding combines the formation of native disulfide bond with conformational folding resulting in the native three-dimensional fold. Oxidative folding pathways can be described in terms of disulfide intermediate species (DIS) which can also be isolated and characterized. Each DIS corr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-19

    authors: Agoston V,Cemazar M,Kaján L,Pongor S

    更新日期:2005-01-27 00:00:00

  • GeneBins: a database for classifying gene expression data, with application to plant genome arrays.

    abstract:BACKGROUND:To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. RESULTS:We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-87

    authors: Goffard N,Weiller G

    更新日期:2007-03-12 00:00:00

  • Survival Online: a web-based service for the analysis of correlations between gene expression and clinical and follow-up data.

    abstract:BACKGROUND:Complex microarray gene expression datasets can be used for many independent analyses and are particularly interesting for the validation of potential biomarkers and multi-gene classifiers. This article presents a novel method to perform correlations between microarray gene expression data and clinico-pathol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S12-S10

    authors: Corradi L,Mirisola V,Porro I,Torterolo L,Fato M,Romano P,Pfeffer U

    更新日期:2009-10-15 00:00:00

  • GenNon-h: generating multiple sequence alignments on nonhomogeneous phylogenetic trees.

    abstract:BACKGROUND:A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-216

    authors: Kedzierska AM,Casanellas M

    更新日期:2012-08-28 00:00:00

  • SNP and gene networks construction and analysis from classification of copy number variations data.

    abstract:BACKGROUND:Detection of genomic DNA copy number variations (CNVs) can provide a complete and more comprehensive view of human disease. It is interesting to identify and represent relevant CNVs from a genome-wide data due to high data volume and the complexity of interactions. RESULTS:In this paper, we incorporate the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S5-S4

    authors: Liu Y,Lee YF,Ng MK

    更新日期:2011-01-01 00:00:00

  • Filling out the structural map of the NTF2-like superfamily.

    abstract:BACKGROUND:The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-ca...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-327

    authors: Eberhardt RY,Chang Y,Bateman A,Murzin AG,Axelrod HL,Hwang WC,Aravind L

    更新日期:2013-11-19 00:00:00

  • Predicting and improving the protein sequence alignment quality by support vector regression.

    abstract:BACKGROUND:For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significant...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-471

    authors: Lee M,Jeong CS,Kim D

    更新日期:2007-12-03 00:00:00

  • The tumor as an organ: comprehensive spatial and temporal modeling of the tumor and its microenvironment.

    abstract:BACKGROUND:Research related to cancer is vast, and continues in earnest in many directions. Due to the complexity of cancer, a better understanding of tumor growth dynamics can be gleaned from a dynamic computational model. We present a comprehensive, fully executable, spatial and temporal 3D computational model of the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1168-5

    authors: Bloch N,Harel D

    更新日期:2016-08-24 00:00:00

  • From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways.

    abstract:BACKGROUND:Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S8-S6

    authors: Bauer-Mehren A,Furlong LI,Rautschka M,Sanz F

    更新日期:2009-08-27 00:00:00

  • DePicT Melanoma Deep-CLASS: a deep convolutional neural networks approach to classify skin lesion images.

    abstract:BACKGROUND:Melanoma results in the vast majority of skin cancer deaths during the last decades, even though this disease accounts for only one percent of all skin cancers' instances. The survival rates of melanoma from early to terminal stages is more than fifty percent. Therefore, having the right information at the r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3351-y

    authors: Nasiri S,Helsper J,Jung M,Fathi M

    更新日期:2020-03-11 00:00:00

  • The Korean Bird Information System (KBIS) through open and public participation.

    abstract:BACKGROUND:The importance of biodiversity conservation has been increasing steadily due to its benefits to human beings. Recently, producing and managing biodiversity databases have become much easier because of the information technology (IT) advancement. This made the general public's participation in biodiversity co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S15-S11

    authors: Paik IH,Lim J,Chun BS,Jin SD,Yu JP,Lee JW,Bhak J,Paek WK

    更新日期:2009-12-03 00:00:00

  • CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences.

    abstract:BACKGROUND:One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1952-x

    authors: Gdanetz K,Benucci GMN,Vande Pol N,Bonito G

    更新日期:2017-12-06 00:00:00

  • Reverse engineering gene regulatory networks: coupling an optimization algorithm with a parameter identification technique.

    abstract:BACKGROUND:To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S15-S8

    authors: Hsiao YT,Lee WP

    更新日期:2014-01-01 00:00:00

  • SpectralNET--an application for spectral graph analysis and visualization.

    abstract:BACKGROUND:Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-260

    authors: Forman JJ,Clemons PA,Schreiber SL,Haggarty SJ

    更新日期:2005-10-19 00:00:00

  • libgapmis: extending short-read alignments.

    abstract:BACKGROUND:A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S11-S4

    authors: Alachiotis N,Berger S,Flouri T,Pissis SP,Stamatakis A

    更新日期:2013-01-01 00:00:00

  • A database and API for variation, dense genotyping and resequencing data.

    abstract:BACKGROUND:Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-238

    authors: Rios D,McLaren WM,Chen Y,Birney E,Stabenau A,Flicek P,Cunningham F

    更新日期:2010-05-11 00:00:00

  • An evidence-based approach to identify aging-related genes in Caenorhabditis elegans.

    abstract:BACKGROUND:Extensive studies have been carried out on Caenorhabditis elegans as a model organism to elucidate mechanisms of aging and the effects of perturbing known aging-related genes on lifespan and behavior. This research has generated large amounts of experimental data that is increasingly difficult to integrate a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0469-4

    authors: Callahan A,Cifuentes JJ,Dumontier M

    更新日期:2015-02-07 00:00:00

  • Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.

    abstract:BACKGROUND:When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1149-8

    authors: Mayr A,Hofner B,Schmid M

    更新日期:2016-07-22 00:00:00

  • Model based heritability scores for high-throughput sequencing data.

    abstract:BACKGROUND:Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS:We propose s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1539-6

    authors: Rudra P,Shi WJ,Vestal B,Russell PH,Odell A,Dowell RD,Radcliffe RA,Saba LM,Kechris K

    更新日期:2017-03-02 00:00:00

  • PoGO: Prediction of Gene Ontology terms for fungal proteins.

    abstract:BACKGROUND:Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-215

    authors: Jung J,Yi G,Sukno SA,Thon MR

    更新日期:2010-04-29 00:00:00

  • Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    abstract:BACKGROUND:In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multip...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1273-5

    authors: Voillet V,Besse P,Liaubet L,San Cristobal M,González I

    更新日期:2016-10-03 00:00:00

  • Amino acid sequence associated with bacteriophage recombination site helps to reveal genes potentially acquired through horizontal gene transfer.

    abstract:BACKGROUND:Horizontal gene transfer, i.e. the acquisition of genetic material from nonparent organism, is considered an important force driving species evolution. Many cases of horizontal gene transfer from prokaryotes to eukaryotes have been registered, but no transfer mechanism has been deciphered so far, although vi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03599-y

    authors: Daugavet MA,Shabelnikov SV,Podgornaya OI

    更新日期:2020-07-24 00:00:00

  • R2R--software to speed the depiction of aesthetic consensus RNA secondary structures.

    abstract:BACKGROUND:With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-3

    authors: Weinberg Z,Breaker RR

    更新日期:2011-01-04 00:00:00

  • CollapsABEL: an R library for detecting compound heterozygote alleles in genome-wide association studies.

    abstract:BACKGROUND:Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic vari...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1006-9

    authors: Zhong K,Karssen LC,Kayser M,Liu F

    更新日期:2016-04-08 00:00:00

  • Evaluation of high-throughput functional categorization of human disease genes.

    abstract:BACKGROUND:Biological data that are well-organized by an ontology, such as Gene Ontology, enables high-throughput availability of the semantic web. It can also be used to facilitate high throughput classification of biomedical information. However, to our knowledge, no evaluation has been published on automating classi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S3-S7

    authors: Chen JL,Liu Y,Sam LT,Li J,Lussier YA

    更新日期:2007-05-09 00:00:00

  • Combining sequence and network information to enhance protein-protein interaction prediction.

    abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03896-6

    authors: Liu L,Zhu X,Ma Y,Piao H,Yang Y,Hao X,Fu Y,Wang L,Peng J

    更新日期:2020-12-16 00:00:00

  • Incorporating biological information in sparse principal component analysis with application to genomic data.

    abstract:BACKGROUND:Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often repre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1740-7

    authors: Li Z,Safo SE,Long Q

    更新日期:2017-07-11 00:00:00