Statistical analysis of unlabeled point sets: comparing molecules in chemoinformatics.

Abstract:

:We consider Bayesian methodology for comparing two or more unlabeled point sets. Application of the technique to a set of steroid molecules illustrates its potential utility involving the comparison of molecules in chemoinformatics and bioinformatics. We initially match a pair of molecules, where one molecule is regarded as random and the other fixed. A type of mixture model is proposed for the point set coordinates, and the parameters of the distribution are a labeling matrix (indicating which pairs of points match) and a concentration parameter. An important property of the likelihood is that it is invariant under rotations and translations of the data. Bayesian inference for the parameters is carried out using Markov chain Monte Carlo simulation, and it is demonstrated that the procedure works well on the steroid data. The posterior distribution is difficult to simulate from, due to multiple local modes, and we also use additional data (partial charges on atoms) to help with this task. An approximation is considered for speeding up the simulation algorithm, and the approximating fast algorithm leads to essentially identical inference to that under the exact method for our data. Extensions to multiple molecule alignment are also introduced, and an algorithm is described which also works well on the steroid data set. After all the steroid molecules have been matched, exploratory data analysis is carried out to examine which molecules are similar. Also, further Bayesian inference for the multiple alignment problem is considered.

journal_name

Biometrics

journal_title

Biometrics

authors

Dryden IL,Hirst JD,Melville JL

doi

10.1111/j.1541-0420.2006.00622.x

subject

Has Abstract

pub_date

2007-03-01 00:00:00

pages

237-51

issue

1

eissn

0006-341X

issn

1541-0420

pii

BIOM622

journal_volume

63

pub_type

杂志文章
  • Interval estimation of the risk ratio between a secondary infection, given a primary infection, and the primary infection.

    abstract::This paper discusses interval estimation of the risk ratio (RR) between a secondary infection, given a primary infection, and the primary infection. Three asymptotic closed-form interval estimators are developed using Wald's test statistic, the logarithmic transformation, and Fieller's theorem. The performance of thes...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lui KJ

    更新日期:1998-06-01 00:00:00

  • A comparison of several point estimators of the odds ratio in a single 2 x 2 contingency table.

    abstract::The relative performance of the unconditioned maximum likelihood estimators (UMLEs), conditional MLEs (CMLEs), and Jewell-type estimators of the odds ratio (OR) and its logarithm were investigated in sets of single 2 x 2 contingency tables. The tables were generated by complete enumeration of all possible cell frequen...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Walter SD,Cook RJ

    更新日期:1991-09-01 00:00:00

  • Numerical discretization-based estimation methods for ordinary differential equation models via penalized spline smoothing with applications in biomedical research.

    abstract::Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space....

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2012.01752.x

    authors: Wu H,Xue H,Kumar A

    更新日期:2012-06-01 00:00:00

  • Unbalanced regression analysis with residuals having a covariance structure of intra-class form.

    abstract::Let Yi be an ni X 1 vector of observations, Xi an ni X p matrix of known values, and beta an unknown p X 1 with the structure Yi = Xi beta + epsilon i, where the covariance matrix of epsilon i is of intra-class form, that is Cov (epsilon i) = sigma2[(1 - rho) Ii + rho e i e i'] where Ii is the ni X ni identity matrix ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Wiorkowski JJ

    更新日期:1975-09-01 00:00:00

  • Criteria for the validation of surrogate endpoints in randomized experiments.

    abstract::The validation of surrogate endpoints has been studied by Prentice (1989, Statistics in Medicine 8, 431-440) and Freedman, Graubard, and Schatzkin (1992, Statistics in Medicine 11, 167-178). We extended their proposals in the cases where the surrogate and the final endpoints are both binary or normally distributed. Le...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Buyse M,Molenberghs G

    更新日期:1998-09-01 00:00:00

  • A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    abstract::Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable a...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12007

    authors: Schörgendorfer A,Branscum AJ,Hanson TE

    更新日期:2013-06-01 00:00:00

  • Doubly robust estimator for net survival rate in analyses of cancer registry data.

    abstract::Cancer population studies based on cancer registry databases are widely conducted to address various research questions. In general, cancer registry databases do not collect information on cause of death. The net survival rate is defined as the survival rate if a subject would not die for any causes other than cancer....

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12568

    authors: Komukai S,Hattori S

    更新日期:2017-03-01 00:00:00

  • Further aspects of a Markovian sampling policy for water quality monitoring.

    abstract::In this paper, a Markov process is developed as a mathematical model to study the general problem of quality control monitoring. This approach was previously used by Arnold (1970) in development of sampling plans to study the water quality monitoring of streams. Arnold considered the expected sample size required for ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Smeach SC,Jernigan RW

    更新日期:1977-03-01 00:00:00

  • On constrained balance randomization for clinical trials.

    abstract::A method is proposed for calculating the probabilities of assignment of a patient to treatments; it involves minimizing a quadratic criterion subject to a balance constraint. The optimal probabilities are very easy to compute. Numerical illustration is given and comparisons are drawn with the entropy-based methods of ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Titterington DM

    更新日期:1983-12-01 00:00:00

  • Validity of tests under covariate-adaptive biased coin randomization and generalized linear models.

    abstract::Some covariate-adaptive randomization methods have been used in clinical trials for a long time, but little theoretical work has been done about testing hypotheses under covariate-adaptive randomization until Shao et al. (2010) who provided a theory with detailed discussion for responses under linear models. In this a...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12062

    authors: Shao J,Yu X

    更新日期:2013-12-01 00:00:00

  • Testing for cubic smoothing splines under dependent data.

    abstract::In most research on smoothing splines the focus has been on estimation, while inference, especially hypothesis testing, has received less attention. By defining design matrices for fixed and random effects and the structure of the covariance matrices of random errors in an appropriate way, the cubic smoothing spline a...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01537.x

    authors: Nummi T,Pan J,Siren T,Liu K

    更新日期:2011-09-01 00:00:00

  • Performance of generalized estimating equations in practical situations.

    abstract::Moment methods for analyzing repeated binary responses have been proposed by Liang and Zeger (1986, Biometrika 73, 13-22), and extended by Prentice (1988, Biometrics 44, 1033-1048). In their generalized estimating equations (GEE), both Liang and Zeger (1986) and Prentice (1988) estimate the parameters associated with ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lipsitz SR,Fitzmaurice GM,Orav EJ,Laird NM

    更新日期:1994-03-01 00:00:00

  • Robustness of group testing in the estimation of proportions.

    abstract::In binomial group testing, unlike one-at-a-time testing, the test unit consists of a group of individuals, and each group is declared to be defective or nondefective. A defective group is one that is presumed to include one or more defective (e.g., infected, positive) individuals and a nondefective group to contain on...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00231.x

    authors: Hung M,Swallow WH

    更新日期:1999-03-01 00:00:00

  • Bayesian models for multivariate current status data with informative censoring.

    abstract::Multivariate current status data, consist of indicators of whether each of several events occur by the time of a single examination. Our interest focuses on inferences about the joint distribution of the event times. Conventional methods for analysis of multiple event-time data cannot be used because all of the event ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2002.00079.x

    authors: Dunson DB,Dinse GE

    更新日期:2002-03-01 00:00:00

  • Stability of population growth determined by 2 X 2 Leslie matrix with density-dependent elements.

    abstract::The matrix considered contains four elements, each a function of total number. A special case for which the matrix may be appropriate is when the population may be divided into juveniles and adults, and the survival rates and fecundity are the same for all members of each group. This is true, at least approximatley, f...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Cooke D,Leon JA

    更新日期:1976-06-01 00:00:00

  • Heterogeneity models of disease susceptibility, with application to diabetic nephropathy.

    abstract::It is not, in general, possible to include all relevant risk factors in a model of survival or disease incidence. This heterogeneity must be accounted for in the interpretation, as it can imply otherwise unexpected results. This is illustrated by diabetic nephropathy, a serious complication experienced by some diabeti...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Hougaard P,Myglegaard P,Borch-Johnsen K

    更新日期:1994-12-01 00:00:00

  • Case-control studies of gene-environment interaction: Bayesian design and analysis.

    abstract::With increasing frequency, epidemiologic studies are addressing hypotheses regarding gene-environment interaction. In many well-studied candidate genes and for standard dietary and behavioral epidemiologic exposures, there is often substantial prior information available that may be used to analyze current data as wel...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01357.x

    authors: Mukherjee B,Ahn J,Gruber SB,Ghosh M,Chatterjee N

    更新日期:2010-09-01 00:00:00

  • Robust inference for the stepped wedge design.

    abstract::Stepped wedge designed trials are a type of cluster-randomized study in which the intervention is introduced to each cluster in a random order over time. This design is often used to assess the effect of a new intervention as it is rolled out across a series of clinics or communities. Based on a permutation argument, ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13106

    authors: Hughes JP,Heagerty PJ,Xia F,Ren Y

    更新日期:2020-03-01 00:00:00

  • Hypothesis testing under mixture models: application to genetic linkage analysis.

    abstract::In this paper we propose a new class of statistics to test a simple hypothesis against a family of alternatives characterized by a mixture model. Unlike the likelihood ratio statistic, whose large sample distribution is still unknown in this situation, these new statistics have a simple asymptotic distribution to whic...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00065.x

    authors: Liang KY,Rathouz PJ

    更新日期:1999-03-01 00:00:00

  • The Jolly-Seber model with tag loss.

    abstract::Tag loss in mark-recapture experiments is a violation of one of the Jolly-Seber model assumptions. It causes bias in parameter estimates and has only been dealt with in an ad hoc manner. We develop methodology to estimate tag retention and abundance in double-tagging mark-recapture experiments. We apply this methodolo...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00523.x

    authors: Cowen L,Schwarz CJ

    更新日期:2006-09-01 00:00:00

  • The estimation of maternal genetic variances.

    abstract::The estimation of maternal genetic variances by a multivariate maximum likelihood method is discussed. As an illustration the method is applied to data on Tribolium using a model based on partitioning the maternal genetic effect into additive and dominance components. An alternative model due to Falconer (1965) is als...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Thompson R

    更新日期:1976-12-01 00:00:00

  • Bayesian inference for two-phase studies with categorical covariates.

    abstract::In this article, we consider two-phase sampling in the situation in which all covariates are categorical. Two-phase designs are appealing from an efficiency perspective since they allow sampling to be concentrated in informative cells. A number of likelihood-based methods have been developed for the analysis of two-ph...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12019

    authors: Ross M,Wakefield J

    更新日期:2013-06-01 00:00:00

  • Statistical methods for classification of human chromosomes.

    abstract::The basic technical facts of human cytogenetics and the laboratory methods employed in chromosome research are explained in simple terms. The main variables used to describe chromosome images are defined and discussed. Three discriminant analysis models for chromosome classification are developed: one in which each ch...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Habbema JD

    更新日期:1979-03-01 00:00:00

  • Sensitivity analysis for nonrandom dropout: a local influence approach.

    abstract::Diggle and Kenward (1994, Applied Statistics 43, 49-93) proposed a selection model for continuous longitudinal data subject to nonrandom dropout. It has provoked a large debate about the role for such models. The original enthusiasm was followed by skepticism about the strong but untestable assumptions on which this t...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.00007.x

    authors: Verbeke G,Molenberghs G,Thijs H,Lesaffre E,Kenward MG

    更新日期:2001-03-01 00:00:00

  • Identification of differential aberrations in multiple-sample array CGH studies.

    abstract::Most existing methods for identifying aberrant regions with array CGH data are confined to a single target sample. Focusing on the comparison of multiple samples from two different groups, we develop a new penalized regression approach with a fused adaptive lasso penalty to accommodate the spatial dependence of the cl...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01457.x

    authors: Wang HJ,Hu J

    更新日期:2011-06-01 00:00:00

  • Sequential equivalence testing and repeated confidence intervals, with applications to normal and binary responses.

    abstract::We propose group sequential tests of the equivalence of two treatments based on ideas related to repeated confidence intervals. These tests adapt readily to unpredictable group sizes, to the possibility of continuing even though a boundary has been crossed, and to nonnormal observations. In comparing two binomial dist...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Jennison C,Turnbull BW

    更新日期:1993-03-01 00:00:00

  • To use or not to use? Backward equations in stochastic carcinogenesis models.

    abstract::The method based on the Kolmogorov backward equations of Little (1995, Biometrics 51, 1278-1291) for computing hazard functions for the multistage carcinogenesis models fails when model parameters are time-dependent. In addition to suggesting an alternative method based on the Kolmogorov forward equation, this note hi...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Zheng Q

    更新日期:1998-03-01 00:00:00

  • Alternative estimation procedures for Pr(X less than Y) in categorized data.

    abstract::Consider two independent random variables X and Y. The functional R = Pr(X less than Y) [or gamma = Pr(X less than Y) - Pr(Y less than X)] is of practical importance in many situations, including clinical trials, genetics, and reliability. In this paper several approaches to estimation of gamma when X and Y are presen...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Simonoff JS,Hochberg Y,Reiser B

    更新日期:1986-12-01 00:00:00

  • Modification of the Greenwood formula for correlated response times.

    abstract::Life-table methodology for interval-censored survival times is used to estimate marginal survival probabilities from data consisting of independent cohorts of correlated responses. We restrict our attention to situations where response times within cohorts are exchangeable and the marginal survival distributions are t...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Kang SS,Koehler KJ

    更新日期:1997-09-01 00:00:00

  • Relative risk trees for censored survival data.

    abstract::A method is developed for obtaining tree-structured relative risk estimates for censored survival data. The first step of a full likelihood estimation procedure is used in a recursive partitioning algorithm that adopts most aspects of the widely used Classification and Regression Tree (CART) algorithm of Breiman et al...

    journal_title:Biometrics

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:

    authors: LeBlanc M,Crowley J

    更新日期:1992-06-01 00:00:00