16 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration
em University of Queensland eSpace - Australia
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
Resumo:
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
Aims [1] To quantify the random and predictable components of variability for aminoglycoside clearance and volume of distribution [2] To investigate models for predicting aminoglycoside clearance in patients with low serum creatinine concentrations [3] To evaluate the predictive performance of initial dosing strategies for achieving an aminoglycoside target concentration. Methods Aminoglycoside demographic, dosing and concentration data were collected from 697 adult patients (>=20 years old) as part of standard clinical care using a target concentration intervention approach for dose individualization. It was assumed that aminoglycoside clearance had a renal and a nonrenal component, with the renal component being linearly related to predicted creatinine clearance. Results A two compartment pharmacokinetic model best described the aminoglycoside data. The addition of weight, age, sex and serum creatinine as covariates reduced the random component of between subject variability (BSVR) in clearance (CL) from 94% to 36% of population parameter variability (PPV). The final pharmacokinetic parameter estimates for the model with the best predictive performance were: CL, 4.7 l h(-1) 70 kg(-1); intercompartmental clearance (CLic), 1 l h(-1) 70 kg(-1); volume of central compartment (V-1), 19.5 l 70 kg(-1); volume of peripheral compartment (V-2) 11.2 l 70 kg(-1). Conclusions Using a fixed dose of aminoglycoside will achieve 35% of typical patients within 80-125% of a required dose. Covariate guided predictions increase this up to 61%. However, because we have shown that random within subject variability (WSVR) in clearance is less than safe and effective variability (SEV), target concentration intervention can potentially achieve safe and effective doses in 90% of patients.
Resumo:
Fundamental principles of precaution are legal maxims that ask for preventive actions, perhaps as contingent interim measures while relevant information about causality and harm remains unavailable, to minimize the societal impact of potentially severe or irreversible outcomes. Such principles do not explain how to make choices or how to identify what is protective when incomplete and inconsistent scientific evidence of causation characterizes the potential hazards. Rather, they entrust lower jurisdictions, such as agencies or authorities, to make current decisions while recognizing that future information can contradict the scientific basis that supported the initial decision. After reviewing and synthesizing national and international legal aspects of precautionary principles, this paper addresses the key question: How can society manage potentially severe, irreversible or serious environmental outcomes when variability, uncertainty, and limited causal knowledge characterize their decision-making? A decision-analytic solution is outlined that focuses on risky decisions and accounts for prior states of information and scientific beliefs that can be updated as subsequent information becomes available. As a practical and established approach to causal reasoning and decision-making under risk, inherent to precautionary decision-making, these (Bayesian) methods help decision-makers and stakeholders because they formally account for probabilistic outcomes, new information, and are consistent and replicable. Rational choice of an action from among various alternatives-defined as a choice that makes preferred consequences more likely-requires accounting for costs, benefits and the change in risks associated with each candidate action. Decisions under any form of the precautionary principle reviewed must account for the contingent nature of scientific information, creating a link to the decision-analytic principle of expected value of information (VOI), to show the relevance of new information, relative to the initial ( and smaller) set of data on which the decision was based. We exemplify this seemingly simple situation using risk management of BSE. As an integral aspect of causal analysis under risk, the methods developed in this paper permit the addition of non-linear, hormetic dose-response models to the current set of regulatory defaults such as the linear, non-threshold models. This increase in the number of defaults is an important improvement because most of the variants of the precautionary principle require cost-benefit balancing. Specifically, increasing the set of causal defaults accounts for beneficial effects at very low doses. We also show and conclude that quantitative risk assessment dominates qualitative risk assessment, supporting the extension of the set of default causal models.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
The use of a fully parametric Bayesian method for analysing single patient trials based on the notion of treatment 'preference' is described. This Bayesian hierarchical modelling approach allows for full parameter uncertainty, use of prior information and the modelling of individual and patient sub-group structures. It provides updated probabilistic results for individual patients, and groups of patients with the same medical condition, as they are sequentially enrolled into individualized trials using the same medication alternatives. Two clinically interpretable criteria for determining a patient's response are detailed and illustrated using data from a previously published paper under two different prior information scenarios. Copyright (C) 2005 John Wiley & Sons, Ltd.
Resumo:
The design, development, and use of complex systems models raises a unique class of challenges and potential pitfalls, many of which are commonly recurring problems. Over time, researchers gain experience in this form of modeling, choosing algorithms, techniques, and frameworks that improve the quality, confidence level, and speed of development of their models. This increasing collective experience of complex systems modellers is a resource that should be captured. Fields such as software engineering and architecture have benefited from the development of generic solutions to recurring problems, called patterns. Using pattern development techniques from these fields, insights from communities such as learning and information processing, data mining, bioinformatics, and agent-based modeling can be identified and captured. Collections of such 'pattern languages' would allow knowledge gained through experience to be readily accessible to less-experienced practitioners and to other domains. This paper proposes a methodology for capturing the wisdom of computational modelers by introducing example visualization patterns, and a pattern classification system for analyzing the relationship between micro and macro behaviour in complex systems models. We anticipate that a new field of complex systems patterns will provide an invaluable resource for both practicing and future generations of modelers.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
Motivation: An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. Results: By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.