150 resultados para Statistical inference
Resumo:
In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.
Resumo:
The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I) observational gene expression data: normal environmental condition, (II) interventional gene expression data: growth in rich media, (III) interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.
Resumo:
This brief examines the application of nonlinear statistical process control to the detection and diagnosis of faults in automotive engines. In this statistical framework, the computed score variables may have a complicated nonparametric distri- bution function, which hampers statistical inference, notably for fault detection and diagnosis. This brief shows that introducing the statistical local approach into nonlinear statistical process control produces statistics that follow a normal distribution, thereby enabling a simple statistical inference for fault detection. Further, for fault diagnosis, this brief introduces a compensation scheme that approximates the fault condition signature. Experimental results from a Volkswagen 1.9-L turbo-charged diesel engine are included.
Resumo:
Nonlinear principal component analysis (PCA) based on neural networks has drawn significant attention as a monitoring tool for complex nonlinear processes, but there remains a difficulty with determining the optimal network topology. This paper exploits the advantages of the Fast Recursive Algorithm, where the number of nodes, the location of centres, and the weights between the hidden layer and the output layer can be identified simultaneously for the radial basis function (RBF) networks. The topology problem for the nonlinear PCA based on neural networks can thus be solved. Another problem with nonlinear PCA is that the derived nonlinear scores may not be statistically independent or follow a simple parametric distribution. This hinders its applications in process monitoring since the simplicity of applying predetermined probability distribution functions is lost. This paper proposes the use of a support vector data description and shows that transforming the nonlinear principal components into a feature space allows a simple statistical inference. Results from both simulated and industrial data confirm the efficacy of the proposed method for solving nonlinear principal component problems, compared with linear PCA and kernel PCA.
Resumo:
The quick, easy way to master all the statistics you'll ever need The bad news first: if you want a psychology degree you'll need to know statistics. Now for the good news: Psychology Statistics For Dummies. Featuring jargon-free explanations, step-by-step instructions and dozens of real-life examples, Psychology Statistics For Dummies makes the knotty world of statistics a lot less baffling. Rather than padding the text with concepts and procedures irrelevant to the task, the authors focus only on the statistics psychology students need to know. As an alternative to typical, lead-heavy statistics texts or supplements to assigned course reading, this is one book psychology students won't want to be without. Ease into statistics – start out with an introduction to how statistics are used by psychologists, including the types of variables they use and how they measure them Get your feet wet – quickly learn the basics of descriptive statistics, such as central tendency and measures of dispersion, along with common ways of graphically depicting information Meet your new best friend – learn the ins and outs of SPSS, the most popular statistics software package among psychology students, including how to input, manipulate and analyse data Analyse this – get up to speed on statistical analysis core concepts, such as probability and inference, hypothesis testing, distributions, Z-scores and effect sizes Correlate that – get the lowdown on common procedures for defining relationships between variables, including linear regressions, associations between categorical data and more Analyse by inference – master key methods in inferential statistics, including techniques for analysing independent groups designs and repeated-measures research designs Open the book and find: Ways to describe statistical data How to use SPSS statistical software Probability theory and statistical inference Descriptive statistics basics How to test hypotheses Correlations and other relationships between variables Core concepts in statistical analysis for psychology Analysing research designs Learn to: Use SPSS to analyse data Master statistical methods and procedures using psychology-based explanations and examples Create better reports Identify key concepts and pass your course
Resumo:
This paper presents an in-depth study on the effect that composition and properties of recycled coarse aggregates from previous concrete structures, together with water/cement ratio (w/c) and a replacement ratio of coarse aggregate, have on compressive strength, its evolution through time, and its variability. A rigorous approach through statistical inference based on multiple linear regression has identified the key factors. A predictive equation is given for compressive strength when recycled coarse aggregates are used. The w/c and replacement ratio are the capital factors affecting concrete compressive strength. Their effect is significantly modified by the properties and composition of the recycled aggregates used. An equation that accurately predicts concrete compressive strength in terms of these parameters is presented. Particular attention has been paid to the complex effect that old concrete and adhered mortar have on concrete compressive strength and its mid-term evolution. It has been confirmed that the presence of contaminants tends to increase variability of compressive strength values.
Resumo:
Modern biology and medicine aim at hunting molecular and cellular causes of biological functions and diseases. Gene regulatory networks (GRN) inferred from gene expression data are considered an important aid for this research by providing a map of molecular interactions. Hence, GRNs have the potential enabling and enhancing basic as well as applied research in the life sciences. In this paper, we introduce a new method called BC3NET for inferring causal gene regulatory networks from large-scale gene expression data. BC3NET is an ensemble method that is based on bagging the C3NET algorithm, which means it corresponds to a Bayesian approach with noninformative priors. In this study we demonstrate for a variety of simulated and biological gene expression data from S. cerevisiae that BC3NET is an important enhancement over other inference methods that is capable of capturing biochemical interactions from transcription regulation and protein-protein interaction sensibly. An implementation of BC3NET is freely available as an R package from the CRAN repository. © 2012 de Matos Simoes, Emmert-Streib.
Resumo:
Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context.
Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing.
Resumo:
Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove an analogous result for inference in Naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and Naive Bayes networks are used in real applications of imprecise probability.