989 resultados para Statistical Robustness


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems. © 1963-2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We uncovered the underlying energy landscape of the mitogen-activated protein kinases signal transduction cellular network by exploring the statistical natures of the Brownian dynamical trajectories. We introduce a dimensionless quantity: The robustness ratio of energy gap versus local roughness to measure the global topography of the underlying landscape. A high robustness ratio implies funneled landscape. The landscape is quite robust against environmental fluctuations and variants of the intrinsic chemical reaction rates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Finding a multidimensional potential landscape is the key for addressing important global issues, such as the robustness of cellular networks. We have uncovered the underlying potential energy landscape of a simple gene regulatory network: a toggle switch. This was realized by explicitly constructing the steady state probability of the gene switch in the protein concentration space in the presence of the intrinsic statistical fluctuations due to the small number of proteins in the cell. We explored the global phase space for the system. We found that the protein synthesis rate and the unbinding rate of proteins to the gene were small relative to the protein degradation rate; the gene switch is monostable with only one stable basin of attraction. When both the protein synthesis rate and the unbinding rate of proteins to the gene are large compared with the protein degradation rate, two global basins of attraction emerge for a toggle switch. These basins correspond to the biologically stable functional states. The potential energy barrier between the two basins determines the time scale of conversion from one to the other. We found as the protein synthesis rate and protein unbinding rate to the gene relative to the protein degradation rate became larger, the potential energy barrier became larger. This also corresponded to systems with less noise or the fluctuations on the protein numbers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, scale-free networks and their functional robustness with respect to structural perturbations of the network are studied. Two types of perturbations are distinguished: random perturbations and attacks. The robustness of directed and undirected scale-free networks is studied numerically for two different measures and the obtained results are compared. For random perturbations, the results indicate that the strength of the perturbation plays a crucial role. In general, directed scale-free networks are more robust than undirected scale-free networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss statistical inference problems associated with identification and testability in econometrics, and we emphasize the common nature of the two issues. After reviewing the relevant statistical notions, we consider in turn inference in nonparametric models and recent developments on weakly identified models (or weak instruments). We point out that many hypotheses, for which test procedures are commonly proposed, are not testable at all, while some frequently used econometric methods are fundamentally inappropriate for the models considered. Such situations lead to ill-defined statistical problems and are often associated with a misguided use of asymptotic distributional results. Concerning nonparametric hypotheses, we discuss three basic problems for which such difficulties occur: (1) testing a mean (or a moment) under (too) weak distributional assumptions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic models with an unlimited number of parameters. Concerning weakly identified models, we stress that valid inference should be based on proper pivotal functions —a condition not satisfied by standard Wald-type methods based on standard errors — and we discuss recent developments in this field, mainly from the viewpoint of building valid tests and confidence sets. The techniques discussed include alternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests. The possibility of deriving a finite-sample distributional theory, robustness to the presence of weak instruments, and robustness to the specification of a model for endogenous explanatory variables are stressed as important criteria assessing alternative procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: We report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links. Results: The eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' - the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections - influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment. Conclusion: A few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We address the problem of automatically identifying and restoring damaged and contaminated images. We suggest a novel approach based on a semi-parametric model. This has two components, a parametric component describing known physical characteristics and a more flexible non-parametric component. The latter avoids the need for a detailed model for the sensor, which is often costly to produce and lacking in robustness. We assess our approach using an analysis of electroencephalographic images contaminated by eye-blink artefacts and highly damaged photographs contaminated by non-uniform lighting. These experiments show that our approach provides an effective solution to problems of this type.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Predators and preys often form species networks with asymmetric patterns of interaction. We study the dynamics of a four species network consisting of two weakly connected predator-prey pairs. We focus our analysis on the effects of the cross interaction between the predator of the first pair and the prey of the second pair. This is an example where the predator overlap, which is the proportion of predators that a given prey shares with other preys, is not uniform across the network due to asymmetries in patterns of interaction. We explore the behavior of the system under different interaction strengths and study the dynamics of survival and extinction. In particular, we consider situations in which the four species have initial populations lower than their long-term equilibrium, simulating catastrophic situations in which their abundances are reduced due to human action or environmental change. We show that, under these reduced initial conditions, and depending on the strength of the cross interaction, the populations tend to oscillate before re-equilibrating, disturbing the community equilibrium and sometimes reaching values that are only a small fraction of the equilibrium population, potentially leading to their extinction. We predict that, contrary to one`s intuition, the most likely scenario is the extinction of the less predated preys. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we carry out robust modeling and influence diagnostics in Birnbaum-Saunders (BS) regression models. Specifically, we present some aspects related to BS and log-BS distributions and their generalizations from the Student-t distribution, and develop BS-t regression models, including maximum likelihood estimation based on the EM algorithm and diagnostic tools. In addition, we apply the obtained results to real data from insurance, which shows the uses of the proposed model. Copyright (c) 2011 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical shape models (SSMs) have been used widely as a basis for segmenting and interpreting complex anatomical structures. The robustness of these models are sensitive to the registration procedures, i.e., establishment of a dense correspondence across a training data set. In this work, two SSMs based on the same training data set of scoliotic vertebrae, and registration procedures were compared. The first model was constructed based on the original binary masks without applying any image pre- and post-processing, and the second was obtained by means of a feature preserving smoothing method applied to the original training data set, followed by a standard rasterization algorithm. The accuracies of the correspondences were assessed quantitatively by means of the maximum of the mean minimum distance (MMMD) and Hausdorf distance (H(D)). Anatomical validity of the models were quantified by means of three different criteria, i.e., compactness, specificity, and model generalization ability. The objective of this study was to compare quasi-identical models based on standard metrics. Preliminary results suggest that the MMMD distance and eigenvalues are not sensitive metrics for evaluating the performance and robustness of SSMs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are numerous statistical methods for quantitative trait linkage analysis in human studies. An ideal such method would have high power to detect genetic loci contributing to the trait, would be robust to non-normality in the phenotype distribution, would be appropriate for general pedigrees, would allow the incorporation of environmental covariates, and would be appropriate in the presence of selective sampling. We recently described a general framework for quantitative trait linkage analysis, based on generalized estimating equations, for which many current methods are special cases. This procedure is appropriate for general pedigrees and easily accommodates environmental covariates. In this paper, we use computer simulations to investigate the power robustness of a variety of linkage test statistics built upon our general framework. We also propose two novel test statistics that take account of higher moments of the phenotype distribution, in order to accommodate non-normality. These new linkage tests are shown to have high power and to be robust to non-normality. While we have not yet examined the performance of our procedures in the context of selective sampling via computer simulations, the proposed tests satisfy all of the other qualities of an ideal quantitative trait linkage analysis method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.