243 resultados para STOCHASTIC SEARCH
Resumo:
The traditional searching method for model-order selection in linear regression is a nested full-parameters-set searching procedure over the desired orders, which we call full-model order selection. On the other hand, a method for model-selection searches for the best sub-model within each order. In this paper, we propose using the model-selection searching method for model-order selection, which we call partial-model order selection. We show by simulations that the proposed searching method gives better accuracies than the traditional one, especially for low signal-to-noise ratios over a wide range of model-order selection criteria (both information theoretic based and bootstrap-based). Also, we show that for some models the performance of the bootstrap-based criterion improves significantly by using the proposed partial-model selection searching method. Index Terms— Model order estimation, model selection, information theoretic criteria, bootstrap 1. INTRODUCTION Several model-order selection criteria can be applied to find the optimal order. Some of the more commonly used information theoretic-based procedures include Akaike’s information criterion (AIC) [1], corrected Akaike (AICc) [2], minimum description length (MDL) [3], normalized maximum likelihood (NML) [4], Hannan-Quinn criterion (HQC) [5], conditional model-order estimation (CME) [6], and the efficient detection criterion (EDC) [7]. From a practical point of view, it is difficult to decide which model order selection criterion to use. Many of them perform reasonably well when the signal-to-noise ratio (SNR) is high. The discrepancies in their performance, however, become more evident when the SNR is low. In those situations, the performance of the given technique is not only determined by the model structure (say a polynomial trend versus a Fourier series) but, more importantly, by the relative values of the parameters within the model. This makes the comparison between the model-order selection algorithms difficult as within the same model with a given order one could find an example for which one of the methods performs favourably well or fails [6, 8]. Our aim is to improve the performance of the model order selection criteria in cases where the SNR is low by considering a model-selection searching procedure that takes into account not only the full-model order search but also a partial model order search within the given model order. Understandably, the improvement in the performance of the model order estimation is at the expense of additional computational complexity.
Resumo:
This paper presents a robust stochastic model for the incorporation of natural features within data fusion algorithms. The representation combines Isomap, a non-linear manifold learning algorithm, with Expectation Maximization, a statistical learning scheme. The representation is computed offline and results in a non-linear, non-Gaussian likelihood model relating visual observations such as color and texture to the underlying visual states. The likelihood model can be used online to instantiate likelihoods corresponding to observed visual features in real-time. The likelihoods are expressed as a Gaussian Mixture Model so as to permit convenient integration within existing nonlinear filtering algorithms. The resulting compactness of the representation is especially suitable to decentralized sensor networks. Real visual data consisting of natural imagery acquired from an Unmanned Aerial Vehicle is used to demonstrate the versatility of the feature representation.
Resumo:
Process models in organizational collections are typically modeled by the same team and using the same conventions. As such, these models share many characteristic features like size range, type and frequency of errors. In most cases merely small samples of these collections are available due to e.g. the sensitive information they contain. Because of their sizes, these samples may not provide an accurate representation of the characteristics of the originating collection. This paper deals with the problem of constructing collections of process models, in the form of Petri nets, from small samples of a collection for accurate estimations of the characteristics of this collection. Given a small sample of process models drawn from a real-life collection, we mine a set of generation parameters that we use to generate arbitrary-large collections that feature the same characteristics of the original collection. In this way we can estimate the characteristics of the original collection on the generated collections.We extensively evaluate the quality of our technique on various sample datasets drawn from both research and industry.
Resumo:
Genomic and proteomic analyses have attracted a great deal of interests in biological research in recent years. Many methods have been applied to discover useful information contained in the enormous databases of genomic sequences and amino acid sequences. The results of these investigations inspire further research in biological fields in return. These biological sequences, which may be considered as multiscale sequences, have some specific features which need further efforts to characterise using more refined methods. This project aims to study some of these biological challenges with multiscale analysis methods and stochastic modelling approach. The first part of the thesis aims to cluster some unknown proteins, and classify their families as well as their structural classes. A development in proteomic analysis is concerned with the determination of protein functions. The first step in this development is to classify proteins and predict their families. This motives us to study some unknown proteins from specific families, and to cluster them into families and structural classes. We select a large number of proteins from the same families or superfamilies, and link them to simulate some unknown large proteins from these families. We use multifractal analysis and the wavelet method to capture the characteristics of these linked proteins. The simulation results show that the method is valid for the classification of large proteins. The second part of the thesis aims to explore the relationship of proteins based on a layered comparison with their components. Many methods are based on homology of proteins because the resemblance at the protein sequence level normally indicates the similarity of functions and structures. However, some proteins may have similar functions with low sequential identity. We consider protein sequences at detail level to investigate the problem of comparison of proteins. The comparison is based on the empirical mode decomposition (EMD), and protein sequences are detected with the intrinsic mode functions. A measure of similarity is introduced with a new cross-correlation formula. The similarity results show that the EMD is useful for detection of functional relationships of proteins. The third part of the thesis aims to investigate the transcriptional regulatory network of yeast cell cycle via stochastic differential equations. As the investigation of genome-wide gene expressions has become a focus in genomic analysis, researchers have tried to understand the mechanisms of the yeast genome for many years. How cells control gene expressions still needs further investigation. We use a stochastic differential equation to model the expression profile of a target gene. We modify the model with a Gaussian membership function. For each target gene, a transcriptional rate is obtained, and the estimated transcriptional rate is also calculated with the information from five possible transcriptional regulators. Some regulators of these target genes are verified with the related references. With these results, we construct a transcriptional regulatory network for the genes from the yeast Saccharomyces cerevisiae. The construction of transcriptional regulatory network is useful for detecting more mechanisms of the yeast cell cycle.
Resumo:
We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary. Peter L. Bartlett, Alexander Rakhlin
Resumo:
Maximum-likelihood estimates of the parameters of stochastic differential equations are consistent and asymptotically efficient, but unfortunately difficult to obtain if a closed-form expression for the transitional probability density function of the process is not available. As a result, a large number of competing estimation procedures have been proposed. This article provides a critical evaluation of the various estimation techniques. Special attention is given to the ease of implementation and comparative performance of the procedures when estimating the parameters of the Cox–Ingersoll–Ross and Ornstein–Uhlenbeck equations respectively.
Resumo:
Many traffic situations require drivers to cross or merge into a stream having higher priority. Gap acceptance theory enables us to model such processes to analyse traffic operation. This discussion demonstrated that numerical search fine tuned by statistical analysis can be used to determine the most likely critical gap for a sample of drivers, based on their largest rejected gap and accepted gap. This method shares some common features with the Maximum Likelihood Estimation technique (Troutbeck 1992) but lends itself well to contemporary analysis tools such as spreadsheet and is particularly analytically transparent. This method is considered not to bias estimation of critical gap due to very small rejected gaps or very large rejected gaps. However, it requires a sufficiently large sample that there is reasonable representation of largest rejected gap/accepted gap pairs within a fairly narrow highest likelihood search band.
Resumo:
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.