954 resultados para Stochastic Differential Equations, Parameter Estimation, Maximum Likelihood, Simulation, Moments
Resumo:
The purpose of this work was to model lung cancer mortality as a function of past exposure to tobacco and to forecast age-sex-specific lung cancer mortality rates. A 3-factor age-period-cohort (APC) model, in which the period variable is replaced by the product of average tar content and adult tobacco consumption per capita, was estimated for the US, UK, Canada and Australia by the maximum likelihood method. Age- and sex-specific tobacco consumption was estimated from historical data on smoking prevalence and total tobacco consumption. Lung cancer mortality was derived from vital registration records. Future tobacco consumption, tar content and the cohort parameter were projected by autoregressive moving average (ARIMA) estimation. The optimal exposure variable was found to be the product of average tar content and adult cigarette consumption per capita, lagged for 2530 years for both males and females in all 4 countries. The coefficient of the product of average tar content and tobacco consumption per capita differs by age and sex. In all models, there was a statistically significant difference in the coefficient of the period variable by sex. In all countries, male age-standardized lung cancer mortality rates peaked in the 1980s and declined thereafter. Female mortality rates are projected to peak in the first decade of this century. The multiplicative models of age, tobacco exposure and cohort fit the observed data between 1950 and 1999 reasonably well, and time-series models yield plausible past trends of relevant variables. Despite a significant reduction in tobacco consumption and average tar content of cigarettes sold over the past few decades, the effect on lung cancer mortality is affected by the time lag between exposure and established disease. As a result, the burden of lung cancer among females is only just reaching, or soon will reach, its peak but has been declining for I to 2 decades in men. Future sex differences in lung cancer mortality are likely to be greater in North America than Australia and the UK due to differences in exposure patterns between the sexes. (c) 2005 Wiley-Liss, Inc.
Resumo:
Subsequent to the influential paper of [Chan, K.C., Karolyi, G.A., Longstaff, F.A., Sanders, A.B., 1992. An empirical comparison of alternative models of the short-term interest rate. Journal of Finance 47, 1209-1227], the generalised method of moments (GMM) has been a popular technique for estimation and inference relating to continuous-time models of the short-term interest rate. GMM has been widely employed to estimate model parameters and to assess the goodness-of-fit of competing short-rate specifications. The current paper conducts a series of simulation experiments to document the bias and precision of GMM estimates of short-rate parameters, as well as the size and power of [Hansen, L.P., 1982. Large sample properties of generalised method of moments estimators. Econometrica 50, 1029-1054], J-test of over-identifying restrictions. While the J-test appears to have appropriate size and good power in sample sizes commonly encountered in the short-rate literature, GMM estimates of the speed of mean reversion are shown to be severely biased. Consequently, it is dangerous to draw strong conclusions about the strength of mean reversion using GMM. In contrast, the parameter capturing the levels effect, which is important in differentiating between competing short-rate specifications, is estimated with little bias. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Testing for simultaneous vicariance across comparative phylogeographic data sets is a notoriously difficult problem hindered by mutational variance, the coalescent variance, and variability across pairs of sister taxa in parameters that affect genetic divergence. We simulate vicariance to characterize the behaviour of several commonly used summary statistics across a range of divergence times, and to characterize this behaviour in comparative phylogeographic datasets having multiple taxon-pairs. We found Tajima's D to be relatively uncorrelated with other summary statistics across divergence times, and using simple hypothesis testing of simultaneous vicariance given variable population sizes, we counter-intuitively found that the variance across taxon pairs in Nei and Li's net nucleotide divergence (pi(net)), a common measure of population divergence, is often inferior to using the variance in Tajima's D across taxon pairs as a test statistic to distinguish ancient simultaneous vicariance from variable vicariance histories. The opposite and more intuitive pattern is found for testing more recent simultaneous vicariance, and overall we found that depending on the timing of vicariance, one of these two test statistics can achieve high statistical power for rejecting simultaneous vicariance, given a reasonable number of intron loci (> 5 loci, 400 bp) and a range of conditions. These results suggest that components of these two composite summary statistics should be used in future simulation-based methods which can simultaneously use a pool of summary statistics to test comparative the phylogeographic hypotheses we consider here.
Resumo:
Determining the dimensionality of G provides an important perspective on the genetic basis of a multivariate suite of traits. Since the introduction of Fisher's geometric model, the number of genetically independent traits underlying a set of functionally related phenotypic traits has been recognized as an important factor influencing the response to selection. Here, we show how the effective dimensionality of G can be established, using a method for the determination of the dimensionality of the effect space from a multivariate general linear model introduced by AMEMIYA (1985). We compare this approach with two other available methods, factor-analytic modeling and bootstrapping, using a half-sib experiment that estimated G for eight cuticular hydrocarbons of Drosophila serrata. In our example, eight pheromone traits were shown to be adequately represented by only two underlying genetic dimensions by Amemiya's approach and factor-analytic modeling of the covariance structure at the sire level. In, contrast, bootstrapping identified four dimensions with significant genetic variance. A simulation study indicated that while the performance of Amemiya's method was more sensitive to power constraints, it performed as well or better than factor-analytic modeling in correctly identifying the original genetic dimensions at moderate to high levels of heritability. The bootstrap approach consistently overestimated the number of dimensions in all cases and performed less well than Amemiya's method at subspace recovery.
Resumo:
We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.
Resumo:
A stochastic model for solute transport in aquifers is studied based on the concepts of stochastic velocity and stochastic diffusivity. By applying finite difference techniques to the spatial variables of the stochastic governing equation, a system of stiff stochastic ordinary differential equations is obtained. Both the semi-implicit Euler method and the balanced implicit method are used for solving this stochastic system. Based on the Karhunen-Loeve expansion, stochastic processes in time and space are calculated by means of a spatial correlation matrix. Four types of spatial correlation matrices are presented based on the hydraulic properties of physical parameters. Simulations with two types of correlation matrices are presented.
Resumo:
We present results concerning the application of the Good-Turing (GT) estimation method to the frequentist n-tuple system. We show that the Good-Turing method can, to a certain extent rectify the Zero Frequency Problem by providing, within a formal framework, improved estimates of small tallies. We also show that it leads to better tuple system performance than Maximum Likelihood estimation (MLE). However, preliminary experimental results suggest that replacing zero tallies with an arbitrary constant close to zero before MLE yields better performance than that of GT system.
Resumo:
It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.
Resumo:
Most traditional methods for extracting the relationships between two time series are based on cross-correlation. In a non-linear non-stationary environment, these techniques are not sufficient. We show in this paper how to use hidden Markov models to identify the lag (or delay) between different variables for such data. Adopting an information-theoretic approach, we develop a procedure for training HMMs to maximise the mutual information (MMI) between delayed time series. The method is used to model the oil drilling process. We show that cross-correlation gives no information and that the MMI approach outperforms maximum likelihood.
Resumo:
Automatically generating maps of a measured variable of interest can be problematic. In this work we focus on the monitoring network context where observations are collected and reported by a network of sensors, and are then transformed into interpolated maps for use in decision making. Using traditional geostatistical methods, estimating the covariance structure of data collected in an emergency situation can be difficult. Variogram determination, whether by method-of-moment estimators or by maximum likelihood, is very sensitive to extreme values. Even when a monitoring network is in a routine mode of operation, sensors can sporadically malfunction and report extreme values. If this extreme data destabilises the model, causing the covariance structure of the observed data to be incorrectly estimated, the generated maps will be of little value, and the uncertainty estimates in particular will be misleading. Marchant and Lark [2007] propose a REML estimator for the covariance, which is shown to work on small data sets with a manual selection of the damping parameter in the robust likelihood. We show how this can be extended to allow treatment of large data sets together with an automated approach to all parameter estimation. The projected process kriging framework of Ingram et al. [2007] is extended to allow the use of robust likelihood functions, including the two component Gaussian and the Huber function. We show how our algorithm is further refined to reduce the computational complexity while at the same time minimising any loss of information. To show the benefits of this method, we use data collected from radiation monitoring networks across Europe. We compare our results to those obtained from traditional kriging methodologies and include comparisons with Box-Cox transformations of the data. We discuss the issue of whether to treat or ignore extreme values, making the distinction between the robust methods which ignore outliers and transformation methods which treat them as part of the (transformed) process. Using a case study, based on an extreme radiological events over a large area, we show how radiation data collected from monitoring networks can be analysed automatically and then used to generate reliable maps to inform decision making. We show the limitations of the methods and discuss potential extensions to remedy these.
Resumo:
Diffusion processes are a family of continuous-time continuous-state stochastic processes that are in general only partially observed. The joint estimation of the forcing parameters and the system noise (volatility) in these dynamical systems is a crucial, but non-trivial task, especially when the system is nonlinear and multimodal. We propose a variational treatment of diffusion processes, which allows us to compute type II maximum likelihood estimates of the parameters by simple gradient techniques and which is computationally less demanding than most MCMC approaches. We also show how a cheap estimate of the posterior over the parameters can be constructed based on the variational free energy.
Resumo:
This thesis describes the design and implementation of a new dynamic simulator called DASP. It is a computer program package written in standard Fortran 77 for the dynamic analysis and simulation of chemical plants. Its main uses include the investigation of a plant's response to disturbances, the determination of the optimal ranges and sensitivities of controller settings and the simulation of the startup and shutdown of chemical plants. The design and structure of the program and a number of features incorporated into it combine to make DASP an effective tool for dynamic simulation. It is an equation-oriented dynamic simulator but the model equations describing the user's problem are generated from in-built model equation library. A combination of the structuring of the model subroutines, the concept of a unit module, and the use of the connection matrix of the problem given by the user have been exploited to achieve this objective. The Executive program has a structure similar to that of a CSSL-type simulator. DASP solves a system of differential equations coupled to nonlinear algebraic equations using an advanced mixed equation solver. The strategy used in formulating the model equations makes it possible to obtain the steady state solution of the problem using the same model equations. DASP can handle state and time events in an efficient way and this includes the modification of the flowsheet. DASP is highly portable and this has been demonstrated by running it on a number of computers with only trivial modifications. The program runs on a microcomputer with 640 kByte of memory. It is a semi-interactive program, with the bulk of all input data given in pre-prepared data files with communication with the user is via an interactive terminal. Using the features in-built in the package, the user can view or modify the values of any input data, variables and parameters in the model, and modify the structure of the flowsheet of the problem during a simulation session. The program has been demonstrated and verified using a number of example problems.
Resumo:
This thesis addresses the kineto-elastodynamic analysis of a four-bar mechanism running at high-speed where all links are assumed to be flexible. First, the mechanism, at static configurations, is considered as structure. Two methods are used to model the system, namely the finite element method (FEM) and the dynamic stiffness method. The natural frequencies and mode shapes at different positions from both methods are calculated and compared. The FEM is used to model the mechanism running at high-speed. The governing equations of motion are derived using Hamilton's principle. The equations obtained are a set of stiff ordinary differential equations with periodic coefficients. A model is developed whereby the FEM and the dynamic stiffness method are used conjointly to provide high-precision results with only one element per link. The principal concern of the mechanism designer is the behaviour of the mechanism at steady-state. Few algorithms have been developed to deliver the steady-state solution without resorting to costly time marching simulation. In this study two algorithms are developed to overcome the limitations of the existing algorithms. The superiority of the new algorithms is demonstrated. The notion of critical speeds is clarified and a distinction is drawn between "critical speeds", where stresses are at a local maximum, and "unstable bands" where the mechanism deflections will grow boundlessly. Floquet theory is used to assess the stability of the system. A simple method to locate the critical speeds is derived. It is shown that the critical speeds of the mechanism coincide with the local maxima of the eigenvalues of the transition matrix with respect to the rotational speed of the mechanism.
Resumo:
Optimal design for parameter estimation in Gaussian process regression models with input-dependent noise is examined. The motivation stems from the area of computer experiments, where computationally demanding simulators are approximated using Gaussian process emulators to act as statistical surrogates. In the case of stochastic simulators, which produce a random output for a given set of model inputs, repeated evaluations are useful, supporting the use of replicate observations in the experimental design. The findings are also applicable to the wider context of experimental design for Gaussian process regression and kriging. Designs are proposed with the aim of minimising the variance of the Gaussian process parameter estimates. A heteroscedastic Gaussian process model is presented which allows for an experimental design technique based on an extension of Fisher information to heteroscedastic models. It is empirically shown that the error of the approximation of the parameter variance by the inverse of the Fisher information is reduced as the number of replicated points is increased. Through a series of simulation experiments on both synthetic data and a systems biology stochastic simulator, optimal designs with replicate observations are shown to outperform space-filling designs both with and without replicate observations. Guidance is provided on best practice for optimal experimental design for stochastic response models. © 2013 Elsevier Inc. All rights reserved.
Resumo:
2000 Mathematics Subject Classification: 60J80.