968 resultados para Statistical Estimation
Resumo:
The success rate of carrier phase ambiguity resolution (AR) is the probability that the ambiguities are successfully fixed to their correct integer values. In existing works, an exact success rate formula for integer bootstrapping estimator has been used as a sharp lower bound for the integer least squares (ILS) success rate. Rigorous computation of success rate for the more general ILS solutions has been considered difficult, because of complexity of the ILS ambiguity pull-in region and computational load of the integration of the multivariate probability density function. Contributions of this work are twofold. First, the pull-in region mathematically expressed as the vertices of a polyhedron is represented by a multi-dimensional grid, at which the cumulative probability can be integrated with the multivariate normal cumulative density function (mvncdf) available in Matlab. The bivariate case is studied where the pull-region is usually defined as a hexagon and the probability is easily obtained using mvncdf at all the grid points within the convex polygon. Second, the paper compares the computed integer rounding and integer bootstrapping success rates, lower and upper bounds of the ILS success rates to the actual ILS AR success rates obtained from a 24 h GPS data set for a 21 km baseline. The results demonstrate that the upper bound probability of the ILS AR probability given in the existing literatures agrees with the actual ILS success rate well, although the success rate computed with integer bootstrapping method is a quite sharp approximation to the actual ILS success rate. The results also show that variations or uncertainty of the unit–weight variance estimates from epoch to epoch will affect the computed success rates from different methods significantly, thus deserving more attentions in order to obtain useful success probability predictions.
Resumo:
Many traffic situations require drivers to cross or merge into a stream having higher priority. Gap acceptance theory enables us to model such processes to analyse traffic operation. This discussion demonstrated that numerical search fine tuned by statistical analysis can be used to determine the most likely critical gap for a sample of drivers, based on their largest rejected gap and accepted gap. This method shares some common features with the Maximum Likelihood Estimation technique (Troutbeck 1992) but lends itself well to contemporary analysis tools such as spreadsheet and is particularly analytically transparent. This method is considered not to bias estimation of critical gap due to very small rejected gaps or very large rejected gaps. However, it requires a sufficiently large sample that there is reasonable representation of largest rejected gap/accepted gap pairs within a fairly narrow highest likelihood search band.
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
We consider the development of statistical models for prediction of constituent concentration of riverine pollutants, which is a key step in load estimation from frequent flow rate data and less frequently collected concentration data. We consider how to capture the impacts of past flow patterns via the average discounted flow (ADF) which discounts the past flux based on the time lapsed - more recent fluxes are given more weight. However, the effectiveness of ADF depends critically on the choice of the discount factor which reflects the unknown environmental cumulating process of the concentration compounds. We propose to choose the discount factor by maximizing the adjusted R-2 values or the Nash-Sutcliffe model efficiency coefficient. The R2 values are also adjusted to take account of the number of parameters in the model fit. The resulting optimal discount factor can be interpreted as a measure of constituent exhaustion rate during flood events. To evaluate the performance of the proposed regression estimators, we examine two different sampling scenarios by resampling fortnightly and opportunistically from two real daily datasets, which come from two United States Geological Survey (USGS) gaging stations located in Des Plaines River and Illinois River basin. The generalized rating-curve approach produces biased estimates of the total sediment loads by -30% to 83%, whereas the new approaches produce relatively much lower biases, ranging from -24% to 35%. This substantial improvement in the estimates of the total load is due to the fact that predictability of concentration is greatly improved by the additional predictors.
Resumo:
The problem of estimating the time-dependent statistical characteristics of a random dynamical system is studied under two different settings. In the first, the system dynamics is governed by a differential equation parameterized by a random parameter, while in the second, this is governed by a differential equation with an underlying parameter sequence characterized by a continuous time Markov chain. We propose, for the first time in the literature, stochastic approximation algorithms for estimating various time-dependent process characteristics of the system. In particular, we provide efficient estimators for quantities such as the mean, variance and distribution of the process at any given time as well as the joint distribution and the autocorrelation coefficient at different times. A novel aspect of our approach is that we assume that information on the parameter model (i.e., its distribution in the first case and transition probabilities of the Markov chain in the second) is not available in either case. This is unlike most other work in the literature that assumes availability of such information. Also, most of the prior work in the literature is geared towards analyzing the steady-state system behavior of the random dynamical system while our focus is on analyzing the time-dependent statistical characteristics which are in general difficult to obtain. We prove the almost sure convergence of our stochastic approximation scheme in each case to the true value of the quantity being estimated. We provide a general class of strongly consistent estimators for the aforementioned statistical quantities with regular sample average estimators being a specific instance of these. We also present an application of the proposed scheme on a widely used model in population biology. Numerical experiments in this framework show that the time-dependent process characteristics as obtained using our algorithm in each case exhibit excellent agreement with exact results. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
The size of the shear transformation zone (STZ) that initiates the elastic to plastic transition in a Zr-based bulk metallic glass was estimated by conducting a statistical analysis of the first pop-in event during spherical nanoindentation. A series of experiments led us to a successful description of the distribution of shear strength for the transition and its dependence on the loading rate. From the activation volume determined by statistical analysis the STZ size was estimated based on a cooperative shearing model. (C) 2012 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
A statistical–dynamical regionalization approach is developed to assess possible changes in wind storm impacts. The method is applied to North Rhine-Westphalia (Western Germany) using the FOOT3DK mesoscale model for dynamical downscaling and ECHAM5/OM1 global circulation model climate projections. The method first classifies typical weather developments within the reanalysis period using K-means cluster algorithm. Most historical wind storms are associated with four weather developments (primary storm-clusters). Mesoscale simulations are performed for representative elements for all clusters to derive regional wind climatology. Additionally, 28 historical storms affecting Western Germany are simulated. Empirical functions are estimated to relate wind gust fields and insured losses. Transient ECHAM5/OM1 simulations show an enhanced frequency of primary storm-clusters and storms for 2060–2100 compared to 1960–2000. Accordingly, wind gusts increase over Western Germany, reaching locally +5% for 98th wind gust percentiles (A2-scenario). Consequently, storm losses are expected to increase substantially (+8% for A1B-scenario, +19% for A2-scenario). Regional patterns show larger changes over north-eastern parts of North Rhine-Westphalia than for western parts. For storms with return periods above 20 yr, loss expectations for Germany may increase by a factor of 2. These results document the method's functionality to assess future changes in loss potentials in regional terms.
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
Anaerobic threshold (AT) is usually estimated as a change point problem by visual analysis of the cardiorespiratory response to incremental dynamic exercise. In this study, two phase linear (TPL) models of the linear-linear and linear-quadratic type were used for the estimation of AT. The correlation coefficient between the classical and statistical approaches was 0.88, and 0.89 after outlier exclusion. The TPL models provide a simple method for estimating AT that can be easily implemented using a digital computer for the automatic pattern recognition of AT.