903 resultados para Geo-statistical model
Resumo:
Coprime and nested sampling are well known deterministic sampling techniques that operate at rates significantly lower than the Nyquist rate, and yet allow perfect reconstruction of the spectra of wide sense stationary signals. However, theoretical guarantees for these samplers assume ideal conditions such as synchronous sampling, and ability to perfectly compute statistical expectations. This thesis studies the performance of coprime and nested samplers in spatial and temporal domains, when these assumptions are violated. In spatial domain, the robustness of these samplers is studied by considering arrays with perturbed sensor locations (with unknown perturbations). Simplified expressions for the Fisher Information matrix for perturbed coprime and nested arrays are derived, which explicitly highlight the role of co-array. It is shown that even in presence of perturbations, it is possible to resolve $O(M^2)$ under appropriate conditions on the size of the grid. The assumption of small perturbations leads to a novel ``bi-affine" model in terms of source powers and perturbations. The redundancies in the co-array are then exploited to eliminate the nuisance perturbation variable, and reduce the bi-affine problem to a linear underdetermined (sparse) problem in source powers. This thesis also studies the robustness of coprime sampling to finite number of samples and sampling jitter, by analyzing their effects on the quality of the estimated autocorrelation sequence. A variety of bounds on the error introduced by such non ideal sampling schemes are computed by considering a statistical model for the perturbation. They indicate that coprime sampling leads to stable estimation of the autocorrelation sequence, in presence of small perturbations. Under appropriate assumptions on the distribution of WSS signals, sharp bounds on the estimation error are established which indicate that the error decays exponentially with the number of samples. The theoretical claims are supported by extensive numerical experiments.
Resumo:
Aiming to obtain empirical models for the estimation of Syrah leaf area a set of 210 fruiting shoots was randomly collected during the 2013 growing season in an adult experimental vineyard, located in Lisbon, Portugal. Samples of 30 fruiting shoots were taken periodically from the stage of inflorescences visible to veraison (7 sampling dates). At the lab, from each shoot, primary and lateral leaves were separated and numbered according to node insertion. For each leaf, the length of the central and lateral veins was recorded and then the leaf area was measured by a leaf area meter. For single leaf area estimation the best statistical models uses as explanatory variable the sum of the lengths of the two lateral leaf veins. For the estimation of leaf area per shoot it was followed the approach of Lopes & Pinto (2005), based on 3 explanatory variables: number of primary leaves and area of the largest and smallest leaves. The best statistical model for estimation of primary leaf area per shoot uses a calculated variable obtained from the average of the largest and smallest primary leaf area multiplied by the number of primary leaves. For lateral leaf area estimation another model using the same type of calculated variable is also presented. All models explain a very high proportion of variability in leaf area. Our results confirm the already reported strong importance of the three measured variables (number of leaves and area of the largest and smallest leaf) as predictors of the shoot leaf area. The proposed models can be used to accurately predict Syrah primary and secondary leaf area per shoot in any phase of the growing cycle. They are inexpensive, practical, non-destructive methods which do not require specialized staff or expensive equipment.
Resumo:
Deep bed filtration occurs in several industrial and environmental processes like water filtration and soil contamination. In petroleum industry, deep bed filtration occurs near to injection wells during water injection, causing injectivity reduction. It also takes place during well drilling, sand production control, produced water disposal in aquifers, etc. The particle capture in porous media can be caused by different physical mechanisms (size exclusion, electrical forces, bridging, gravity, etc). A statistical model for filtration in porous media is proposed and analytical solutions for suspended and retained particles are derived. The model, which incorporates particle retention probability, is compared with the classical deep bed filtration model allowing a physical interpretation of the filtration coefficients. Comparison of the obtained analytical solutions for the proposed model with the classical model solutions allows concluding that the larger the particle capture probability, the larger the discrepancy between the proposed and the classical models
Resumo:
Several deterministic and probabilistic methods are used to evaluate the probability of seismically induced liquefaction of a soil. The probabilistic models usually possess some uncertainty in that model and uncertainties in the parameters used to develop that model. These model uncertainties vary from one statistical model to another. Most of the model uncertainties are epistemic, and can be addressed through appropriate knowledge of the statistical model. One such epistemic model uncertainty in evaluating liquefaction potential using a probabilistic model such as logistic regression is sampling bias. Sampling bias is the difference between the class distribution in the sample used for developing the statistical model and the true population distribution of liquefaction and non-liquefaction instances. Recent studies have shown that sampling bias can significantly affect the predicted probability using a statistical model. To address this epistemic uncertainty, a new approach was developed for evaluating the probability of seismically-induced soil liquefaction, in which a logistic regression model in combination with Hosmer-Lemeshow statistic was used. This approach was used to estimate the population (true) distribution of liquefaction to non-liquefaction instances of standard penetration test (SPT) and cone penetration test (CPT) based most updated case histories. Apart from this, other model uncertainties such as distribution of explanatory variables and significance of explanatory variables were also addressed using KS test and Wald statistic respectively. Moreover, based on estimated population distribution, logistic regression equations were proposed to calculate the probability of liquefaction for both SPT and CPT based case history. Additionally, the proposed probability curves were compared with existing probability curves based on SPT and CPT case histories.
Resumo:
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.
Resumo:
The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.
Resumo:
The Mobile Emissions Assessment System for Urban and Regional Evaluation (MEASURE) model provides an external validation capability for hot stabilized option; the model is one of several new modal emissions models designed to predict hot stabilized emission rates for various motor vehicle groups as a function of the conditions under which the vehicles are operating. The validation of aggregate measurements, such as speed and acceleration profile, is performed on an independent data set using three statistical criteria. The MEASURE algorithms have proved to provide significant improvements in both average emission estimates and explanatory power over some earlier models for pollutants across almost every operating cycle tested.
Resumo:
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier than within a large one. Requiring this “strong margin adaptivity” makes the model selection problem more challenging. We first prove, in a general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.
Resumo:
Purpose. To create a binocular statistical eye model based on previously measured ocular biometric data. Methods. Thirty-nine parameters were determined for a group of 127 healthy subjects (37 male, 90 female; 96.8% Caucasian) with an average age of 39.9 ± 12.2 years and spherical equivalent refraction of −0.98 ± 1.77 D. These parameters described the biometry of both eyes and the subjects' age. Missing parameters were complemented by data from a previously published study. After confirmation of the Gaussian shape of their distributions, these parameters were used to calculate their mean and covariance matrices. These matrices were then used to calculate a multivariate Gaussian distribution. From this, an amount of random biometric data could be generated, which were then randomly selected to create a realistic population of random eyes. Results. All parameters had Gaussian distributions, with the exception of the parameters that describe total refraction (i.e., three parameters per eye). After these non-Gaussian parameters were omitted from the model, the generated data were found to be statistically indistinguishable from the original data for the remaining 33 parameters (TOST [two one-sided t tests]; P < 0.01). Parameters derived from the generated data were also significantly indistinguishable from those calculated with the original data (P > 0.05). The only exception to this was the lens refractive index, for which the generated data had a significantly larger SD. Conclusions. A statistical eye model can describe the biometric variations found in a population and is a useful addition to the classic eye models.
Resumo:
The need for a house rental model in Townsville, Australia is addressed. Models developed for predicting house rental levels are described. An analytical model is built upon a priori selected variables and parameters of rental levels. Regression models are generated to provide a comparison to the analytical model. Issues in model development and performance evaluation are discussed. A comparison of the models indicates that the analytical model performs better than the regression models.
Resumo:
Vacuum circuit breaker (VCB) overvoltage failure and its catastrophic failures during shunt reactor switching have been analyzed through computer simulations for multiple reignitions with a statistical VCB model found in the literature. However, a systematic review (SR) that is related to the multiple reignitions with a statistical VCB model does not yet exist. Therefore, this paper aims to analyze and explore the multiple reignitions with a statistical VCB model. It examines the salient points, research gaps and limitations of the multiple reignition phenomenon to assist with future investigations following the SR search. Based on the SR results, seven issues and two approaches to enhance the current statistical VCB model are identified. These results will be useful as an input to improve the computer modeling accuracy as well as the development of a reignition switch model with point-on-wave controlled switching for condition monitoring
Resumo:
Accelerated aging experiments have been conducted on a representative oil-pressboard insulation model to investigate the effect of constant and sequential stresses on the PD behavior using a built-in phase resolved partial discharge analyzer. A cycle of the applied voltage starting from the zero of the positive half cycle was divided into 16 equal phase windows (Φ1 to Φ16) and partial discharge (PD) magnitude distribution in each phase was determined. Based on the experimental results, three stages of aging mechanism were identified. Gumbel's extreme value distribution of the largest element was used to model the first stage of aging process. Second and subsequent stages were modeled using two-parameter Weibull distribution. Spearman's non-parametric rank correlation test statistic and Kolmogrov-Smirnov two sample test were used to relate the aging process of each phase with the corresponding process of the full cycle. To bring out clearly the effect of stress level, its duration and test procedure on the distribution parameters and hence of the aging process, non-parametric ANOVA techniques like Kruskal-Wallis and Fisher's LSD multiple comparison tests were used. Results of the analysis show that two phases (Φ13 and Φ14) near the vicinity of the negative voltage peak were found to contribute significantly to the aging process and their aging mechanism also correlated well with that of the corresponding full cycle mechanism. Attempts have been made to relate these results with the published work of other workers