939 resultados para robust estimation statistics
Resumo:
We consider the problem of interaction neighborhood estimation from the partial observation of a finite number of realizations of a random field. We introduce a model selection rule to choose estimators of conditional probabilities among natural candidates. Our main result is an oracle inequality satisfied by the resulting estimator. We use then this selection rule in a two-step procedure to evaluate the interacting neighborhoods. The selection rule selects a small prior set of possible interacting points and a cutting step remove from this prior set the irrelevant points. We also prove that the Ising models satisfy the assumptions of the main theorems, without restrictions on the temperature, on the structure of the interacting graph or on the range of the interactions. It provides therefore a large class of applications for our results. We give a computationally efficient procedure in these models. We finally show the practical efficiency of our approach in a simulation study.
Resumo:
The crossflow filtration process differs of the conventional filtration by presenting the circulation flow tangentially to the filtration surface. The conventional mathematical models used to represent the process have some limitations in relation to the identification and generalization of the system behaviour. In this paper, a system based on artificial neural networks is developed to overcome the problems usually found in the conventional mathematical models. More specifically, the developed system uses an artificial neural network that simulates the behaviour of the crossflow filtration process in a robust way. Imprecisions and uncertainties associated with the measurements made on the system are automatically incorporated in the neural approach. Simulation results are presented to justify the validity of the proposed approach. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Asymmetric discrete triangular distributions are introduced in order to extend the symmetric ones serving for discrete associated kernels in the nonparametric estimation for discrete functions. The extension from one to two orders around the mode provides a large family of discrete distributions having a finite support. Establishing a bridge between Dirac and discrete uniform distributions, some different shapes are also obtained and their properties are investigated. In particular, the mean and variance are pointed out. Applications to discrete kernel estimators are given with a solution to a boundary bias problem. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The zero-inflated negative binomial model is used to account for overdispersion detected in data that are initially analyzed under the zero-Inflated Poisson model A frequentist analysis a jackknife estimator and a non-parametric bootstrap for parameter estimation of zero-inflated negative binomial regression models are considered In addition an EM-type algorithm is developed for performing maximum likelihood estimation Then the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and some ways to perform global influence analysis are derived In order to study departures from the error assumption as well as the presence of outliers residual analysis based on the standardized Pearson residuals is discussed The relevance of the approach is illustrated with a real data set where It is shown that zero-inflated negative binomial regression models seems to fit the data better than the Poisson counterpart (C) 2010 Elsevier B V All rights reserved
Resumo:
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A significant problem in the collection of responses to potentially sensitive questions, such as relating to illegal, immoral or embarrassing activities, is non-sampling error due to refusal to respond or false responses. Eichhorn & Hayre (1983) suggested the use of scrambled responses to reduce this form of bias. This paper considers a linear regression model in which the dependent variable is unobserved but for which the sum or product with a scrambling random variable of known distribution, is known. The performance of two likelihood-based estimators is investigated, namely of a Bayesian estimator achieved through a Markov chain Monte Carlo (MCMC) sampling scheme, and a classical maximum-likelihood estimator. These two estimators and an estimator suggested by Singh, Joarder & King (1996) are compared. Monte Carlo results show that the Bayesian estimator outperforms the classical estimators in almost all cases, and the relative performance of the Bayesian estimator improves as the responses become more scrambled.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
This article presents Monte Carlo techniques for estimating network reliability. For highly reliable networks, techniques based on graph evolution models provide very good performance. However, they are known to have significant simulation cost. An existing hybrid scheme (based on partitioning the time space) is available to speed up the simulations; however, there are difficulties with optimizing the important parameter associated with this scheme. To overcome these difficulties, a new hybrid scheme (based on partitioning the edge set) is proposed in this article. The proposed scheme shows orders of magnitude improvement of performance over the existing techniques in certain classes of network. It also provides reliability bounds with little overhead.
Resumo:
Objective: To compare percentage body fat (%BF) for a given body mass index (BMI) among New Zealand European, Maori and Pacific Island children. To develop prediction equations based on bioimpedance measurements for the estimation of fat-free mass (FFM) appropriate to children in these three ethnic groups. Design: Cross-sectional study. Purposive sampling of schoolchildren aimed at recruiting three children of each sex and ethnicity for each year of age. Double cross-validation of FFM prediction equations developed by multiple regression. Setting: Local schools in Auckland. Subjects: Healthy European, Maori and Pacific Island children (n = 172, 83 M, 89 F, mean age 9.4 +/- 2.8(s. d.), range 5 - 14 y). Measurements: Height, weight, age, sex and ethnicity were recorded. FFM was derived from measurements of total body water by deuterium dilution and resistance and reactance were measured by bioimpedance analysis. Results: For fixed BMI, the Maori and Pacific Island girls averaged 3.7% lower % BF than European girls. For boys a similar relation was not found since BMI did not significantly influence % BF of European boys ( P = 0.18). Based on bioimpedance measurements a single prediction equation was developed for all children: FFM (kg) = 0.622 height (cm)(2)/ resistance +0.234 weight (kg)+1.166, R-2 = 0.96, s. e. e. = 2.44 kg. Ethnicity, age and sex were not significant predictors. Conclusions: A robust equation for estimation of FFM in New Zealand European, Maori and Pacific Island children in the 5 - 14 y age range that is more suitable than BMI for the determination of body fatness in field studies has been developed. Sponsorship: Maurice and Phyllis Paykel Trust, Auckland University of Technology Contestable Grants Fund and the Ministry of Health.
Resumo:
Tese de Doutoramento, Matemática (Investigação Operacional), 23 de Setembro de 2006, Universidade dos Açores.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Probability and Statistics—Selected Problems is a unique book for senior undergraduate and graduate students to fast review basic materials in Probability and Statistics. Descriptive statistics are presented first, and probability is reviewed secondly. Discrete and continuous distributions are presented. Sample and estimation with hypothesis testing are presented in the last two chapters. The solutions for proposed excises are listed for readers to references.
Resumo:
Proceedings of the International Conference on Computer Vision Theory and Applications, 361-365, 2013, Barcelona, Spain
Resumo:
Research Project submited as partial fulfilment for the Master Degree in Statistics and Information Management