837 resultados para ROBUST ESTIMATES
Resumo:
Optimal robust M-estimates of a multidimensional parameter are described using Hampel's infinitesimal approach. The optimal estimates are derived by minimizing a measure of efficiency under the model, subject to a bounded measure of infinitesimal robustness. To this purpose we define measures of efficiency and infinitesimal sensitivity based on the Hellinger distance.We show that these two measures coincide with similar ones defined by Yohai using the Kullback-Leibler divergence, and therefore the corresponding optimal estimates coincide too.We also give an example where we fit a negative binomial distribution to a real dataset of "days of stay in hospital" using the optimal robust estimates.
Resumo:
We consider robust parametric procedures for univariate discrete distributions, focusing on the negative binomial model. The procedures are based on three steps: ?First, a very robust, but possibly inefficient, estimate of the model parameters is computed. ?Second, this initial model is used to identify outliers, which are then removed from the sample. ?Third, a corrected maximum likelihood estimator is computed with the remaining observations. The final estimate inherits the breakdown point (bdp) of the initial one and its efficiency can be significantly higher. Analogous procedures were proposed in [1], [2], [5] for the continuous case. A comparison of the asymptotic bias of various estimates under point contamination points out the minimum Neyman's chi-squared disparity estimate as a good choice for the initial step. Various minimum disparity estimators were explored by Lindsay [4], who showed that the minimum Neyman's chi-squared estimate has a 50% bdp under point contamination; in addition, it is asymptotically fully efficient at the model. However, the finite sample efficiency of this estimate under the uncontaminated negative binomial model is usually much lower than 100% and the bias can be strong. We show that its performance can then be greatly improved using the three step procedure outlined above. In addition, we compare the final estimate with the procedure described in
Resumo:
There is a lack of a common concept on how to estimate transmissibility of Chlamydia trachomatis from cross-sectional sexual partnership studies. Using a mathematical model that takes into account the dynamics of chlamydia transmission and sexual partnership formation, we report refined estimates of chlamydia transmissibility in heterosexual partnerships.
Resumo:
A sustainable water resources management depends on sound information about the impacts of climate change. This information is, however, not easily derived because natural runoff variability interferes with the climate change signal. This study presents a procedure that leads to robust estimates of magnitude and Time Of Emergence (TOE) of climate-induced hydrological change that also account for the natural variability contained in the time series. Firstly, natural variability of 189 mesoscale catchments in Switzerland is sampled for 10 ENSEMBLES scenarios for the control (1984–2005) and two scenario periods (near future: 2025–2046, far future: 2074–2095) applying a bootstrap procedure. Then, the sampling distributions of mean monthly runoff are tested for significant differences with the Wilcoxon-Mann–Whitney test and for effect size with Cliff’s delta d. Finally, the TOE of a climate change induced hydrological change is determined when at least eight out of the ten hydrological projections significantly differ from natural variability. The results show that the TOE occurs in the near future period except for high-elevated catchments in late summer. The significant hydrological projections in the near future correspond, however, to only minor runoff changes. In the far future, hydrological change is statistically significant and runoff changes are substantial. Temperature change is the most important factor determining hydrological change in this mountainous region. Therefore, hydrological change depends strongly on a catchment’s mean elevation. Considering that the hydrological changes are predicted to be robust in the near future highlights the importance of accounting for these changes in water resources planning.
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
Nonlinear regression problems can often be reduced to linearity by transforming the response variable (e.g., using the Box-Cox family of transformations). The classic estimates of the parameter defining the transformation as well as of the regression coefficients are based on the maximum likelihood criterion, assuming homoscedastic normal errors for the transformed response. These estimates are nonrobust in the presence of outliers and can be inconsistent when the errors are nonnormal or heteroscedastic. This article proposes new robust estimates that are consistent and asymptotically normal for any unimodal and homoscedastic error distribution. For this purpose, a robust version of conditional expectation is introduced for which the prediction mean squared error is replaced with an M scale. This concept is then used to develop a nonparametric criterion to estimate the transformation parameter as well as the regression coefficients. A finite sample estimate of this criterion based on a robust version of smearing is also proposed. Monte Carlo experiments show that the new estimates compare favorably with respect to the available competitors.
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data
Resumo:
Conservation strategies for long-lived vertebrates require accurate estimates of parameters relative to the populations' size, numbers of non-breeding individuals (the “cryptic” fraction of the population) and the age structure. Frequently, visual survey techniques are used to make these estimates but the accuracy of these approaches is questionable, mainly because of the existence of numerous potential biases. Here we compare data on population trends and age structure in a bearded vulture (Gypaetus barbatus) population from visual surveys performed at supplementary feeding stations with data derived from population matrix-modelling approximations. Our results suggest that visual surveys overestimate the number of immature (<2 years old) birds, whereas subadults (3–5 y.o.) and adults (>6 y.o.) were underestimated in comparison with the predictions of a population model using a stable-age distribution. In addition, we found that visual surveys did not provide conclusive information on true variations in the size of the focal population. Our results suggest that although long-term studies (i.e. population matrix modelling based on capture-recapture procedures) are a more time-consuming method, they provide more reliable and robust estimates of population parameters needed in designing and applying conservation strategies. The findings shown here are likely transferable to the management and conservation of other long-lived vertebrate populations that share similar life-history traits and ecological requirements.
Resumo:
Matrix population models, elasticity analysis and loop analysis can potentially provide powerful techniques for the analysis of life histories. Data from a capture-recapture study on a population of southern highland water skinks (Eulamprus tympanum) were used to construct a matrix population model. Errors in elasticities were calculated by using the parametric bootstrap technique. Elasticity and loop analyses were then conducted to identify the life history stages most important to fitness. The same techniques were used to investigate the relative importance of fast versus slow growth, and rapid versus delayed reproduction. Mature water skinks were long-lived, but there was high immature mortality. The most sensitive life history stage was the subadult stage. It is suggested that life history evolution in E. tympanum may be strongly affected by predation, particularly by birds. Because our population declined over the study, slow growth and delayed reproduction were the optimal life history strategies over this period. Although the techniques of evolutionary demography provide a powerful approach for the analysis of life histories, there are formidable logistical obstacles in gathering enough high-quality data for robust estimates of the critical parameters.
Resumo:
BACKGROUND: The early detection of medullary thyroid carcinoma (MTC) can improve patient prognosis, because histological stage and patient age at diagnosis are highly relevant prognostic factors. As a consequence, delay in the diagnosis and/or incomplete surgical treatment should correlate with a poorer prognosis for patients. Few papers have evaluated the specific capability of fine-needle aspiration cytology (FNAC) to detect MTC, and small series have been reported. This study conducts a meta-analysis of published data on the diagnostic performance of FNAC in MTC to provide more robust estimates. RESEARCH DESIGN AND METHODS: A comprehensive computer literature search of the PubMed/MEDLINE, Embase and Scopus databases was conducted by searching for the terms 'medullary thyroid' AND 'cytology', 'FNA', 'FNAB', 'FNAC', 'fine needle' or 'fine-needle'. The search was updated until 21 March 2014, and no language restrictions were used. RESULTS: Fifteen relevant studies and 641 MTC lesions that had undergone FNAC were included. The detection rate (DR) of FNAC in patients with MTC (diagnosed as 'MTC' or 'suspicious for MTC') on a per lesion-based analysis ranged from 12·5% to 88·2%, with a pooled estimate of 56·4% (95% CI: 52·6-60·1%). The included studies were statistically heterogeneous in their estimates of DR (I-square >50%). Egger's regression intercept for DR pooling was 0·03 (95% CI: -3·1 to 3·2, P = 0·9). The study that reported the largest MTC series had a DR of 45%. Data on immunohistochemistry for calcitonin in diagnosing MTC were inconsistent for the meta-analysis. CONCLUSIONS: The presented meta-analysis demonstrates that FNAC is able to detect approximately one-half of MTC lesions. These findings suggest that other techniques may be needed in combination with FNAC to diagnose MTC and avoid false negative results.
Resumo:
This paper investigates a simple procedure to estimate robustly the mean of an asymmetric distribution. The procedure removes the observations which are larger or smaller than certain limits and takes the arithmetic mean of the remaining observations, the limits being determined with the help of a parametric model, e.g., the Gamma, the Weibull or the Lognormal distribution. The breakdown point, the influence function, the (asymptotic) variance, and the contamination bias of this estimator are explored and compared numerically with those of competing estimates.
Resumo:
AbstractFor a wide range of environmental, hydrological, and engineering applications there is a fast growing need for high-resolution imaging. In this context, waveform tomographic imaging of crosshole georadar data is a powerful method able to provide images of pertinent electrical properties in near-surface environments with unprecedented spatial resolution. In contrast, conventional ray-based tomographic methods, which consider only a very limited part of the recorded signal (first-arrival traveltimes and maximum first-cycle amplitudes), suffer from inherent limitations in resolution and may prove to be inadequate in complex environments. For a typical crosshole georadar survey the potential improvement in resolution when using waveform-based approaches instead of ray-based approaches is in the range of one order-of- magnitude. Moreover, the spatial resolution of waveform-based inversions is comparable to that of common logging methods. While in exploration seismology waveform tomographic imaging has become well established over the past two decades, it is comparably still underdeveloped in the georadar domain despite corresponding needs. Recently, different groups have presented finite-difference time-domain waveform inversion schemes for crosshole georadar data, which are adaptations and extensions of Tarantola's seminal nonlinear generalized least-squares approach developed for the seismic case. First applications of these new crosshole georadar waveform inversion schemes on synthetic and field data have shown promising results. However, there is little known about the limits and performance of such schemes in complex environments. To this end, the general motivation of my thesis is the evaluation of the robustness and limitations of waveform inversion algorithms for crosshole georadar data in order to apply such schemes to a wide range of real world problems.One crucial issue to making applicable and effective any waveform scheme to real-world crosshole georadar problems is the accurate estimation of the source wavelet, which is unknown in reality. Waveform inversion schemes for crosshole georadar data require forward simulations of the wavefield in order to iteratively solve the inverse problem. Therefore, accurate knowledge of the source wavelet is critically important for successful application of such schemes. Relatively small differences in the estimated source wavelet shape can lead to large differences in the resulting tomograms. In the first part of my thesis, I explore the viability and robustness of a relatively simple iterative deconvolution technique that incorporates the estimation of the source wavelet into the waveform inversion procedure rather than adding additional model parameters into the inversion problem. Extensive tests indicate that this source wavelet estimation technique is simple yet effective, and is able to provide remarkably accurate and robust estimates of the source wavelet in the presence of strong heterogeneity in both the dielectric permittivity and electrical conductivity as well as significant ambient noise in the recorded data. Furthermore, our tests also indicate that the approach is insensitive to the phase characteristics of the starting wavelet, which is not the case when directly incorporating the wavelet estimation into the inverse problem.Another critical issue with crosshole georadar waveform inversion schemes which clearly needs to be investigated is the consequence of the common assumption of frequency- independent electromagnetic constitutive parameters. This is crucial since in reality, these parameters are known to be frequency-dependent and complex and thus recorded georadar data may show significant dispersive behaviour. In particular, in the presence of water, there is a wide body of evidence showing that the dielectric permittivity can be significantly frequency dependent over the GPR frequency range, due to a variety of relaxation processes. The second part of my thesis is therefore dedicated to the evaluation of the reconstruction limits of a non-dispersive crosshole georadar waveform inversion scheme in the presence of varying degrees of dielectric dispersion. I show that the inversion algorithm, combined with the iterative deconvolution-based source wavelet estimation procedure that is partially able to account for the frequency-dependent effects through an "effective" wavelet, performs remarkably well in weakly to moderately dispersive environments and has the ability to provide adequate tomographic reconstructions.
Resumo:
In this article we present the first empirical analysis on the associations between body size, activity, employment and wages for several European countries. The main advantage of the present work with respect to the previous literature is offered by the comparability of the data and its large geographical coverage. According to our results, for Spanish women, being obese is associated with both a 9% lower wage and probability of being employed, while for Swedish and Danish, obesity is associated with a 12% lower probability of being employed, and a 10% lower wage respectively. In Belgium, obesity is associated with a 19% lower probability of being employed for men. These robust estimates are strongly informative and may be used as a simple statistical rule of thumb to decide the countries in which lab and field experiments should be run.
Resumo:
A major issue in the application of waveform inversion methods to crosshole georadar data is the accurate estimation of the source wavelet. Here, we explore the viability and robustness of incorporating this step into a time-domain waveform inversion procedure through an iterative deconvolution approach. Our results indicate that, at least in non-dispersive electrical environments, such an approach provides remarkably accurate and robust estimates of the source wavelet even in the presence of strong heterogeneity in both the dielectric permittivity and electrical conductivity. Our results also indicate that the proposed source wavelet estimation approach is relatively insensitive to ambient noise and to the phase characteristics of the starting wavelet. Finally, there appears to be little-to-no trade-off between the wavelet estimation and the tomographic imaging procedures.
Resumo:
A major issue in the application of waveform inversion methods to crosshole ground-penetrating radar (GPR) data is the accurate estimation of the source wavelet. Here, we explore the viability and robustness of incorporating this step into a recently published time-domain inversion procedure through an iterative deconvolution approach. Our results indicate that, at least in non-dispersive electrical environments, such an approach provides remarkably accurate and robust estimates of the source wavelet even in the presence of strong heterogeneity of both the dielectric permittivity and electrical conductivity. Our results also indicate that the proposed source wavelet estimation approach is relatively insensitive to ambient noise and to the phase characteristics of the starting wavelet. Finally, there appears to be little to no trade-off between the wavelet estimation and the tomographic imaging procedures.