953 resultados para Statistical inference
Resumo:
Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Resumo:
Modern sample surveys started to spread after statistician at the U.S. Bureau of the Census in the 1940s had developed a sampling design for the Current Population Survey (CPS). A significant factor was also that digital computers became available for statisticians. In the beginning of 1950s, the theory was documented in textbooks on survey sampling. This thesis is about the development of the statistical inference for sample surveys. For the first time the idea of statistical inference was enunciated by a French scientist, P. S. Laplace. In 1781, he published a plan for a partial investigation in which he determined the sample size needed to reach the desired accuracy in estimation. The plan was based on Laplace s Principle of Inverse Probability and on his derivation of the Central Limit Theorem. They were published in a memoir in 1774 which is one of the origins of statistical inference. Laplace s inference model was based on Bernoulli trials and binominal probabilities. He assumed that populations were changing constantly. It was depicted by assuming a priori distributions for parameters. Laplace s inference model dominated statistical thinking for a century. Sample selection in Laplace s investigations was purposive. In 1894 in the International Statistical Institute meeting, Norwegian Anders Kiaer presented the idea of the Representative Method to draw samples. Its idea was that the sample would be a miniature of the population. It is still prevailing. The virtues of random sampling were known but practical problems of sample selection and data collection hindered its use. Arhtur Bowley realized the potentials of Kiaer s method and in the beginning of the 20th century carried out several surveys in the UK. He also developed the theory of statistical inference for finite populations. It was based on Laplace s inference model. R. A. Fisher contributions in the 1920 s constitute a watershed in the statistical science He revolutionized the theory of statistics. In addition, he introduced a new statistical inference model which is still the prevailing paradigm. The essential idea is to draw repeatedly samples from the same population and the assumption that population parameters are constants. Fisher s theory did not include a priori probabilities. Jerzy Neyman adopted Fisher s inference model and applied it to finite populations with the difference that Neyman s inference model does not include any assumptions of the distributions of the study variables. Applying Fisher s fiducial argument he developed the theory for confidence intervals. Neyman s last contribution to survey sampling presented a theory for double sampling. This gave the central idea for statisticians at the U.S. Census Bureau to develop the complex survey design for the CPS. Important criterion was to have a method in which the costs of data collection were acceptable, and which provided approximately equal interviewer workloads, besides sufficient accuracy in estimation.
Resumo:
Paired-tow calibration studies provide information on changes in survey catchability that may occur because of some necessary change in protocols (e.g., change in vessel or vessel gear) in a fish stock survey. This information is important to ensure the continuity of annual time-series of survey indices of stock size that provide the basis for fish stock assessments. There are several statistical models used to analyze the paired-catch data from calibration studies. Our main contributions are results from simulation experiments designed to measure the accuracy of statistical inferences derived from some of these models. Our results show that a model commonly used to analyze calibration data can provide unreliable statistical results when there is between-tow spatial variation in the stock densities at each paired-tow site. However, a generalized linear mixed-effects model gave very reliable results over a wide range of spatial variations in densities and we recommend it for the analysis of paired-tow survey calibration data. This conclusion also applies if there is between-tow variation in catchability.
Resumo:
Amplitude demodulation is an ill-posed problem and so it is natural to treat it from a Bayesian viewpoint, inferring the most likely carrier and envelope under probabilistic constraints. One such treatment is Probabilistic Amplitude Demodulation (PAD), which, whilst computationally more intensive than traditional approaches, offers several advantages. Here we provide methods for estimating the uncertainty in the PAD-derived envelopes and carriers, and for learning free-parameters like the time-scale of the envelope. We show how the probabilistic approach can naturally handle noisy and missing data. Finally, we indicate how to extend the model to signals which contain multiple modulators and carriers.
Resumo:
We study the problem of measuring the uncertainty of CGE (or RBC)-type model simulations associated with parameter uncertainty. We describe two approaches for building confidence sets on model endogenous variables. The first one uses a standard Wald-type statistic. The second approach assumes that a confidence set (sampling or Bayesian) is available for the free parameters, from which confidence sets are derived by a projection technique. The latter has two advantages: first, confidence set validity is not affected by model nonlinearities; second, we can easily build simultaneous confidence intervals for an unlimited number of variables. We study conditions under which these confidence sets take the form of intervals and show they can be implemented using standard methods for solving CGE models. We present an application to a CGE model of the Moroccan economy to study the effects of policy-induced increases of transfers from Moroccan expatriates.
Resumo:
It is well known that standard asymptotic theory is not valid or is extremely unreliable in models with identification problems or weak instruments [Dufour (1997, Econometrica), Staiger and Stock (1997, Econometrica), Wang and Zivot (1998, Econometrica), Stock and Wright (2000, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. One possible way out consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows one to build exact tests and confidence sets only for the full vector of the coefficients of the endogenous explanatory variables in a structural equation, which in general does not allow for individual coefficients. This problem may in principle be overcome by using projection techniques [Dufour (1997, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. AR-types are emphasized because they are robust to both weak instruments and instrument exclusion. However, these techniques can be implemented only by using costly numerical techniques. In this paper, we provide a complete analytic solution to the problem of building projection-based confidence sets from Anderson-Rubin-type confidence sets. The latter involves the geometric properties of “quadrics” and can be viewed as an extension of usual confidence intervals and ellipsoids. Only least squares techniques are required for building the confidence intervals. We also study by simulation how “conservative” projection-based confidence sets are. Finally, we illustrate the methods proposed by applying them to three different examples: the relationship between trade and growth in a cross-section of countries, returns to education, and a study of production functions in the U.S. economy.
Resumo:
We discuss statistical inference problems associated with identification and testability in econometrics, and we emphasize the common nature of the two issues. After reviewing the relevant statistical notions, we consider in turn inference in nonparametric models and recent developments on weakly identified models (or weak instruments). We point out that many hypotheses, for which test procedures are commonly proposed, are not testable at all, while some frequently used econometric methods are fundamentally inappropriate for the models considered. Such situations lead to ill-defined statistical problems and are often associated with a misguided use of asymptotic distributional results. Concerning nonparametric hypotheses, we discuss three basic problems for which such difficulties occur: (1) testing a mean (or a moment) under (too) weak distributional assumptions; (2) inference under heteroskedasticity of unknown form; (3) inference in dynamic models with an unlimited number of parameters. Concerning weakly identified models, we stress that valid inference should be based on proper pivotal functions —a condition not satisfied by standard Wald-type methods based on standard errors — and we discuss recent developments in this field, mainly from the viewpoint of building valid tests and confidence sets. The techniques discussed include alternative proposed statistics, bounds, projection, split-sampling, conditioning, Monte Carlo tests. The possibility of deriving a finite-sample distributional theory, robustness to the presence of weak instruments, and robustness to the specification of a model for endogenous explanatory variables are stressed as important criteria assessing alternative procedures.
Resumo:
Department of Statistics, Cochin University of Science and Technology
Resumo:
This paper uses Colombian household survey data collected over the period 1984-2005 to estimate Gini coe¢ cients along with their corresponding standard errors. We Önd a statistically signiÖcant increase in wage income inequality following the adoption of the liberalisation measures of the early 1990s, and mixed evidence during the recovery years that followed the economic recession of the late 1990s. We also Önd that in several cases the observed di§erences in the Gini coe¢ cients across cities have not been statistically signiÖcant.
Resumo:
Bayesian inference has been used to determine rigorous estimates of hydroxyl radical concentrations () and air mass dilution rates (K) averaged following air masses between linked observations of nonmethane hydrocarbons (NMHCs) spanning the North Atlantic during the Intercontinental Transport and Chemical Transformation (ITCT)-Lagrangian-2K4 experiment. The Bayesian technique obtains a refined (posterior) distribution of a parameter given data related to the parameter through a model and prior beliefs about the parameter distribution. Here, the model describes hydrocarbon loss through OH reaction and mixing with a background concentration at rate K. The Lagrangian experiment provides direct observations of hydrocarbons at two time points, removing assumptions regarding composition or sources upstream of a single observation. The estimates are sharpened by using many hydrocarbons with different reactivities and accounting for their variability and measurement uncertainty. A novel technique is used to construct prior background distributions of many species, described by variation of a single parameter . This exploits the high correlation of species, related by the first principal component of many NMHC samples. The Bayesian method obtains posterior estimates of , K and following each air mass. Median values are typically between 0.5 and 2.0 × 106 molecules cm−3, but are elevated to between 2.5 and 3.5 × 106 molecules cm−3, in low-level pollution. A comparison of estimates from absolute NMHC concentrations and NMHC ratios assuming zero background (the “photochemical clock” method) shows similar distributions but reveals systematic high bias in the estimates from ratios. Estimates of K are ∼0.1 day−1 but show more sensitivity to the prior distribution assumed.