885 resultados para finite-sample test


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The advances that have been characterizing spatial econometrics in recent years are mostly theoretical and have not found an extensive empirical application yet. In this work we aim at supplying a review of the main tools of spatial econometrics and to show an empirical application for one of the most recently introduced estimators. Despite the numerous alternatives that the econometric theory provides for the treatment of spatial (and spatiotemporal) data, empirical analyses are still limited by the lack of availability of the correspondent routines in statistical and econometric software. Spatiotemporal modeling represents one of the most recent developments in spatial econometric theory and the finite sample properties of the estimators that have been proposed are currently being tested in the literature. We provide a comparison between some estimators (a quasi-maximum likelihood, QML, estimator and some GMM-type estimators) for a fixed effects dynamic panel data model under certain conditions, by means of a Monte Carlo simulation analysis. We focus on different settings, which are characterized either by fully stable or quasi-unit root series. We also investigate the extent of the bias that is caused by a non-spatial estimation of a model when the data are characterized by different degrees of spatial dependence. Finally, we provide an empirical application of a QML estimator for a time-space dynamic model which includes a temporal, a spatial and a spatiotemporal lag of the dependent variable. This is done by choosing a relevant and prolific field of analysis, in which spatial econometrics has only found limited space so far, in order to explore the value-added of considering the spatial dimension of the data. In particular, we study the determinants of cropland value in Midwestern U.S.A. in the years 1971-2009, by taking the present value model (PVM) as the theoretical framework of analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, for the difference of the precision measures between two competing rules can then be constructed. All the proposals are illustrated with two real examples and their finite sample properties are evaluated via a simulation study.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a complicated target distribution via simple ergodic averages. A fundamental question in MCMC applications is when should the sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a method that stops the MCMC sampling the first time the width of a confidence interval based on the ergodic averages is less than a user-specified value. Hence calculating Monte Carlo standard errors is a critical step in assessing the output of the simulation. In particular, we consider the regenerative simulation and batch means methods of estimating the variance of the asymptotic normal distribution. We describe sufficient conditions for the strong consistency and asymptotic normality of both methods and investigate their finite sample properties in a variety of examples.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals are drawn from a population. On these individuals, information is obtained on treatment, outcome, and a few low-dimensional confounders. These individuals are then stratified according to these factors. In the second phase, a random sub-sample of individuals are drawn from each stratum, with known, stratum-specific selection probabilities. On these individuals, a rich set of confounding factors are collected. In this setting, we introduce four estimators: (1) simple inverse weighted, (2) locally efficient, (3) doubly robust and (4)enriched inverse weighted. We evaluate the finite-sample performance of these estimators in a simulation study. We also use our methodology to estimate the causal effect of trauma care on in-hospital mortality using data from the National Study of Cost and Outcomes of Trauma.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

PURPOSE: To prospectively assess the depiction rate and morphologic features of myocardial bridging (MB) of coronary arteries with 64-section computed tomographic (CT) coronary angiography in comparison to conventional coronary angiography. MATERIALS AND METHODS: Patients were simultaneously enrolled in a prospective study comparing CT and conventional coronary angiography, for which ethics committee approval and informed consent were obtained. One hundred patients (38 women, 62 men; mean age, 63.8 years +/- 11.6 [standard deviation]) underwent 64-section CT and conventional coronary angiography. Fifty additional patients (19 women, 31 men; mean age, 59.2 years +/- 13.2) who underwent CT only were also included. CT images were analyzed for the direct signs length, depth, and degree of systolic compression, while conventional angiograms were analyzed for the indirect signs step down-step up phenomenon, milking effect, and systolic compression of the tunneled segment. Statistical analysis was performed with Pearson correlation analysis, the Wilcoxon two-sample test, and Fisher exact tests. RESULTS: MB was detected with CT in 26 (26%) of 100 patients and with conventional angiography in 12 patients (12%). Mean tunneled segment length and depth at CT (n = 150) were 24.3 mm +/- 10.0 and 2.6 mm +/- 0.8, respectively. Systolic compression in the 12 patients was 31.3% +/- 11.0 at CT and 28.2% +/- 10.5 at conventional angiography (r = 0.72, P < .001). With CT, a significant correlation was not found between systolic compression and length (r = 0.16, P = .25, n = 150) but was found with depth (r = 0.65, P < .01, n = 150) of the tunneled segment. In 14 patients in whom MB was found at CT but not at conventional angiography, length, depth, and systolic compression were significantly lower than in patients in whom both modalities depicted the anomaly (P < .001, P < .01, and P < .001, respectively). CONCLUSION: The depiction rate of MB is greater with 64-section CT coronary angiography than with conventional coronary angiography. The degree of systolic compression of MB significantly correlates with tunneled segment depth but not length.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Several tests for the comparison of different groups in the randomized complete block design exist. However, there is a lack of robust estimators for the location difference between one group and all the others on the original scale. The relative marginal effects are commonly used in this situation, but they are more difficult to interpret and use by less experienced people because of the different scale. In this paper two nonparametric estimators for the comparison of one group against the others in the randomized complete block design will be presented. Theoretical results such as asymptotic normality, consistency, translation invariance, scale preservation, unbiasedness, and median unbiasedness are derived. The finite sample behavior of these estimators is derived by simulations of different scenarios. In addition, possible confidence intervals with these estimators are discussed and their behavior derived also by simulations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Prevalent sampling is an efficient and focused approach to the study of the natural history of disease. Right-censored time-to-event data observed from prospective prevalent cohort studies are often subject to left-truncated sampling. Left-truncated samples are not randomly selected from the population of interest and have a selection bias. Extensive studies have focused on estimating the unbiased distribution given left-truncated samples. However, in many applications, the exact date of disease onset was not observed. For example, in an HIV infection study, the exact HIV infection time is not observable. However, it is known that the HIV infection date occurred between two observable dates. Meeting these challenges motivated our study. We propose parametric models to estimate the unbiased distribution of left-truncated, right-censored time-to-event data with uncertain onset times. We first consider data from a length-biased sampling, a specific case in left-truncated samplings. Then we extend the proposed method to general left-truncated sampling. With a parametric model, we construct the full likelihood, given a biased sample with unobservable onset of disease. The parameters are estimated through the maximization of the constructed likelihood by adjusting the selection bias and unobservable exact onset. Simulations are conducted to evaluate the finite sample performance of the proposed methods. We apply the proposed method to an HIV infection study, estimating the unbiased survival function and covariance coefficients. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years fractionally differenced processes have received a great deal of attention due to its flexibility in financial applications with long memory. This paper considers a class of models generated by Gegenbauer polynomials, incorporating the long memory in stochastic volatility (SV) components in order to develop the General Long Memory SV (GLMSV) model. We examine the statistical properties of the new model, suggest using the spectral likelihood estimation for long memory processes, and investigate the finite sample properties via Monte Carlo experiments. We apply the model to three exchange rate return series. Overall, the results of the out-of-sample forecasts show the adequacy of the new GLMSV model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objetivos: Avaliar numa coorte de idosos se há diferenças entre as situações do curso de vida que contribuem para uma maior chance de sobrevida em homens e mulheres com 60 anos e mais. Métodos: 2.143 idosos entrevistados em 2000 foram acompanhados durante quase 15 anos. Utilizou-se na análise estatística o teste de associação para amostras complexas (Rao-Scott), o estimador produto limite de Kaplan-Meier, o teste de log-rank, análise univariada e multivariada do modelo de riscos proporcionais de Cox, sendo construídos, através deste último, dois modelos finais por gênero, ao nível de significância de 5 por cento . Resultados e discussão: Dos 836.204 idosos paulistanos representados pelo estudo 51,6 por cento continuavam vivos (43,0 por cento dos homens e 57,7 por cento das mulheres). Em ambos os gêneros, os/as que tinham boas condições materiais e que realizavam sem dificuldade e sem ajuda mais da metade das atividades básicas e instrumentais de vida diária (ABVD/AIVD) apresentaram maior chance de sobrevivência. O mesmo foi verificado naquelas mulheres com menos de duas limitações de determinadas doenças, nas casadas, nas que ofereciam algum tipo de ajuda e nas que realizaram ações preventivas em relação a sua condição de saúde. Entre os homens o mesmo ocorreu naqueles que se referiram ser chefe de família, nos que tinham participação comunitária, nos que residiam com algum outro morador que foi ou ia à escola, que trabalharam predominantemente como proprietário ou por conta própria e nos que relataram ter tido alguma doença em seus primeiros quinzes anos de vida. Conclusão: As chances de sobrevida são diferentes entre gêneros, e entre si, mesmo em situações aparentemente semelhantes do curso de vida vivenciados e presentes em homens e mulheres idosas.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this article we investigate the asymptotic and finite-sample properties of predictors of regression models with autocorrelated errors. We prove new theorems associated with the predictive efficiency of generalized least squares (GLS) and incorrectly structured GLS predictors. We also establish the form associated with their predictive mean squared errors as well as the magnitude of these errors relative to each other and to those generated from the ordinary least squares (OLS) predictor. A large simulation study is used to evaluate the finite-sample performance of forecasts generated from models using different corrections for the serial correlation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we develop set of novel Markov chain Monte Carlo algorithms for Bayesian smoothing of partially observed non-linear diffusion processes. The sampling algorithms developed herein use a deterministic approximation to the posterior distribution over paths as the proposal distribution for a mixture of an independence and a random walk sampler. The approximating distribution is sampled by simulating an optimized time-dependent linear diffusion process derived from the recently developed variational Gaussian process approximation method. Flexible blocking strategies are introduced to further improve mixing, and thus the efficiency, of the sampling algorithms. The algorithms are tested on two diffusion processes: one with double-well potential drift and another with SINE drift. The new algorithm's accuracy and efficiency is compared with state-of-the-art hybrid Monte Carlo based path sampling. It is shown that in practical, finite sample, applications the algorithm is accurate except in the presence of large observation errors and low observation densities, which lead to a multi-modal structure in the posterior distribution over paths. More importantly, the variational approximation assisted sampling algorithm outperforms hybrid Monte Carlo in terms of computational efficiency, except when the diffusion process is densely observed with small errors in which case both algorithms are equally efficient.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we develop set of novel Markov Chain Monte Carlo algorithms for Bayesian smoothing of partially observed non-linear diffusion processes. The sampling algorithms developed herein use a deterministic approximation to the posterior distribution over paths as the proposal distribution for a mixture of an independence and a random walk sampler. The approximating distribution is sampled by simulating an optimized time-dependent linear diffusion process derived from the recently developed variational Gaussian process approximation method. The novel diffusion bridge proposal derived from the variational approximation allows the use of a flexible blocking strategy that further improves mixing, and thus the efficiency, of the sampling algorithms. The algorithms are tested on two diffusion processes: one with double-well potential drift and another with SINE drift. The new algorithm's accuracy and efficiency is compared with state-of-the-art hybrid Monte Carlo based path sampling. It is shown that in practical, finite sample applications the algorithm is accurate except in the presence of large observation errors and low to a multi-modal structure in the posterior distribution over paths. More importantly, the variational approximation assisted sampling algorithm outperforms hybrid Monte Carlo in terms of computational efficiency, except when the diffusion process is densely observed with small errors in which case both algorithms are equally efficient. © 2011 Springer-Verlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper develops a novel realized matrix-exponential stochastic volatility model of multivariate returns and realized covariances that incorporates asymmetry and long memory (hereafter the RMESV-ALM model). The matrix exponential transformation guarantees the positivedefiniteness of the dynamic covariance matrix. The contribution of the paper ties in with Robert Basmann’s seminal work in terms of the estimation of highly non-linear model specifications (“Causality tests and observationally equivalent representations of econometric models”, Journal of Econometrics, 1988, 39(1-2), 69–104), especially for developing tests for leverage and spillover effects in the covariance dynamics. Efficient importance sampling is used to maximize the likelihood function of RMESV-ALM, and the finite sample properties of the quasi-maximum likelihood estimator of the parameters are analysed. Using high frequency data for three US financial assets, the new model is estimated and evaluated. The forecasting performance of the new model is compared with a novel dynamic realized matrix-exponential conditional covariance model. The volatility and co-volatility spillovers are examined via the news impact curves and the impulse response functions from returns to volatility and co-volatility.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.