998 resultados para Conditional Distribution
Resumo:
We extend PML theory to account for information on the conditional moments up to order four, but without assuming a parametric model, to avoid a risk of misspecification of the conditional distribution. The key statistical tool is the quartic exponential family, which allows us to generalize the PML2 and QGPML1 methods proposed in Gourieroux et al. (1984) to PML4 and QGPML2 methods, respectively. An asymptotic theory is developed. The key numerical tool that we use is the Gauss-Freud integration scheme that solves a computational problem that has previously been raised in several fields. Simulation exercises demonstrate the feasibility and robustness of the methods [Authors]
Resumo:
Preface In this thesis we study several questions related to transaction data measured at an individual level. The questions are addressed in three essays that will constitute this thesis. In the first essay we use tick-by-tick data to estimate non-parametrically the jump process of 37 big stocks traded on the Paris Stock Exchange, and of the CAC 40 index. We separate the total daily returns in three components (trading continuous, trading jump, and overnight), and we characterize each one of them. We estimate at the individual and index levels the contribution of each return component to the total daily variability. For the index, the contribution of jumps is smaller and it is compensated by the larger contribution of overnight returns. We test formally that individual stocks jump more frequently than the index, and that they do not respond independently to the arrive of news. Finally, we find that daily jumps are larger when their arrival rates are larger. At the contemporaneous level there is a strong negative correlation between the jump frequency and the trading activity measures. The second essay study the general properties of the trade- and volume-duration processes for two stocks traded on the Paris Stock Exchange. These two stocks correspond to a very illiquid stock and to a relatively liquid stock. We estimate a class of autoregressive gamma process with conditional distribution from the family of non-central gamma (up to a scale factor). This process was introduced by Gouriéroux and Jasiak and it is known as Autoregressive gamma process. We also evaluate the ability of the process to fit the data. For this purpose we use the Diebold, Gunther and Tay (1998) test; and the capacity of the model to reproduce the moments of the observed data, and the empirical serial correlation and the partial serial correlation functions. We establish that the model describes correctly the trade duration process of illiquid stocks, but have problems to adjust correctly the trade duration process of liquid stocks which present long-memory characteristics. When the model is adjusted to volume duration, it successfully fit the data. In the third essay we study the economic relevance of optimal liquidation strategies by calibrating a recent and realistic microstructure model with data from the Paris Stock Exchange. We distinguish the case of parameters which are constant through the day from time-varying ones. An optimization problem incorporating this realistic microstructure model is presented and solved. Our model endogenizes the number of trades required before the position is liquidated. A comparative static exercise demonstrates the realism of our model. We find that a sell decision taken in the morning will be liquidated by the early afternoon. If price impacts increase over the day, the liquidation will take place more rapidly.
Resumo:
Le prix efficient est latent, il est contaminé par les frictions microstructurelles ou bruit. On explore la mesure et la prévision de la volatilité fondamentale en utilisant les données à haute fréquence. Dans le premier papier, en maintenant le cadre standard du modèle additif du bruit et le prix efficient, on montre qu’en utilisant le volume de transaction, les volumes d’achat et de vente, l’indicateur de la direction de transaction et la différence entre prix d’achat et prix de vente pour absorber le bruit, on améliore la précision des estimateurs de volatilité. Si le bruit n’est que partiellement absorbé, le bruit résiduel est plus proche d’un bruit blanc que le bruit original, ce qui diminue la misspécification des caractéristiques du bruit. Dans le deuxième papier, on part d’un fait empirique qu’on modélise par une forme linéaire de la variance du bruit microstructure en la volatilité fondamentale. Grâce à la représentation de la classe générale des modèles de volatilité stochastique, on explore la performance de prévision de différentes mesures de volatilité sous les hypothèses de notre modèle. Dans le troisième papier, on dérive de nouvelles mesures réalizées en utilisant les prix et les volumes d’achat et de vente. Comme alternative au modèle additif standard pour les prix contaminés avec le bruit microstructure, on fait des hypothèses sur la distribution du prix sans frictions qui est supposé borné par les prix de vente et d’achat.
Resumo:
Los métodos disponibles para realizar análisis de descomposición que se pueden aplicar cuando los datos son completamente observados, no son válidos cuando la variable de interés es censurada. Esto puede explicar la escasez de este tipo de ejercicios considerando variables de duración, las cuales se observan usualmente bajo censura. Este documento propone un método del tipo Oaxaca-Blinder para descomponer diferencias en la media en el contexto de datos censurados. La validez de dicho método radica en la identificación y estimación de la distribución conjunta de la variable de duración y un conjunto de covariables. Adicionalmente, se propone un método más general que permite descomponer otros funcionales de interés como la mediana o el coeficiente de Gini, el cual se basa en la especificación de la función de distribución condicional de la variable de duración dado un conjunto de covariables. Con el fin de evaluar el desempeño de dichos métodos, se realizan experimentos tipo Monte Carlo. Finalmente, los métodos propuestos son aplicados para analizar las brechas de género en diferentes características de la duración del desempleo en España, tales como la duración media, la probabilidad de ser desempleado de largo plazo y el coeficiente de Gini. Los resultados obtenidos permiten concluir que los factores diferentes a las características observables, tales como capital humano o estructura del hogar, juegan un papel primordial para explicar dichas brechas.
Resumo:
[1] In many practical situations where spatial rainfall estimates are needed, rainfall occurs as a spatially intermittent phenomenon. An efficient geostatistical method for rainfall estimation in the case of intermittency has previously been published and comprises the estimation of two independent components: a binary random function for modeling the intermittency and a continuous random function that models the rainfall inside the rainy areas. The final rainfall estimates are obtained as the product of the estimates of these two random functions. However the published approach does not contain a method for estimation of uncertainties. The contribution of this paper is the presentation of the indicator maximum likelihood estimator from which the local conditional distribution of the rainfall value at any location may be derived using an ensemble approach. From the conditional distribution, representations of uncertainty such as the estimation variance and confidence intervals can be obtained. An approximation to the variance can be calculated more simply by assuming rainfall intensity is independent of location within the rainy area. The methodology has been validated using simulated and real rainfall data sets. The results of these case studies show good agreement between predicted uncertainties and measured errors obtained from the validation data.
Determinants of fruit and vegetable intake in England: a re-examination based on quantile regression
Resumo:
Objective To examine die sociodemographic determinants of fruit and vegetable (F&V) consumption in England and determine the differential effects of socioeconomic variables at various parts of the intake distribution, with a special focus on severely inadequate intakes Design Quantile regression, expressing F&V intake as a function of sociodemographic variables, is employed. Here, quantile regression flexibly allows variables such as ethnicity to exert effects on F&V intake that. vary depending oil existing levels of intake. Setting The 2003 Health survey of England. Subjects Data were from 11044 adult individuals. Results The influence of particular sociodemographic variables is found to vary significantly across the intake distribution We conclude that women consume more F&V than men, Asians and Hacks mole dian Whites, co-habiting individuals more than single-living ones Increased incomes and education also boost intake However, the key general finding of the present study is that the influence of most variables is relatively weak in the area of greatest concern, i e among those with the most inadequate intakes in any reference group. Conclusions. Our findings emphasise the importance of allowing the effects of socio-economic drivers to vary across the intake distribution The main finding, that variables which exert significant influence on F&V Intake at other parts Of the conditional distribution have a relatively weak influence at the lower tail, is cause for concern. It implies that in any defined group, those consuming the lease F&V are hard to influence using compaigns or policy levers.
Resumo:
This paper presents semiparametric estimators of changes in inequality measures of a dependent variable distribution taking into account the possible changes on the distributions of covariates. When we do not impose parametric assumptions on the conditional distribution of the dependent variable given covariates, this problem becomes equivalent to estimation of distributional impacts of interventions (treatment) when selection to the program is based on observable characteristics. The distributional impacts of a treatment will be calculated as differences in inequality measures of the potential outcomes of receiving and not receiving the treatment. These differences are called here Inequality Treatment Effects (ITE). The estimation procedure involves a first non-parametric step in which the probability of receiving treatment given covariates, the propensity-score, is estimated. Using the inverse probability weighting method to estimate parameters of the marginal distribution of potential outcomes, in the second step weighted sample versions of inequality measures are computed. Root-N consistency, asymptotic normality and semiparametric efficiency are shown for the semiparametric estimators proposed. A Monte Carlo exercise is performed to investigate the behavior in finite samples of the estimator derived in the paper. We also apply our method to the evaluation of a job training program.
Resumo:
Current research compares the Bayesian estimates obtained for the parameters of processes of ARCH family with normal and Student's t distributions for the conditional distribution of the return series. A non-informative prior distribution was adopted and a reparameterization of models under analysis was taken into account to map parameters' space into real space. The procedure adopts a normal prior distribution for the transformed parameters. The posterior summaries were obtained by Monte Carlo Markov Chain (MCMC) simulation methods. The methodology was evaluated by a series of Bovespa Index returns and the predictive ordinate criterion was employed to select the best adjustment model to the data. Results show that, as a rule, the proposed Bayesian approach provides satisfactory estimates and that the GARCH process with Student's t distribution adjusted better to the data.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The aim of this thesis is to apply multilevel regression model in context of household surveys. Hierarchical structure in this type of data is characterized by many small groups. In last years comparative and multilevel analysis in the field of perceived health have grown in size. The purpose of this thesis is to develop a multilevel analysis with three level of hierarchy for Physical Component Summary outcome to: evaluate magnitude of within and between variance at each level (individual, household and municipality); explore which covariates affect on perceived physical health at each level; compare model-based and design-based approach in order to establish informativeness of sampling design; estimate a quantile regression for hierarchical data. The target population are the Italian residents aged 18 years and older. Our study shows a high degree of homogeneity within level 1 units belonging from the same group, with an intraclass correlation of 27% in a level-2 null model. Almost all variance is explained by level 1 covariates. In fact, in our model the explanatory variables having more impact on the outcome are disability, unable to work, age and chronic diseases (18 pathologies). An additional analysis are performed by using novel procedure of analysis :"Linear Quantile Mixed Model", named "Multilevel Linear Quantile Regression", estimate. This give us the possibility to describe more generally the conditional distribution of the response through the estimation of its quantiles, while accounting for the dependence among the observations. This has represented a great advantage of our models with respect to classic multilevel regression. The median regression with random effects reveals to be more efficient than the mean regression in representation of the outcome central tendency. A more detailed analysis of the conditional distribution of the response on other quantiles highlighted a differential effect of some covariate along the distribution.
Resumo:
Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice. This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences. We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.
Resumo:
In biostatistical applications interest often focuses on the estimation of the distribution of a time-until-event variable T. If one observes whether or not T exceeds an observed monitoring time at a random number of monitoring times, then the data structure is called interval censored data. We extend this data structure by allowing the presence of a possibly time-dependent covariate process that is observed until end of follow up. If one only assumes that the censoring mechanism satisfies coarsening at random, then, by the curve of dimensionality, typically no regular estimators will exist. To fight the curse of dimensionality we follow the approach of Robins and Rotnitzky (1992) by modeling parameters of the censoring mechanism. We model the right-censoring mechanism by modeling the hazard of the follow up time, conditional on T and the covariate process. For the monitoring mechanism we avoid modeling the joint distribution of the monitoring times by only modeling a univariate hazard of the pooled monitoring times, conditional on the follow up time, T, and the covariates process, which can be estimated by treating the pooled sample of monitoring times as i.i.d. In particular, it is assumed that the monitoring times and the right-censoring times only depend on T through the observed covariate process. We introduce inverse probability of censoring weighted (IPCW) estimator of the distribution of T and of smooth functionals thereof which are guaranteed to be consistent and asymptotically normal if we have available correctly specified semiparametric models for the two hazards of the censoring process. Furthermore, given such correctly specified models for these hazards of the censoring process, we propose a one-step estimator which will improve on the IPCW estimator if we correctly specify a lower-dimensional working model for the conditional distribution of T, given the covariate process, that remains consistent and asymptotically normal if this latter working model is misspecified. It is shown that the one-step estimator is efficient if each subject is at most monitored once and the working model contains the truth. In general, it is shown that the one-step estimator optimally uses the surrogate information if the working model contains the truth. It is not optimal in using the interval information provided by the current status indicators at the monitoring times, but simulations in Peterson, van der Laan (1997) show that the efficiency loss is small.
Resumo:
In biostatistical applications, interest often focuses on the estimation of the distribution of time T between two consecutive events. If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed monitoring time, then the data is described by the well known singly-censored current status model, also known as interval censored data, case I. We extend this current status model by allowing the presence of a time-dependent process, which is partly observed and allowing C to depend on T through the observed part of this time-dependent process. Because of the high dimension of the covariate process, no globally efficient estimators exist with a good practical performance at moderate sample sizes. We follow the approach of Robins and Rotnitzky (1992) by modeling the censoring variable, given the time-variable and the covariate-process, i.e., the missingness process, under the restriction that it satisfied coarsening at random. We propose a generalization of the simple current status estimator of the distribution of T and of smooth functionals of the distribution of T, which is based on an estimate of the missingness. In this estimator the covariates enter only through the estimate of the missingness process. Due to the coarsening at random assumption, the estimator has the interesting property that if we estimate the missingness process more nonparametrically, then we improve its efficiency. We show that by local estimation of an optimal model or optimal function of the covariates for the missingness process, the generalized current status estimator for smooth functionals become locally efficient; meaning it is efficient if the right model or covariate is consistently estimated and it is consistent and asymptotically normal in general. Estimation of the optimal model requires estimation of the conditional distribution of T, given the covariates. Any (prior) knowledge of this conditional distribution can be used at this stage without any risk of losing root-n consistency. We also propose locally efficient one step estimators. Finally, we show some simulation results.
Resumo:
The Data Envelopment Analysis (DEA) efficiency score obtained for an individual firm is a point estimate without any confidence interval around it. In recent years, researchers have resorted to bootstrapping in order to generate empirical distributions of efficiency scores. This procedure assumes that all firms have the same probability of getting an efficiency score from any specified interval within the [0,1] range. We propose a bootstrap procedure that empirically generates the conditional distribution of efficiency for each individual firm given systematic factors that influence its efficiency. Instead of resampling directly from the pooled DEA scores, we first regress these scores on a set of explanatory variables not included at the DEA stage and bootstrap the residuals from this regression. These pseudo-efficiency scores incorporate the systematic effects of unit-specific factors along with the contribution of the randomly drawn residual. Data from the U.S. airline industry are utilized in an empirical application.
Resumo:
Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^