196 resultados para [JEL:C32] Mathematical and Quantitative Methods - Econometric Methods: Multiple
Resumo:
Most methods for small-area estimation are based on composite estimators derived from design- or model-based methods. A composite estimator is a linear combination of a direct and an indirect estimator with weights that usually depend on unknown parameters which need to be estimated. Although model-based small-area estimators are usually based on random-effects models, the assumption of fixed effects is at face value more appropriate.Model-based estimators are justified by the assumption of random (interchangeable) area effects; in practice, however, areas are not interchangeable. In the present paper we empirically assess the quality of several small-area estimators in the setting in which the area effects are treated as fixed. We consider two settings: one that draws samples from a theoretical population, and another that draws samples from an empirical population of a labor force register maintained by the National Institute of Social Security (NISS) of Catalonia. We distinguish two types of composite estimators: a) those that use weights that involve area specific estimates of bias and variance; and, b) those that use weights that involve a common variance and a common squared bias estimate for all the areas. We assess their precision and discuss alternatives to optimizing composite estimation in applications.
Resumo:
This paper studies the rate of convergence of an appropriatediscretization scheme of the solution of the Mc Kean-Vlasovequation introduced by Bossy and Talay. More specifically,we consider approximations of the distribution and of thedensity of the solution of the stochastic differentialequation associated to the Mc Kean - Vlasov equation. Thescheme adopted here is a mixed one: Euler/weakly interactingparticle system. If $n$ is the number of weakly interactingparticles and $h$ is the uniform step in the timediscretization, we prove that the rate of convergence of thedistribution functions of the approximating sequence in the $L^1(\Omega\times \Bbb R)$ norm and in the sup norm is of theorder of $\frac 1{\sqrt n} + h $, while for the densities is ofthe order $ h +\frac 1 {\sqrt {nh}}$. This result is obtainedby carefully employing techniques of Malliavin Calculus.
Resumo:
In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.
Resumo:
In this paper we explore the effects of the minimum pension program on welfare andretirement in Spain. This is done with a stylized life-cycle model which provides a convenient analytical characterization of optimal behavior. We use data from the Spanish Social Security to estimate the behavioral parameters of the model and then simulate the changes induced by the minimum pension in aggregate retirement patterns. The impact is substantial: there is threefold increase in retirement at 60 (the age of first entitlement) with respect to the economy without minimum pensions, and total early retirement (before or at 60) is almost 50% larger.
Resumo:
We obtain minimax lower and upper bounds for the expected distortionredundancy of empirically designed vector quantizers. We show that the meansquared distortion of a vector quantizer designed from $n$ i.i.d. datapoints using any design algorithm is at least $\Omega (n^{-1/2})$ awayfrom the optimal distortion for some distribution on a bounded subset of${\cal R}^d$. Together with existing upper bounds this result shows thatthe minimax distortion redundancy for empirical quantizer design, as afunction of the size of the training data, is asymptotically on the orderof $n^{1/2}$. We also derive a new upper bound for the performance of theempirically optimal quantizer.
Resumo:
In 1952 F. Riesz and Sz.Nágy published an example of a monotonic continuous function whose derivative is zero almost everywhere, that is to say, a singular function. Besides, the function was strictly increasing. Their example was built as the limit of a sequence of deformations of the identity function. As an easy consequence of the definition, the derivative, when it existed and was finite, was found to be zero. In this paper we revisit the Riesz-N´agy family of functions and we relate it to a system for real numberrepresentation which we call (t, t-1) expansions. With the help of these real number expansions we generalize the family. The singularity of the functions is proved through some metrical properties of the expansions used in their definition which also allows us to give a more precise way of determining when the derivative is 0 or infinity.
Resumo:
Statistical computing when input/output is driven by a Graphical User Interface is considered. A proposal is made for automatic control ofcomputational flow to ensure that only strictly required computationsare actually carried on. The computational flow is modeled by a directed graph for implementation in any object-oriented programming language with symbolic manipulation capabilities. A complete implementation example is presented to compute and display frequency based piecewise linear density estimators such as histograms or frequency polygons.
Resumo:
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.
Resumo:
We introduce a simple new hypothesis testing procedure, which,based on an independent sample drawn from a certain density, detects which of $k$ nominal densities is the true density is closest to, under the total variation (L_{1}) distance. Weobtain a density-free uniform exponential bound for the probability of false detection.
Resumo:
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.
Resumo:
This paper presents a comparative analysis of linear and mixed modelsfor short term forecasting of a real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay.The series is interpolated with a linear predictor which minimizes theforecast mean square error. The linear models are seasonal ARIMA models and themixed models have a linear component and a non linear seasonal component.The non linear component is estimated by a non parametric regression of dataversus time. Short term forecasts, no more than two days ahead, are of interestbecause they can be used by the port authorities to notice the fleet.Several models are fitted and compared by their forecasting behavior.
Resumo:
A tool for user choice of the local bandwidth function for a kernel density estimate is developed using KDE, a graphical object-oriented package for interactive kernel density estimation written in LISP-STAT. The bandwidth function is a cubic spline, whose knots are manipulated by the user in one window, while the resulting estimate appears in another window. A real data illustration of this method raises concerns, because an extremely large family of estimates is available.
Resumo:
The Treatise on Quadrature of Fermat (c. 1659), besides containing the first known proof of the computation of the area under a higher parabola, R x+m/n dx, or under a higher hyperbola, R x-m/n dx with the appropriate limits of integration in each case , has a second part which was not understood by Fermat s contemporaries. This second part of the Treatise is obscure and difficult to read and even the great Huygens described it as'published with many mistakes and it is so obscure (with proofs redolent of error) that I have been unable to make any sense of it'. Far from the confusion that Huygens attributes to it, in this paper we try to prove that Fermat, in writing the Treatise, had a very clear goal in mind and he managed to attain it by means of a simple and original method. Fermat reduced the quadrature of a great number of algebraic curves to the quadrature of known curves: the higher parabolas and hyperbolas of the first part of the paper. Others, he reduced to the quadrature of the circle. We shall see how the clever use of two procedures, quite novel at the time: the change of variables and a particular case of the formulaof integration by parts, provide Fermat with the necessary tools to square very easily curves as well-known as the folium of Descartes, the cissoid of Diocles or the witch of Agnesi.
Resumo:
This work is part of a project studying the performance of model basedestimators in a small area context. We have chosen a simple statisticalapplication in which we estimate the growth rate of accupation for severalregions of Spain. We compare three estimators: the direct one based onstraightforward results from the survey (which is unbiassed), and a thirdone which is based in a statistical model and that minimizes the mean squareerror.
Resumo:
Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.