199 resultados para Quantitative Methods
Resumo:
This paper studies the rate of convergence of an appropriatediscretization scheme of the solution of the Mc Kean-Vlasovequation introduced by Bossy and Talay. More specifically,we consider approximations of the distribution and of thedensity of the solution of the stochastic differentialequation associated to the Mc Kean - Vlasov equation. Thescheme adopted here is a mixed one: Euler/weakly interactingparticle system. If $n$ is the number of weakly interactingparticles and $h$ is the uniform step in the timediscretization, we prove that the rate of convergence of thedistribution functions of the approximating sequence in the $L^1(\Omega\times \Bbb R)$ norm and in the sup norm is of theorder of $\frac 1{\sqrt n} + h $, while for the densities is ofthe order $ h +\frac 1 {\sqrt {nh}}$. This result is obtainedby carefully employing techniques of Malliavin Calculus.
Resumo:
In this paper we attempt to describe the general reasons behind the world populationexplosion in the 20th century. The size of the population at the end of the century inquestion, deemed excessive by some, was a consequence of a dramatic improvementin life expectancies, attributable, in turn, to scientific innovation, the circulation ofinformation and economic growth. Nevertheless, fertility is a variable that plays acrucial role in differences in demographic growth. We identify infant mortality, femaleeducation levels and racial identity as important exogenous variables affecting fertility.It is estimated that in poor countries one additional year of primary schooling forwomen leads to 0.614 child less per couple on average (worldwide). While it may bepossible to identify a global tendency towards convergence in demographic trends,particular attention should be paid to the case of Africa, not only due to its differentdemographic patterns, but also because much of the continent's population has yet toexperience improvement in quality of life generally enjoyed across the rest of theplanet.
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.
Resumo:
In this paper we explore the effects of the minimum pension program on welfare andretirement in Spain. This is done with a stylized life-cycle model which provides a convenient analytical characterization of optimal behavior. We use data from the Spanish Social Security to estimate the behavioral parameters of the model and then simulate the changes induced by the minimum pension in aggregate retirement patterns. The impact is substantial: there is threefold increase in retirement at 60 (the age of first entitlement) with respect to the economy without minimum pensions, and total early retirement (before or at 60) is almost 50% larger.
Resumo:
A Method is offered that makes it possible to apply generalized canonicalcorrelations analysis (CANCOR) to two or more matrices of different row and column order. The new method optimizes the generalized canonical correlationanalysis objective by considering only the observed values. This is achieved byemploying selection matrices. We present and discuss fit measures to assessthe quality of the solutions. In a simulation study we assess the performance of our new method and compare it to an existing procedure called GENCOM,proposed by Green and Carroll. We find that our new method outperforms the GENCOM algorithm both with respect to model fit and recovery of the truestructure. Moreover, as our new method does not require any type of iteration itis easier to implement and requires less computation. We illustrate the methodby means of an example concerning the relative positions of the political parties inthe Netherlands based on provincial data.
Resumo:
We obtain minimax lower and upper bounds for the expected distortionredundancy of empirically designed vector quantizers. We show that the meansquared distortion of a vector quantizer designed from $n$ i.i.d. datapoints using any design algorithm is at least $\Omega (n^{-1/2})$ awayfrom the optimal distortion for some distribution on a bounded subset of${\cal R}^d$. Together with existing upper bounds this result shows thatthe minimax distortion redundancy for empirical quantizer design, as afunction of the size of the training data, is asymptotically on the orderof $n^{1/2}$. We also derive a new upper bound for the performance of theempirically optimal quantizer.
Resumo:
Principal curves have been defined Hastie and Stuetzle (JASA, 1989) assmooth curves passing through the middle of a multidimensional dataset. They are nonlinear generalizations of the first principalcomponent, a characterization of which is the basis for the principalcurves definition.In this paper we propose an alternative approach based on a differentproperty of principal components. Consider a point in the space wherea multivariate normal is defined and, for each hyperplane containingthat point, compute the total variance of the normal distributionconditioned to belong to that hyperplane. Choose now the hyperplaneminimizing this conditional total variance and look for thecorresponding conditional mean. The first principal component of theoriginal distribution passes by this conditional mean and it isorthogonal to that hyperplane. This property is easily generalized todata sets with nonlinear structure. Repeating the search from differentstarting points, many points analogous to conditional means are found.We call them principal oriented points. When a one-dimensional curveruns the set of these special points it is called principal curve oforiented points. Successive principal curves are recursively definedfrom a generalization of the total variance.
Resumo:
In 1952 F. Riesz and Sz.Nágy published an example of a monotonic continuous function whose derivative is zero almost everywhere, that is to say, a singular function. Besides, the function was strictly increasing. Their example was built as the limit of a sequence of deformations of the identity function. As an easy consequence of the definition, the derivative, when it existed and was finite, was found to be zero. In this paper we revisit the Riesz-N´agy family of functions and we relate it to a system for real numberrepresentation which we call (t, t-1) expansions. With the help of these real number expansions we generalize the family. The singularity of the functions is proved through some metrical properties of the expansions used in their definition which also allows us to give a more precise way of determining when the derivative is 0 or infinity.
Resumo:
Statistical computing when input/output is driven by a Graphical User Interface is considered. A proposal is made for automatic control ofcomputational flow to ensure that only strictly required computationsare actually carried on. The computational flow is modeled by a directed graph for implementation in any object-oriented programming language with symbolic manipulation capabilities. A complete implementation example is presented to compute and display frequency based piecewise linear density estimators such as histograms or frequency polygons.
Resumo:
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.
Resumo:
We introduce a simple new hypothesis testing procedure, which,based on an independent sample drawn from a certain density, detects which of $k$ nominal densities is the true density is closest to, under the total variation (L_{1}) distance. Weobtain a density-free uniform exponential bound for the probability of false detection.
Resumo:
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.
Resumo:
This paper presents a comparative analysis of linear and mixed modelsfor short term forecasting of a real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay.The series is interpolated with a linear predictor which minimizes theforecast mean square error. The linear models are seasonal ARIMA models and themixed models have a linear component and a non linear seasonal component.The non linear component is estimated by a non parametric regression of dataversus time. Short term forecasts, no more than two days ahead, are of interestbecause they can be used by the port authorities to notice the fleet.Several models are fitted and compared by their forecasting behavior.
Resumo:
A tool for user choice of the local bandwidth function for a kernel density estimate is developed using KDE, a graphical object-oriented package for interactive kernel density estimation written in LISP-STAT. The bandwidth function is a cubic spline, whose knots are manipulated by the user in one window, while the resulting estimate appears in another window. A real data illustration of this method raises concerns, because an extremely large family of estimates is available.