867 resultados para Panel-data econometrics
Resumo:
Correspondence analysis has found extensive use in ecology, archeology, linguisticsand the social sciences as a method for visualizing the patterns of association in a table offrequencies or nonnegative ratio-scale data. Inherent to the method is the expression of the datain each row or each column relative to their respective totals, and it is these sets of relativevalues (called profiles) that are visualized. This relativization of the data makes perfect sensewhen the margins of the table represent samples from sub-populations of inherently differentsizes. But in some ecological applications sampling is performed on equal areas or equalvolumes so that the absolute levels of the observed occurrences may be of relevance, in whichcase relativization may not be required. In this paper we define the correspondence analysis ofthe raw unrelativized data and discuss its properties, comparing this new method to regularcorrespondence analysis and to a related variant of non-symmetric correspondence analysis.
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
This paper presents a method for the measurement of changes in health inequality and income-related health inequality over time in a population.For pure health inequality (as measured by the Gini coefficient) andincome-related health inequality (as measured by the concentration index),we show how measures derived from longitudinal data can be related tocross section Gini and concentration indices that have been typicallyreported in the literature to date, along with measures of health mobilityinspired by the literature on income mobility. We also show how thesemeasures of mobility can be usefully decomposed into the contributions ofdifferent covariates. We apply these methods to investigate the degree ofincome-related mobility in the GHQ measure of psychological well-being inthe first nine waves of the British Household Panel Survey (BHPS). Thisreveals that dynamics increase the absolute value of the concentrationindex of GHQ on income by 10%.
Resumo:
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods.
Resumo:
This paper presents a comparative analysis of linear and mixed modelsfor short term forecasting of a real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay.The series is interpolated with a linear predictor which minimizes theforecast mean square error. The linear models are seasonal ARIMA models and themixed models have a linear component and a non linear seasonal component.The non linear component is estimated by a non parametric regression of dataversus time. Short term forecasts, no more than two days ahead, are of interestbecause they can be used by the port authorities to notice the fleet.Several models are fitted and compared by their forecasting behavior.
Resumo:
As the prevalence of smoking has decreased to below 20%, health practitioners interest has shifted towards theprevalence of obesity, and reducing it is one of the major health challenges in decades to come. In this paper westudy the impact that the final product of the anti-smoking campaign, that is, smokers quitting the habit, had onaverage weight in the population. To these ends, we use data from the Behavioral Risk Factors Surveillance System,a large series of independent representative cross-sectional surveys. We construct a synthetic panel that allows us tocontrol for unobserved heterogeneity and we exploit the exogenous changes in taxes and regulations to instrumentthe endogenous decision to give up the habit of smoking. Our estimates, are very close to estimates issued in the 90sby the US Department of Health, and indicate that a 10% decrease in the incidence of smoking leads to an averageweight increase of 2.2 to 3 pounds, depending on choice of specification. In addition, we find evidence that the effectovershoots in the short run, although a significant part remains even after two years. However, when we split thesample between men and women, we only find a significant effect for men. Finally, the implicit elasticity of quittingsmoking to the probability of becoming obese is calculated at 0.58. This implies that the net benefit from reducingthe incidence of smoking by 1% is positive even though the cost to society is $0.6 billions.
Resumo:
The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.
Resumo:
Dual scaling of a subjects-by-objects table of dominance data (preferences,paired comparisons and successive categories data) has been contrasted with correspondence analysis, as if the two techniques were somehow different. In this note we show that dual scaling of dominance data is equivalent to the correspondence analysis of a table which is doubled with respect to subjects. We also show that the results of both methods can be recovered from a principal components analysis of the undoubled dominance table which is centred with respect to subject means.
Resumo:
The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.
Resumo:
Structural equation models (SEM) are commonly used to analyze the relationship between variables some of which may be latent, such as individual ``attitude'' to and ``behavior'' concerning specific issues. A number of difficulties arise when we want to compare a large number of groups, each with large sample size, and the manifest variables are distinctly non-normally distributed. Using an specific data set, we evaluate the appropriateness of the following alternative SEM approaches: multiple group versus MIMIC models, continuous versus ordinal variables estimation methods, and normal theory versus non-normal estimation methods. The approaches are applied to the ISSP-1993 Environmental data set, with the purpose of exploring variation in the mean level of variables of ``attitude'' to and ``behavior''concerning environmental issues and their mutual relationship across countries. Issues of both theoretical and practical relevance arise in the course of this application.
Resumo:
In 2007 the first Quality Enhancement Meeting on sampling in the European SocialSurvey (ESS) took place. The discussion focused on design effects and inteviewereffects in face-to-face interviews. Following the recomendations of this meeting theSpanish ESS team studied the impact of interviewers as a new element in the designeffect in the response s variance using the information of the correspondent SampleDesign Data Files. Hierarchical multilevel and cross-classified multilevel analysis areconducted in order to estimate the amount of responses variation due to PSU and tointerviewers for different questions in the survey. Factor such as the age of theinterviewer, gender, workload, training and experience and respondent characteristicssuch as age, gender, renuance to participate and their possible interactions are alsoincluded in the analysis of some specific questions like trust in politicians and trustin legal system . Some recomendations related to future sampling designs and thecontents of the briefing sessions are derived from this initial research.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
BACKGROUND: Prospective data describing the appropriateness of use of colonoscopy based on detailed panel-based clinical criteria are not available. METHODS: In a cohort of 553 consecutive patients referred for colonoscopy to two university-based Swiss outpatient clinics, the percentage of patients who underwent colonoscopy for appropriate, equivocal, and inappropriate indications and the relationship between appropriateness of use and the presence of relevant endoscopic lesions was prospectively assessed. This assessment was based on criteria of the American Society for Gastrointestinal Endoscopy and explicit American and Swiss criteria developed in 1994 by a formal panel process using the RAND/UCLA appropriateness method. RESULTS: The procedures were rated appropriate or equivocal in 72.2% by criteria of the American Society for Gastrointestinal Endoscopy, in 68.5% by explicit American criteria, and in 74.4% by explicit Swiss criteria (not statistically significant, NS). Inappropriate use (overuse) of colonoscopy was found in 27.8%, 31.5%, and 25.6%, respectively (NS). The proportion of appropriate procedures was higher with increasing age. Almost all reasons for using colonoscopy could be assessed by the two explicit criteria sets, whereas 28.4% of reasons for using colonoscopy could not be evaluated by the criteria of the American Society for Gastrointestinal Endoscopy (p < 0.0001). The probability of finding a relevant endoscopic lesion was distinctly higher in the procedures rated appropriate or equivocal than in procedures judged inappropriate. CONCLUSIONS: The rate of inappropriate use of colonoscopy is substantial in Switzerland. Explicit criteria allow assessment of almost all indications encountered in clinical practice. In this study, all sets of appropriateness criteria significantly enhanced the probability of finding a relevant endoscopic lesion during colonoscopy.