199 resultados para Quantitative methods
Resumo:
The objective of this paper is to compare the performance of twopredictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
We show that the Heston volatility or equivalently the Cox-Ingersoll-Ross process is Malliavin differentiable and give an explicit expression for the derivative. This result assures the applicability of Malliavin calculus in the framework of the Heston stochastic volatility model and the Cox-Ingersoll-Ross model for interest rates.
Resumo:
Although correspondence analysis is now widely available in statistical software packages and applied in a variety of contexts, notably the social and environmental sciences, there are still some misconceptions about this method as well as unresolved issues which remain controversial to this day. In this paper we hope to settle these matters, namely (i) the way CA measures variance in a two-way table and how to compare variances between tables of different sizes, (ii) the influence, or rather lack of influence, of outliers in the usual CA maps, (iii) the scaling issue and the biplot interpretation of maps,(iv) whether or not to rotate a solution, and (v) statistical significance of results.
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, andmargin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
This paper analyzes collective bargaining using Spanish firm level data. Centralto the analysis are the joint determination of wage and strike outcomes in adynamic framework and the possibility of segregate wage equation for strike andnon-strike outcomes. Conditional to strikes taking place, we confirm a negativerelationship between strike duration and wage changes in a dynamic context.Furthermore, we find selection in wage equations induced by the strike outcome.In this sense, the possibility of wage determination processes being differentin strike and non-strike samples is not rejected by the data. In particular,wage dynamics are of opposite sing in both strike and non-strike equations.Finally, we find evidence of a 0.33 percentage points wage change strike premium.
Resumo:
When the behaviour of a specific hypothesis test statistic is studied by aMonte Carlo experiment, the usual way to describe its quality is by givingthe empirical level of the test. As an alternative to this procedure, we usethe empirical distribution of the obtained \emph{p-}values and exploit itsinformation both graphically and numerically.
Resumo:
It is shown how correspondence analysis may be applied to a subset of response categories from a questionnaire survey, for example the subset of undecided responses or the subset of responses for a particular category. The idea is to maintain the original relative frequencies of the categories and not re-express them relative to totals within the subset, as would normally be done in a regular correspondence analysis of the subset. Furthermore, the masses and chi-square metric assigned to the data subset are the same as those in the correspondence analysis of the whole data set. This variant of the method, called Subset Correspondence Analysis, is illustrated on data from the ISSP survey on Family and Changing Gender Roles.
Resumo:
We show the equivalence between the use of correspondence analysis (CA)of concadenated tables and the application of a particular version ofconjoint analysis called categorical conjoint measurement (CCM). Theconnection is established using canonical correlation (CC). The second part introduces the interaction e¤ects in all three variants of theanalysis and shows how to pass between the results of each analysis.
Resumo:
The approximants to regular continued fractions constitute `best approximations' to the numbers they converge to in two ways known as of the first and the second kind.This property of continued fractions provides a solution to Gosper's problem of the batting average: if the batting average of a baseball player is 0.334, what is the minimum number of times he has been at bat? In this paper, we tackle somehow the inverse question: given a rational number P/Q, what is the set of all numbers for which P/Q is a `best approximation' of one or the other kind? We prove that inboth cases these `Optimality Sets' are intervals and we give aprecise description of their endpoints.
Resumo:
In this paper we use a variety of data sources, both micro and macro, time series, crosssection, and panel data to provide an empirical evaluation of the current level of economicwellbeing of the Spanish elderly, and of its determinants. We focus, in particular on the role played by the pension system and its generosity in terms of minimum pension supplements and non-contributive pensions. In an IV context, we find that actual Social Security benefits contribute substantially to explain income and consumption poverty levels and trends of low income and consumption percentiles. Thus we offer support to previous evidence for Spain emphasizing the role of minimum benefit policies.
Resumo:
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.
Resumo:
We study the effects of the cancellation of a sizeable child benefit in Spainon birth timing and neonatal health. In May 2010, the government announced that a2,500-euro universal "baby bonus" would stop being paid to babies born startingJanuary 1, 2011. We use detailed micro data from birth certificates from 2000 to 2011,and find that more than 2,000 families were able to anticipate the date of birth of theirbabies from (early) January 2011 to (late) December 2010 (for a total of about 10,000births a week nationally). This shifting took place in part via an increase as well as ananticipation of pre-programmed c-sections, seemingly mostly in private clinics. We findthat this shifting of birthdates resulted in a significant increase in the number ofborderline low birth weight babies, as well as a peak in neonatal mortality. The resultssuggest that announcement effects are important, and that families and healthprofessionals may face effective trade-offs when deciding on the timing (and method) ofbirth.
Resumo:
This work proposes novel network analysis techniques for multivariate time series.We define the network of a multivariate time series as a graph where verticesdenote the components of the process and edges denote non zero long run partialcorrelations. We then introduce a two step LASSO procedure, called NETS, toestimate high dimensional sparse Long Run Partial Correlation networks. This approachis based on a VAR approximation of the process and allows to decomposethe long run linkages into the contribution of the dynamic and contemporaneousdependence relations of the system. The large sample properties of the estimatorare analysed and we establish conditions for consistent selection and estimation ofthe non zero long run partial correlations. The methodology is illustrated with anapplication to a panel of U.S. bluechips.
Resumo:
This paper provides regression discontinuity evidence on long-run and intergenerational education impacts of a temporary increase in federal transfers to local governments in Brazil. Revenues and expenditures of the communities benefiting from extra transfers temporarily increased by about 20% during the 4 year period from 1982 to the end of 1985. Schooling and literacy gains for directly exposed cohorts established in previous work that used the 1991 census are attenuated but persist in the 2000 and 2010 censuses. Children and adolescents of the next generation --born after the extra funding had disappeared-- show gains of about 0.08 standard deviation across the entire score distribution of two nationwide exams at the end of the 2000s. While we find no evidence of persistent improvements in school resources, we document discontinuities in education levels, literacy rates and incomes of test takers' parents that are consistent with intergenerational human capital spillovers.