50 resultados para Statistics, Nonparametric
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Las pruebas paramétricas son un tipo de pruebas de significación estadística que cuantifican la asociación o independencia entre una variable cuantitativa y una categórica. Las pruebas paramétricas exigen ciertos requisitos previos para su aplicación: la distribución Normal de la variable cuantitativa en los grupos que se comparan, la homogeneidad de varianzas en las poblaciones de las que proceden los grupos y una n muestral no inferior a 30. Su incumplimiento conlleva la necesidad de recurrir a pruebas estadísticas no paramétricas. Las pruebas paramétricas se clasifican en dos: prueba t (para una muestra o para dos muestras relacionadas o independientes) y prueba ANOVA (para más de dos muestras independientes).
Resumo:
This paper proposes a nonparametric test in order to establish the level of accuracy of theforeign trade statistics of 17 Latin American countries when contrasted with the trade statistics of the main partners in 1925. The Wilcoxon Matched-Pairs Ranks test is used to determine whether the differences between the data registered by exporters and importers are meaningful, and if so, whether the differences are systematic in any direction. The paper tests for the reliability of the data registered for two homogeneous products, petroleum and coal, both in volume and value. The conclusion of the several exercises performed is that we cannot accept the existence of statistically significant differences between the data provided by the exporters and the registered by the importing countries in most cases. The qualitative historiography of Latin American describes its foreign trade statistics as mostly unusable. Our quantitative results contest this view.
Resumo:
A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.
Resumo:
Condence intervals in econometric time series regressions suffer fromnotorious coverage problems. This is especially true when the dependencein the data is noticeable and sample sizes are small to moderate, as isoften the case in empirical studies. This paper suggests using thestudentized block bootstrap and discusses practical issues, such as thechoice of the block size. A particular data-dependent method is proposedto automate the method. As a side note, it is pointed out that symmetricconfidence intervals are preferred over equal-tailed ones, since theyexhibit improved coverage accuracy. The improvements in small sampleperformance are supported by a simulation study.
Resumo:
Connections between Statistics and Archaeology have always appeared veryfruitful. The objective of this paper is to offer an outlook of somestatistical techniques that are being developed in the most recentyears and that can be of interest for archaeologists in the short run.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
The most suitable method for estimation of size diversity is investigated. Size diversity is computed on the basis of the Shannon diversity expression adapted for continuous variables, such as size. It takes the form of an integral involving the probability density function (pdf) of the size of the individuals. Different approaches for the estimation of pdf are compared: parametric methods, assuming that data come from a determinate family of pdfs, and nonparametric methods, where pdf is estimated using some kind of local evaluation. Exponential, generalized Pareto, normal, and log-normal distributions have been used to generate simulated samples using estimated parameters from real samples. Nonparametric methods include discrete computation of data histograms based on size intervals and continuous kernel estimation of pdf. Kernel approach gives accurate estimation of size diversity, whilst parametric methods are only useful when the reference distribution have similar shape to the real one. Special attention is given for data standardization. The division of data by the sample geometric mean is proposedas the most suitable standardization method, which shows additional advantages: the same size diversity value is obtained when using original size or log-transformed data, and size measurements with different dimensionality (longitudes, areas, volumes or biomasses) may be immediately compared with the simple addition of ln k where kis the dimensionality (1, 2, or 3, respectively). Thus, the kernel estimation, after data standardization by division of sample geometric mean, arises as the most reliable and generalizable method of size diversity evaluation
Resumo:
The aim of this article is to assess the effects of several territorial characteristics, specifically agglomeration economies, on industrial location processes in the Spanish region of Catalonia. Theoretically, the level of agglomeration causes economies which favour the location of new establishments, but an excessive level of agglomeration might cause diseconomies, since congestion effects arise. The empirical evidence on this matter is inconclusive, probably because the models used so far are not suitable enough. We use a more flexible semiparametric specification, which allows us to study the nonlinear relationship between the different types of agglomeration levels and location processes. Our main statistical source is the REIC (Catalan Manufacturing Establishments Register), which has plant-level microdata on location of new industrial establishments. Keywords: agglomeration economies, industrial location, Generalized Additive Models, nonparametric estimation, count data models.
Resumo:
Given a model that can be simulated, conditional moments at a trial parameter value can be calculated with high accuracy by applying kernel smoothing methods to a long simulation. With such conditional moments in hand, standard method of moments techniques can be used to estimate the parameter. Since conditional moments are calculated using kernel smoothing rather than simple averaging, it is not necessary that the model be simulable subject to the conditioning information that is used to define the moment conditions. For this reason, the proposed estimator is applicable to general dynamic latent variable models. Monte Carlo results show that the estimator performs well in comparison to other estimators that have been proposed for estimation of general DLV models.
Resumo:
This research focuses on a major concern for marketers addressing the claims of inefficiency of the spending on advertising. We examine whether the Internet can help increase overall advertising efficiency. Using a sample from the Spanish automobile industry, we combine a nonparametric method - Data Envelopment Analysis - with recent important insights from statistics and econometrics studies, and we find that online advertising improves the efficiency levels and this effect is more pronounced in the long-term temporal framework.
Resumo:
This paper presents an initial challenge to tackle the every so "tricky" points encountered when dealing with energy accounting, and thereafter illustrates how such a system of accounting can be used when assessing for the metabolic changes in societies. The paper is divided in four main sections. The first three, present a general discussion on the main issues encountered when conducting energy analyses. The last section, subsequently, combines this heuristic approach to the actual formalization of it, in quantitative terms, for the analysis of possible energy scenarios. Section one covers the broader issue of how to account for the relevant categories used when accounting for Joules of energy; emphasizing on the clear distinction between Primary Energy Sources (PES) (which are the physical exploited entities that are used to derive useable energy forms (energy carriers)) and Energy Carriers (EC) (the actual useful energy that is transmitted for the appropriate end uses within a society). Section two sheds light on the concept of Energy Return on Investment (EROI). Here, it is emphasized that, there must already be a certain amount of energy carriers available to be able to extract/exploit Primary Energy Sources to thereafter generate a net supply of energy carriers. It is pointed out that this current trend of intense energy supply has only been possible to the great use and dependence on fossil energy. Section three follows up on the discussion of EROI, indicating that a single numeric indicator such as an output/input ratio is not sufficient in assessing for the performance of energetic systems. Rather an integrated approach that incorporates (i) how big the net supply of Joules of EC can be, given an amount of extracted PES (the external constraints); (ii) how much EC needs to be invested to extract an amount of PES; and (iii) the power level that it takes for both processes to succeed, is underlined. Section four, ultimately, puts the theoretical concepts at play, assessing for how the metabolic performances of societies can be accounted for within this analytical framework.
Resumo:
Abstract. Given a model that can be simulated, conditional moments at a trial parameter value can be calculated with high accuracy by applying kernel smoothing methods to a long simulation. With such conditional moments in hand, standard method of moments techniques can be used to estimate the parameter. Because conditional moments are calculated using kernel smoothing rather than simple averaging, it is not necessary that the model be simulable subject to the conditioning information that is used to define the moment conditions. For this reason, the proposed estimator is applicable to general dynamic latent variable models. It is shown that as the number of simulations diverges, the estimator is consistent and a higher-order expansion reveals the stochastic difference between the infeasible GMM estimator based on the same moment conditions and the simulated version. In particular, we show how to adjust standard errors to account for the simulations. Monte Carlo results show how the estimator may be applied to a range of dynamic latent variable (DLV) models, and that it performs well in comparison to several other estimators that have been proposed for DLV models.
Resumo:
This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described
Resumo:
Aquest projecte es centra principalment en el detector no coherent d’un GPS. Per tal de caracteritzar el procés de detecció d’un receptor, es necessita conèixer l’estadística implicada. Pel cas dels detectors no coherents convencionals, l’estadística de segon ordre intervé plenament. Les prestacions que ens dóna l’estadística de segon ordre, plasmada en la ROC, són prou bons tot i que en diferents situacions poden no ser els millors. Aquest projecte intenta reproduir el procés de detecció mitjançant l’estadística de primer ordre com a alternativa a la ja coneguda i implementada estadística de segon ordre. Per tal d’aconseguir-ho, s’usen expressions basades en el Teorema Central del Límit i de les sèries Edgeworth com a bones aproximacions. Finalment, tant l’estadística convencional com l’estadística proposada són comparades, en termes de la ROC, per tal de determinar quin detector no coherent ofereix millor prestacions en cada situació.
Resumo:
Given an observed test statistic and its degrees of freedom, one may compute the observed P value with most statistical packages. It is unknown to what extent test statistics and P values are congruent in published medical papers. Methods:We checked the congruence of statistical results reported in all the papers of volumes 409–412 of Nature (2001) and a random sample of 63 results from volumes 322–323 of BMJ (2001). We also tested whether the frequencies of the last digit of a sample of 610 test statistics deviated from a uniform distribution (i.e., equally probable digits).Results: 11.6% (21 of 181) and 11.1% (7 of 63) of the statistical results published in Nature and BMJ respectively during 2001 were incongruent, probably mostly due to rounding, transcription, or type-setting errors. At least one such error appeared in 38% and 25% of the papers of Nature and BMJ, respectively. In 12% of the cases, the significance level might change one or more orders of magnitude. The frequencies of the last digit of statistics deviated from the uniform distribution and suggested digit preference in rounding and reporting.Conclusions: this incongruence of test statistics and P values is another example that statistical practice is generally poor, even in the most renowned scientific journals, and that quality of papers should be more controlled and valued