40 resultados para non-parametric technique
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
We present a real data set of claims amounts where costs related to damage are recorded separately from those related to medical expenses. Only claims with positive costs are considered here. Two approaches to density estimation are presented: a classical parametric and a semi-parametric method, based on transformation kernel density estimation. We explore the data set with standard univariate methods. We also propose ways to select the bandwidth and transformation parameters in the univariate case based on Bayesian methods. We indicate how to compare the results of alternative methods both looking at the shape of the overall density domain and exploring the density estimates in the right tail.
Resumo:
Land cover classification is a key research field in remote sensing and land change science as thematic maps derived from remotely sensed data have become the basis for analyzing many socio-ecological issues. However, land cover classification remains a difficult task and it is especially challenging in heterogeneous tropical landscapes where nonetheless such maps are of great importance. The present study aims to establish an efficient classification approach to accurately map all broad land cover classes in a large, heterogeneous tropical area of Bolivia, as a basis for further studies (e.g., land cover-land use change). Specifically, we compare the performance of parametric (maximum likelihood), non-parametric (k-nearest neighbour and four different support vector machines - SVM), and hybrid classifiers, using both hard and soft (fuzzy) accuracy assessments. In addition, we test whether the inclusion of a textural index (homogeneity) in the classifications improves their performance. We classified Landsat imagery for two dates corresponding to dry and wet seasons and found that non-parametric, and particularly SVM classifiers, outperformed both parametric and hybrid classifiers. We also found that the use of the homogeneity index along with reflectance bands significantly increased the overall accuracy of all the classifications, but particularly of SVM algorithms. We observed that improvements in producer’s and user’s accuracies through the inclusion of the homogeneity index were different depending on land cover classes. Earlygrowth/degraded forests, pastures, grasslands and savanna were the classes most improved, especially with the SVM radial basis function and SVM sigmoid classifiers, though with both classifiers all land cover classes were mapped with producer’s and user’s accuracies of around 90%. Our approach seems very well suited to accurately map land cover in tropical regions, thus having the potential to contribute to conservation initiatives, climate change mitigation schemes such as REDD+, and rural development policies.
Resumo:
This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described
Resumo:
The objective of this study is to analyse the technical or productive efficiency ofthe refuse collection services in 75 municipalities located in the Spanish regionof Catalonia. The analysis has been carried out using various techniques. Firstly we have calculated a deterministic parametric frontier, then a stochastic parametric frontier, and finally, various non-parametric approaches (DEA and FDH). Concerning the results, these naturally differ according to the technique used to approach the frontier. Nevertheless, they have an appearance of solidity, at least with regard to the ordinal concordance among the indices of efficiency obtained by the different approaches, as is demonstrated by the statistical tests used. Finally, we have attempted to search for any relation existing between efficiency and the method (public or private) of managing the services. No significant relation was found between the type of management and efficiencyindices
Resumo:
The objective of this study is to analyse the technical or productive efficiency ofthe refuse collection services in 75 municipalities located in the Spanish regionof Catalonia. The analysis has been carried out using various techniques. Firstly we have calculated a deterministic parametric frontier, then a stochastic parametric frontier, and finally, various non-parametric approaches (DEA and FDH). Concerning the results, these naturally differ according to the technique used to approach the frontier. Nevertheless, they have an appearance of solidity, at least with regard to the ordinal concordance among the indices of efficiency obtained by the different approaches, as is demonstrated by the statistical tests used. Finally, we have attempted to search for any relation existing between efficiency and the method (public or private) of managing the services. No significant relation was found between the type of management and efficiencyindices
Resumo:
A parametric procedure for the blind inversion of nonlinear channels is proposed, based on a recent method of blind source separation in nonlinear mixtures. Experiments show that the proposed algorithms perform efficiently, even in the presence of hard distortion. The method, based on the minimization of the output mutual information, needs the knowledge of log-derivative of input distribution (the so-called score function). Each algorithm consists of three adaptive blocks: one devoted to adaptive estimation of the score function, and two other blocks estimating the inverses of the linear and nonlinear parts of the channel, (quasi-)optimally adapted using the estimated score functions. This paper is mainly concerned by the nonlinear part, for which we propose two parametric models, the first based on a polynomial model and the second on a neural network, while [14, 15] proposed non-parametric approaches.
Resumo:
Inductive learning aims at finding general rules that hold true in a database. Targeted learning seeks rules for the predictions of the value of a variable based on the values of others, as in the case of linear or non-parametric regression analysis. Non-targeted learning finds regularities without a specific prediction goal. We model the product of non-targeted learning as rules that state that a certain phenomenon never happens, or that certain conditions necessitate another. For all types of rules, there is a trade-off between the rule's accuracy and its simplicity. Thus rule selection can be viewed as a choice problem, among pairs of degree of accuracy and degree of complexity. However, one cannot in general tell what is the feasible set in the accuracy-complexity space. Formally, we show that finding out whether a point belongs to this set is computationally hard. In particular, in the context of linear regression, finding a small set of variables that obtain a certain value of R2 is computationally hard. Computational complexity may explain why a person is not always aware of rules that, if asked, she would find valid. This, in turn, may explain why one can change other people's minds (opinions, beliefs) without providing new information.
Resumo:
Estudi elaborat a partir d’una estada al Royal Brompton Hospital, Londres, Regne Unit, durant octubre i novembre del 2006.Els beneficis de la estimulació beta-adrenèrgica en pacients amb lesió pulmonar aguda (LPA) són coneguts, però no es disposa de dades sobre el possible efecte antiinflamatori. El condensat d'aire exhalat (CAE) és una tècnica no-invasiva de recollida de mostres del tracte respiratori inferior, podent ser útil en la monitorització de patologies respiratòries. S’ha usat marcadors biològics en el CAE de pacients ventilats mecànicament amb LPA per estudiar el possible efecte antiinflamatori que el salbutamol hi podria exercir. El CAE va ser recollit abans i després de l'administració de salbutamol inahalat. Inmediatament després es va mesurar la conductivitat i el pH abans i després de la desgasificació amb heli. Es va mesurar la concentració de nitrits i nitrats. Les mostres varen ser liofilitzades i guardades a -80ºC. La concentració de leucotriè B4 es va mesurar després de la reconstitució de la mostra. Els resultats s'expressen com a mitjana (error estàndard de la mostra). No s'han detectat diferències entre els valors de CAE basals dels pacients amb LPA i els de referència de la població sana de Barcelona. Es conclou doncs que el CAE és una tècnica no invasiva que pot ser usada en la monitorització de paceints ventilats mecànicament. El salbutamol inhalat incrementa de manera significativa el pH del CAE dels paceints amb LPA, tot i que un efecte directe de la inhalació de slabutamol no pot ser desestimat.
Resumo:
Public authorities and road users alike are increasingly concerned by recent trends in road safety outcomes in Barcelona, which is the European city with the highest number of registered Powered Two-Wheel (PTW) vehicles per inhabitant,. In this study we explore the determinants of motorcycle and moped accident severity in a large urban area, drawing on Barcelona’s local police database (2002-2008). We apply non-parametric regression techniques to characterize PTW accidents and parametric methods to investigate the factors influencing their severity. Our results show that PTW accident victims are more vulnerable, showing greater degrees of accident severity, than other traffic victims. Speed violations and alcohol consumption provide the worst health outcomes. Demographic and environment-related risk factors, in addition to helmet use, play an important role in determining accident severity. Thus, this study furthers our understanding of the most vulnerable vehicle types, while our results have direct implications for local policy makers in their fight to reduce the severity of PTW accidents in large urban areas.
Resumo:
Our objective is to analyse fraud as an operational risk for the insurance company. We study the effect of a fraud detection policy on the insurer's results account, quantifying the loss risk from the perspective of claims auditing. From the point of view of operational risk, the study aims to analyse the effect of failing to detect fraudulent claims after investigation. We have chosen VAR as the risk measure with a non-parametric estimation of the loss risk involved in the detection or non-detection of fraudulent claims. The most relevant conclusion is that auditing claims reduces loss risk in the insurance company.
Resumo:
Our project aims at analyzing the relevance of economic factors (mainly income and other socioeconomic characteristics of Spanish households and market prices) on the prevalence of obesity in Spain and to what extent market intervention prices are effective to reduce obesity and improve the quality of the diet, and under what circumstances. In relation to the existing literature worldwide, this project is the first attempt in Spain trying to get an overall picture on the effectiveness of public policies on both food consumption and the quality of diet, on one hand, and on the prevalence of obesity on the other. The project consists of four main parts. The first part represents a critical review of the literature on the economic approach of dealing with the obesity prevalence problems, diet quality and public intervention policies. Although another important body of obesity literature is dealing with physical exercise but in this paper we will limit our attention to those studies related to food consumption respecting the scope of our study and as there are many published literature review dealing with the literature related to the physical exercise and its effect on obesity prevalence. The second part consists of a Parametric and Non-Parametric Analysis of the Role of Economic Factors on Obesity Prevalence in Spain. The third part is trying to overcome the shortcomings of many diet quality indices that have been developed during last decades, such as the Healthy Eating Index, the Diet Quality Index, the Healthy Diet Indicator, and the Mediterranean Diet Score, through the development of a new obesity specific diet quality index. While the last part of our project concentrates on the assessment of the effectiveness of market intervention policies to improve the healthiness of the Spanish Diet Using the new Exact Affine Stone Index (EASI) Demand System.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’tincludes below detection limits and/or zero values, and since most of the geological dataresponds to lognormal distributions, these “zero data” represent a mathematicalchallenge for the interpretation.We need to start by recognizing that there are zero values in geology. For example theamount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-existswith nepheline. Another common essential zero is a North azimuth, however we canalways change that zero for the value of 360°. These are known as “Essential zeros”, butwhat can we do with “Rounded zeros” that are the result of below the detection limit ofthe equipment?Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimeswe need to differentiate between a sodic and a potassic alteration. Pre-classification intogroups requires a good knowledge of the distribution of the data and the geochemicalcharacteristics of the groups which is not always available. Considering the zero valuesequal to the limit of detection of the used equipment will generate spuriousdistributions, especially in ternary diagrams. Same situation will occur if we replace thezero values by a small amount using non-parametric or parametric techniques(imputation).The method that we are proposing takes into consideration the well known relationshipsbetween some elements. For example, in copper porphyry deposits, there is always agood direct correlation between the copper values and the molybdenum ones, but whilecopper will always be above the limit of detection, many of the molybdenum values willbe “rounded zeros”. So, we will take the lower quartile of the real molybdenum valuesand establish a regression equation with copper, and then we will estimate the“rounded” zero values of molybdenum by their corresponding copper values.The method could be applied to any type of data, provided we establish first theircorrelation dependency.One of the main advantages of this method is that we do not obtain a fixed value for the“rounded zeros”, but one that depends on the value of the other variable.Key words: compositional data analysis, treatment of zeros, essential zeros, roundedzeros, correlation dependency
Resumo:
How much would output increase if underdeveloped economies were to increase their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on the increase in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number of schooling levels with arbitrary patterns of substitution/complementarity. Another advantage is that the upper bound is robust to certain forms of endogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compare our results with the standard development accounting approach, and provide an update on the results using the standard approach for a large sample of countries.
Resumo:
How much would output increase if underdeveloped economies were toincrease their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on theincrease in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number ofschooling levels with arbitrary patterns of substitution/complementarity.Another advantage is that the upper bound is robust to certain forms ofendogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compareour results with the standard development accounting approach, andprovide an update on the results using the standard approach for a largesample of countries.