30 resultados para non-parametric

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a real data set of claims amounts where costs related to damage are recorded separately from those related to medical expenses. Only claims with positive costs are considered here. Two approaches to density estimation are presented: a classical parametric and a semi-parametric method, based on transformation kernel density estimation. We explore the data set with standard univariate methods. We also propose ways to select the bandwidth and transformation parameters in the univariate case based on Bayesian methods. We indicate how to compare the results of alternative methods both looking at the shape of the overall density domain and exploring the density estimates in the right tail.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Land cover classification is a key research field in remote sensing and land change science as thematic maps derived from remotely sensed data have become the basis for analyzing many socio-ecological issues. However, land cover classification remains a difficult task and it is especially challenging in heterogeneous tropical landscapes where nonetheless such maps are of great importance. The present study aims to establish an efficient classification approach to accurately map all broad land cover classes in a large, heterogeneous tropical area of Bolivia, as a basis for further studies (e.g., land cover-land use change). Specifically, we compare the performance of parametric (maximum likelihood), non-parametric (k-nearest neighbour and four different support vector machines - SVM), and hybrid classifiers, using both hard and soft (fuzzy) accuracy assessments. In addition, we test whether the inclusion of a textural index (homogeneity) in the classifications improves their performance. We classified Landsat imagery for two dates corresponding to dry and wet seasons and found that non-parametric, and particularly SVM classifiers, outperformed both parametric and hybrid classifiers. We also found that the use of the homogeneity index along with reflectance bands significantly increased the overall accuracy of all the classifications, but particularly of SVM algorithms. We observed that improvements in producer’s and user’s accuracies through the inclusion of the homogeneity index were different depending on land cover classes. Earlygrowth/degraded forests, pastures, grasslands and savanna were the classes most improved, especially with the SVM radial basis function and SVM sigmoid classifiers, though with both classifiers all land cover classes were mapped with producer’s and user’s accuracies of around 90%. Our approach seems very well suited to accurately map land cover in tropical regions, thus having the potential to contribute to conservation initiatives, climate change mitigation schemes such as REDD+, and rural development policies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A parametric procedure for the blind inversion of nonlinear channels is proposed, based on a recent method of blind source separation in nonlinear mixtures. Experiments show that the proposed algorithms perform efficiently, even in the presence of hard distortion. The method, based on the minimization of the output mutual information, needs the knowledge of log-derivative of input distribution (the so-called score function). Each algorithm consists of three adaptive blocks: one devoted to adaptive estimation of the score function, and two other blocks estimating the inverses of the linear and nonlinear parts of the channel, (quasi-)optimally adapted using the estimated score functions. This paper is mainly concerned by the nonlinear part, for which we propose two parametric models, the first based on a polynomial model and the second on a neural network, while [14, 15] proposed non-parametric approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Inductive learning aims at finding general rules that hold true in a database. Targeted learning seeks rules for the predictions of the value of a variable based on the values of others, as in the case of linear or non-parametric regression analysis. Non-targeted learning finds regularities without a specific prediction goal. We model the product of non-targeted learning as rules that state that a certain phenomenon never happens, or that certain conditions necessitate another. For all types of rules, there is a trade-off between the rule's accuracy and its simplicity. Thus rule selection can be viewed as a choice problem, among pairs of degree of accuracy and degree of complexity. However, one cannot in general tell what is the feasible set in the accuracy-complexity space. Formally, we show that finding out whether a point belongs to this set is computationally hard. In particular, in the context of linear regression, finding a small set of variables that obtain a certain value of R2 is computationally hard. Computational complexity may explain why a person is not always aware of rules that, if asked, she would find valid. This, in turn, may explain why one can change other people's minds (opinions, beliefs) without providing new information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Public authorities and road users alike are increasingly concerned by recent trends in road safety outcomes in Barcelona, which is the European city with the highest number of registered Powered Two-Wheel (PTW) vehicles per inhabitant,. In this study we explore the determinants of motorcycle and moped accident severity in a large urban area, drawing on Barcelona’s local police database (2002-2008). We apply non-parametric regression techniques to characterize PTW accidents and parametric methods to investigate the factors influencing their severity. Our results show that PTW accident victims are more vulnerable, showing greater degrees of accident severity, than other traffic victims. Speed violations and alcohol consumption provide the worst health outcomes. Demographic and environment-related risk factors, in addition to helmet use, play an important role in determining accident severity. Thus, this study furthers our understanding of the most vulnerable vehicle types, while our results have direct implications for local policy makers in their fight to reduce the severity of PTW accidents in large urban areas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our objective is to analyse fraud as an operational risk for the insurance company. We study the effect of a fraud detection policy on the insurer's results account, quantifying the loss risk from the perspective of claims auditing. From the point of view of operational risk, the study aims to analyse the effect of failing to detect fraudulent claims after investigation. We have chosen VAR as the risk measure with a non-parametric estimation of the loss risk involved in the detection or non-detection of fraudulent claims. The most relevant conclusion is that auditing claims reduces loss risk in the insurance company.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our project aims at analyzing the relevance of economic factors (mainly income and other socioeconomic characteristics of Spanish households and market prices) on the prevalence of obesity in Spain and to what extent market intervention prices are effective to reduce obesity and improve the quality of the diet, and under what circumstances. In relation to the existing literature worldwide, this project is the first attempt in Spain trying to get an overall picture on the effectiveness of public policies on both food consumption and the quality of diet, on one hand, and on the prevalence of obesity on the other. The project consists of four main parts. The first part represents a critical review of the literature on the economic approach of dealing with the obesity prevalence problems, diet quality and public intervention policies. Although another important body of obesity literature is dealing with physical exercise but in this paper we will limit our attention to those studies related to food consumption respecting the scope of our study and as there are many published literature review dealing with the literature related to the physical exercise and its effect on obesity prevalence. The second part consists of a Parametric and Non-Parametric Analysis of the Role of Economic Factors on Obesity Prevalence in Spain. The third part is trying to overcome the shortcomings of many diet quality indices that have been developed during last decades, such as the Healthy Eating Index, the Diet Quality Index, the Healthy Diet Indicator, and the Mediterranean Diet Score, through the development of a new obesity specific diet quality index. While the last part of our project concentrates on the assessment of the effectiveness of market intervention policies to improve the healthiness of the Spanish Diet Using the new Exact Affine Stone Index (EASI) Demand System.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There is almost not a case in exploration geology, where the studied data doesn’tincludes below detection limits and/or zero values, and since most of the geological dataresponds to lognormal distributions, these “zero data” represent a mathematicalchallenge for the interpretation.We need to start by recognizing that there are zero values in geology. For example theamount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-existswith nepheline. Another common essential zero is a North azimuth, however we canalways change that zero for the value of 360°. These are known as “Essential zeros”, butwhat can we do with “Rounded zeros” that are the result of below the detection limit ofthe equipment?Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimeswe need to differentiate between a sodic and a potassic alteration. Pre-classification intogroups requires a good knowledge of the distribution of the data and the geochemicalcharacteristics of the groups which is not always available. Considering the zero valuesequal to the limit of detection of the used equipment will generate spuriousdistributions, especially in ternary diagrams. Same situation will occur if we replace thezero values by a small amount using non-parametric or parametric techniques(imputation).The method that we are proposing takes into consideration the well known relationshipsbetween some elements. For example, in copper porphyry deposits, there is always agood direct correlation between the copper values and the molybdenum ones, but whilecopper will always be above the limit of detection, many of the molybdenum values willbe “rounded zeros”. So, we will take the lower quartile of the real molybdenum valuesand establish a regression equation with copper, and then we will estimate the“rounded” zero values of molybdenum by their corresponding copper values.The method could be applied to any type of data, provided we establish first theircorrelation dependency.One of the main advantages of this method is that we do not obtain a fixed value for the“rounded zeros”, but one that depends on the value of the other variable.Key words: compositional data analysis, treatment of zeros, essential zeros, roundedzeros, correlation dependency

Relevância:

60.00% 60.00%

Publicador:

Resumo:

How much would output increase if underdeveloped economies were to increase their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on the increase in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number of schooling levels with arbitrary patterns of substitution/complementarity. Another advantage is that the upper bound is robust to certain forms of endogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compare our results with the standard development accounting approach, and provide an update on the results using the standard approach for a large sample of countries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

How much would output increase if underdeveloped economies were toincrease their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on theincrease in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number ofschooling levels with arbitrary patterns of substitution/complementarity.Another advantage is that the upper bound is robust to certain forms ofendogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compareour results with the standard development accounting approach, andprovide an update on the results using the standard approach for a largesample of countries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a comparative analysis of linear and mixed modelsfor short term forecasting of a real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay.The series is interpolated with a linear predictor which minimizes theforecast mean square error. The linear models are seasonal ARIMA models and themixed models have a linear component and a non linear seasonal component.The non linear component is estimated by a non parametric regression of dataversus time. Short term forecasts, no more than two days ahead, are of interestbecause they can be used by the port authorities to notice the fleet.Several models are fitted and compared by their forecasting behavior.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We revisit the debt overhang question. We first use non-parametric techniques to isolate a panel of countries on the downward sloping section of a debt Laffer curve. In particular, overhang countries are ones where a threshold level of debt is reached in sample, beyond which (initial) debt ends up lowering (subsequent)growth. On average, significantly negative coefficients appear when debt face value reaches 60 percent of GDP or 200 percent of exports, and when its present value reaches 40 percent of GDP or 140 percent of exports. Second, we depart from reduced form growth regressions and perform direct tests of the theory on the thus selected sample of overhang countries. In the spirit of event studies, we ask whether, as overhang level of debt is reached: (i)investment falls precipitously as it should when it becomes optimal to default, (ii) economic policy deteriorates observably, as it should when debt contracts become unable to elicit effort on the part of the debtor, and (iii) the terms of borrowing worsen noticeably, as they should when it becomes optimal for creditors to pre-empt default and exact punitive interest rates. We find a systematic response of investment, particularly when property rights are weakly enforced, some worsening of the policy environment, and a fall in interest rates. This easing of borrowing conditions happens because lending by the private sector virtually disappears in overhang situations, and multilateral agencies step in with concessional rates. Thus, while debt relief is likely to improve economic policy (and especially investment) in overhang countries, it is doubtful that it would ease their terms of borrowing, or the burden of debt.