51 resultados para Non-parametric density estimator


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is almost not a case in exploration geology, where the studied data doesn’tincludes below detection limits and/or zero values, and since most of the geological dataresponds to lognormal distributions, these “zero data” represent a mathematicalchallenge for the interpretation.We need to start by recognizing that there are zero values in geology. For example theamount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-existswith nepheline. Another common essential zero is a North azimuth, however we canalways change that zero for the value of 360°. These are known as “Essential zeros”, butwhat can we do with “Rounded zeros” that are the result of below the detection limit ofthe equipment?Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimeswe need to differentiate between a sodic and a potassic alteration. Pre-classification intogroups requires a good knowledge of the distribution of the data and the geochemicalcharacteristics of the groups which is not always available. Considering the zero valuesequal to the limit of detection of the used equipment will generate spuriousdistributions, especially in ternary diagrams. Same situation will occur if we replace thezero values by a small amount using non-parametric or parametric techniques(imputation).The method that we are proposing takes into consideration the well known relationshipsbetween some elements. For example, in copper porphyry deposits, there is always agood direct correlation between the copper values and the molybdenum ones, but whilecopper will always be above the limit of detection, many of the molybdenum values willbe “rounded zeros”. So, we will take the lower quartile of the real molybdenum valuesand establish a regression equation with copper, and then we will estimate the“rounded” zero values of molybdenum by their corresponding copper values.The method could be applied to any type of data, provided we establish first theircorrelation dependency.One of the main advantages of this method is that we do not obtain a fixed value for the“rounded zeros”, but one that depends on the value of the other variable.Key words: compositional data analysis, treatment of zeros, essential zeros, roundedzeros, correlation dependency

Relevância:

100.00% 100.00%

Publicador:

Resumo:

How much would output increase if underdeveloped economies were to increase their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on the increase in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number of schooling levels with arbitrary patterns of substitution/complementarity. Another advantage is that the upper bound is robust to certain forms of endogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compare our results with the standard development accounting approach, and provide an update on the results using the standard approach for a large sample of countries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

How much would output increase if underdeveloped economies were toincrease their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on theincrease in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number ofschooling levels with arbitrary patterns of substitution/complementarity.Another advantage is that the upper bound is robust to certain forms ofendogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compareour results with the standard development accounting approach, andprovide an update on the results using the standard approach for a largesample of countries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although the histogram is the most widely used density estimator, itis well--known that the appearance of a constructed histogram for a given binwidth can change markedly for different choices of anchor position. In thispaper we construct a stability index $G$ that assesses the potential changesin the appearance of histograms for a given data set and bin width as theanchor position changes. If a particular bin width choice leads to an unstableappearance, the arbitrary choice of any one anchor position is dangerous, anda different bin width should be considered. The index is based on the statisticalroughness of the histogram estimate. We show via Monte Carlo simulation thatdensities with more structure are more likely to lead to histograms withunstable appearance. In addition, ignoring the precision to which the datavalues are provided when choosing the bin width leads to instability. We provideseveral real data examples to illustrate the properties of $G$. Applicationsto other binned density estimators are also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a comparative analysis of linear and mixed modelsfor short term forecasting of a real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay.The series is interpolated with a linear predictor which minimizes theforecast mean square error. The linear models are seasonal ARIMA models and themixed models have a linear component and a non linear seasonal component.The non linear component is estimated by a non parametric regression of dataversus time. Short term forecasts, no more than two days ahead, are of interestbecause they can be used by the port authorities to notice the fleet.Several models are fitted and compared by their forecasting behavior.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We revisit the debt overhang question. We first use non-parametric techniques to isolate a panel of countries on the downward sloping section of a debt Laffer curve. In particular, overhang countries are ones where a threshold level of debt is reached in sample, beyond which (initial) debt ends up lowering (subsequent)growth. On average, significantly negative coefficients appear when debt face value reaches 60 percent of GDP or 200 percent of exports, and when its present value reaches 40 percent of GDP or 140 percent of exports. Second, we depart from reduced form growth regressions and perform direct tests of the theory on the thus selected sample of overhang countries. In the spirit of event studies, we ask whether, as overhang level of debt is reached: (i)investment falls precipitously as it should when it becomes optimal to default, (ii) economic policy deteriorates observably, as it should when debt contracts become unable to elicit effort on the part of the debtor, and (iii) the terms of borrowing worsen noticeably, as they should when it becomes optimal for creditors to pre-empt default and exact punitive interest rates. We find a systematic response of investment, particularly when property rights are weakly enforced, some worsening of the policy environment, and a fall in interest rates. This easing of borrowing conditions happens because lending by the private sector virtually disappears in overhang situations, and multilateral agencies step in with concessional rates. Thus, while debt relief is likely to improve economic policy (and especially investment) in overhang countries, it is doubtful that it would ease their terms of borrowing, or the burden of debt.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this paper is to present an optimal resource allocation model for the regional allocation of public service inputs. Theproposed solution leads to maximise the relative public service availability in regions located below the best availability frontier, subject to exogenous budget restrictions and equality ofaccess for equal need criteria (equity-based notion of regional needs). The construction of non-parametric deficit indicators is proposed for public service availability by a novel application of Data Envelopment Analysis (DEA) models, whose results offer advantages for the evaluation and improvement of decentralised public resource allocation systems. The method introduced in this paper has relevance as a resource allocation guide for the majority of services centrally funded by the public sector in a given country, such as health care, basic and higher education, citizen safety, justice, transportation, environmental protection, leisure, culture, housing and city planning, etc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the scope of the European project Hydroptimet, INTERREG IIIB-MEDOCC programme, limited area model (LAM) intercomparison of intense events that produced many damages to people and territory is performed. As the comparison is limited to single case studies, the work is not meant to provide a measure of the different models' skill, but to identify the key model factors useful to give a good forecast on such a kind of meteorological phenomena. This work focuses on the Spanish flash-flood event, also known as "Montserrat-2000" event. The study is performed using forecast data from seven operational LAMs, placed at partners' disposal via the Hydroptimet ftp site, and observed data from Catalonia rain gauge network. To improve the event analysis, satellite rainfall estimates have been also considered. For statistical evaluation of quantitative precipitation forecasts (QPFs), several non-parametric skill scores based on contingency tables have been used. Furthermore, for each model run it has been possible to identify Catalonia regions affected by misses and false alarms using contingency table elements. Moreover, the standard "eyeball" analysis of forecast and observed precipitation fields has been supported by the use of a state-of-the-art diagnostic method, the contiguous rain area (CRA) analysis. This method allows to quantify the spatial shift forecast error and to identify the error sources that affected each model forecasts. High-resolution modelling and domain size seem to have a key role for providing a skillful forecast. Further work is needed to support this statement, including verification using a wider observational data set.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study is to analyse the technical or productive efficiency ofthe refuse collection services in 75 municipalities located in the Spanish regionof Catalonia. The analysis has been carried out using various techniques. Firstly we have calculated a deterministic parametric frontier, then a stochastic parametric frontier, and finally, various non-parametric approaches (DEA and FDH). Concerning the results, these naturally differ according to the technique used to approach the frontier. Nevertheless, they have an appearance of solidity, at least with regard to the ordinal concordance among the indices of efficiency obtained by the different approaches, as is demonstrated by the statistical tests used. Finally, we have attempted to search for any relation existing between efficiency and the method (public or private) of managing the services. No significant relation was found between the type of management and efficiencyindices

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study is to analyse the technical or productive efficiency ofthe refuse collection services in 75 municipalities located in the Spanish regionof Catalonia. The analysis has been carried out using various techniques. Firstly we have calculated a deterministic parametric frontier, then a stochastic parametric frontier, and finally, various non-parametric approaches (DEA and FDH). Concerning the results, these naturally differ according to the technique used to approach the frontier. Nevertheless, they have an appearance of solidity, at least with regard to the ordinal concordance among the indices of efficiency obtained by the different approaches, as is demonstrated by the statistical tests used. Finally, we have attempted to search for any relation existing between efficiency and the method (public or private) of managing the services. No significant relation was found between the type of management and efficiencyindices

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo de este trabajo es analizar como ha evolucionado y los efectos que el tipo de propiedad tiene sobre el desempeño de los bancos en aquellos países de la Europa Central y del Este, que en los últimos años han experimentado con gran intensidad el proceso de integración europea. Con este fin, hemos analizado 242 bancos correspondientes a 12 países (10 nuevos miembros de la UE y 2 en fase de negociación). Para verificar la existencia de un efecto derivado del tipo de propiedad, analizamos las dimensiones de la eficiencia bancaria, rentabilidad, costes, e intermediación, mediante la aplicación de distintas técnicas, tanto paramétricas como no paramétricas. Los resultados muestran la existencia de ciertos efectos derivados del tipo de propiedad. Así, entre los principales resultados, destaca que los bancos privatizados tienden a presentar unos niveles de rentabilidad superiores a los presentados por otros tipos de propiedad, mientras que a su vez, los bancos de origen extranjero son los que de media presentan unos menores niveles de costes, si bien esta diferencia no es estadísticamente significativa. Analizamos también la importancia que supone la presencia de un inversor estratégico en la propiedad de los bancos, obteniendo una mejoría que si bien no es significativa en los ratios de rentabilidad, si lo es en relación a los gastos generales de gestión.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Psychophysical studies suggest that humans preferentially use a narrow band of low spatial frequencies for face recognition. Here we asked whether artificial face recognition systems have an improved recognition performance at the same spatial frequencies as humans. To this end, we estimated recognition performance over a large database of face images by computing three discriminability measures: Fisher Linear Discriminant Analysis, Non-Parametric Discriminant Analysis, and Mutual Information. In order to address frequency dependence, discriminabilities were measured as a function of (filtered) image size. All three measures revealed a maximum at the same image sizes, where the spatial frequency content corresponds to the psychophysical found frequencies. Our results therefore support the notion that the critical band of spatial frequencies for face recognition in humans and machines follows from inherent properties of face images, and that the use of these frequencies is associated with optimal face recognition performance.