938 resultados para Non-parametric regression methods
Resumo:
This paper provides recent evidence about the beneÖts of attending preschool on future performance. A non-parametric matching procedure is used over two outcomes: math and verbal scores at a national mandatory test (Saber11) in Colombia. It is found that students who had the chance of attending preschool obtain higher scores in math (6.7%) and verbal (5.4%) than those who did not. A considerable fraction of these gaps comes from the upper quintiles of studentís performance, suggesting that preschool matters when is done at high quality institutions. When we include the number of years at the preschool, the gap rises up to 12% in verbal and 17% in math.
Resumo:
We document the existence of a Crime Kuznets Curve in US states since the 1970s. As income levels have risen, crime has followed an inverted U-shaped pattern, first increasing and then dropping. The Crime Kuznets Curve is not explained by income inequality. In fact, we show that during the sample period inequality has risen monotonically with income, ruling out the traditional Kuznets Curve. Our finding is robust to adding a large set of controls that are used in the literature to explain the incidence of crime, as well as to controlling for state and year fixed effects. The Curve is also revealed in nonparametric specifications. The Crime Kuznets Curve exists for property crime and for some categories of violent crime.
Resumo:
La tesis se centra en la Visión por Computador y, más concretamente, en la segmentación de imágenes, la cual es una de las etapas básicas en el análisis de imágenes y consiste en la división de la imagen en un conjunto de regiones visualmente distintas y uniformes considerando su intensidad, color o textura. Se propone una estrategia basada en el uso complementario de la información de región y de frontera durante el proceso de segmentación, integración que permite paliar algunos de los problemas básicos de la segmentación tradicional. La información de frontera permite inicialmente identificar el número de regiones presentes en la imagen y colocar en el interior de cada una de ellas una semilla, con el objetivo de modelar estadísticamente las características de las regiones y definir de esta forma la información de región. Esta información, conjuntamente con la información de frontera, es utilizada en la definición de una función de energía que expresa las propiedades requeridas a la segmentación deseada: uniformidad en el interior de las regiones y contraste con las regiones vecinas en los límites. Un conjunto de regiones activas inician entonces su crecimiento, compitiendo por los píxeles de la imagen, con el objetivo de optimizar la función de energía o, en otras palabras, encontrar la segmentación que mejor se adecua a los requerimientos exprsados en dicha función. Finalmente, todo esta proceso ha sido considerado en una estructura piramidal, lo que nos permite refinar progresivamente el resultado de la segmentación y mejorar su coste computacional. La estrategia ha sido extendida al problema de segmentación de texturas, lo que implica algunas consideraciones básicas como el modelaje de las regiones a partir de un conjunto de características de textura y la extracción de la información de frontera cuando la textura es presente en la imagen. Finalmente, se ha llevado a cabo la extensión a la segmentación de imágenes teniendo en cuenta las propiedades de color y textura. En este sentido, el uso conjunto de técnicas no-paramétricas de estimación de la función de densidad para la descripción del color, y de características textuales basadas en la matriz de co-ocurrencia, ha sido propuesto para modelar adecuadamente y de forma completa las regiones de la imagen. La propuesta ha sido evaluada de forma objetiva y comparada con distintas técnicas de integración utilizando imágenes sintéticas. Además, se han incluido experimentos con imágenes reales con resultados muy positivos.
Resumo:
The use of Bayesian inference in the inference of time-frequency representations has, thus far, been limited to offline analysis of signals, using a smoothing spline based model of the time-frequency plane. In this paper we introduce a new framework that allows the routine use of Bayesian inference for online estimation of the time-varying spectral density of a locally stationary Gaussian process. The core of our approach is the use of a likelihood inspired by a local Whittle approximation. This choice, along with the use of a recursive algorithm for non-parametric estimation of the local spectral density, permits the use of a particle filter for estimating the time-varying spectral density online. We provide demonstrations of the algorithm through tracking chirps and the analysis of musical data.
Resumo:
This paper models the transmission of shocks between the US, Japanese and Australian equity markets. Tests for the existence of linear and non-linear transmission of volatility across the markets are performed using parametric and non-parametric techniques. In particular the size and sign of return innovations are important factors in determining the degree of spillovers in volatility. It is found that a multivariate asymmetric GARCH formulation can explain almost all of the non-linear causality between markets. These results have important implications for the construction of models and forecasts of international equity returns.
Resumo:
In this paper, we study the role of the volatility risk premium for the forecasting performance of implied volatility. We introduce a non-parametric and parsimonious approach to adjust the model-free implied volatility for the volatility risk premium and implement this methodology using more than 20 years of options and futures data on three major energy markets. Using regression models and statistical loss functions, we find compelling evidence to suggest that the risk premium adjusted implied volatility significantly outperforms other models, including its unadjusted counterpart. Our main finding holds for different choices of volatility estimators and competing time-series models, underlying the robustness of our results.
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Resumo:
Although the sunspot-number series have existed since the mid-19th century, they are still the subject of intense debate, with the largest uncertainty being related to the "calibration" of the visual acuity of individual observers in the past. Daisy-chain regression methods are applied to inter-calibrate the observers which may lead to significant bias and error accumulation. Here we present a novel method to calibrate the visual acuity of the key observers to the reference data set of Royal Greenwich Observatory sunspot groups for the period 1900-1976, using the statistics of the active-day fraction. For each observer we independently evaluate their observational thresholds [S_S] defined such that the observer is assumed to miss all of the groups with an area smaller than S_S and report all the groups larger than S_S. Next, using a Monte-Carlo method we construct, from the reference data set, a correction matrix for each observer. The correction matrices are significantly non-linear and cannot be approximated by a linear regression or proportionality. We emphasize that corrections based on a linear proportionality between annually averaged data lead to serious biases and distortions of the data. The correction matrices are applied to the original sunspot group records for each day, and finally the composite corrected series is produced for the period since 1748. The corrected series displays secular minima around 1800 (Dalton minimum) and 1900 (Gleissberg minimum), as well as the Modern grand maximum of activity in the second half of the 20th century. The uniqueness of the grand maximum is confirmed for the last 250 years. It is shown that the adoption of a linear relationship between the data of Wolf and Wolfer results in grossly inflated group numbers in the 18th and 19th centuries in some reconstructions.
Resumo:
Cobalt is one of the main components of cast metal alloys broadly used in dentistry. It is the constituent of 45 to 70% of numerous prosthetic works. There are evidences that metal elements cause systemic and local toxicity. The purpose of the present study was to evaluate the effects of cobalt on the junctional epithelium and reduced enamel epithelium of the first superior molar in rats, during lactation. To do this, 1-day old rats were used, whose mothers received 300mg of cobalt chloride per liter of distilled water in the drinker, during lactation. After 21 days, the rat pups were killed with an anesthetic overdose. The heads were separated, fixed in ""alfac"", decalcified and embedded in paraffin. Frontal sections stained with hematoxylin and eosin were employed. Karyometric methods allowed to estimate the following parameters: biggest, smallest and mean diameters, D/d ratio, perimeter, area, volume, volume/area ratio, eccentricity, form coefficient and contour index. Stereologic methods allow to evaluate: cytoplasm/nucleus ratio, cell and cytoplasm volume, cell number density, external surface/basal membrane ratio, thickness of the epithelial layers and surface density. All the collected data were subjected to statistic analysis by the non-parametric Wilcoxon-Mann-Whitney test. The nuclei of the studied tissues showed smaller values after karyometry for: diameters; perimeter, area, volume and volume/area ratio. Stereologically, it was observed, in the junctional epithelium and in the reduced enamel epithelium, smaller cells with scarce cytoplasm, reflected in the greater number of cells per mm3 of tissue. In this study, cobalt caused epithelial atrophy, indicating a direct action on the junctional and enamel epithelium.
Resumo:
Background: The oral health conditions of indigenous peoples in Amazonia are closely associated with ecological and dietary changes related to interaction with non-Indians. Aim: The study investigated the incidence of caries in an indigenous community from Central Brazil focusing on gender differences. Subjects and methods: The research was conducted among the Xavante Indians and was based on longitudinal data collected in two surveys (1999 and 2004). The study included 128 individuals, 63 (49.2%) males and 65 (50.8%) females, divided in four age brackets (6-12, 13-19, 20-34, 35-60 years of age). The DMFT (decayed, missing and filled teeth) index and incidences (difference between 1999 and 2004) were calculated for each individual. The proportion of incidence was also calculated. Differences in caries risk between gender and age brackets were compared by parametric and non-parametric tests. Results: There were statistically significant differences in relation to caries incidence between age brackets and gender. The greatest incidence was observed in the 20-34 age bracket, which presented 3.30 new decayed teeth, twice the risk of the 6-12 age bracket (p0.01), chosen as reference. While females in most age groups did not show higher risk for caries when compared to males, there was a 4.04-fold risk in the 20-34 age bracket (p0.01). Conclusion: It is concluded that factors related to the social functions of each sex (gender issues) and differential access to information, health services, and education may help to understand the differences observed in the incidence of caries.
Resumo:
When missing data occur in studies designed to compare the accuracy of diagnostic tests, a common, though naive, practice is to base the comparison of sensitivity, specificity, as well as of positive and negative predictive values on some subset of the data that fits into methods implemented in standard statistical packages. Such methods are usually valid only under the strong missing completely at random (MCAR) assumption and may generate biased and less precise estimates. We review some models that use the dependence structure of the completely observed cases to incorporate the information of the partially categorized observations into the analysis and show how they may be fitted via a two-stage hybrid process involving maximum likelihood in the first stage and weighted least squares in the second. We indicate how computational subroutines written in R may be used to fit the proposed models and illustrate the different analysis strategies with observational data collected to compare the accuracy of three distinct non-invasive diagnostic methods for endometriosis. The results indicate that even when the MCAR assumption is plausible, the naive partial analyses should be avoided.
Resumo:
Background and aims Evaluating status in patients with motor fluctuations is complex and occasional observations/measurements do not give an adequate picture as to the time spent in different states. We developed a test battery to assess advanced Parkinson patients' status consisting of diary assessments and motor tests. This battery was constructed and implemented on a handheld computer with built-in mobile communication. In fluctuating patients, it should typically be used several times daily in the home environment, over periods of about one week. The aim of this battery is to provide status information in order to evaluate treatment effects in clinical practice and research, follow up treatments and disease progression and predict outcome to optimize treatment strategy. Methods Selection of diary questions was based on a previous study with Duodopa® (DIREQT). Tapping tests (with and without visual cueing) and a spiral drawing test were added. Rapid prototyping was used in development of the user interface. An evaluation with two pilot patients was performed before and after receiving new treatments for advanced disease (one received Duodopa® and one received DBS). Speed and proportion missed taps were calculated for the tapping tests and entropy of the radial drawing velocity was calculated for the spiral tests. Test variables were evaluated using non-parametric statistics. Results Post-treatment improvement was detected in both patients in many of the test variables. Conclusions Although validation work remains, preliminary results are promising and the test battery is currently being evaluated in a long-term health economics study with Duodopa® (DAPHNE).
Resumo:
Climate change has resulted in substantial variations in annual extreme rainfall quantiles in different durations and return periods. Predicting the future changes in extreme rainfall quantiles is essential for various water resources design, assessment, and decision making purposes. Current Predictions of future rainfall extremes, however, exhibit large uncertainties. According to extreme value theory, rainfall extremes are rather random variables, with changing distributions around different return periods; therefore there are uncertainties even under current climate conditions. Regarding future condition, our large-scale knowledge is obtained using global climate models, forced with certain emission scenarios. There are widely known deficiencies with climate models, particularly with respect to precipitation projections. There is also recognition of the limitations of emission scenarios in representing the future global change. Apart from these large-scale uncertainties, the downscaling methods also add uncertainty into estimates of future extreme rainfall when they convert the larger-scale projections into local scale. The aim of this research is to address these uncertainties in future projections of extreme rainfall of different durations and return periods. We plugged 3 emission scenarios with 2 global climate models and used LARS-WG, a well-known weather generator, to stochastically downscale daily climate models’ projections for the city of Saskatoon, Canada, by 2100. The downscaled projections were further disaggregated into hourly resolution using our new stochastic and non-parametric rainfall disaggregator. The extreme rainfall quantiles can be consequently identified for different durations (1-hour, 2-hour, 4-hour, 6-hour, 12-hour, 18-hour and 24-hour) and return periods (2-year, 10-year, 25-year, 50-year, 100-year) using Generalized Extreme Value (GEV) distribution. By providing multiple realizations of future rainfall, we attempt to measure the extent of total predictive uncertainty, which is contributed by climate models, emission scenarios, and downscaling/disaggregation procedures. The results show different proportions of these contributors in different durations and return periods.
Resumo:
In this thesis, we investigate some aspects of the interplay between economic regulation and the risk of the regulated firm. In the first chapter, the main goal is to understand the implications a mainstream regulatory model (Laffont and Tirole, 1993) have on the systematic risk of the firm. We generalize the model in order to incorporate aggregate risk, and find that the optimal regulatory contract must be severely constrained in order to reproduce real-world systematic risk levels. We also consider the optimal profit-sharing mechanism, with an endogenous sharing rate, to explore the relationship between contract power and beta. We find results compatible with the available evidence that high-powered regimes impose more risk to the firm. In the second chapter, a joint work with Daniel Lima from the University of California, San Diego (UCSD), we start from the observation that regulated firms are subject to some regulatory practices that potentially affect the symmetry of the distribution of their future profits. If these practices are anticipated by investors in the stock market, the pattern of asymmetry in the empirical distribution of stock returns may differ among regulated and non-regulated companies. We review some recently proposed asymmetry measures that are robust to the empirical regularities of return data and use them to investigate whether there are meaningful differences in the distribution of asymmetry between these two groups of companies. In the third and last chapter, three different approaches to the capital asset pricing model of Kraus and Litzenberger (1976) are tested with recent Brazilian data and estimated using the generalized method of moments (GMM) as a unifying procedure. We find that ex-post stock returns generally exhibit statistically significant coskewness with the market portfolio, and hence are sensitive to squared market returns. However, while the theoretical ground for the preference for skewness is well established and fairly intuitive, we did not find supporting evidence that investors require a premium for supporting this risk factor in Brazil.