879 resultados para Predictive regression
Resumo:
The Normal Quantile Transform (NQT) has been used in many hydrological and meteorological applications in order to make the Cumulated Distribution Function (CDF) of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP) developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as the Normal-Score Transform. In this paper some possible problems caused by small sample sizes when applying the NQT in flood forecasting systems will be discussed and a novel way to solve the problem will be outlined by combining extreme value analysis and non-parametric regression methods. The method will be illustrated by examples of hydrological stream-flow forecasts.
Resumo:
(ABR) is of fundamental importance to the investiga- tion of the auditory system behavior, though its in- terpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analyzing the ABR, clinicians are often interested in the identi- fication of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave la- tency) is a practical tool for the diagnosis of disorders affecting the auditory system. In this context, the aim of this research is to compare ABR manual/visual analysis provided by different examiners. Methods: The ABR data were collected from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). A total of 160 data samples were analyzed and a pair- wise comparison between four distinct examiners was executed. We carried out a statistical study aiming to identify significant differences between assessments provided by the examiners. For this, we used Linear Regression in conjunction with Bootstrap, as a me- thod for evaluating the relation between the responses given by the examiners. Results: The analysis sug- gests agreement among examiners however reveals differences between assessments of the variability of the waves. We quantified the magnitude of the ob- tained wave latency differences and 18% of the inves- tigated waves presented substantial differences (large and moderate) and of these 3.79% were considered not acceptable for the clinical practice. Conclusions: Our results characterize the variability of the manual analysis of ABR data and the necessity of establishing unified standards and protocols for the analysis of these data. These results may also contribute to the validation and development of automatic systems that are employed in the early diagnosis of hearing loss.
Resumo:
In this article, we illustrate experimentally an important consequence of the stochastic component in choice behaviour which has not been acknowledged so far. Namely, its potential to produce ‘regression to the mean’ (RTM) effects. We employ a novel approach to individual choice under risk, based on repeated multiple-lottery choices (i.e. choices among many lotteries), to show how the high degree of stochastic variability present in individual decisions can distort crucially certain results through RTM effects. We demonstrate the point in the context of a social comparison experiment.
Resumo:
We evaluate the predictive power of leading indicators for output growth at horizons up to 1 year. We use the MIDAS regression approach as this allows us to combine multiple individual leading indicators in a parsimonious way and to directly exploit the information content of the monthly series to predict quarterly output growth. When we use real-time vintage data, the indicators are found to have significant predictive ability, and this is further enhanced by the use of monthly data on the quarter at the time the forecast is made
Resumo:
There is strong evidence that neonates imitate previously unseen behaviors. These behaviors are predominantly used in social interactions, demonstrating neonates’ ability and motivation to engage with others. Research on neonatal imitation can provide a wealth of information about the early mirror neuron system (MNS): namely, its functional characteristics, its plasticity from birth, and its relation to skills later in development. Though numerous studies document the existence of neonatal imitation in the laboratory, little is known about its natural occurrence during parent-infant interactions and its plasticity as a consequence of experience. We review these critical aspects of imitation, which we argue are necessary for understanding the early action-perception system. We address common criticisms and misunderstandings about neonatal imitation and discuss methodological differences among studies. Recent work reveals that individual differences in neonatal imitation positively correlate with later social, cognitive, and motor development. We propose that such variation in neonatal imitation could reflect important individual differences of the MNS. Although postnatal experience is not necessary for imitation, we present evidence that neonatal imitation is influenced by experience in the first week of life.
Resumo:
When the sensory consequences of an action are systematically altered our brain can recalibrate the mappings between sensory cues and properties of our environment. This recalibration can be driven by both cue conflicts and altered sensory statistics, but neither mechanism offers a way for cues to be calibrated so they provide accurate information about the world, as sensory cues carry no information as to their own accuracy. Here, we explored whether sensory predictions based on internal physical models could be used to accurately calibrate visual cues to 3D surface slant. Human observers played a 3D kinematic game in which they adjusted the slant of a surface so that a moving ball would bounce off the surface and through a target hoop. In one group, the ball’s bounce was manipulated so that the surface behaved as if it had a different slant to that signaled by visual cues. With experience of this altered bounce, observers recalibrated their perception of slant so that it was more consistent with the assumed laws of kinematics and physical behavior of the surface. In another group, making the ball spin in a way that could physically explain its altered bounce eliminated this pattern of recalibration. Importantly, both groups adjusted their behavior in the kinematic game in the same way, experienced the same set of slants and were not presented with low-level cue conflicts that could drive the recalibration. We conclude that observers use predictive kinematic models to accurately calibrate visual cues to 3D properties of world.
Resumo:
An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
Our digital universe is rapidly expanding,more and more daily activities are digitally recorded, data arrives in streams, it needs to be analyzed in real time and may evolve over time. In the last decade many adaptive learning algorithms and prediction systems, which can automatically update themselves with the new incoming data, have been developed. The majority of those algorithms focus on improving the predictive performance and assume that model update is always desired as soon as possible and as frequently as possible. In this study we consider potential model update as an investment decision, which, as in the financial markets, should be taken only if a certain return on investment is expected. We introduce and motivate a new research problem for data streams ? cost-sensitive adaptation. We propose a reference framework for analyzing adaptation strategies in terms of costs and benefits. Our framework allows to characterize and decompose the costs of model updates, and to asses and interpret the gains in performance due to model adaptation for a given learning algorithm on a given prediction task. Our proof-of-concept experiment demonstrates how the framework can aid in analyzing and managing adaptation decisions in the chemical industry.
Resumo:
This paper employs a probit and a Markov switching model using information from the Conference Board Leading Indicator and other predictor variables to forecast the signs of future rental growth in four key U.S. commercial rent series. We find that both approaches have considerable power to predict changes in the direction of commercial rents up to two years ahead, exhibiting strong improvements over a naïve model, especially for the warehouse and apartment sectors. We find that while the Markov switching model appears to be more successful, it lags behind actual turnarounds in market outcomes whereas the probit is able to detect whether rental growth will be positive or negative several quarters ahead.
Resumo:
An efficient data based-modeling algorithm for nonlinear system identification is introduced for radial basis function (RBF) neural networks with the aim of maximizing generalization capability based on the concept of leave-one-out (LOO) cross validation. Each of the RBF kernels has its own kernel width parameter and the basic idea is to optimize the multiple pairs of regularization parameters and kernel widths, each of which is associated with a kernel, one at a time within the orthogonal forward regression (OFR) procedure. Thus, each OFR step consists of one model term selection based on the LOO mean square error (LOOMSE), followed by the optimization of the associated kernel width and regularization parameter, also based on the LOOMSE. Since like our previous state-of-the-art local regularization assisted orthogonal least squares (LROLS) algorithm, the same LOOMSE is adopted for model selection, our proposed new OFR algorithm is also capable of producing a very sparse RBF model with excellent generalization performance. Unlike our previous LROLS algorithm which requires an additional iterative loop to optimize the regularization parameters as well as an additional procedure to optimize the kernel width, the proposed new OFR algorithm optimizes both the kernel widths and regularization parameters within the single OFR procedure, and consequently the required computational complexity is dramatically reduced. Nonlinear system identification examples are included to demonstrate the effectiveness of this new approach in comparison to the well-known approaches of support vector machine and least absolute shrinkage and selection operator as well as the LROLS algorithm.
Resumo:
A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
Species distribution models (SDM) are increasingly used to understand the factors that regulate variation in biodiversity patterns and to help plan conservation strategies. However, these models are rarely validated with independently collected data and it is unclear whether SDM performance is maintained across distinct habitats and for species with different functional traits. Highly mobile species, such as bees, can be particularly challenging to model. Here, we use independent sets of occurrence data collected systematically in several agricultural habitats to test how the predictive performance of SDMs for wild bee species depends on species traits, habitat type, and sampling technique. We used a species distribution modeling approach parametrized for the Netherlands, with presence records from 1990 to 2010 for 193 Dutch wild bees. For each species, we built a Maxent model based on 13 climate and landscape variables. We tested the predictive performance of the SDMs with independent datasets collected from orchards and arable fields across the Netherlands from 2010 to 2013, using transect surveys or pan traps. Model predictive performance depended on species traits and habitat type. Occurrence of bee species specialized in habitat and diet was better predicted than generalist bees. Predictions of habitat suitability were also more precise for habitats that are temporally more stable (orchards) than for habitats that suffer regular alterations (arable), particularly for small, solitary bees. As a conservation tool, SDMs are best suited to modeling rarer, specialist species than more generalist and will work best in long-term stable habitats. The variability of complex, short-term habitats is difficult to capture in such models and historical land use generally has low thematic resolution. To improve SDMs’ usefulness, models require explanatory variables and collection data that include detailed landscape characteristics, for example, variability of crops and flower availability. Additionally, testing SDMs with field surveys should involve multiple collection techniques.
Resumo:
Forecasting wind power is an important part of a successful integration of wind power into the power grid. Forecasts with lead times longer than 6 h are generally made by using statistical methods to post-process forecasts from numerical weather prediction systems. Two major problems that complicate this approach are the non-linear relationship between wind speed and power production and the limited range of power production between zero and nominal power of the turbine. In practice, these problems are often tackled by using non-linear non-parametric regression models. However, such an approach ignores valuable and readily available information: the power curve of the turbine's manufacturer. Much of the non-linearity can be directly accounted for by transforming the observed power production into wind speed via the inverse power curve so that simpler linear regression models can be used. Furthermore, the fact that the transformed power production has a limited range can be taken care of by employing censored regression models. In this study, we evaluate quantile forecasts from a range of methods: (i) using parametric and non-parametric models, (ii) with and without the proposed inverse power curve transformation and (iii) with and without censoring. The results show that with our inverse (power-to-wind) transformation, simpler linear regression models with censoring perform equally or better than non-linear models with or without the frequently used wind-to-power transformation.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.