114 resultados para Random regression
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.
Resumo:
Let X be a locally compact Polish space. A random measure on X is a probability measure on the space of all (nonnegative) Radon measures on X. Denote by K(X) the cone of all Radon measures η on X which are of the form η =
Resumo:
Background Children with callous-unemotional (CU) traits, a proposed precursor to adult psychopathy, are characterized by impaired emotion recognition, reduced responsiveness to others’ distress, and a lack of guilt or empathy. Reduced attention to faces, and more specifically to the eye region, has been proposed to underlie these difficulties, although this has never been tested longitudinally from infancy. Attention to faces occurs within the context of dyadic caregiver interactions, and early environment including parenting characteristics has been associated with CU traits. The present study tested whether infants’ preferential tracking of a face with direct gaze and levels of maternal sensitivity predict later CU traits. Methods Data were analyzed from a stratified random sample of 213 participants drawn from a population-based sample of 1233 first-time mothers. Infants’ preferential face tracking at 5 weeks and maternal sensitivity at 29 weeks were entered into a weighted linear regression as predictors of CU traits at 2.5 years. Results Controlling for a range of confounders (e.g., deprivation), lower preferential face tracking predicted higher CU traits (p = .001). Higher maternal sensitivity predicted lower CU traits in girls (p = .009), but not boys. No significant interaction between face tracking and maternal sensitivity was found. Conclusions This is the first study to show that attention to social features during infancy as well as early sensitive parenting predict the subsequent development of CU traits. Identifying such early atypicalities offers the potential for developing parent-mediated interventions in children at risk for developing CU traits.
Resumo:
We consider the billiard dynamics in a non-compact set of ℝ d that is constructed as a bi-infinite chain of translated copies of the same d-dimensional polytope. A random configuration of semi-dispersing scatterers is placed in each copy. The ensemble of dynamical systems thus defined, one for each global realization of the scatterers, is called quenched random Lorentz tube. Under some fairly general conditions, we prove that every system in the ensemble is hyperbolic and almost every system is recurrent, ergodic, and enjoys some higher chaotic properties.
Resumo:
We consider the billiard dynamics in a striplike set that is tessellated by countably many translated copies of the same polygon. A random configuration of semidispersing scatterers is placed in each copy. The ensemble of dynamical systems thus defined, one for each global choice of scatterers, is called quenched random Lorentz tube. We prove that under general conditions, almost every system in the ensemble is recurrent.
Resumo:
Forecasting wind power is an important part of a successful integration of wind power into the power grid. Forecasts with lead times longer than 6 h are generally made by using statistical methods to post-process forecasts from numerical weather prediction systems. Two major problems that complicate this approach are the non-linear relationship between wind speed and power production and the limited range of power production between zero and nominal power of the turbine. In practice, these problems are often tackled by using non-linear non-parametric regression models. However, such an approach ignores valuable and readily available information: the power curve of the turbine's manufacturer. Much of the non-linearity can be directly accounted for by transforming the observed power production into wind speed via the inverse power curve so that simpler linear regression models can be used. Furthermore, the fact that the transformed power production has a limited range can be taken care of by employing censored regression models. In this study, we evaluate quantile forecasts from a range of methods: (i) using parametric and non-parametric models, (ii) with and without the proposed inverse power curve transformation and (iii) with and without censoring. The results show that with our inverse (power-to-wind) transformation, simpler linear regression models with censoring perform equally or better than non-linear models with or without the frequently used wind-to-power transformation.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Resumo:
There remains large disagreement between ice-water path (IWP) in observational data sets, largely because the sensors observe different parts of the ice particle size distribution. A detailed comparison of retrieved IWP from satellite observations in the Tropics (!30 " latitude) in 2007 was made using collocated measurements. The radio detection and ranging(radar)/light detection and ranging (lidar) (DARDAR) IWP data set, based on combined radar/lidar measurements, is used as a reference because it provides arguably the best estimate of the total column IWP. For each data set, usable IWP dynamic ranges are inferred from this comparison. IWP retrievals based on solar reflectance measurements, in the moderate resolution imaging spectroradiometer (MODIS), advanced very high resolution radiometer–based Climate Monitoring Satellite Applications Facility (CMSAF), and Pathfinder Atmospheres-Extended (PATMOS-x) datasets, were found to be correlated with DARDAR over a large IWP range (~20–7000 g m -2 ). The random errors of the collocated data sets have a close to lognormal distribution, and the combined random error of MODIS and DARDAR is less than a factor of 2, which also sets the upper limit for MODIS alone. In the same way, the upper limit for the random error of all considered data sets is determined. Data sets based on passive microwave measurements, microwave surface and precipitation products system (MSPPS), microwave integrated retrieval system (MiRS), and collocated microwave only (CMO), are largely correlated with DARDAR for IWP values larger than approximately 700 g m -2 . The combined uncertainty between these data sets and DARDAR in this range is slightly less MODIS-DARDAR, but the systematic bias is nearly an order of magnitude.