838 resultados para regression splines
Resumo:
Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.
Resumo:
The thrust of this report concerns spline theory and some of the background to spline theory and follows the development in (Wahba, 1991). We also review methods for determining hyper-parameters, such as the smoothing parameter, by Generalised Cross Validation. Splines have an advantage over Gaussian Process based procedures in that we can readily impose atmospherically sensible smoothness constraints and maintain computational efficiency. Vector splines enable us to penalise gradients of vorticity and divergence in wind fields. Two similar techniques are summarised and improvements based on robust error functions and restricted numbers of basis functions given. A final, brief discussion of the application of vector splines to the problem of scatterometer data assimilation highlights the problems of ambiguous solutions.
Resumo:
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise or corruption. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which allows for input noise given that some model of the noise process exists. In the limit where this noise process is small and symmetric it is shown, using the Laplace approximation, that there is an additional term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable and sampling this jointly with the network's weights, using Markov Chain Monte Carlo methods, it is demonstrated that it is possible to infer the unbiassed regression over the noiseless input.
Resumo:
Based on a simple convexity lemma, we develop bounds for different types of Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution which equals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.
Resumo:
Correlation and regression are two of the statistical procedures most widely used by optometrists. However, these tests are often misused or interpreted incorrectly, leading to erroneous conclusions from clinical experiments. This review examines the major statistical tests concerned with correlation and regression that are most likely to arise in clinical investigations in optometry. First, the use, interpretation and limitations of Pearson's product moment correlation coefficient are described. Second, the least squares method of fitting a linear regression to data and for testing how well a regression line fits the data are described. Third, the problems of using linear regression methods in observational studies, if there are errors associated in measuring the independent variable and for predicting a new value of Y for a given X, are discussed. Finally, methods for testing whether a non-linear relationship provides a better fit to the data and for comparing two or more regression lines are considered.
Resumo:
Researchers often use 3-way interactions in moderated multiple regression analysis to test the joint effect of 3 independent variables on a dependent variable. However, further probing of significant interaction terms varies considerably and is sometimes error prone. The authors developed a significance test for slope differences in 3-way interactions and illustrate its importance for testing psychological hypotheses. Monte Carlo simulations revealed that sample size, magnitude of the slope difference, and data reliability affected test power. Application of the test to published data yielded detection of some slope differences that were undetected by alternative probing techniques and led to changes of results and conclusions. The authors conclude by discussing the test's applicability for psychological research. Copyright 2006 by the American Psychological Association.
Resumo:
The kinematic mapping of a rigid open-link manipulator is a homomorphism between Lie groups. The homomorphisrn has solution groups that act on an inverse kinematic solution element. A canonical representation of solution group operators that act on a solution element of three and seven degree-of-freedom (do!) dextrous manipulators is determined by geometric analysis. Seven canonical solution groups are determined for the seven do! Robotics Research K-1207 and Hollerbach arms. The solution element of a dextrous manipulator is a collection of trivial fibre bundles with solution fibres homotopic to the Torus. If fibre solutions are parameterised by a scalar, a direct inverse funct.ion that maps the scalar and Cartesian base space coordinates to solution element fibre coordinates may be defined. A direct inverse pararneterisation of a solution element may be approximated by a local linear map generated by an inverse augmented Jacobian correction of a linear interpolation. The action of canonical solution group operators on a local linear approximation of the solution element of inverse kinematics of dextrous manipulators generates cyclical solutions. The solution representation is proposed as a model of inverse kinematic transformations in primate nervous systems. Simultaneous calibration of a composition of stereo-camera and manipulator kinematic models is under-determined by equi-output parameter groups in the composition of stereo-camera and Denavit Hartenberg (DH) rnodels. An error measure for simultaneous calibration of a composition of models is derived and parameter subsets with no equi-output groups are determined by numerical experiments to simultaneously calibrate the composition of homogeneous or pan-tilt stereo-camera with DH models. For acceleration of exact Newton second-order re-calibration of DH parameters after a sequential calibration of stereo-camera and DH parameters, an optimal numerical evaluation of DH matrix first order and second order error derivatives with respect to a re-calibration error function is derived, implemented and tested. A distributed object environment for point and click image-based tele-command of manipulators and stereo-cameras is specified and implemented that supports rapid prototyping of numerical experiments in distributed system control. The environment is validated by a hierarchical k-fold cross validated calibration to Cartesian space of a radial basis function regression correction of an affine stereo model. Basic design and performance requirements are defined for scalable virtual micro-kernels that broker inter-Java-virtual-machine remote method invocations between components of secure manageable fault-tolerant open distributed agile Total Quality Managed ISO 9000+ conformant Just in Time manufacturing systems.
Resumo:
A method of determining the spatial pattern of any histological feature in sections of brain tissue which can be measured quantitatively is described and compared with a previously described method. A measurement of a histological feature such as density, area, amount or load is obtained for a series of contiguous sample fields. The regression coefficient (β) is calculated from the measurements taken in pairs, first in pairs of adjacent samples and then in pairs of samples taken at increasing degrees of separation between them, i.e. separated by 2, 3, 4,..., n units. A plot of β versus the degree of separation between the pairs of sample fields reveals whether the histological feature is distributed randomly, uniformly or in clusters. If the feature is clustered, the analysis determines whether the clusters are randomly or regularly distributed, the mean size of the clusters and the spacing of the clusters. The method is simple to apply and interpret and is illustrated using simulated data and studies of the spatial patterns of blood vessels in the cerebral cortex of normal brain, the degree of vacuolation of the cortex in patients with Creutzfeldt-Jacob disease (CJD) and the characteristic lesions present in Alzheimer's disease (AD). Copyright (C) 2000 Elsevier Science B.V.
Spatial pattern analysis of beta-amyloid (A beta) deposits in Alzheimer disease by linear regression
Resumo:
The spatial patterns of discrete beta-amyloid (Abeta) deposits in brain tissue from patients with Alzheimer disease (AD) were studied using a statistical method based on linear regression, the results being compared with the more conventional variance/mean (V/M) method. Both methods suggested that Abeta deposits occurred in clusters (400 to <12,800 mu m in diameter) in all but 1 of the 42 tissues examined. In many tissues, a regular periodicity of the Abeta deposit clusters parallel to the tissue boundary was observed. In 23 of 42 (55%) tissues, the two methods revealed essentially the same spatial patterns of Abeta deposits; in 15 of 42 (36%), the regression method indicated the presence of clusters at a scale not revealed by the V/M method; and in 4 of 42 (9%), there was no agreement between the two methods. Perceived advantages of the regression method are that there is a greater probability of detecting clustering at multiple scales, the dimension of larger Abeta clusters can be estimated more accurately, and the spacing between the clusters may be estimated. However, both methods may be useful, with the regression method providing greater resolution and the V/M method providing greater simplicity and ease of interpretation. Estimates of the distance between regularly spaced Abeta clusters were in the range 2,200-11,800 mu m, depending on tissue and cluster size. The regular periodicity of Abeta deposit clusters in many tissues would be consistent with their development in relation to clusters of neurons that give rise to specific neuronal projections.