963 resultados para Regression methods
Resumo:
The role of land cover change as a significant component of global change has become increasingly recognized in recent decades. Large databases measuring land cover change, and the data which can potentially be used to explain the observed changes, are also becoming more commonly available. When developing statistical models to investigate observed changes, it is important to be aware that the chosen sampling strategy and modelling techniques can influence results. We present a comparison of three sampling strategies and two forms of grouped logistic regression models (multinomial and ordinal) in the investigation of patterns of successional change after agricultural land abandonment in Switzerland. Results indicated that both ordinal and nominal transitional change occurs in the landscape and that the use of different sampling regimes and modelling techniques as investigative tools yield different results. Synthesis and applications. Our multimodel inference identified successfully a set of consistently selected indicators of land cover change, which can be used to predict further change, including annual average temperature, the number of already overgrown neighbouring areas of land and distance to historically destructive avalanche sites. This allows for more reliable decision making and planning with respect to landscape management. Although both model approaches gave similar results, ordinal regression yielded more parsimonious models that identified the important predictors of land cover change more efficiently. Thus, this approach is favourable where land cover change pattern can be interpreted as an ordinal process. Otherwise, multinomial logistic regression is a viable alternative.
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
Correlation and regression are two of the statistical procedures most widely used by optometrists. However, these tests are often misused or interpreted incorrectly, leading to erroneous conclusions from clinical experiments. This review examines the major statistical tests concerned with correlation and regression that are most likely to arise in clinical investigations in optometry. First, the use, interpretation and limitations of Pearson's product moment correlation coefficient are described. Second, the least squares method of fitting a linear regression to data and for testing how well a regression line fits the data are described. Third, the problems of using linear regression methods in observational studies, if there are errors associated in measuring the independent variable and for predicting a new value of Y for a given X, are discussed. Finally, methods for testing whether a non-linear relationship provides a better fit to the data and for comparing two or more regression lines are considered.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
Analytical curves are normally obtained from discrete data by least squares regression. The least squares regression of data involving significant error in both x and y values should not be implemented by ordinary least squares (OLS). In this work, the use of orthogonal distance regression (ODR) is discussed as an alternative approach in order to take into account the error in the x variable. Four examples are presented to illustrate deviation between the results from both regression methods. The examples studied show that, in some situations, ODR coefficients must substitute for those of OLS, and, in other situations, the difference is not significant.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.
Resumo:
The thesis is concerned with local trigonometric regression methods. The aim was to develop a method for extraction of cyclical components in time series. The main results of the thesis are the following. First, a generalization of the filter proposed by Christiano and Fitzgerald is furnished for the smoothing of ARIMA(p,d,q) process. Second, a local trigonometric filter is built, with its statistical properties. Third, they are discussed the convergence properties of trigonometric estimators, and the problem of choosing the order of the model. A large scale simulation experiment has been designed in order to assess the performance of the proposed models and methods. The results show that local trigonometric regression may be a useful tool for periodic time series analysis.
Resumo:
The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual process and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the Cystic Fibrosis registry.
Resumo:
This paper studied two different regression techniques for pelvic shape prediction, i.e., the partial least square regression (PLSR) and the principal component regression (PCR). Three different predictors such as surface landmarks, morphological parameters, or surface models of neighboring structures were used in a cross-validation study to predict the pelvic shape. Results obtained from applying these two different regression techniques were compared to the population mean model. In almost all the prediction experiments, both regression techniques unanimously generated better results than the population mean model, while the difference on prediction accuracy between these two regression methods is not statistically significant (α=0.01).
Resumo:
We present a methodology for reducing a straight line fitting regression problem to a Least Squares minimization one. This is accomplished through the definition of a measure on the data space that takes into account directional dependences of errors, and the use of polar descriptors for straight lines. This strategy improves the robustness by avoiding singularities and non-describable lines. The methodology is powerful enough to deal with non-normal bivariate heteroscedastic data error models, but can also supersede classical regression methods by making some particular assumptions. An implementation of the methodology for the normal bivariate case is developed and evaluated.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
Direct quantile regression involves estimating a given quantile of a response variable as a function of input variables. We present a new framework for direct quantile regression where a Gaussian process model is learned, minimising the expected tilted loss function. The integration required in learning is not analytically tractable so to speed up the learning we employ the Expectation Propagation algorithm. We describe how this work relates to other quantile regression methods and apply the method on both synthetic and real data sets. The method is shown to be competitive with state of the art methods whilst allowing for the leverage of the full Gaussian process probabilistic framework.
Resumo:
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of multi-output regression. This paper provides a survey on state-of-the-art multi-output regression methods, that are categorized as problem transformation and algorithm adaptation methods. In addition, we present the mostly used performance evaluation measures, publicly available data sets for multi-output regression real-world problems, as well as open-source software frameworks.