43 resultados para Stepwise multiple linear regression
em Aston University Research Archive
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
An investigator may also wish to select a small subset of the X variables which give the best prediction of the Y variable. In this case, the question is how many variables should the regression equation include? One method would be to calculate the regression of Y on every subset of the X variables and choose the subset that gives the smallest mean square deviation from the regression. Most investigators, however, prefer to use a ‘stepwise multiple regression’ procedure. There are two forms of this analysis called the ‘step-up’ (or ‘forward’) method and the ‘step-down’ (or ‘backward’) method. This Statnote illustrates the use of stepwise multiple regression with reference to the scenario introduced in Statnote 24, viz., the influence of climatic variables on the growth of the crustose lichen Rhizocarpon geographicum (L.)DC.
Spatial pattern analysis of beta-amyloid (A beta) deposits in Alzheimer disease by linear regression
Resumo:
The spatial patterns of discrete beta-amyloid (Abeta) deposits in brain tissue from patients with Alzheimer disease (AD) were studied using a statistical method based on linear regression, the results being compared with the more conventional variance/mean (V/M) method. Both methods suggested that Abeta deposits occurred in clusters (400 to <12,800 mu m in diameter) in all but 1 of the 42 tissues examined. In many tissues, a regular periodicity of the Abeta deposit clusters parallel to the tissue boundary was observed. In 23 of 42 (55%) tissues, the two methods revealed essentially the same spatial patterns of Abeta deposits; in 15 of 42 (36%), the regression method indicated the presence of clusters at a scale not revealed by the V/M method; and in 4 of 42 (9%), there was no agreement between the two methods. Perceived advantages of the regression method are that there is a greater probability of detecting clustering at multiple scales, the dimension of larger Abeta clusters can be estimated more accurately, and the spacing between the clusters may be estimated. However, both methods may be useful, with the regression method providing greater resolution and the V/M method providing greater simplicity and ease of interpretation. Estimates of the distance between regularly spaced Abeta clusters were in the range 2,200-11,800 mu m, depending on tissue and cluster size. The regular periodicity of Abeta deposit clusters in many tissues would be consistent with their development in relation to clusters of neurons that give rise to specific neuronal projections.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
In the Bayesian framework, predictions for a regression problem are expressed in terms of a distribution of output values. The mode of this distribution corresponds to the most probable output, while the uncertainty associated with the predictions can conveniently be expressed in terms of error bars. In this paper we consider the evaluation of error bars in the context of the class of generalized linear regression models. We provide insights into the dependence of the error bars on the location of the data points and we derive an upper bound on the true error bars in terms of the contributions from individual data points which are themselves easily evaluated.
Resumo:
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.
Resumo:
Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.
Resumo:
In some circumstances, there may be no scientific model of the relationship between X and Y that can be specified in advance and indeed the objective of the investigation may be to provide a ‘curve of best fit’ for predictive purposes. In such an example, the fitting of successive polynomials may be the best approach. There are various strategies to decide on the polynomial of best fit depending on the objectives of the investigation.
Resumo:
1. The techniques associated with regression, whether linear or non-linear, are some of the most useful statistical procedures that can be applied in clinical studies in optometry. 2. In some cases, there may be no scientific model of the relationship between X and Y that can be specified in advance and the objective may be to provide a ‘curve of best fit’ for predictive purposes. In such cases, the fitting of a general polynomial type curve may be the best approach. 3. An investigator may have a specific model in mind that relates Y to X and the data may provide a test of this hypothesis. Some of these curves can be reduced to a linear regression by transformation, e.g., the exponential and negative exponential decay curves. 4. In some circumstances, e.g., the asymptotic curve or logistic growth law, a more complex process of curve fitting involving non-linear estimation will be required.
Resumo:
Purpose - Anterior segment optical coherent tomography (AS-OCT) is used to further examine previous reports that ciliary muscle thickness (CMT) is increased in myopic eyes. With reference to temporal and nasal CMT, interrelationships between biometric and morphological characteristics of anterior and posterior segments are analysed for British-White and British-South-Asian adults with and without myopia. Methods - Data are presented for the right eyes of 62 subjects (British-White n = 39, British-South-Asian n = 23, aged 18–40 years) with a range of refractive error (mean spherical error (MSE (D)) -1.74 ± 3.26; range -10.06 to +4.38) and separated into myopes (MSE (D) <-0.50, range -10.06 to -0.56; n = 30) and non-myopes (MSE (D) =-0.50, -0.50 to +4.38; n = 32). Temporal and nasal ciliary muscle cross-sections were imaged using a Visante AS-OCT. Using Visante software, manual measures of nasal and temporal CMT (NCMT and TCMT respectively) were taken in successive posterior 1 mm steps from the scleral spur over a 3 mm distance (designated NCMT1, TCMT1 et seq). Measures of axial length and anterior chamber depth were taken with an IOLMaster biometer. MSE and corneal curvature (CC) measurements were taken with a Shin-Nippon auto-refractor. Magnetic resonance imaging was used to determine total ocular volume (OV) for 31 of the original subject group. Statistical comparisons and analyses were made using mixed repeated measures anovas, Pearson's correlation coefficient and stepwise forward multiple linear regression. Results - MSE was significantly associated with CMT, with thicker CMT2 and CMT3 being found in the myopic eyes (p = 0.002). In non-myopic eyes TCMT1, TCMT2, NCMT1 and NCMT2 correlated significantly with MSE, AL and OV (p < 0.05). In contrast, myopic eyes failed generally to exhibit a significant correlation between CMT, MSE and axial length but notably retained a significant correlation between OV, TCMT2, TCMT3, NCMT2 and NCMT3 (p < 0.05). OV was found to be a significantly better predictor of TCMT2 and TCMT3 than AL by approximately a factor of two (p < 0.001). Anterior chamber depth was significantly associated with both temporal and nasal CMT2 and CMT3; TCMT1 correlated positively with CC. Ethnicity had no significant effect on differences in CMT. Conclusions - Increased CMT is associated with myopia. We speculate that the lack of correlation in myopic subjects between CMT and axial length, but not between CMT and OV, is evidence that disrupted feedback between the fovea and ciliary apparatus occurs in myopia development.
Resumo:
Linear programming (LP) is the most widely used optimization technique for solving real-life problems because of its simplicity and efficiency. Although conventional LP models require precise data, managers and decision makers dealing with real-world optimization problems often do not have access to exact values. Fuzzy sets have been used in the fuzzy LP (FLP) problems to deal with the imprecise data in the decision variables, objective function and/or the constraints. The imprecisions in the FLP problems could be related to (1) the decision variables; (2) the coefficients of the decision variables in the objective function; (3) the coefficients of the decision variables in the constraints; (4) the right-hand-side of the constraints; or (5) all of these parameters. In this paper, we develop a new stepwise FLP model where fuzzy numbers are considered for the coefficients of the decision variables in the objective function, the coefficients of the decision variables in the constraints and the right-hand-side of the constraints. In the first step, we use the possibility and necessity relations for fuzzy constraints without considering the fuzzy objective function. In the subsequent step, we extend our method to the fuzzy objective function. We use two numerical examples from the FLP literature for comparison purposes and to demonstrate the applicability of the proposed method and the computational efficiency of the procedures and algorithms. © 2013-IOS Press and the authors. All rights reserved.
Resumo:
In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.