29 resultados para multiple linear regression
em Aston University Research Archive
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Spatial pattern analysis of beta-amyloid (A beta) deposits in Alzheimer disease by linear regression
Resumo:
The spatial patterns of discrete beta-amyloid (Abeta) deposits in brain tissue from patients with Alzheimer disease (AD) were studied using a statistical method based on linear regression, the results being compared with the more conventional variance/mean (V/M) method. Both methods suggested that Abeta deposits occurred in clusters (400 to <12,800 mu m in diameter) in all but 1 of the 42 tissues examined. In many tissues, a regular periodicity of the Abeta deposit clusters parallel to the tissue boundary was observed. In 23 of 42 (55%) tissues, the two methods revealed essentially the same spatial patterns of Abeta deposits; in 15 of 42 (36%), the regression method indicated the presence of clusters at a scale not revealed by the V/M method; and in 4 of 42 (9%), there was no agreement between the two methods. Perceived advantages of the regression method are that there is a greater probability of detecting clustering at multiple scales, the dimension of larger Abeta clusters can be estimated more accurately, and the spacing between the clusters may be estimated. However, both methods may be useful, with the regression method providing greater resolution and the V/M method providing greater simplicity and ease of interpretation. Estimates of the distance between regularly spaced Abeta clusters were in the range 2,200-11,800 mu m, depending on tissue and cluster size. The regular periodicity of Abeta deposit clusters in many tissues would be consistent with their development in relation to clusters of neurons that give rise to specific neuronal projections.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
In the Bayesian framework, predictions for a regression problem are expressed in terms of a distribution of output values. The mode of this distribution corresponds to the most probable output, while the uncertainty associated with the predictions can conveniently be expressed in terms of error bars. In this paper we consider the evaluation of error bars in the context of the class of generalized linear regression models. We provide insights into the dependence of the error bars on the location of the data points and we derive an upper bound on the true error bars in terms of the contributions from individual data points which are themselves easily evaluated.
Resumo:
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.
Resumo:
Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.
Resumo:
In some circumstances, there may be no scientific model of the relationship between X and Y that can be specified in advance and indeed the objective of the investigation may be to provide a ‘curve of best fit’ for predictive purposes. In such an example, the fitting of successive polynomials may be the best approach. There are various strategies to decide on the polynomial of best fit depending on the objectives of the investigation.
Resumo:
1. The techniques associated with regression, whether linear or non-linear, are some of the most useful statistical procedures that can be applied in clinical studies in optometry. 2. In some cases, there may be no scientific model of the relationship between X and Y that can be specified in advance and the objective may be to provide a ‘curve of best fit’ for predictive purposes. In such cases, the fitting of a general polynomial type curve may be the best approach. 3. An investigator may have a specific model in mind that relates Y to X and the data may provide a test of this hypothesis. Some of these curves can be reduced to a linear regression by transformation, e.g., the exponential and negative exponential decay curves. 4. In some circumstances, e.g., the asymptotic curve or logistic growth law, a more complex process of curve fitting involving non-linear estimation will be required.
Resumo:
In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.
Resumo:
BACKGROUND: In the light of sub-optimal uptake of the measles, mumps, and rubella (MMR) vaccination, we investigated the factors that influence the intentions of mothers to vaccinate. METHOD: A cross-sectional survey of 300 mothers in Birmingham with children approaching a routine MMR vaccination was conducted using a postal questionnaire to measure: intention to vaccinate, psychological variables, knowledge of the vaccine, and socioeconomic status. The vaccination status of the children was obtained from South Birmingham Child Health Surveillance Unit. RESULTS: The response rate was 59%. Fewer mothers approaching the second MMR vaccination (Group 2) intended to take their children for this vaccination than Group 1 (mothers approaching the first MMR vaccination) (Mann-Whitney U = 2180, P < 0.0001). Group 2 expressed more negative beliefs about the outcome of having the MMR vaccine ('vaccine outcome beliefs') (Mann-Whitney U = 2155, P < 0.0001), were more likely to believe it was 'unsafe' (chi 2 = 9.114, P = 0.004) and that it rarely protected (chi 2 = 6.882, P = 0.014) than Group 1. The commonest side-effect cited was general malaise, but 29.8% cited autism. The most trusted source of information was the general practitioner but the most common source of information on side-effects was television (34.6%). Multiple linear regression revealed that, in Group 1, only 'vaccine outcome beliefs' significantly predicted intention (77.1% of the variance). In Group 2 'vaccine outcome beliefs', attitude to the MMR vaccine, and prior MMR status all predicted intention (93% of the variance). CONCLUSION: A major reason for the low uptake of the MMR vaccination is that it is not perceived to be important for children's health, particularly the second dose. Health education from GPs is likely to have a considerable impact.
Resumo:
How does a firm choose a proper model of foreign direct investment (FDI) for entering a foreign market? Which mode of entry performs better? What are the performance implications of joint venture (JV) ownership structure? These important questions face a multinational enterprise (MNE) that decides to enter a foreign market. However, few studies have been conducted on such issues, and no consistent or conclusive findings are generated, especially with respect to China. It’s composed of five chapters, providing corresponding answers to the questions given above. Specifically, Chapter One is an overall introductory chapter. Chapter Two is about the choice of entry mode of FDI in China. Chapter Three examines the relationship between four main entry modes and performance. Chapter Four explores the performance implications of JV ownership structure. Chapter Five is an overall concluding chapter. These empirical studies are based on the most recent and richest data that has never been explored in previous studies. It contains information on 11,765 foreign-invested enterprises in China in seven manufacturing industries in 2000, 10,757 in 1999, and 10,666 in 1998. The four FDI entry modes examined include wholly-owned enterprises (WOEs), equity joint ventures (EJVs), contractual joint ventures (CJVs), and joint stock companies (JSCs). In Chapter Two, a multinominal logit model is established, and techniques of multiple linear regression analysis are employed in Chapter Three and Four. It was found that MNEs, under the conditions of a good investment environment, large capital commitment and small cultural distance, prefer the WOE strategy. If these conditions are not met, the EJV mode would be of greater use. The relative propensity to pursue the CJV mode increases with a good investment environment, small capital commitment, and small cultural distance. JSCs are not favoured by MNEs when the investment environment improves and when affiliates are located in the coastal areas. MNEs have been found to have a greater preference for an EJV as a mode of entry into the Chinese market in all industries. It is also found that in terms of return on assets (ROA) and asset turnover, WOEs perform the best, followed by EJVs, CJVs, and JSCs. Finally, minority-owned EJVs or JSCs are found to outperform their majority-owned counterparts in terms of ROA and asset turnover.
Resumo:
This book is aimed primarily at microbiologists who are undertaking research and who require a basic knowledge of statistics to analyse their experimental data. Computer software employing a wide range of data analysis methods is widely available to experimental scientists. The availability of this software, however, makes it essential that investigators understand the basic principles of statistics. Statistical analysis of data can be complex with many different methods of approach, each of which applies in a particular experimental circumstance. Hence, it is possible to apply an incorrect statistical method to data and to draw the wrong conclusions from an experiment. The purpose of this book, which has its origin in a series of articles published in the Society for Applied Microbiology journal ‘The Microbiologist’, is an attempt to present the basic logic of statistics as clearly as possible and therefore, to dispel some of the myths that often surround the subject. The 28 ‘Statnotes’ deal with various topics that are likely to be encountered, including the nature of variables, the comparison of means of two or more groups, non-parametric statistics, analysis of variance, correlating variables, and more complex methods such as multiple linear regression and principal components analysis. In each case, the relevant statistical method is illustrated with examples drawn from experiments in microbiological research. The text incorporates a glossary of the most commonly used statistical terms and there are two appendices designed to aid the investigator in the selection of the most appropriate test.