895 resultados para nonparametric regression
Resumo:
In many circumstances, it may be of interest to discover whether two or more regression lines are the same. Regression lines may differ in three properties, viz., in residual variance, in slope, and in elevation; all of which can be tested using analysis of covariance. If there are no significant differences between regression lines, an investigator may which to combine the data from different studies and fit a single regression line to the whole of the data.
Resumo:
Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.
Resumo:
In some circumstances, there may be no scientific model of the relationship between X and Y that can be specified in advance and indeed the objective of the investigation may be to provide a ‘curve of best fit’ for predictive purposes. In such an example, the fitting of successive polynomials may be the best approach. There are various strategies to decide on the polynomial of best fit depending on the objectives of the investigation.
Resumo:
1. Fitting a linear regression to data provides much more information about the relationship between two variables than a simple correlation test. A goodness of fit test of the line should always be carried out. Hence, r squared estimates the strength of the relationship between Y and X, ANOVA whether a statistically significant line is present, and the ‘t’ test whether the slope of the line is significantly different from zero. 2. Always check whether the data collected fit the assumptions for regression analysis and, if not, whether a transformation of the Y and/or X variables is necessary. 3. If the regression line is to be used for prediction, it is important to determine whether the prediction involves an individual y value or a mean. Care should be taken if predictions are made close to the extremities of the data and are subject to considerable error if x falls beyond the range of the data. Multiple predictions require correction of the P values. 3. If several individual regression lines have been calculated from a number of similar sets of data, consider whether they should be combined to form a single regression line. 4. If the data exhibit a degree of curvature, then fitting a higher-order polynomial curve may provide a better fit than a straight line. In this case, a test of whether the data depart significantly from a linear regression should be carried out.
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
1. The techniques associated with regression, whether linear or non-linear, are some of the most useful statistical procedures that can be applied in clinical studies in optometry. 2. In some cases, there may be no scientific model of the relationship between X and Y that can be specified in advance and the objective may be to provide a ‘curve of best fit’ for predictive purposes. In such cases, the fitting of a general polynomial type curve may be the best approach. 3. An investigator may have a specific model in mind that relates Y to X and the data may provide a test of this hypothesis. Some of these curves can be reduced to a linear regression by transformation, e.g., the exponential and negative exponential decay curves. 4. In some circumstances, e.g., the asymptotic curve or logistic growth law, a more complex process of curve fitting involving non-linear estimation will be required.
Resumo:
Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.
Resumo:
An investigator may also wish to select a small subset of the X variables which give the best prediction of the Y variable. In this case, the question is how many variables should the regression equation include? One method would be to calculate the regression of Y on every subset of the X variables and choose the subset that gives the smallest mean square deviation from the regression. Most investigators, however, prefer to use a ‘stepwise multiple regression’ procedure. There are two forms of this analysis called the ‘step-up’ (or ‘forward’) method and the ‘step-down’ (or ‘backward’) method. This Statnote illustrates the use of stepwise multiple regression with reference to the scenario introduced in Statnote 24, viz., the influence of climatic variables on the growth of the crustose lichen Rhizocarpon geographicum (L.)DC.
Resumo:
The aim of this research work was primarily to examine the relevance of patient parameters, ward structures, procedures and practices, in respect of the potential hazards of wound cross-infection and nasal colonisation with multiple resistant strains of Staphylococcus aureus, which it is thought might provide a useful indication of a patient's general susceptibility to wound infection. Information from a large cross-sectional survey involving 12,000 patients from some 41 hospitals and 375 wards was collected over a five-year period from 1967-72, and its validity checked before any subsequent analysis was carried out. Many environmental factors and procedures which had previously been thought (but never conclusively proved) to have an influence on wound infection or nasal colonisation rates, were assessed, and subsequently dismissed as not being significant, provided that the standard of the current range of practices and procedures is maintained and not allowed to deteriorate. Retrospective analysis revealed that the probability of wound infection was influenced by the patient's age, duration of pre-operative hospitalisation, sex, type of wound, presence and type of drain, number of patients in ward, and other special risk factors, whilst nasal colonisation was found to be influenced by the patient's age, total duration of hospitalisation, sex, antibiotics, proportion of occupied beds in the ward, average distance between bed centres and special risk factors. A multi-variate regression analysis technique was used to develop statistical models, consisting of variable patient and environmental factors which were found to have a significant influence on the risks pertaining to wound infection and nasal colonisation. A relationship between wound infection and nasal colonisation was then established and this led to the development of a more advanced model for predicting wound infections, taking advantage of the additional knowledge of the patient's state of nasal colonisation prior to operation.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
Financial institutes are an integral part of any modern economy. In the 1970s and 1980s, Gulf Cooperation Council (GCC) countries made significant progress in financial deepening and in building a modern financial infrastructure. This study aims to evaluate the performance (efficiency) of financial institutes (banking sector) in GCC countries. Since, the selected variables include negative data for some banks and positive for others, and the available evaluation methods are not helpful in this case, so we developed a Semi Oriented Radial Model to perform this evaluation. Furthermore, since the SORM evaluation result provides a limited information for any decision maker (bankers, investors, etc...), we proposed a second stage analysis using classification and regression (C&R) method to get further results combining SORM results with other environmental data (Financial, economical and political) to set rules for the efficient banks, hence, the results will be useful for bankers in order to improve their bank performance and to the investors, maximize their returns. Mainly there are two approaches to evaluate the performance of Decision Making Units (DMUs), under each of them there are different methods with different assumptions. Parametric approach is based on the econometric regression theory and nonparametric approach is based on a mathematical linear programming theory. Under the nonparametric approaches, there are two methods: Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH). While there are three methods under the parametric approach: Stochastic Frontier Analysis (SFA); Thick Frontier Analysis (TFA) and Distribution-Free Analysis (DFA). The result shows that DEA and SFA are the most applicable methods in banking sector, but DEA is seem to be most popular between researchers. However DEA as SFA still facing many challenges, one of these challenges is how to deal with negative data, since it requires the assumption that all the input and output values are non-negative, while in many applications negative outputs could appear e.g. losses in contrast with profit. Although there are few developed Models under DEA to deal with negative data but we believe that each of them has it is own limitations, therefore we developed a Semi-Oriented-Radial-Model (SORM) that could handle the negativity issue in DEA. The application result using SORM shows that the overall performance of GCC banking is relatively high (85.6%). Although, the efficiency score is fluctuated over the study period (1998-2007) due to the second Gulf War and to the international financial crisis, but still higher than the efficiency score of their counterpart in other countries. Banks operating in Saudi Arabia seem to be the highest efficient banks followed by UAE, Omani and Bahraini banks, while banks operating in Qatar and Kuwait seem to be the lowest efficient banks; this is because these two countries are the most affected country in the second Gulf War. Also, the result shows that there is no statistical relationship between the operating style (Islamic or Conventional) and bank efficiency. Even though there is no statistical differences due to the operational style, but Islamic bank seem to be more efficient than the Conventional bank, since on average their efficiency score is 86.33% compare to 85.38% for Conventional banks. Furthermore, the Islamic banks seem to be more affected by the political crisis (second Gulf War), whereas Conventional banks seem to be more affected by the financial crisis.
Resumo:
Direct quantile regression involves estimating a given quantile of a response variable as a function of input variables. We present a new framework for direct quantile regression where a Gaussian process model is learned, minimising the expected tilted loss function. The integration required in learning is not analytically tractable so to speed up the learning we employ the Expectation Propagation algorithm. We describe how this work relates to other quantile regression methods and apply the method on both synthetic and real data sets. The method is shown to be competitive with state of the art methods whilst allowing for the leverage of the full Gaussian process probabilistic framework.
Resumo:
Factors associated with duration of dementia in a consecutive series of 103 Alzheimer's disease (AD) cases were studied using the Kaplan-Meier estimator and Cox regression analysis (proportional hazard model). Mean disease duration was 7.1 years (range: 6 weeks-30 years, standard deviation = 5.18); 25% of cases died within four years, 50% within 6.9 years, and 75% within 10 years. Familial AD cases (FAD) had a longer duration than sporadic cases (SAD), especially cases linked to presenilin (PSEN) genes. No significant differences in duration were associated with age, sex, or apolipoprotein E (Apo E) genotype. Duration was reduced in cases with arterial hypertension. Cox regression analysis suggested longer duration was associated with an earlier disease onset and increased senile plaque (SP) and neurofibrillary tangle (NFT) pathology in the orbital gyrus (OrG), CA1 sector of the hippocampus, and nucleus basalis of Meynert (NBM). The data suggest shorter disease duration in SAD and in cases with hypertensive comorbidity. In addition, degree of neuropathology did not influence survival, but spread of SP/NFT pathology into the frontal lobe, hippocampus, and basal forebrain was associated with longer disease duration. © 2014 R. A. Armstrong.