907 resultados para rank regression
Resumo:
In some circumstances, there may be no scientific model of the relationship between X and Y that can be specified in advance and indeed the objective of the investigation may be to provide a ‘curve of best fit’ for predictive purposes. In such an example, the fitting of successive polynomials may be the best approach. There are various strategies to decide on the polynomial of best fit depending on the objectives of the investigation.
Resumo:
1. Fitting a linear regression to data provides much more information about the relationship between two variables than a simple correlation test. A goodness of fit test of the line should always be carried out. Hence, r squared estimates the strength of the relationship between Y and X, ANOVA whether a statistically significant line is present, and the ‘t’ test whether the slope of the line is significantly different from zero. 2. Always check whether the data collected fit the assumptions for regression analysis and, if not, whether a transformation of the Y and/or X variables is necessary. 3. If the regression line is to be used for prediction, it is important to determine whether the prediction involves an individual y value or a mean. Care should be taken if predictions are made close to the extremities of the data and are subject to considerable error if x falls beyond the range of the data. Multiple predictions require correction of the P values. 3. If several individual regression lines have been calculated from a number of similar sets of data, consider whether they should be combined to form a single regression line. 4. If the data exhibit a degree of curvature, then fitting a higher-order polynomial curve may provide a better fit than a straight line. In this case, a test of whether the data depart significantly from a linear regression should be carried out.
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
1. The techniques associated with regression, whether linear or non-linear, are some of the most useful statistical procedures that can be applied in clinical studies in optometry. 2. In some cases, there may be no scientific model of the relationship between X and Y that can be specified in advance and the objective may be to provide a ‘curve of best fit’ for predictive purposes. In such cases, the fitting of a general polynomial type curve may be the best approach. 3. An investigator may have a specific model in mind that relates Y to X and the data may provide a test of this hypothesis. Some of these curves can be reduced to a linear regression by transformation, e.g., the exponential and negative exponential decay curves. 4. In some circumstances, e.g., the asymptotic curve or logistic growth law, a more complex process of curve fitting involving non-linear estimation will be required.
Resumo:
Purpose: To investigate the correlation between tests of visual function and perceived visual ability recorded with a 'quality-of-life' questionnaire for patients with central field loss. Method: 12 females and 7 males (mean age = 53.1 years; Range = 23 - 80 years) with subfoveal neovascular membranes underwent a comprehensive assessment of visual function. Tests included unaided distance vision, high and low contrast distance logMAR visual acuity (VA), Pelli-Robson contrast senstivity (at 1m), near logMAR word VA and text reading speed. All tests were done both monocularly and binocularly. The patients also completed a 28 point questionnaire separated into a 'core' section consisting of general questions about perceived visual function and a 'module' section with specific questions on reading function. Results: Step-wise multiple regression analysis was used to determine which visual function tests were correlated with the patients's perceived visual function and to rank them in order of importance. The visual function test that explains most of the variance in both 'core' score (66%0 and the 'module' score (68%) of the questionnaire is low contrast VA in the better eye (P<0.001 in both cases). Further, the module score also accounts for a significant proportion of the variance (P<0.01) of the distance logMAR VA in both the better and worse eye, and the near logMAR in both the better eye and binocularly. Conclusions: The best predictor of both perceived reading ability and of general perceived visual ability in this study is low contrast logMAR VA. The results highlight that distance VA is not the only relevant measure of visual fucntion in relation to a patients's perceived visual performance and should not be considered a determinant of surgical or management success.
Resumo:
Most of the new processes involving the utilisation of coal are based on hydroliquefaction, and in order to assess the suitability of the various coals for this purpose and to characterise coals in general, it is desirable to have a detailed and accurate knowledge of their chemical constitution and reactivity. Also, in the consumption of coals as chemical feed stocks, as in hydroliquefaction, it is advantageous to classify the coals in terms of chemical parameters as opposed to, or in addition to, carbonisation parameters. In view of this it is important to realise the functional groups on the coal hydrocarbon skeleton. In this research it was attempted to characterise coals of various rank (and subsequently their macerals) via methods involving both microwave-driven and bench top derivatisation of the hydroxyl functionalities present in coal. These hydroxyl groups are predominantly in the form of hindered phenolic groups, with other alcoholic groupings being less important, in the coals studied here. Four different techniques were employed, three of which - stannylation, silylation and methylation - were based on in situ analysis. The fourth technique - acetylation - involved derivatisation followed by analysis of a leaving group. The four different techniques were critically compared and it is concluded that silylation is the most promising technique for the evaluation of the hydroxyl content of middle rank coals and coal macerals. Derivatisation via stannylation using TBTO was impeded due to the large steric demand of the reagent and acetylation did not successfully derivatise the more hindered phenolic groups. Three novel methylation techniques were investigated and two of these show great potential. The information obtained from the techniques was correlated together to give a comprehensive insight into the coals and coal macerals studied.
Resumo:
Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.
Resumo:
The recent history of small shop and independent retailing has been one of decline. The most desirable form of assistance is the provision of information which will increase the efficiency model of marketing mix effeciveness which may be applied in small scale retailing. A further aim is to enhance theoretical development in the marketing field. Recent changes in retailing have affected location, product range, pricing and promotion practices. Although a large number of variables representing aspects of the marketing mix may be identified, it is not possible, on the basis of currently available information, to quantify or rank them according to their effect on sales performance. In designing a suitable study a major issue is that of access to a suitable representative sample of small retailers. The publish nature of the retail activities involved facilitates the use of a novel observation approach to data collection. A cross-sectional survey research design was used focussing on a clustered random sample of greengrocers and gent's fashion outfitters in the West Midlands. Linear multiple regression was the main analytical technique. Powerful regression models were evolved for both types of retailing. For greengrocers the major influences on trade are pedestrian traffic and shelf display space. For gent's outfitters they are centrality-to-other shopping, advertising and shelf display space. The models may be utilised by retailers to determine the relative strength of marketing mix variables. The level of precision is not sufficient to permit cost benefit analysis. Comparison of the findings for the two distinct kinds of business studied suggests an overall model of marketing mix effectiveness might be based on frequency of purchase, homogeneity of the shopping environment, elasticity of demand and bulk characteristics of the good sold by a shop.
Resumo:
In this thesis the validity of an Assessment Centre (called 'Extended Interview') operated on behalf of the British police is investigated. This Assessment Centre (AC) is used to select from amongst internal candidates (serving policemen and policewomen) and external candidates (graduates) for places on an accelerated promotion scheme. The literature is reviewed with respect to history, content, structure, reliability, validity, efficiency and usefulness of ACs, and to contextual issues surrounding AC use. The history of, background to and content of police Extended Interviews (Els) is described, and research issues are identified. Internal validation involved regression of overall EI grades on measures from component tests, exercises, interviews and peer nominations. Four samples numbering 126, 73, 86 and 109 were used in this part of the research. External validation involved regression of three types of criteria - training grades, rank attained, and supervisory ratings - on all EI measures. Follow-up periods for job criteria ranged from 7 to 19 years. Three samples, numbering 223, 157 and 86, were used in this part of the research. In subsidiary investigations, supervisory ratings were factor analysed and criteria intercorrelated. For two of the samples involved in the external validition, clinical/judgemental prediction was compared with mechanical (unit-weighted composite) prediction. Main conclusions are that: (1) EI selection decisions were valid, but only for a job performance criterion; relatively low validity overall was interpreted principally in terms of the questionable job relatedness of the EI procedure; (2) Els as a whole had more validity than was reflected in final EI decisions; (3) assessors' use of information was not optimum, tending to over-emphasize subjectively derived information particularly from interviews; and (4) mechanical prediction was superior to clinical/judgemental prediction for five major criteria.
Resumo:
An investigator may also wish to select a small subset of the X variables which give the best prediction of the Y variable. In this case, the question is how many variables should the regression equation include? One method would be to calculate the regression of Y on every subset of the X variables and choose the subset that gives the smallest mean square deviation from the regression. Most investigators, however, prefer to use a ‘stepwise multiple regression’ procedure. There are two forms of this analysis called the ‘step-up’ (or ‘forward’) method and the ‘step-down’ (or ‘backward’) method. This Statnote illustrates the use of stepwise multiple regression with reference to the scenario introduced in Statnote 24, viz., the influence of climatic variables on the growth of the crustose lichen Rhizocarpon geographicum (L.)DC.
Resumo:
The aim of this research work was primarily to examine the relevance of patient parameters, ward structures, procedures and practices, in respect of the potential hazards of wound cross-infection and nasal colonisation with multiple resistant strains of Staphylococcus aureus, which it is thought might provide a useful indication of a patient's general susceptibility to wound infection. Information from a large cross-sectional survey involving 12,000 patients from some 41 hospitals and 375 wards was collected over a five-year period from 1967-72, and its validity checked before any subsequent analysis was carried out. Many environmental factors and procedures which had previously been thought (but never conclusively proved) to have an influence on wound infection or nasal colonisation rates, were assessed, and subsequently dismissed as not being significant, provided that the standard of the current range of practices and procedures is maintained and not allowed to deteriorate. Retrospective analysis revealed that the probability of wound infection was influenced by the patient's age, duration of pre-operative hospitalisation, sex, type of wound, presence and type of drain, number of patients in ward, and other special risk factors, whilst nasal colonisation was found to be influenced by the patient's age, total duration of hospitalisation, sex, antibiotics, proportion of occupied beds in the ward, average distance between bed centres and special risk factors. A multi-variate regression analysis technique was used to develop statistical models, consisting of variable patient and environmental factors which were found to have a significant influence on the risks pertaining to wound infection and nasal colonisation. A relationship between wound infection and nasal colonisation was then established and this led to the development of a more advanced model for predicting wound infections, taking advantage of the additional knowledge of the patient's state of nasal colonisation prior to operation.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
The practice of evidence-based medicine involves consulting documents from repositories such as Scopus, PubMed, or the Cochrane Library. The most common approach for presenting retrieved documents is in the form of a list, with the assumption that the higher a document is on a list, the more relevant it is. Despite this list-based presentation, it is seldom studied how physicians perceive the importance of the order of documents presented in a list. This paper describes an empirical study that elicited and modeled physicians' preferences with regard to list-based results. Preferences were analyzed using a GRIP method that relies on pairwise comparisons of selected subsets of possible rank-ordered lists composed of 3 documents. The results allow us to draw conclusions regarding physicians' attitudes towards the importance of having documents ranked correctly on a result list, versus the importance of retrieving relevant but misplaced documents. Our findings should help developers of clinical information retrieval applications when deciding how retrieved documents should be presented and how performance of the application should be assessed. © 2012 Springer-Verlag Berlin Heidelberg.
Resumo:
Direct quantile regression involves estimating a given quantile of a response variable as a function of input variables. We present a new framework for direct quantile regression where a Gaussian process model is learned, minimising the expected tilted loss function. The integration required in learning is not analytically tractable so to speed up the learning we employ the Expectation Propagation algorithm. We describe how this work relates to other quantile regression methods and apply the method on both synthetic and real data sets. The method is shown to be competitive with state of the art methods whilst allowing for the leverage of the full Gaussian process probabilistic framework.