26 resultados para Multiple Additive Regression Trees (MART)
em Aston University Research Archive
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.
Resumo:
BACKGROUND: In the light of sub-optimal uptake of the measles, mumps, and rubella (MMR) vaccination, we investigated the factors that influence the intentions of mothers to vaccinate. METHOD: A cross-sectional survey of 300 mothers in Birmingham with children approaching a routine MMR vaccination was conducted using a postal questionnaire to measure: intention to vaccinate, psychological variables, knowledge of the vaccine, and socioeconomic status. The vaccination status of the children was obtained from South Birmingham Child Health Surveillance Unit. RESULTS: The response rate was 59%. Fewer mothers approaching the second MMR vaccination (Group 2) intended to take their children for this vaccination than Group 1 (mothers approaching the first MMR vaccination) (Mann-Whitney U = 2180, P < 0.0001). Group 2 expressed more negative beliefs about the outcome of having the MMR vaccine ('vaccine outcome beliefs') (Mann-Whitney U = 2155, P < 0.0001), were more likely to believe it was 'unsafe' (chi 2 = 9.114, P = 0.004) and that it rarely protected (chi 2 = 6.882, P = 0.014) than Group 1. The commonest side-effect cited was general malaise, but 29.8% cited autism. The most trusted source of information was the general practitioner but the most common source of information on side-effects was television (34.6%). Multiple linear regression revealed that, in Group 1, only 'vaccine outcome beliefs' significantly predicted intention (77.1% of the variance). In Group 2 'vaccine outcome beliefs', attitude to the MMR vaccine, and prior MMR status all predicted intention (93% of the variance). CONCLUSION: A major reason for the low uptake of the MMR vaccination is that it is not perceived to be important for children's health, particularly the second dose. Health education from GPs is likely to have a considerable impact.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
How does a firm choose a proper model of foreign direct investment (FDI) for entering a foreign market? Which mode of entry performs better? What are the performance implications of joint venture (JV) ownership structure? These important questions face a multinational enterprise (MNE) that decides to enter a foreign market. However, few studies have been conducted on such issues, and no consistent or conclusive findings are generated, especially with respect to China. It’s composed of five chapters, providing corresponding answers to the questions given above. Specifically, Chapter One is an overall introductory chapter. Chapter Two is about the choice of entry mode of FDI in China. Chapter Three examines the relationship between four main entry modes and performance. Chapter Four explores the performance implications of JV ownership structure. Chapter Five is an overall concluding chapter. These empirical studies are based on the most recent and richest data that has never been explored in previous studies. It contains information on 11,765 foreign-invested enterprises in China in seven manufacturing industries in 2000, 10,757 in 1999, and 10,666 in 1998. The four FDI entry modes examined include wholly-owned enterprises (WOEs), equity joint ventures (EJVs), contractual joint ventures (CJVs), and joint stock companies (JSCs). In Chapter Two, a multinominal logit model is established, and techniques of multiple linear regression analysis are employed in Chapter Three and Four. It was found that MNEs, under the conditions of a good investment environment, large capital commitment and small cultural distance, prefer the WOE strategy. If these conditions are not met, the EJV mode would be of greater use. The relative propensity to pursue the CJV mode increases with a good investment environment, small capital commitment, and small cultural distance. JSCs are not favoured by MNEs when the investment environment improves and when affiliates are located in the coastal areas. MNEs have been found to have a greater preference for an EJV as a mode of entry into the Chinese market in all industries. It is also found that in terms of return on assets (ROA) and asset turnover, WOEs perform the best, followed by EJVs, CJVs, and JSCs. Finally, minority-owned EJVs or JSCs are found to outperform their majority-owned counterparts in terms of ROA and asset turnover.
Resumo:
This book is aimed primarily at microbiologists who are undertaking research and who require a basic knowledge of statistics to analyse their experimental data. Computer software employing a wide range of data analysis methods is widely available to experimental scientists. The availability of this software, however, makes it essential that investigators understand the basic principles of statistics. Statistical analysis of data can be complex with many different methods of approach, each of which applies in a particular experimental circumstance. Hence, it is possible to apply an incorrect statistical method to data and to draw the wrong conclusions from an experiment. The purpose of this book, which has its origin in a series of articles published in the Society for Applied Microbiology journal ‘The Microbiologist’, is an attempt to present the basic logic of statistics as clearly as possible and therefore, to dispel some of the myths that often surround the subject. The 28 ‘Statnotes’ deal with various topics that are likely to be encountered, including the nature of variables, the comparison of means of two or more groups, non-parametric statistics, analysis of variance, correlating variables, and more complex methods such as multiple linear regression and principal components analysis. In each case, the relevant statistical method is illustrated with examples drawn from experiments in microbiological research. The text incorporates a glossary of the most commonly used statistical terms and there are two appendices designed to aid the investigator in the selection of the most appropriate test.
Resumo:
The Aston Eye Study (AES) was instigated in October 2005 to determine the distribution of refractive error and associated ocular biometry in a sample of UK urban school children. The AES is the first study to compare outcome measures separately in White, South Asian and Black children. Children were selected from two age groups (Year 2 children aged 6/7 years, Year8 children aged 12/13 years of age) using random cluster sampling of schools in Birmingham, West Midlands UK. To date, the AES has examined 598 children (302 Year 2,296 Year 8). Using open-field cycloplegic autorefraction, the overall prevalence of myopia (=-0.50D SER in either eye) determined was 19.6%, with a higher prevalence in older (29.4%) compared to younger (9.9%) children (p<0.001). Using multiple logistic regression models, the risk of myopia was higher in Year 8 South Asian compared to White children and higher in children attending grammar schools relative to comprehensive schools. In addition, the prevalence of uncorrected ametropia was found to be high (Year 8: 12.84%, Year 2: 15.23%), which will be of concern to bodies responsible for the implementation of school vision screening strategies. Biometric data using non-contact partial coherence interferometry revealed a contributory effect of axial length (AL) and central corneal radius (CR) on myopic refraction, resulting in a strong coefficient of determination of the AL/CR ratio on refractive error. Ocular biometric measures did not vary significantly as a function of ethnicity, suggesting a greater miscorrelation of components in susceptible ethnic groups to account for their higher myopia prevalence. Corneal radius was found to be steeper in myopes in both age groups, but was found to flatten with increasing axial length. Due to the inextricable link between myopia and axial elongation, the paradoxical finding of the cornea demands further longitudinal investigation, particularly in relation to myopia onset. Questionnaire analysis revealed a history of myopia in parents and siblings to be significantly associated with myopia in Year 8 children, with a dose-dependent rise in the odds ratio of myopia evident with increasing number of myopic parents. By classifying socioeconomic status (SES) using Index of Multiple Deprivation values, it was found that Year 8 children from moderately deprived backgrounds were more at risk of myopia compared with children located at both extremities of the deprivation spectrum. However, the main effect of SES weakened following multivariate analysis, with South Asian ethnicity and grammar schooling remaining associated with Year 8 myopia after adjustment.
Resumo:
Background/Aim - People of south Asian origin have an excessive risk of morbidity and mortality from cardiovascular disease. We examined the effect of ethnicity on known risk factors and analysed the risk of cardiovascular events and mortality in UK south Asian and white Europeans patients with type 2 diabetes over a 2 year period. Methods - A total of 1486 south Asian (SA) and 492 white European (WE) subjects with type 2 diabetes were recruited from 25 general practices in Coventry and Birmingham, UK. Baseline data included clinical history, anthropometry and measurements of traditional risk factors – blood pressure, total cholesterol, HbA1c. Multiple linear regression models were used to examine ethnicity differences in individual risk factors. Ten-year cardiovascular risk was estimated using the Framingham and UKPDS equations. All subjects were followed up for 2 years. Cardiovascular events (CVD) and mortality between the two groups were compared. Findings - Significant differences were noted in risk profiles between both groups. After adjustment for clustering and confounding a significant ethnicity effect remained only for higher HbA1c (0.50 [0.22 to 0.77]; P?=?0.0004) and lower HDL (-0.09 [-0.17 to -0.01]; P?=?0.0266). Baseline CVD history was predictive of CVD events during follow-up for SA (P?0.0001) but not WE (P?=?0.189). Mean age at death was 66.8 (11.8) for SA vs. 74.2 (12.1) for WE, a difference of 7.4 years (95% CI 1.0 to 13.7 years), P?=?0.023. The adjusted odds ratio of CVD event or death from CVD was greater but not significantly so in SA than in WE (OR 1.4 [0.9 to 2.2]). Limitations - Fewer events in both groups and short period of follow-up are key limitations. Longer follow-up is required to see if the observed differences between the ethnic groups persist. Conclusion - South Asian patients with type 2 diabetes in the UK have a higher cardiovascular risk and present with cardiovascular events at a significantly younger age than white Europeans. Enhanced and ethnicity specific targets and effective treatments are needed if these inequalities are to be reduced.
Resumo:
Objective - This study investigated and compared the prevalence of microalbuminuria and overt proteinuria and their determinants in a cohort of UK resident patients of white European or south Asian ethnicity with type 2 diabetes mellitus. Research design and methods - A total of 1978 patients, comprising 1486 of south Asian and 492 of white European ethnicity, in 25 general practices in Coventry and Birmingham inner city areas in England were studied in a cross-sectional study. Demographic and risk factor data were collected and presence of microalbuminuria and overt proteinuria assessed. Main outcome measures - Prevalences of microalbuminuria and overt proteinuria. Results - Urinary albumin:creatinine measurements were available for 1852 (94%) patients. The south Asian group had a lower prevalence of microalbuminuria, 19% vs. 23% and a higher prevalence of overt proteinuria, 8% vs. 3%, X2?=?15.85, 2df, P?=?0.0004. In multiple logistic regression models, adjusted for confounding factors, significantly increased risk for the south Asian vs. white European patients for overt proteinuria was shown; OR (95% CI) 2.17 (1.05, 4.49), P?=?0.0365. For microalbuminuria, an interaction effect for ethnicity and duration of diabetes suggested that risk for south Asian patients was lower in early years following diagnosis; OR for SA vs. WH at durations 0 and 1 year were 0.56 (0.37, 0.86) and 0.59 (0.39, 0.89) respectively. After 20 years’ duration, OR?=?1.40 (0.63, 3.08). Limitations - Comparability of ethnicity defined groups; statistical methods controlled for differences between groups, but residual confounding may remain. Analyses are based on a single measure of albumin:creatinine ratio. Conclusions - There were significant differences between ethnicity groups in risk factor profiles and microalbuminuria and overt proteinuria outcomes. Whilst south Asian patients had no excess risk of microalbuminuria, the risk of overt proteinuria was elevated significantly, which might be explained by faster progression of renal dysfunction in patients of south Asian ethnicity.
Resumo:
Purpose - Anterior segment optical coherent tomography (AS-OCT) is used to further examine previous reports that ciliary muscle thickness (CMT) is increased in myopic eyes. With reference to temporal and nasal CMT, interrelationships between biometric and morphological characteristics of anterior and posterior segments are analysed for British-White and British-South-Asian adults with and without myopia. Methods - Data are presented for the right eyes of 62 subjects (British-White n = 39, British-South-Asian n = 23, aged 18–40 years) with a range of refractive error (mean spherical error (MSE (D)) -1.74 ± 3.26; range -10.06 to +4.38) and separated into myopes (MSE (D) <-0.50, range -10.06 to -0.56; n = 30) and non-myopes (MSE (D) =-0.50, -0.50 to +4.38; n = 32). Temporal and nasal ciliary muscle cross-sections were imaged using a Visante AS-OCT. Using Visante software, manual measures of nasal and temporal CMT (NCMT and TCMT respectively) were taken in successive posterior 1 mm steps from the scleral spur over a 3 mm distance (designated NCMT1, TCMT1 et seq). Measures of axial length and anterior chamber depth were taken with an IOLMaster biometer. MSE and corneal curvature (CC) measurements were taken with a Shin-Nippon auto-refractor. Magnetic resonance imaging was used to determine total ocular volume (OV) for 31 of the original subject group. Statistical comparisons and analyses were made using mixed repeated measures anovas, Pearson's correlation coefficient and stepwise forward multiple linear regression. Results - MSE was significantly associated with CMT, with thicker CMT2 and CMT3 being found in the myopic eyes (p = 0.002). In non-myopic eyes TCMT1, TCMT2, NCMT1 and NCMT2 correlated significantly with MSE, AL and OV (p < 0.05). In contrast, myopic eyes failed generally to exhibit a significant correlation between CMT, MSE and axial length but notably retained a significant correlation between OV, TCMT2, TCMT3, NCMT2 and NCMT3 (p < 0.05). OV was found to be a significantly better predictor of TCMT2 and TCMT3 than AL by approximately a factor of two (p < 0.001). Anterior chamber depth was significantly associated with both temporal and nasal CMT2 and CMT3; TCMT1 correlated positively with CC. Ethnicity had no significant effect on differences in CMT. Conclusions - Increased CMT is associated with myopia. We speculate that the lack of correlation in myopic subjects between CMT and axial length, but not between CMT and OV, is evidence that disrupted feedback between the fovea and ciliary apparatus occurs in myopia development.
Resumo:
The purpose of this study was to investigate cortisol levels as a function of the hypothalamic-pituitary-adrenal axis (HPA) in relation to alexithymia in patients with somatoform disorders (SFD). Diurnal salivary cortisol was sampled in 32 patients with SFD who also underwent a psychiatric examination and filled in questionnaires (Toronto Alexithymia Scale, TAS scale; Screening for Somatoform Symptoms, SOMS scale; Hamilton Depression Scale, HAMD). The mean TAS total score in the sample was 55.69.6, 32% of patients being classified as alexithymic on the basis of their TAS scores. Depression scores were moderate (HAMD=13.2, Beck Depression Inventory, BDI=16.5). The patients' alexithymia scores (TAS scale Difficulty identifying feelings) correlated significantly positively with their somatization scale scores (Symptom Checklist-90 Revised, SCL-90-R); r=0.3438 (P0.05) and their scores on the Global Severity Index (GSI) on the SCL-90-R; r=0.781 (P0.01). Regression analysis was performed with cortisol variables as the dependent variables. Cortisol levels [measured by the area under the curve-ground (AUC-G), area under the curve-increase (AUC-I) and morning cortisol (MCS)] were best predicted in a multiple linear regression model by lower depressive scores (HAMD) and more psychopathological symptoms (SCL-90-R). No significant correlations were found between the patients' alexithymia scores (TAS) and cortisol levels. The healthy control group (n=25) demonstrated significantly higher cortisol levels than did the patients with SFD; in both tests P0.001 for AUC-G and AUC-I. However, the two groups did not differ in terms of their mean morning cortisol levels (P0.05). The results suggest that pre-existing hypocortisolism might possibly be associated with SFD.
Resumo:
The paper proposes an ISE (Information goal, Search strategy, Evaluation threshold) user classification model based on Information Foraging Theory for understanding user interaction with content-based image retrieval (CBIR). The proposed model is verified by a multiple linear regression analysis based on 50 users' interaction features collected from a task-based user study of interactive CBIR systems. To our best knowledge, this is the first principled user classification model in CBIR verified by a formal and systematic qualitative analysis of extensive user interaction data. Copyright 2010 ACM.