908 resultados para Analysis of multiple regression
Resumo:
Multiple regression analysis is a statistical technique which allows to predict a dependent variable from m ore than one independent variable and also to determine influential independent variables. Using experimental data, in this study the multiple regression analysis is applied to predict the room mean velocity and determine the most influencing parameters on the velocity. More than 120 experiments for four different heat source locations were carried out in a test chamber with a high level wall mounted air supply terminal at air change rates 3-6 ach. The influence of the environmental parameters such as supply air momentum, room heat load, Archimedes number and local temperature ratio, were examined by two methods: a simple regression analysis incorporated into scatter matrix plots and multiple stepwise regression analysis. It is concluded that, when a heat source is located along the jet centre line, the supply momentum mainly influences the room mean velocity regardless of the plume strength. However, when the heat source is located outside the jet region, the local temperature ratio (the inverse of the local heat removal effectiveness) is a major influencing parameter.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.
Resumo:
The aim of this research work was primarily to examine the relevance of patient parameters, ward structures, procedures and practices, in respect of the potential hazards of wound cross-infection and nasal colonisation with multiple resistant strains of Staphylococcus aureus, which it is thought might provide a useful indication of a patient's general susceptibility to wound infection. Information from a large cross-sectional survey involving 12,000 patients from some 41 hospitals and 375 wards was collected over a five-year period from 1967-72, and its validity checked before any subsequent analysis was carried out. Many environmental factors and procedures which had previously been thought (but never conclusively proved) to have an influence on wound infection or nasal colonisation rates, were assessed, and subsequently dismissed as not being significant, provided that the standard of the current range of practices and procedures is maintained and not allowed to deteriorate. Retrospective analysis revealed that the probability of wound infection was influenced by the patient's age, duration of pre-operative hospitalisation, sex, type of wound, presence and type of drain, number of patients in ward, and other special risk factors, whilst nasal colonisation was found to be influenced by the patient's age, total duration of hospitalisation, sex, antibiotics, proportion of occupied beds in the ward, average distance between bed centres and special risk factors. A multi-variate regression analysis technique was used to develop statistical models, consisting of variable patient and environmental factors which were found to have a significant influence on the risks pertaining to wound infection and nasal colonisation. A relationship between wound infection and nasal colonisation was then established and this led to the development of a more advanced model for predicting wound infections, taking advantage of the additional knowledge of the patient's state of nasal colonisation prior to operation.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
When researchers introduce a new test they have to demonstrate that it is valid, using unbiased designs and suitable statistical procedures. In this article we use Monte Carlo analyses to highlight how incorrect statistical procedures (i.e., stepwise regression, extreme scores analyses) or ignoring regression assumptions (e.g., heteroscedasticity) contribute to wrong validity estimates. Beyond these demonstrations, and as an example, we re-examined the results reported by Warwick, Nettelbeck, and Ward (2010) concerning the validity of the Ability Emotional Intelligence Measure (AEIM). Warwick et al. used the wrong statistical procedures to conclude that the AEIM was incrementally valid beyond intelligence and personality traits in predicting various outcomes. In our re-analysis, we found that the reliability-corrected multiple correlation of their measures with personality and intelligence was up to .69. Using robust statistical procedures and appropriate controls, we also found that the AEIM did not predict incremental variance in GPA, stress, loneliness, or well-being, demonstrating the importance for testing validity instead of looking for it.
Resumo:
Transitional cell carcinoma (TCC) of the urothelium is often multifocal and subsequent tumors may occur anywhere in the urinary tract after the treatment of a primary carcinoma. Patients initially presenting a bladder cancer are at significant risk of developing metachronous tumors in the upper urinary tract (UUT). We evaluated the prognostic factors of primary invasive bladder cancer that may predict a metachronous UUT TCC after radical cystectomy. The records of 476 patients who underwent radical cystectomy for primary invasive bladder TCC from 1989 to 2001 were reviewed retrospectively. The prognostic factors of UUT TCC were determined by multivariate analysis using the COX proportional hazards regression model. Kaplan-Meier analysis was also used to assess the variable incidence of UUT TCC according to different risk factors. Twenty-two patients (4.6%). developed metachronous UUT TCC. Multiplicity, prostatic urethral involvement by the bladder cancer and the associated carcinoma in situ (CIS) were significant and independent factors affecting the occurrence of metachronous UUT TCC (P = 0.0425, 0.0082, and 0.0006, respectively). These results were supported, to some extent, by analysis of the UUT TCC disease-free rate by the Kaplan-Meier method, whereby patients with prostatic urethral involvement or with associated CIS demonstrated a significantly lower metachronous UUT TCC disease-free rate than patients without prostatic urethral involvement or without associated CIS (log-rank test, P = 0.0116 and 0.0075, respectively). Multiple tumors, prostatic urethral involvement and associated CIS were risk factors for metachronous UUT TCC, a conclusion that may be useful for designing follow-up strategies for primary invasive bladder cancer after radical cystectomy.
Resumo:
This thesis Entitled “modelling and analysis of recurrent event data with multiple causes.Survival data is a term used for describing data that measures the time to occurrence of an event.In survival studies, the time to occurrence of an event is generally referred to as lifetime.Recurrent event data are commonly encountered in longitudinal studies when individuals are followed to observe the repeated occurrences of certain events. In many practical situations, individuals under study are exposed to the failure due to more than one causes and the eventual failure can be attributed to exactly one of these causes.The proposed model was useful in real life situations to study the effect of covariates on recurrences of certain events due to different causes.In Chapter 3, an additive hazards model for gap time distributions of recurrent event data with multiple causes was introduced. The parameter estimation and asymptotic properties were discussed .In Chapter 4, a shared frailty model for the analysis of bivariate competing risks data was presented and the estimation procedures for shared gamma frailty model, without covariates and with covariates, using EM algorithm were discussed. In Chapter 6, two nonparametric estimators for bivariate survivor function of paired recurrent event data were developed. The asymptotic properties of the estimators were studied. The proposed estimators were applied to a real life data set. Simulation studies were carried out to find the efficiency of the proposed estimators.
Resumo:
Objective: To identify potential prognostic factors for pulmonary thromboembolism (PTE), establishing a mathematical model to predict the risk for fatal PTE and nonfatal PTE.Method: the reports on 4,813 consecutive autopsies performed from 1979 to 1998 in a Brazilian tertiary referral medical school were reviewed for a retrospective study. From the medical records and autopsy reports of the 512 patients found with macroscopically and/or microscopically,documented PTE, data on demographics, underlying diseases, and probable PTE site of origin were gathered and studied by multiple logistic regression. Thereafter, the jackknife method, a statistical cross-validation technique that uses the original study patients to validate a clinical prediction rule, was performed.Results: the autopsy rate was 50.2%, and PTE prevalence was 10.6%. In 212 cases, PTE was the main cause of death (fatal PTE). The independent variables selected by the regression significance criteria that were more likely to be associated with fatal PTE were age (odds ratio [OR], 1.02; 95% confidence interval [CI], 1.00 to 1.03), trauma (OR, 8.5; 95% CI, 2.20 to 32.81), right-sided cardiac thrombi (OR, 1.96; 95% CI, 1.02 to 3.77), pelvic vein thrombi (OR, 3.46; 95% CI, 1.19 to 10.05); those most likely to be associated with nonfatal PTE were systemic arterial hypertension (OR, 0.51; 95% CI, 0.33 to 0.80), pneumonia (OR, 0.46; 95% CI, 0.30 to 0.71), and sepsis (OR, 0.16; 95% CI, 0.06 to 0.40). The results obtained from the application of the equation in the 512 cases studied using logistic regression analysis suggest the range in which logit p > 0.336 favors the occurrence of fatal PTE, logit p < - 1.142 favors nonfatal PTE, and logit P with intermediate values is not conclusive. The cross-validation prediction misclassification rate was 25.6%, meaning that the prediction equation correctly classified the majority of the cases (74.4%).Conclusions: Although the usefulness of this method in everyday medical practice needs to be confirmed by a prospective study, for the time being our results suggest that concerning prevention, diagnosis, and treatment of PTE, strict attention should be given to those patients presenting the variables that are significant in the logistic regression model.
Resumo:
In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.
Resumo:
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. In addition to recording TOD, the cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also identified for use as the independent variables in the regression analysis. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajectory parame- ters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowledge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace.
Resumo:
The purposes of this study were (1) to validate of the item-attribute matrix using two levels of attributes (Level 1 attributes and Level 2 sub-attributes), and (2) through retrofitting the diagnostic models to the mathematics test of the Trends in International Mathematics and Science Study (TIMSS), to evaluate the construct validity of TIMSS mathematics assessment by comparing the results of two assessment booklets. Item data were extracted from Booklets 2 and 3 for the 8th grade in TIMSS 2007, which included a total of 49 mathematics items and every student's response to every item. The study developed three categories of attributes at two levels: content, cognitive process (TIMSS or new), and comprehensive cognitive process (or IT) based on the TIMSS assessment framework, cognitive procedures, and item type. At level one, there were 4 content attributes (number, algebra, geometry, and data and chance), 3 TIMSS process attributes (knowing, applying, and reasoning), and 4 new process attributes (identifying, computing, judging, and reasoning). At level two, the level 1 attributes were further divided into 32 sub-attributes. There was only one level of IT attributes (multiple steps/responses, complexity, and constructed-response). Twelve Q-matrices (4 originally specified, 4 random, and 4 revised) were investigated with eleven Q-matrix models (QM1 ~ QM11) using multiple regression and the least squares distance method (LSDM). Comprehensive analyses indicated that the proposed Q-matrices explained most of the variance in item difficulty (i.e., 64% to 81%). The cognitive process attributes contributed to the item difficulties more than the content attributes, and the IT attributes contributed much more than both the content and process attributes. The new retrofitted process attributes explained the items better than the TIMSS process attributes. Results generated from the level 1 attributes and the level 2 attributes were consistent. Most attributes could be used to recover students' performance, but some attributes' probabilities showed unreasonable patterns. The analysis approaches could not demonstrate if the same construct validity was supported across booklets. The proposed attributes and Q-matrices explained the items of Booklet 2 better than the items of Booklet 3. The specified Q-matrices explained the items better than the random Q-matrices.
Resumo:
Bibliographical footnotes.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.