834 resultados para multiple regression analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Indoor residual spraying (IRS) has become an increasingly popular method of insecticide use for malaria control, and many recent studies have reported on its effectiveness in reducing malaria burden in a single community or region. There is a need for systematic review and integration of the published literature on IRS and the contextual determining factors of its success in controlling malaria. This study reports the findings of a meta-regression analysis based on 13 published studies, which were chosen from more than 400 articles through a systematic search and selection process. The summary relative risk for reducing malaria prevalence was 0.38 (95% confidence interval = 0.31-0.46), which indicated a risk reduction of 62%. However, an excessive degree of heterogeneity was found between the studies. The meta-regression analysis indicates that IRS is more effective with high initial prevalence, multiple rounds of spraying, use of DDT, and in regions with a combination of Plasmodium falciparum and P. vivax malaria.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To identify potential prognostic factors for pulmonary thromboembolism (PTE), establishing a mathematical model to predict the risk for fatal PTE and nonfatal PTE.Method: the reports on 4,813 consecutive autopsies performed from 1979 to 1998 in a Brazilian tertiary referral medical school were reviewed for a retrospective study. From the medical records and autopsy reports of the 512 patients found with macroscopically and/or microscopically,documented PTE, data on demographics, underlying diseases, and probable PTE site of origin were gathered and studied by multiple logistic regression. Thereafter, the jackknife method, a statistical cross-validation technique that uses the original study patients to validate a clinical prediction rule, was performed.Results: the autopsy rate was 50.2%, and PTE prevalence was 10.6%. In 212 cases, PTE was the main cause of death (fatal PTE). The independent variables selected by the regression significance criteria that were more likely to be associated with fatal PTE were age (odds ratio [OR], 1.02; 95% confidence interval [CI], 1.00 to 1.03), trauma (OR, 8.5; 95% CI, 2.20 to 32.81), right-sided cardiac thrombi (OR, 1.96; 95% CI, 1.02 to 3.77), pelvic vein thrombi (OR, 3.46; 95% CI, 1.19 to 10.05); those most likely to be associated with nonfatal PTE were systemic arterial hypertension (OR, 0.51; 95% CI, 0.33 to 0.80), pneumonia (OR, 0.46; 95% CI, 0.30 to 0.71), and sepsis (OR, 0.16; 95% CI, 0.06 to 0.40). The results obtained from the application of the equation in the 512 cases studied using logistic regression analysis suggest the range in which logit p > 0.336 favors the occurrence of fatal PTE, logit p < - 1.142 favors nonfatal PTE, and logit P with intermediate values is not conclusive. The cross-validation prediction misclassification rate was 25.6%, meaning that the prediction equation correctly classified the majority of the cases (74.4%).Conclusions: Although the usefulness of this method in everyday medical practice needs to be confirmed by a prospective study, for the time being our results suggest that concerning prevention, diagnosis, and treatment of PTE, strict attention should be given to those patients presenting the variables that are significant in the logistic regression model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purposes of this study were (1) to validate of the item-attribute matrix using two levels of attributes (Level 1 attributes and Level 2 sub-attributes), and (2) through retrofitting the diagnostic models to the mathematics test of the Trends in International Mathematics and Science Study (TIMSS), to evaluate the construct validity of TIMSS mathematics assessment by comparing the results of two assessment booklets. Item data were extracted from Booklets 2 and 3 for the 8th grade in TIMSS 2007, which included a total of 49 mathematics items and every student's response to every item. The study developed three categories of attributes at two levels: content, cognitive process (TIMSS or new), and comprehensive cognitive process (or IT) based on the TIMSS assessment framework, cognitive procedures, and item type. At level one, there were 4 content attributes (number, algebra, geometry, and data and chance), 3 TIMSS process attributes (knowing, applying, and reasoning), and 4 new process attributes (identifying, computing, judging, and reasoning). At level two, the level 1 attributes were further divided into 32 sub-attributes. There was only one level of IT attributes (multiple steps/responses, complexity, and constructed-response). Twelve Q-matrices (4 originally specified, 4 random, and 4 revised) were investigated with eleven Q-matrix models (QM1 ~ QM11) using multiple regression and the least squares distance method (LSDM). Comprehensive analyses indicated that the proposed Q-matrices explained most of the variance in item difficulty (i.e., 64% to 81%). The cognitive process attributes contributed to the item difficulties more than the content attributes, and the IT attributes contributed much more than both the content and process attributes. The new retrofitted process attributes explained the items better than the TIMSS process attributes. Results generated from the level 1 attributes and the level 2 attributes were consistent. Most attributes could be used to recover students' performance, but some attributes' probabilities showed unreasonable patterns. The analysis approaches could not demonstrate if the same construct validity was supported across booklets. The proposed attributes and Q-matrices explained the items of Booklet 2 better than the items of Booklet 3. The specified Q-matrices explained the items better than the random Q-matrices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An investigator may also wish to select a small subset of the X variables which give the best prediction of the Y variable. In this case, the question is how many variables should the regression equation include? One method would be to calculate the regression of Y on every subset of the X variables and choose the subset that gives the smallest mean square deviation from the regression. Most investigators, however, prefer to use a ‘stepwise multiple regression’ procedure. There are two forms of this analysis called the ‘step-up’ (or ‘forward’) method and the ‘step-down’ (or ‘backward’) method. This Statnote illustrates the use of stepwise multiple regression with reference to the scenario introduced in Statnote 24, viz., the influence of climatic variables on the growth of the crustose lichen Rhizocarpon geographicum (L.)DC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Boards of directors are thought to provide access to a wealth of knowledge and resources for the companies they serve, and are considered important to corporate governance. Under the Resource Based View (RBV) of the firm (Wernerfelt, 1984) boards are viewed as a strategic resource available to firms. As a consequence there has been a significant research effort aimed at establishing a link between board attributes and company performance. In this thesis I explore and extend the study of interlocking directorships (Mizruchi, 1996; Scott 1991a) by examining the links between directors’ opportunity networks and firm performance. Specifically, I use resource dependence theory (Pfeffer & Salancik, 1978) and social capital theory (Burt, 1980b; Coleman, 1988) as the basis for a new measure of a board’s opportunity network. I contend that both directors’ formal company ties and their social ties determine a director’s opportunity network through which they are able to access and mobilise resources for their firms. This approach is based on recent studies that suggest the measurement of interlocks at the director level, rather than at the firm level, may be a more reliable indicator of this phenomenon. This research uses publicly available data drawn from Australia’s top-105 listed companies and their directors in 1999. I employ Social Network Analysis (SNA) (Scott, 1991b) using the UCINET software to analyse the individual director’s formal and social networks. SNA is used to measure a the number of ties a director has to other directors in the top-105 company director network at both one and two degrees of separation, that is, direct ties and indirect (or ‘friend of a friend’) ties. These individual measures of director connectedness are aggregated to produce a board-level network metric for comparison with measures of a firm’s performance using multiple regression analysis. Performance is measured with accounting-based and market-based measures. Findings indicate that better-connected boards are associated with higher market-based company performance (measured by Tobin’s q). However, weaker and mostly unreliable associations were found for accounting-based performance measure ROA. Furthermore, formal (or corporate) network ties are a stronger predictor of market performance than total network ties (comprising social and corporate ties). Similarly, strong ties (connectedness at degree-1) are better predictors of performance than weak ties (connectedness at degree-2). My research makes four contributions to the literature on director interlocks. First, it extends a new way of measuring a board’s opportunity network based on the director rather than the company as the unit of interlock. Second, it establishes evidence of a relationship between market-based measures of firm performance and the connectedness of that firm’s board. Third, it establishes that director’s formal corporate ties matter more to market-based firm performance than their social ties. Fourth, it establishes that director’s strong direct ties are more important to market-based performance than weak ties. The thesis concludes with implications for research and practice, including a more speculative interpretation of these results. In particular, I raise the possibility of reverse causality – that is networked directors seek to join high-performing companies. Thus, the relationship may be a result of symbolic action by companies seeking to increase the legitimacy of their firms rather than a reflection of the social capital available to the companies. This is an important consideration worthy of future investigation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With growing population and fast urbanization in Australia, it is a challenging task to maintain our water quality. It is essential to develop an appropriate statistical methodology in analyzing water quality data in order to draw valid conclusions and hence provide useful advices in water management. This paper is to develop robust rank-based procedures for analyzing nonnormally distributed data collected over time at different sites. To take account of temporal correlations of the observations within sites, we consider the optimally combined estimating functions proposed by Wang and Zhu (Biometrika, 93:459-464, 2006) which leads to more efficient parameter estimation. Furthermore, we apply the induced smoothing method to reduce the computational burden. Smoothing leads to easy calculation of the parameter estimates and their variance-covariance matrix. Analysis of water quality data from Total Iron and Total Cyanophytes shows the differences between the traditional generalized linear mixed models and rank regression models. Our analysis also demonstrates the advantages of the rank regression models for analyzing nonnormal data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an optimization algorithm for an ammonia reactor based on a regression model relating the yield to several parameters, control inputs and disturbances. This model is derived from the data generated by hybrid simulation of the steady-state equations describing the reactor behaviour. The simplicity of the optimization program along with its ability to take into account constraints on flow variables make it best suited in supervisory control applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present work presents the results of experimental investigation of semi-solid rheocasting of A356 Al alloy using a cooling slope. The experiments have been carried out following Taguchi method of parameter design (orthogonal array of L-9 experiments). Four key process variables (slope angle, pouring temperature, wall temperature, and length of travel of the melt) at three different levels have been considered for the present experimentation. Regression analysis and analysis of variance (ANOVA) has also been performed to develop a mathematical model for degree of sphericity evolution of primary alpha-Al phase and to find the significance and percentage contribution of each process variable towards the final outcome of degree of sphericity, respectively. The best processing condition has been identified for optimum degree of sphericity (0.83) as A(3), B-3, C-2, D-1 i.e., slope angle of 60 degrees, pouring temperature of 650 degrees C, wall temperature 60 degrees C, and 500 mm length of travel of the melt, based on mean response and signal to noise ratio (SNR). ANOVA results shows that the length of travel has maximum impact on degree of sphericity evolution. The predicted sphericity obtained from the developed regression model and the values obtained experimentally are found to be in good agreement with each other. The sphericity values obtained from confirmation experiment, performed at 95% confidence level, ensures that the optimum result is correct and also the confirmation experiment values are within permissible limits. (c) 2014 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examined the economic potential of fish farming in Abeokuta zone of Ogun State in the 2003 production season. Descriptive statistics cost returns and multiple regression analysis were used in analyzing the data. The farmers predominantly practiced monoculture. Inefficiency in the use of pond size, lime and labour with over-utilization of fingerlings stocked was revealed by the study. The average variable cost of N124.67 constituted 45% of the total while average fixed cost was N149.802.67 per average farm size. Fish farming was found to be a profitable venture in the study area with a net income of N761, 400.58 for an average pond size of 301.47sq.m. Based on these findings, it is suggested that for profit maximization, the fish farm will have to increase the level of their use of fingerlings and fertilizers and decrease the use of lime labour and pond size