9 resultados para NIRS. Plum. Multivariate calibration. Variables selection

em DigitalCommons@The Texas Medical Center


Relevância:

50.00% 50.00%

Publicador:

Resumo:

Body fat distribution is a cardiovascular health risk factor in adults. Body fat distribution can be measured through various methods including anthropometry. It is not clear which anthropometric index is suitable for epidemiologic studies of fat distribution and cardiovascular disease. The purpose of the present study was to select a measure of body fat distribution from among a series of indices (those traditionally used in the literature and others constructed from the analysis) that is most highly correlated with lipid-related variables and is independent of overall fatness. Subjects were Mexican-American men and women (N = 1004) from a study of gallbladder disease in Starr County, Texas. Multivariate associations were sought between lipid profile measures (lipids, lipoproteins, and apolipoproteins) and two sets of anthropometric variables (4 circumferences and 6 skinfolds). This was done to assess the association between lipid-related measures and the two sets of anthropometric variables and guide the construction of indices.^ Two indices emerged from the analysis that seemed to be highly correlated with lipid profile measures independent of obesity. These indices are: 2*arm circumference-thigh skinfold in pre- and post-menopausal women and arm/thigh circumference ratio in men. Next, using the sum of all skinfolds to represent obesity and the selected body fat distribution indices, the following hypotheses were tested: (1) state of obesity and centrally/upper distributed body fat are equally predictive of lipids, lipoproteins and apolipoproteins, and (2) the correlation among the lipid-related measures is not altered by obesity and body fat distribution.^ With respect to the first hypothesis, the present study found that most lipids, lipoproteins and apolipoproteins were significantly associated with both overall fatness and anatomical location of body fat in both sex and menopausal groups. However, within men and post-menopausal women, certain lipid profile measures (triglyceride and HDLT among post-menopausal women and apos C-II, CIII, and E among men) had substantially higher correlation with body fat distribution as compared with overall fatness.^ With respect to the second hypothesis, both obesity and body fat distribution were found to alter the association among plasma lipid variables in men and women. There was a suggestion from the data that the pattern of correlations among men and post-menopausal women are more comparable. Among men correlations involving apo A-I, HDLT, and HDL$\sb2$ seemed greatly influenced by obesity, and A-II by fat distribution; among post-menopausal women correlations involving apos A-I and A-II were highly affected by the location of body fat.^ Thus, these data point out that not only can obesity and fat distribution affect levels of single measures, they also can markedly influence the pattern of relationship among measures. The fact that such changes are seen for both obesity and fat distribution is significant, since the indices employed were chosen because they were independent of one another. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A historical prospective study was designed to assess the man weight status of subjects who participated in a behavioral weight reduction program in 1983 and to determine whether there was an association between the dependent variable weight change and any of 31 independent variables after a 2 year follow-up period. Data was obtained by abstracting the subjects records and from a follow-up questionnaire administered 2 years following program participation. Five hundred nine subjects (386 females and 123 males) of 1460 subjects who participated in the program, completed and returned the questionnaire. Results showed that mean weight was significantly different (p < 0.001) between the measurement at baseline and after a 2 year follow-up period. The mean weight loss of the group was 5.8 pounds, 10.7 pounds for males and 4.2 pounds for females after a 2 year follow-up period. A total of 63.9% of the group, 69.9% of males and 61.9% of females were still below their initial weight after the 2 year follow-up period. Sixteen of the 31 variables assessed utilizing bivariate analyses were found to be significantly (p (LESSTHEQ) 0.05) associated with weight change after a 2 year follow-up period. These variables were then entered into a multivariate linear regression model. A total of 37.9% of the variance of the dependent variable, weight change, was accounted for by all 16 variables. Eight of these variables were found to be significantly (p (LESSTHEQ) 0.05) predictive of weight change in the stepwise multivariate process accounting for 37.1% of the variance. These variables included: Two baseline variables (percent over ideal body weight at enrollment and occupation) and six follow-up variables (feeling in control of eating habits, percent of body weight lost during treatment, frequency of weight measurement, physical activity, eating in response to emotions, and number of pounds of weight gain needed to resume a diet). It was concluded that a greater amount of emphasis should be placed on the six follow-up variables by clinicians involved in the treatment of obesity, and by the subjects themselves to enhance their chances of success at long-term weight loss. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A multivariate frailty hazard model is developed for joint-modeling of three correlated time-to-event outcomes: (1) local recurrence, (2) distant recurrence, and (3) overall survival. The term frailty is introduced to model population heterogeneity. The dependence is modeled by conditioning on a shared frailty that is included in the three hazard functions. Independent variables can be included in the model as covariates. The Markov chain Monte Carlo methods are used to estimate the posterior distributions of model parameters. The algorithm used in present application is the hybrid Metropolis-Hastings algorithm, which simultaneously updates all parameters with evaluations of gradient of log posterior density. The performance of this approach is examined based on simulation studies using Exponential and Weibull distributions. We apply the proposed methods to a study of patients with soft tissue sarcoma, which motivated this research. Our results indicate that patients with chemotherapy had better overall survival with hazard ratio of 0.242 (95% CI: 0.094 - 0.564) and lower risk of distant recurrence with hazard ratio of 0.636 (95% CI: 0.487 - 0.860), but not significantly better in local recurrence with hazard ratio of 0.799 (95% CI: 0.575 - 1.054). The advantages and limitations of the proposed models, and future research directions are discussed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current statistical methods for estimation of parametric effect sizes from a series of experiments are generally restricted to univariate comparisons of standardized mean differences between two treatments. Multivariate methods are presented for the case in which effect size is a vector of standardized multivariate mean differences and the number of treatment groups is two or more. The proposed methods employ a vector of independent sample means for each response variable that leads to a covariance structure which depends only on correlations among the $p$ responses on each subject. Using weighted least squares theory and the assumption that the observations are from normally distributed populations, multivariate hypotheses analogous to common hypotheses used for testing effect sizes were formulated and tested for treatment effects which are correlated through a common control group, through multiple response variables observed on each subject, or both conditions.^ The asymptotic multivariate distribution for correlated effect sizes is obtained by extending univariate methods for estimating effect sizes which are correlated through common control groups. The joint distribution of vectors of effect sizes (from $p$ responses on each subject) from one treatment and one control group and from several treatment groups sharing a common control group are derived. Methods are given for estimation of linear combinations of effect sizes when certain homogeneity conditions are met, and for estimation of vectors of effect sizes and confidence intervals from $p$ responses on each subject. Computational illustrations are provided using data from studies of effects of electric field exposure on small laboratory animals. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The role of clinical chemistry has traditionally been to evaluate acutely ill or hospitalized patients. Traditional statistical methods have serious drawbacks in that they use univariate techniques. To demonstrate alternative methodology, a multivariate analysis of covariance model was developed and applied to the data from the Cooperative Study of Sickle Cell Disease.^ The purpose of developing the model for the laboratory data from the CSSCD was to evaluate the comparability of the results from the different clinics. Several variables were incorporated into the model in order to control for possible differences among the clinics that might confound any real laboratory differences.^ Differences for LDH, alkaline phosphatase and SGOT were identified which will necessitate adjustments by clinic whenever these data are used. In addition, aberrant clinic values for LDH, creatinine and BUN were also identified.^ The use of any statistical technique including multivariate analysis without thoughtful consideration may lead to spurious conclusions that may not be corrected for some time, if ever. However, the advantages of multivariate analysis far outweigh its potential problems. If its use increases as it should, the applicability to the analysis of laboratory data in prospective patient monitoring, quality control programs, and interpretation of data from cooperative studies could well have a major impact on the health and well being of a large number of individuals. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When choosing among models to describe categorical data, the necessity to consider interactions makes selection more difficult. With just four variables, considering all interactions, there are 166 different hierarchical models and many more non-hierarchical models. Two procedures have been developed for categorical data which will produce the "best" subset or subsets of each model size where size refers to the number of effects in the model. Both procedures are patterned after the Leaps and Bounds approach used by Furnival and Wilson for continuous data and do not generally require fitting all models. For hierarchical models, likelihood ratio statistics (G('2)) are computed using iterative proportional fitting and "best" is determined by comparing, among models with the same number of effects, the Pr((chi)(,k)('2) (GREATERTHEQ) G(,ij)('2)) where k is the degrees of freedom for ith model of size j. To fit non-hierarchical as well as hierarchical models, a weighted least squares procedure has been developed.^ The procedures are applied to published occupational data relating to the occurrence of byssinosis. These results are compared to previously published analyses of the same data. Also, the procedures are applied to published data on symptoms in psychiatric patients and again compared to previously published analyses.^ These procedures will make categorical data analysis more accessible to researchers who are not statisticians. The procedures should also encourage more complex exploratory analyses of epidemiologic data and contribute to the development of new hypotheses for study. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In light of the new healthcare regulations, hospitals are increasingly reevaluating their IT integration strategies to meet expanded healthcare information exchange requirements. Nevertheless, hospital executives do not have all the information they need to differentiate between the available strategies and recognize what may better fit their organizational needs. ^ In the interest of providing the desired information, this study explored the relationships between hospital financial performance, integration strategy selection, and strategy change. The integration strategies examined – applied as binary logistic regression dependent variables and in the order from most to least integrated – were Single-Vendor (SV), Best-of-Suite (BoS), and Best-of-Breed (BoB). In addition, the financial measurements adopted as independent variables for the models were two administrative labor efficiency and six industry standard financial ratios designed to provide a broad proxy of hospital financial performance. Furthermore, descriptive statistical analyses were carried out to evaluate recent trends in hospital integration strategy change. Overall six research questions were proposed for this study. ^ The first research question sought to answer if financial performance was related to the selection of integration strategies. The next questions, however, explored whether hospitals were more likely to change strategies or remain the same when there was no external stimulus to change, and if they did change, they would prefer strategies closer to the existing ones. These were followed by a question that inquired if financial performance was also related to strategy change. Nevertheless, rounding up the questions, the last two probed if the new Health Information Technology for Economic and Clinical Health (HITECH) Act had any impact on the frequency and direction of strategy change. ^ The results confirmed that financial performance is related to both IT integration strategy selection and strategy change, while concurred with prior studies that suggested hospital and environmental characteristics are associated factors as well. Specifically this study noted that the most integrated SV strategy is related to increased administrative labor efficiency and the hybrid BoS strategy is associated with improved financial health (based on operating margin and equity financing ratios). On the other hand, no financial indicators were found to be related to the least integrated BoB strategy, except for short-term liquidity (current ratio) when involving strategy change. ^ Ultimately, this study concluded that when making IT integration strategy decisions hospitals closely follow the resource dependence view of minimizing uncertainty. As each integration strategy may favor certain organizational characteristics, hospitals traditionally preferred not to make strategy changes and when they did, they selected strategies that were more closely related to the existing ones. However, as new regulations further heighten revenue uncertainty while require increased information integration, moving forward, as evidence already suggests a growing trend of organizations shifting towards more integrated strategies, hospitals may be more limited in their strategy selection choices.^