7 resultados para Multivariate models

em DigitalCommons@The Texas Medical Center


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this study was to determine whether depression is a factor in explaining the difference in sex behaviors among adolescents with different ethnic backgrounds, family and school contexts. We hypothesize that adolescents with a higher number of depressive symptoms are more likely to engage in sexual risk behaviors than adolescents with fewer depressive symptoms. Further, adolescent depression and sexual behaviors are mediated or moderated by individual characteristics, family and school contexts. ^ Background. large ethnic disparities exist in adolescent engagement in risky sexual behaviors, yet, there is little in the literature that explains these disparities. Studies of sexual behavior of youths abound; yet, there is little literature on the prevalence and correlates of depression or the association between depression and sexual behaviors among different ethnic groups. Objectives. (1) To determine ethnic differences in the prevalence of depressive symptoms using data collected through the National Longitudinal Study of Adolescent Health (Add Health). (2) To determine predictors of sex risk behaviors among adolescents, including the role of depression. (3) To identify predictors of depression among these adolescents. Methods. Add Health data from wave 1 and wave 2 interviews of 7th–12th graders were analyzed using multivariate models constructed with both depression and sexual behavior as outcome variables. Logistic regression models determined whether and to what extent the independent variables, including depression, sex behaviors, demographic factors, individual and family characteristics, and school context were related to the probability of engaging in risky sexual behaviors. Results. Ethnic differences in depressive symptoms did not persist after demographic and contextual variables were included in the model. Sex behaviors all shared the hypothesized relationship with depressive symptoms. The odds of risky sex behaviors increased as number of depressive symptoms increased. Depression was predicted by marijuana use and having a serious argument with father for males at Wave 1 and by age and future orientation for females. Wave 2 depression was predicted by Wave 1 depression. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The negative outcomes from alcohol misuse have been chronicled for decades in epidemiological studies. Recent research has focused on patterns of drinking. Binge and heavy drinking have been associated with multiple negative outcomes, to include surrogate outcomes designed to measure decrements to military readiness. This study is perhaps the first to examine whether binge or heavy drinking patterns are associated with the U.S. military’s overall inability to deploy rate or the individual reasons unable to deploy. ^ The prevalence of binge and heavy drinking and the inability to deploy rates were assessed from responses to the 2005 Department of Defense Survey of Health Related Behaviors Among Military Personnel. A secondary analysis of extant data resulted in a final sample size of 13,619 respondents who represented 847,253 active-duty military personnel. Multivariate models were fitted to examine the association between patterns of drinking and individual reasons for the inability to deploy. ^ Logistic regression showed no association of binge or heavy drinking to greater inability to deploy. Interestingly, individual reasons for the inability to deploy did show an association to include: Training, Dental Issue, No HIV Test, and Family Situation. There was no association noted for the individual reasons: Injury, Illness, Leave/Temporary Duty, or Other. Binge and heavy drinkers appear to be more susceptible to the psychosocial determinants than physical determinants as reasons for the inability to deploy. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The sensitivity of Interferon-γ release assays for detection of Mycobacterium tuberculosis (MTB) infection or disease is affected by conditions that depress host immunity (such as HIV). It is critical to determine whether these assays are affected by diabetes and related conditions (i.e. hyperglycemia, chronic hyperglycemia, or being overweight/obese) given that immune impairment is thought to underline susceptibility to tuberculosis (TB) in people with diabetes. This is important for tuberculosis control due to the millions of type 2 diabetes patients at risk for tuberculosis worldwide.^ The objective of this study was to identify host characteristics, including diabetes, that may affect the sensitivity of two commercially available Interferon-γ (IFN-γ) release assays (IGRA), the QuantiFERON®-TB Gold (QFT-G) and the T-SPOT®.TB in active TB patients. We further explored whether IFN-γ secretion in response to MTB antigens (ESAT-6 and CFP-10) is associated with diabetes and its defining characteristics (high blood glucose, high HbA1c, high BMI). To achieve these objectives, the sensitivity of QFT-G and T-SPOT. TB assays were evaluated in newly diagnosed, tuberculosis confirmed (by positive smear for acid fast bacilli and/or positive culture for MTB) adults enrolled at Texas and Mexico study sites between March 2006 and April 2009. Univariate and multivariate models were constructed to identify host characteristics associated with IGRA result and level of IFN-γ secretion.^ QFT-G was positive in 68% of tuberculosis patients. Those with diabetes, chronic hyperglycemia or obesity were more likely to have a positive QFT-G result, and to secrete higher levels of IFN-γ in response to the mycobacterial antigens (p<0.05). Previous history of BCG vaccination was the only other host characteristic associated with QFT-G result, whereby a higher proportion of non-BCG vaccinated persons were QFT-G positive, in comparison to vaccinated persons. In a separate group of patients, the T-SPOT.TB was 94% sensitive, with similar performance in all tuberculosis patients, regardless of host characteristics.^ In summary, we have demonstrated the validity of QFT-G and T-SPOT. TB to support the diagnosis of TB in patients with a range of host characteristics, but most notably in patients with diabetes. We also confirmed that TB patients with diabetes and associated characteristics (chronic hyperglycemia or BMI) secreted higher titers of IFN-γ when stimulated with MTB specific antigens, in comparison to patients without these characteristics. Together, these findings suggest that the mechanism by which diabetes increases risk to TB may not be explained by the inability to secrete IFN-γ, a key cytokine for TB control.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A multivariate frailty hazard model is developed for joint-modeling of three correlated time-to-event outcomes: (1) local recurrence, (2) distant recurrence, and (3) overall survival. The term frailty is introduced to model population heterogeneity. The dependence is modeled by conditioning on a shared frailty that is included in the three hazard functions. Independent variables can be included in the model as covariates. The Markov chain Monte Carlo methods are used to estimate the posterior distributions of model parameters. The algorithm used in present application is the hybrid Metropolis-Hastings algorithm, which simultaneously updates all parameters with evaluations of gradient of log posterior density. The performance of this approach is examined based on simulation studies using Exponential and Weibull distributions. We apply the proposed methods to a study of patients with soft tissue sarcoma, which motivated this research. Our results indicate that patients with chemotherapy had better overall survival with hazard ratio of 0.242 (95% CI: 0.094 - 0.564) and lower risk of distant recurrence with hazard ratio of 0.636 (95% CI: 0.487 - 0.860), but not significantly better in local recurrence with hazard ratio of 0.799 (95% CI: 0.575 - 1.054). The advantages and limitations of the proposed models, and future research directions are discussed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^