28 resultados para exploratory data analysis
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
This study aimed to develop and validate The Cancer Family Impact Scale (CFIS), an instrument for use in studies investigating relationships among family factors and colorectal cancer (CRC) screening when family history is a risk factor. We used existing data to develop the measure from 1,285 participants (637 families) across the United States who were in the Johns Hopkins Colon Cancer Genetic Testing study. Participants were 94% white with an average age of 50.1 years, and 60% were women. None had a personal CRC history, and eighty percent had 1 FDR with CRC and 20% had more than one FDR with CRC. The study had three aims: (1) to identify the latent factors underlying the CFIS via exploratory factor analysis (EFA); (2) to confirm the findings of the EFA via confirmatory factor analysis (CFA); and (3) to assess the reliability of the scale via Cronbach's alpha. Exploratory analyses were performed on a split half of the sample, and the final model was confirmed on the other half. The EFA suggested the CFIS was an 18-item measure with 5 latent constructs: (1) NEGATIVE: negative effects of cancer on the family; (2) POSITIVE: positive effects of cancer on the family; (3) COMMUNICATE: how families communicate about cancer; (4) FLOW: how information about cancer is conveyed in families; and (5) NORM: how individuals react to family norms about cancer. CFA on the holdout sample showed the CFIS to have a reasonably good fit (Chi-square = 389.977, df = 122, RMSEA= 0.058 (.052-.065), CFI=.902, TLI=.877, GF1=.939). The overall reliability of the scale was α=0.65. The reliability of the subscales was: (1) NEGATIVE α = 0.682; (2) POSITIVE α = 0.686; (3) COMMUNICATE α = 0.723; (4) FLOW α = 0.467; and (5) NORM α = 0.732. ^ We concluded the CFIS to be a good measure with most fit levels over 0.90. The CFIS could be used to compare theoretically driven hypotheses about the pathways through which family factors could influence health behavior among unaffected individuals at risk due to family history, and also aid in the development and evaluation of cancer prevention interventions including a family component. ^
Resumo:
Introduction. The HIV/AIDS disease burden disproportionately affects minority populations, specifically African Americans. While sexual risk behaviors play a role in the observed HIV burden, other factors including gender, age, socioeconomics, and barriers to healthcare access may also be contributory. The goal of this study was to determine how far down the HIV/AIDS disease process people of different ethnicities first present for healthcare. The study specifically analyzed the differences in CD4 cell counts at the initial HIV-1 diagnosis with respect to ethnicity. The study also analyzed racial differences in HIV/AIDS risk factors. ^ Methods. This is a retrospective study using data from the Adult Spectrum of HIV Disease (ASD), collected by the City of Houston Department of Health. The ASD database contains information on newly reported HIV cases in the Harris County District Hospitals between 1989 and 2000. Each patient had an initial and a follow-up report. The extracted variables of interest from the ASD data set were CD4 counts at the initial HIV diagnosis, race, gender, age at HIV diagnosis and behavioral risk factors. One-way ANOVA was used to examine differences in baseline CD4 counts at HIV diagnosis between racial/ethnic groups. Chi square was used to analyze racial differences in risk factors. ^ Results. The analyzed study sample was 4767. The study population was 47% Black, 37% White and 16% Hispanic [p<0.05]. The mean and median CD4 counts at diagnosis were 254 and 193 cells per ml, respectively. At the initial HIV diagnosis Blacks had the highest average CD4 counts (285), followed by Whites (233) and Hispanics (212) [p<0.001 ]. These statistical differences, however, were only observed with CD4 counts above 350 [p<0.001], even when adjusted for age at diagnosis and gender [p<0.05]. Looking at risk factors, Blacks were mostly affected by intravenous drug use (IVDU) and heterosexuality, whereas Whites and Hispanics were more affected by male homosexuality [ p<0.05]. ^ Conclusion. (1) There were statistical differences in CD4 counts with respect to ethnicity, but these differences only existed for CD4 counts above 350. These differences however do not appear to have clinical significance. Antithetically, Blacks had the highest CD4 counts followed by Whites and Hispanics. (2) 50% of this study group clinically had AIDS at their initial HIV diagnosis (median=193), irrespective of ethnicity. It was not clear from data analysis if these observations were due to failure of early HIV surveillance, HIV testing policies or healthcare access. More studies need to be done to address this question. (3) Homosexuality and bisexuality were the biggest risk factors for Whites and Hispanics, whereas for Blacks were mostly affected by heterosexuality and IVDU, implying a need for different public health intervention strategies for these racial groups. ^
Resumo:
National data show that Hispanics report low levels of physical activity. Limited information on barriers to exercise in this population exists in the literature. Surveys were administered to 398 Hispanic participants from two colonias in South Texas to investigate self-reported levels of and perceived barriers to exercise. Results show that 67.6% of respondents did not meet physical activity recommendations of at least 150 minutes per week, as compared to 55.6% nationally. Overall, the most frequently reported barriers included “lack of time”, “very tired” and “lack of self-discipline” to exercise. An exploratory factor analysis of the barriers reported by participants not meeting physical activity recommendations resulted in a three-factor structure. A unidimensional scale was found for participants meeting recommendations. Findings suggest that future interventions should be specific to gender and exercise level to address the high prevalence of inactivity in this population. ^
Resumo:
As schools are pressured to perform on academics and standardized examinations, schools are reluctant to dedicate increased time to physical activity. After-school exercise and health programs may provide an opportunity to engage in more physical activity without taking time away from coursework during the day. The current study is a secondary data analysis of data from a randomized trial of a 10-week after-school program (six schools, n = 903) that implemented an exercise component based on the CATCH physical activity component and health modules based on the culturally-tailored Bienestar health education program. Outcome variables included BMI and aerobic capacity, health knowledge and healthy food intentions as assessed through path analysis techniques. Both the baseline model (χ2 (df = 8) = 16.90, p = .031; RMSEA = .035 (90% CI of .010–.058), NNFI = 0.983 and the CFI = 0.995) and the model incorporating intervention participation proved to be a good fit to the data (χ2 (df = 10) = 11.59, p = .314. RMSEA = .013 (90% CI of .010–.039); NNFI = 0.996 and CFI = 0.999). Experimental group participation was not predictive of changes in health knowledge, intentions to eat healthy foods or changes in Body Mass Index, but it was associated with increased aerobic capacity, β = .067, p < .05. School characteristics including SES and Language proficiency proved to be significantly associated with changes in knowledge and physical indicators. Further effects of school level variables on intervention outcomes are recommended so that tailored interventions can be developed aimed at the specific characteristics of each participating school. ^
Resumo:
Introduction. Cancer is the second most common cause of death in the USA (2). Studies have shown a coexistence of cancer and hypogonadism (9,31,13). The majority of patients with cancer develop cachexia, which cannot be solely explained by anorexia seen in these patients. Testosterone is a male sex hormone which is known to increase muscle mass and strength, maintain cancellous bone mass, and increase cortical bone mass, in addition to improving libido, sexual desire, and fantasy (14). If a high prevalence of hypogonadism is detected in male cancer patients, and a significant difference exists in testosterone levels in cancer patients with cachexia versus those without cachexia, testosterone may be administered in future randomized trials to help alleviate cachexia. Study group and design The study group consisted of male cancer patients and non-cancer controls aged between 40 and 70 years. The primary study design was cross-sectional with a sample size of 135. The present data analysis is done on a subset convenience sample of 72 patients recruited between November 2006 and January 2010. ^ Methods. Patients aged 40-70 years with or without a diagnosis of cancer were recruited into the study. All patients with a BMI over 35, significant edema, non-melanomatous skin cancer, current alcohol or illicit drug abuse, concomitant usage of medications interfering with gonadal axis, and anabolic agents, patients on tube feeds or parenteral nutrition within 3 months prior to enrollment were excluded from the study. The study was approved by the Institutional Review Board of Baylor College of Medicine and is being conducted at the Michael E. DeBakey Veterans Affairs Medical Center at Houston. My thesis is a pilot data analysis that employs a smaller subset convenience sample of 72 patients determined by using the data available for the 72 patients (of the intended sample of 135 patients) recruited between November 2006 and January 2010. The primary aim of this analysis is to compare the proportion of patients with hypogonadism in the male cancer and non-cancer control groups, and to evaluate if a significant difference exists with respect to testosterone levels in male cancer patients with cachexia versus those without cachexia. The procedures of the study relevant to the current data analysis included blood collection to measure levels of testosterone and measurement of body weight to categorize cancer patients into cancer cachexia and cancer non-cachexia sub-groups. ^ Results. After logarithmic transformation of data of cancer and control groups, the unpaired t test with unequal variances was done. The proportion of patients with hypogonadism in the male cancer and non-cancer control groups was 47.5% and 22.7% with a Pearson chi2 statistic of 1.6036 and a p value of 0.205. Comparing the mean calculated Bioavailable testosterone in male cancer patients and non-cancer controls resulted in a t statistic of 21.83 and a p value less than 0.001. When the cancer group alone was taken, the mean free testosterone, calculated bioavailable testosterone and total testosterone levels in the cancer non-cachexia sub-group were 3.93, 5.09, 103.51 respectively and in the cancer cachexia sub-group were 3.58, 4.17, 84.08 respectively. The unpaired t test with equal variances showed that the two sub-groups had p values of 0.2015, 0.1842, and 0.4894 with respect to calculated bioavailable testosterone, free testosterone, and total testosterone respectively. ^ Conclusions. The small sample size of this exploratory study, resulting in a small power, does not allow us to draw definitive conclusions. For the given sub-sample, the proportion of patients with hypogonadism in the cancer group was not significantly different from that of patients with hypogonadism in the control group. Inferences on prevalence of hypogonadism in male cancer patients could not be made in this paper as the sub-sample is small and therefore not representative of the general population. However, there was a statistically significant difference in calculated Bioavailable testosterone levels in male cancer patients versus non-cancer controls. Analysis of cachectic and non-cachectic patients within the male cancer group showed no significant difference in testosterone levels (total, free, and calculated bioavailable testosterone) between both sub-groups. However, to re-iterate, this study is exploratory and the results may change once the complete dataset is obtained and analyzed. It however serves as a good template to guide further research and analysis.^
Resumo:
Helicobacter pylori infection is frequently acquired during childhood. This microorganism is known to cause gastritis, and duodenal ulcer in pediatric patients, however most children remain completely asymptomatic to the infection. Currently there is no consensus in favor of treatment of H. pylori infection in asymptomatic children. The firstline of treatment for this population is triple medication therapy including two antibacterial agents and one proton pump inhibitor for a 2 week duration course. Decreased eradication rate of less than 75% has been documented with the use of this first-line therapy but novel tinidazole-containing quadruple sequential therapies seem worth investigating. None of the previous studies on such therapy has been done in the United States of America. As part of an iron deficiency anemia study in asymptomatic H. pylori infected children of El Paso, Texas, we conducted a secondary data analysis of study data collected in this trial to assess the effectiveness of this tinidazole-containing sequential quadruple therapy compared to placebo on clearing the infection. Subjects were selected from a group of asymptomatic children identified through household visits to 11,365 randomly selected dwelling units. After obtaining parental consent and child assent a total of 1,821 children 3-10 years of age were screened and 235 were positive to a novel urine immunoglobulin class G antibodies test for H. pylori infection and confirmed as infected using a 13C urea breath test, using a hydrolysis urea rate >10 μg/min as cut-off value. Out of those, 119 study subjects had a complete physical exam and baseline blood work and were randomly allocated to four groups, two of which received active H. pylori eradication medication alone or in combination with iron, while the other two received iron only or placebo only. Follow up visits to their houses were done to assess compliance and occurrence of adverse events and at 45+ days post-treatment, a second urea breath test was performed to assess their infection status. The effectiveness was primarily assessed on intent to treat basis (i.e., according to their treatment allocation), and the proportion of those who cleared their infection using a cut-off value >10 μg/min of for urea hydrolysis rate, was the primary outcome. Also we conducted analysis on a per-protocol basis and according to the cytotoxin associated gene A product of the H. pylori infection status. Also we compared the rate of adverse events across the two arms. On intent-to-treat and per-protocol analyses, 44.3% and 52.9%, respectively, of the children receiving the novel quadruple sequential eradication cleared their infection compared to 12.2% and 15.4% in the arms receiving iron or placebo only, respectively. Such differences were statistically significant (p<0.001). The study medications were well accepted and safe. In conclusion, we found in this study population, of mostly asymptomatically H. pylori infected children, living in the US along the border with Mexico, that the quadruple sequential eradication therapy cleared the infection in only half of the children receiving this treatment. Research is needed to assess the antimicrobial susceptibility of the strains of H. pylori infecting this population to formulate more effective therapies. ^
Resumo:
Objective. The goal of this study is to characterize the current workforce of CIHs, the lengths of professional practice careers of the past and current CIHs.^ Methods. This is a secondary data analysis of data compiled from all of the nearly 50 annual roster listings of the American Board of Industrial Hygiene (ABIH) for Certified Industrial Hygienists active in each year since 1960. Survival analysis was performed as a technique to measure the primary outcome of interest. The technique which was involved in this study was the Kaplan-Meier method for estimating the survival function.^ Study subjects: The population to be studied is all Certified Industrial Hygienists (CIHs). A CIH is defined by the ABIH as an individual who has achieved the minimum requirements for education, working experience and through examination, has demonstrated a minimum level of knowledge and competency in the prevention of occupational illnesses. ^ Results. A Cox-proportional hazards model analysis was performed by different start-time cohorts of CIHs. In this model we chose cohort 1 as the reference cohort. The estimated relative risk of the event (defined as retirement, or absent from 5 consecutive years of listing) occurred for CIHs for cohorts 2,3,4,5 relative to cohort 1 is 0.385, 0.214, 0.234, 0.299 relatively. The result show that cohort 2 (CIHs issued from 1970-1980) has the lowest hazard ratio which indicates the lowest retirement rate.^ Conclusion. The manpower of CIHs (still actively practicing up to the end of 2009) increased tremendously starting in 1980 and grew into a plateau in recent decades. This indicates that the supply and demand of the profession may have reached equilibrium. More demographic information and variables are needed to actually predict the future number of CIHs needed. ^
Resumo:
The purpose of this study is to descriptively analyze the current program at Ben Taub Pediatric Weight Management Program in Houston, Texas, a program designed to help overweight children ages three to eighteen to lose weight. In Texas, approximately one in every three children is overweight or obese. Obesity is seen at an even greater level within Ben Taub due to the hospital's high rate of service for underserved minority populations (Dehghan et al, 2005; Tyler and Horner, 2008; Hunt, 2009). The weight management program consists of nutritional, behavioral, physical activity, and medical counseling. Analysis will focus on changes in weight, BMI, cholesterol levels, and blood pressure from 2007–2010 for all participants who attended at least two weight management sessions. Recommendations will be given in response to the results of the data analysis.^
Resumo:
Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^
Resumo:
Background: Poor communication among health care providers is cited as the most common cause of sentinel events involving patients. Sign-out of patient data at the change of clinician shifts is a component of communication that is especially vulnerable to errors. Sign-outs are particularly extensive and complex in intensive care units (ICUs). There is a paucity of validated tools to assess ICU sign-outs. ^ Objective: To design a valid and reliable survey tool to assess the perceptions of Pediatric ICU (PICU) clinicians about sign-out. ^ Design: Cross-sectional, web-based survey ^ Setting: Academic hospital, 31-bed PICU ^ Subjects: Attending faculty, fellows, nurse practitioners and physician assistants. ^ Interventions: A survey was designed with input from a focus group and administered to PICU clinicians. Test-retest reliability, internal consistency and validity of the survey tool were assessed. ^ Measurements and Main Results: Forty-eight PICU clinicians agreed to participate. We had 42(88%) and 40(83%) responses in the test and retest phases. The mean scores for the ten survey items ranged from 2.79 to 3.67 on a five point Likert scale with no significant test-retest difference and a Pearson correlation between pre and post answers of 0.65. The survey item scores showed internal consistency with a Cronbach's Alpha of 0.85. Exploratory factor analysis revealed three constructs: efficacy of sign-out process, recipient satisfaction and content applicability. Seventy eight % clinicians affirmed the need for improvement of the sign-out process and 83% confirmed the need for face- to-face verbal sign-out. A system-based sign-out format was favored by fellows and advanced level practitioners while attendings preferred a problem-based format (p=0.003). ^ Conclusions: We developed a valid and reliable survey to assess clinician perceptions about the ICU sign-out process. These results can be used to design a verbal template to improve and standardize the sign-out process.^
Resumo:
Self-management is being promoted in cystic fibrosis (CF). However, it has not been well studied. Principal aims of this research were (1) to evaluate psychometric properties of a CF disease status measure, the NIH Clinical Score; (2) to develop and validate a measure of self-management behavior, the SMQ-CF scale, and (3) to examine the relation between self-management and disease status in CF patients over two years.^ In study 1, NIH Clinical Scores for 200 patients were used. The scale was examined for internal consistency, interrater reliability, and content validity using factor analysis. The Cronbach's alpha (.81) and interrater reliability (.90) for the total scale were high. General scale items were less reliable. Factor analysis indicated that most of the variance in disease status is accounted for by Factor 1 which consists of pulmonary disease items.^ The SMQ-CF measures the performance of CF self-management. Pilot testing was done with 98 CF primary caregivers. Internal consistency reliability, social desirability bias, and content validity using factor analysis were examined. Internal consistency was good (alpha =.95). Social desirability correlation was low (r =.095). Twelve factors identified were consistent with conceptual groupings of behaviors. Around two hundred caregivers from two CF centers were surveyed and multivariate analysis of variance was used to assess construct validity. Results confirmed expected relations between self-management, patient age, and disease status. Patient age accounted for 50% and disease status 18% of the variance in the SMQ-CF scale.^ It was hypothesized that self-management would positively affect future disease status. Data from 199 CF patients (control and education intervention groups) were examined. Models of hypothesized relations were tested using LISREL structural equation modeling. Results indicated that the relations between baseline self-management and Time 1 disease status were not significant. Significant relations were observed in self-management behaviors from time 1 to time 2 and patterns of significant relations differed between the two groups.^ This research has contributed to refinements in the ability to measure self-management behavior and disease status outcomes in cystic fibrosis. In addition, it provides the first steps in exploratory behavioral analysis with regard to self-management in this disease. ^