Many studies in biostatistics deal with binary data. Some of these studies involve correlated observations, which can complicate the analysis of the resulting data. Studies of this kind typically arise when a high degree of commonality exists between test subjects. If there exists a natural hierarchy in the data, multilevel analysis is an appropriate tool for the analysis. Two examples are the measurements on identical twins, or the study of symmetrical organs or appendages such as in the case of ophthalmic studies. Although this type of matching appears ideal for the purposes of comparison, analysis of the resulting data while ignoring the effect of intra-cluster correlation has been shown to produce biased results.^ This paper will explore the use of multilevel modeling of simulated binary data with predetermined levels of correlation. Data will be generated using the Beta-Binomial method with varying degrees of correlation between the lower level observations. The data will be analyzed using the multilevel software package MlwiN (Woodhouse, et al, 1995). Comparisons between the specified intra-cluster correlation of these data and the estimated correlations, using multilevel analysis, will be used to examine the accuracy of this technique in analyzing this type of data. ^


This study applies the multilevel analysis technique to longitudinal data of a large clinical trial. The technique accounts for the correlation at different levels when modeling repeated blood pressure measurements taken throughout the trial. This modeling allows for closer inspection of the remaining correlation and non-homogeneity of variance in the data. Three methods of modeling the correlation were compared. ^


Hierarchically clustered populations are often encountered in public health research, but the traditional methods used in analyzing this type of data are not always adequate. In the case of survival time data, more appropriate methods have only begun to surface in the last couple of decades. Such methods include multilevel statistical techniques which, although more complicated to implement than traditional methods, are more appropriate. ^ One population that is known to exhibit a hierarchical structure is that of patients who utilize the health care system of the Department of Veterans Affairs where patients are grouped not only by hospital, but also by geographic network (VISN). This project analyzes survival time data sets housed at the Houston Veterans Affairs Medical Center Research Department using two different Cox Proportional Hazards regression models, a traditional model and a multilevel model. VISNs that exhibit significantly higher or lower survival rates than the rest are identified separately for each model. ^ In this particular case, although there are differences in the results of the two models, it is not enough to warrant using the more complex multilevel technique. This is shown by the small estimates of variance associated with levels two and three in the multilevel Cox analysis. Much of the differences that are exhibited in identification of VISNs with high or low survival rates is attributable to computer hardware difficulties rather than to any significant improvements in the model. ^


Background. The gap between actual and ideal rates of routine cancer screening in the U.S., particularly for colorectal cancer screening (CRCS) (1;2), is responsible for an unnecessary burden of morbidity and mortality, particularly for disadvantaged groups. Knowledge about the effects of individual and area influences is being advanced by a growing body of research that has examined the association of area socioeconomic status (SES) and cancer screening after controlling for individual SES. The findings from this emerging and heterogeneous research in the cancer screening literature have been mixed. Moreover, multilevel studies in this area have not yet adequately explored the possibility of differential associations by population subgroup, despite some evidence suggesting gender-specific effects. ^ Objectives and methods. This dissertation reports on a systematic review of studies on the association of area SES and cancer screening and a multilevel study of the association between area SES and CRCS. The specific aims of the systematic review are to: (1) describe the study designs, constructs, methods, and measures; (2) describe the association of area SES and cancer screening; and (3) identify neglected areas of research. ^ The empiric study linked a pooled sample of respondents aged ≥50 years without a personal history of colorectal cancer from the 2003 and 2005 California Health Interview Surveys with a comprehensive set of census-tract level area SES measures from the 2000 U.S. Census. Two-level random intercept models were used to test 2 hypotheses: (1) area SES will be associated with adherence to two modalities of CRCS after controlling for individual SES; and (2) gender will moderate the relationship between area socioeconomic status and adherence to both modalities of CRCS. ^ Results. The systematic review identified 19 eligible studies that demonstrated variability in study designs, methods, constructs, and measures. The majority of tested associations were either not statistically significant or significant and in the positive direction, indicating that as area SES increased, the odds of CRCS increased. The multilevel study demonstrated that while multiple aspects of area SES were associated with CRCS after controlling for individual SES, associations differed by screening modality and in the case of endoscopy, they also differed by gender. ^ Conclusions. Conceptual and methodologic heterogeneity and weaknesses in the literature to date limit definitive conclusions about the underlying relationships between area SES and cancer screening. The multilevel study provided partial support for both hypotheses. Future research should continue to explore the role of gender as a moderating influence with the aim of identifying the mechanisms linking area SES and cancer prevention behaviors. ^


Systemic sclerosis (SSc) or Scleroderma is a complex disease and its etiopathogenesis remains unelucidated. Fibrosis in multiple organs is a key feature of SSc and studies have shown that transforming growth factor-β (TGF-β) pathway has a crucial role in fibrotic responses. For a complex disease such as SSc, expression quantitative trait loci (eQTL) analysis is a powerful tool for identifying genetic variations that affect expression of genes involved in this disease. In this study, a multilevel model is described to perform a multivariate eQTL for identifying genetic variation (SNPs) specifically associated with the expression of three members of TGF-β pathway, CTGF, SPARC and COL3A1. The uniqueness of this model is that all three genes were included in one model, rather than one gene being examined at a time. A protein might contribute to multiple pathways and this approach allows the identification of important genetic variations linked to multiple genes belonging to the same pathway. In this study, 29 SNPs were identified and 16 of them located in known genes. Exploring the roles of these genes in TGF-β regulation will help elucidate the etiology of SSc, which will in turn help to better manage this complex disease. ^


Health departments, research institutions, policy-makers, and healthcare providers are often interested in knowing the health status of their clients/constituents. Without the resources, financially or administratively, to go out into the community and conduct health assessments directly, these entities frequently rely on data from population-based surveys to supply the information they need. Unfortunately, these surveys are ill-equipped for the job due to sample size and privacy concerns. Small area estimation (SAE) techniques have excellent potential in such circumstances, but have been underutilized in public health due to lack of awareness and confidence in applying its methods. The goal of this research is to make model-based SAE accessible to a broad readership using clear, example-based learning. Specifically, we applied the principles of multilevel, unit-level SAE to describe the geographic distribution of HPV vaccine coverage among females aged 11-26 in Texas.^ Multilevel (3 level: individual, county, public health region) random-intercept logit models of HPV vaccination (receipt of ≥ 1 dose Gardasil® ) were fit to data from the 2008 Behavioral Risk Factor Surveillance System (outcome and level 1 covariates) and a number of secondary sources (group-level covariates). Sampling weights were scaled (level 1) or constructed (levels 2 & 3), and incorporated at every level. Using the regression coefficients (and standard errors) from the final models, I simulated 10,000 datasets for each regression coefficient from the normal distribution and applied them to the logit model to estimate HPV vaccine coverage in each county and respective demographic subgroup. For simplicity, I only provide coverage estimates (and 95% confidence intervals) for counties.^ County-level coverage among females aged 11-17 varied from 6.8-29.0%. For females aged 18-26, coverage varied from 1.9%-23.8%. Aggregated to the state level, these values translate to indirect state estimates of 15.5% and 11.4%, respectively; both of which fall within the confidence intervals for the direct estimates of HPV vaccine coverage in Texas (Females 11-17: 17.7%, 95% CI: 13.6, 21.9; Females 18-26: 12.0%, 95% CI: 6.2, 17.7).^ Small area estimation has great potential for informing policy, program development and evaluation, and the provision of health services. Harnessing the flexibility of multilevel, unit-level SAE to estimate HPV vaccine coverage among females aged 11-26 in Texas counties, I have provided (1) practical guidance on how to conceptualize and conduct modelbased SAE, (2) a robust framework that can be applied to other health outcomes or geographic levels of aggregation, and (3) HPV vaccine coverage data that may inform the development of health education programs, the provision of health services, the planning of additional research studies, and the creation of local health policies.^


This study proposed a novel statistical method that modeled the multiple outcomes and missing data process jointly using item response theory. This method follows the "intent-to-treat" principle in clinical trials and accounts for the correlation between outcomes and missing data process. This method may provide a good solution to chronic mental disorder study. ^ The simulation study demonstrated that if the true model is the proposed model with moderate or strong correlation, ignoring the within correlation may lead to overestimate of the treatment effect and result in more type I error than specified level. Even if the within correlation is small, the performance of proposed model is as good as naïve response model. Thus, the proposed model is robust for different correlation settings if the data is generated by the proposed model.^


Left ventricular mass (LVM) is a strong predictor of cardiovascular disease (CVD) in adults. However, normal growth of LVM in healthy children is not well understood, and previous results on independent effects of body size and body fatness on LVM have been inconsistent. The purpose of this study was (1) to establish the normal growth curve of LVM from age 8 to age 18, and evaluate the determinants of change in LVM with age, and (2) to assess the independent effects of body size and body fatness on LVM.^ In Project HeartBeat!, 678 healthy children aged 8, 11 and 14 years at baseline were enrolled and examined at 4-monthly intervals for up to 4 years. A synthetic cohort with continuous observations from age 8 to 18 years was constructed. A total of 4608 LVM measurements was made from M-mode echocardiography. The multilevel linear model was used for analysis.^ Sex-specific trajectories of normal growth of LVM from age 8 to 18 was displayed. On average, LVM was 15 g higher in males than females. Average LVM increased linearly in males from 78 g at age 8 to 145 g at age 18. For females, the trajectory was curvilinear, nearly constant after age 14. No significant racial differences were found. After adjustment for the effects of body size and body fatness, average LVM decreased slightly from age 8 to 18, and sex differences in changes of LVM remained constant.^ The impact of body size on LVM was examined by adding to a basic LVM-sex-age model one of 9 body size indicators. The impact of body fatness was tested by further introducing into each of the 9 LVM models (with one or another of the body size indicators) one of 4 body fatness indicators, yielding 36 models with different body size and body fatness combinations. The results indicated that effects of body size on LVM can be distinguished between fat-free body mass and fat body mass, both being independent, positive predictors. The former is the stronger determinant. When a non-fat-free body size indicator is used as predictor, the estimated residual effect of body fatness on LVM becomes negative. ^


In numerous intervention studies and education field trials, random assignment to treatment occurs in clusters rather than at the level of observation. This departure of random assignment of units may be due to logistics, political feasibility, or ecological validity. Data within the same cluster or grouping are often correlated. Application of traditional regression techniques, which assume independence between observations, to clustered data produce consistent parameter estimates. However such estimators are often inefficient as compared to methods which incorporate the clustered nature of the data into the estimation procedure (Neuhaus 1993).1 Multilevel models, also known as random effects or random components models, can be used to account for the clustering of data by estimating higher level, or group, as well as lower level, or individual variation. Designing a study, in which the unit of observation is nested within higher level groupings, requires the determination of sample sizes at each level. This study investigates the design and analysis of various sampling strategies for a 3-level repeated measures design on the parameter estimates when the outcome variable of interest follows a Poisson distribution. ^ Results study suggest that second order PQL estimation produces the least biased estimates in the 3-level multilevel Poisson model followed by first order PQL and then second and first order MQL. The MQL estimates of both fixed and random parameters are generally satisfactory when the level 2 and level 3 variation is less than 0.10. However, as the higher level error variance increases, the MQL estimates become increasingly biased. If convergence of the estimation algorithm is not obtained by PQL procedure and higher level error variance is large, the estimates may be significantly biased. In this case bias correction techniques such as bootstrapping should be considered as an alternative procedure. For larger sample sizes, those structures with 20 or more units sampled at levels with normally distributed random errors produced more stable estimates with less sampling variance than structures with an increased number of level 1 units. For small sample sizes, sampling fewer units at the level with Poisson variation produces less sampling variation, however this criterion is no longer important when sample sizes are large. ^ 1Neuhaus J (1993). “Estimation efficiency and Tests of Covariate Effects with Clustered Binary Data”. Biometrics , 49, 989–996^


Blood cholesterol and blood pressure development in childhood and adolescence have important impact on the future adult level of cholesterol and blood pressure, and on increased risk of cardiovascular diseases. The U.S. has higher mortality rates of coronary heart diseases than Japan. A longitudinal comparison in children of risk factor development in the two countries provides more understanding about the causes of cardiovascular disease and its prevention. Such comparisons have not been reported in the past. ^ In Project HeartBeat!, 506 non-Hispanic white, 136 black and 369 Japanese children participated in the study in the U.S. and Japan from 1991 to 1995. A synthetic cohort of ages 8 to 18 years was composed by three cohorts with starting ages at 8, 11, and 14. A multilevel regression model was used for data analysis. ^ The study revealed that the Japanese children had significantly higher slopes of mean total cholesterol (TC) and high density lipoprotein (HDL) cholesterol levels than the U.S. children after adjusting for age and sex. The mean TC level of Japanese children was not significantly different from white and black children. The mean HDL level of Japanese children was significantly higher than white and black children after adjusting for age and sex. The ratio of HDL/TC in Japanese children was significantly higher than in U.S. whites, but not significantly different from the black children. The Japanese group had significantly lower mean diastolic blood pressure phase IV (DBP4) and phase V (DBP5) than the two U.S. groups. The Japanese group also showed significantly higher slopes in systolic blood pressure, DBP5 and DBP4 during the study period than both U.S. groups. The differences were independent from height and body mass index. ^ The study provided the first longitudinal comparison of blood cholesterol and blood pressure between the U.S. and Japanese children and adolescents. It revealed the dynamic process of these factors in the three ethnic groups. ^


The use of group-randomized trials is particularly widespread in the evaluation of health care, educational, and screening strategies. Group-randomized trials represent a subset of a larger class of designs often labeled nested, hierarchical, or multilevel and are characterized by the randomization of intact social units or groups, rather than individuals. The application of random effects models to group-randomized trials requires the specification of fixed and random components of the model. The underlying assumption is usually that these random components are normally distributed. This research is intended to determine if the Type I error rate and power are affected when the assumption of normality for the random component representing the group effect is violated. ^ In this study, simulated data are used to examine the Type I error rate, power, bias and mean squared error of the estimates of the fixed effect and the observed intraclass correlation coefficient (ICC) when the random component representing the group effect possess distributions with non-normal characteristics, such as heavy tails or severe skewness. The simulated data are generated with various characteristics (e.g. number of schools per condition, number of students per school, and several within school ICCs) observed in most small, school-based, group-randomized trials. The analysis is carried out using SAS PROC MIXED, Version 6.12, with random effects specified in a random statement and restricted maximum likelihood (REML) estimation specified. The results from the non-normally distributed data are compared to the results obtained from the analysis of data with similar design characteristics but normally distributed random effects. ^ The results suggest that the violation of the normality assumption for the group component by a skewed or heavy-tailed distribution does not appear to influence the estimation of the fixed effect, Type I error, and power. Negative biases were detected when estimating the sample ICC and dramatically increased in magnitude as the true ICC increased. These biases were not as pronounced when the true ICC was within the range observed in most group-randomized trials (i.e. 0.00 to 0.05). The normally distributed group effect also resulted in bias ICC estimates when the true ICC was greater than 0.05. However, this may be a result of higher correlation within the data. ^


Coronary heart disease remains the leading cause of death in the United States and increased blood cholesterol level has been found to be a major risk factor with roots in childhood. Tracking of cholesterol, i.e., the tendency to maintain a particular cholesterol level relative to the rest of the population, and variability in blood lipid levels with increase in age have implications for cholesterol screening and assessment of lipid levels in children for possible prevention of further rise to prevent adulthood heart disease. In this study the pattern of change in plasma lipids, over time, and their tracking were investigated. Also, within-person variance and retest reliability defined as the square root of within-person variance for plasma total cholesterol, HDL-cholesterol, LDL-cholesterol, and triglycerides and their relation to age, sex and body mass index among participants from age 8 to 18 years were investigated. ^ In Project HeartBeat!, 678 healthy children aged 8, 11 and 14 years at baseline were enrolled and examined at 4-monthly intervals for up to 4 years. We examined the relationship between repeated observations by Pearson's correlations. Age- and sex-specific quintiles were calculated and the probability of participants to remain in the uppermost quintile of their respective distribution was evaluated with life table methods. Plasma total cholesterol, HDL-C and LDL-C at baseline were strongly and significantly correlated with measurements at subsequent visits across the sex and age groups. Plasma triglyceride at baseline was also significantly correlated with subsequent measurements but less strongly than was the case for other plasma lipids. The probability to remain in the upper quintile was also high (60 to 70%) for plasma total cholesterol, HDL-C and LDL-C. ^ We used a mixed longitudinal, or synthetic cohort design with continuous observations from age 8 to 18 years to estimate within person variance of plasma total cholesterol, HDL-C, LDL-C and triglycerides. A total of 5809 measurements were available for both cholesterol and triglycerides. A multilevel linear model was used. Within-person variance among repeated measures over up to four years of follow-up was estimated for total cholesterol, HDL-C, LDL-C and triglycerides separately. The relationship of within-person and inter-individual variance with age, sex, and body mass index was evaluated. Likelihood ratio tests were conducted by calculating the deviation of −2log (likelihood) within the basic model and alternative models. The square root of within-person variance provided the retest reliability (within person standard deviation) for plasma total cholesterol, HDL-C, LDL-C and triglycerides. We found 13.6 percent retest reliability for plasma cholesterol, 6.1 percent for HDL-cholesterol, 11.9 percent for LDL-cholesterol and 32.4 percent for triglycerides. Retest reliability of plasma lipids was significantly related with age and body mass index. It increased with increase in body mass index and age. These findings have implications for screening guidelines, as participants in the uppermost quintile tended to maintain their status in each of the age groups during a four-year follow-up. The magnitude of within-person variability of plasma lipids influences the ability to classify children into risk categories recommended by the National Cholesterol Education Program. ^


This dissertation was written in the format of three journal articles. Paper 1 examined the influence of change and fluctuation in body mass index (BMI) over an eleven-year period, on changes in serum lipid levels (total, HDL, and LDL cholesterol, triglyceride) in a population of Mexican Americans with type 2 diabetes. Linear regression models containing initial lipid value, BMI and age, BMI change (slope of BMI), and BMI fluctuation (root mean square error) were used to investigate associations of these variables with change in lipids over time. Increasing BMI over time was associated with gains in total and LDL cholesterol and triglyceride levels in women. Fluctuation of BMI was not associated with detrimental lipid profiles. These effects were independent of age and were not statistically significant in men. In Mexican-American women with type 2 diabetes, weight reduction is likely to result in more favorable levels of total and LDL cholesterol and triglyceride, without concern for possible detrimental effects of weight fluctuation. Weight reduction may not be as effective in men, but does not appear to be harmful either. ^ Paper 2 examined the associations of upper and total body fat with total cholesterol, HDL and LDL cholesterol, and triglyceride levels in the same population. Multilevel analysis was used to predict serum lipid levels from total body fat (BMI and triceps skinfold) and upper body fat (subscapular skinfold), while controlling for the effects of sex, age and self-correlations across time. Body fat was not strikingly associated with trends in serum lipid levels. However, upper body fat was strongly associated with triglyceride levels. This suggests that loss of upper body fat may be more important than weight loss in management of the hypertriglyceridemia commonly seen in type 2 diabetes. ^ Paper 3 was a review of the literature reporting associations between weight fluctuation and lipid levels. Few studies have reported associations between weight fluctuation and total, LDL, and HDL cholesterol and triglyceride levels. The body of evidence to date suggests that weight fluctuation does not strongly influence levels of total, LDL and HDL cholesterol and triglyceride. ^


Objective. The study reviewed one year of Texas hospital discharge data and Trauma Registry data for the 22 trauma services regions in Texas to identify regional variations in capacity, process of care and clinical outcomes for trauma patients, and analyze the statistical associations among capacity, process of care, and outcomes. ^ Methods. Cross sectional study design covering one year of state-wide Texas data. Indicators of trauma capacity, trauma care processes, and clinical outcomes were defined and data were collected on each indicator. Descriptive analyses were conducted of regional variations in trauma capacity, process of care, and clinical outcomes at all trauma centers, at Level I and II trauma centers and at Level III and IV trauma centers. Multilevel regression models were performed to test the relations among trauma capacity, process of care, and outcome measures at all trauma centers, at Level I and II trauma centers and at Level III and IV trauma centers while controlling for confounders such as age, gender, race/ethnicity, injury severity, level of trauma centers and urbanization. ^ Results. Significant regional variation was found among the 22 trauma services regions across Texas in trauma capacity, process of care, and clinical outcomes. The regional trauma bed rate, the average staffed bed per 100,000 varied significantly by trauma service region. Pre-hospital trauma care processes were significantly variable by region---EMS time, transfer time, and triage. Clinical outcomes including mortality, hospital and intensive care unit length of stay, and hospital charges also varied significantly by region. In multilevel regression analysis, the average trauma bed rate was significantly related to trauma care processes including ambulance delivery time, transfer time, and triage after controlling for age, gender, race/ethnicity, injury severity, level of trauma centers, and urbanization at all trauma centers. Transfer time only among processes of care was significant with the average trauma bed rate by region at Level III and IV. Also trauma mortality only among outcomes measures was significantly associated with the average trauma bed rate by region at all trauma centers. Hospital charges only among outcomes measures were statistically related to trauma bed rate at Level I and II trauma centers. The effect of confounders on processes and outcomes such as age, gender, race/ethnicity, injury severity, and urbanization was found significantly variable by level of trauma centers. ^ Conclusions. Regional variation in trauma capacity, process, and outcomes in Texas was extensive. Trauma capacity, age, gender, race/ethnicity, injury severity, level of trauma centers and urbanization were significantly associated with trauma process and clinical outcomes depending on level of trauma centers. ^ Key words: regionalized trauma systems, trauma capacity, pre-hospital trauma care, process, trauma outcomes, trauma performance, evaluation measures, regional variations ^


Background. In Dr. Mel Greaves "delayed-infection hypothesis," postponed exposure to common infections increases the likelihood of childhood cancer. Hygienic advancements in developed countries have reduced children's exposure to pathogens and children encounter common infectious agents at an older age with an immune system unable to deal with the foreign antigens. Vaccinations may be considered to be simulated infections as they prompt an antigenic response by the immune system. Vaccinations may regulate the risk of childhood cancer by modulating the immune system. The aim of the study was to determine if children born in Texas counties with higher levels of vaccination coverage were at a reduced risk for childhood cancer.^ Methods. We conducted a case-control study to examine the risk of childhood cancers, specifically leukemia, brain tumors, and non-Hodgkin lymphoma, in relation to vaccination rates in Texas counties. We utilized a multilevel mixed-effects regression model of the individual data from the Texas Cancer Registry (TCR) with group-level exposure data (i.e., the county- and public health region-level vaccination rates).^ Results. Utilizing county-level vaccination rates and controlling for child's sex, birth year, ethnicity, birth weight, and mother's age at child's birth the hepatitis B vaccine revealed negative associations with developing all cancer types (OR = 0.81, 95% CI: 0.67–0.98) and acute lymphoblastic leukemia (ALL) (OR = 0.63, 95% CI: 0.46–0.88). The decreased risk for ALL was also evident for the inactivated polio vaccine (IPV) (OR = 0.67, 95% CI: 0.49–0.92) and 4-3-1-3-3 vaccination series (OR = 0.62, 95% CI: 0.44-0.87). Using public health region vaccine coverage levels, an inverse association between the Haemophilus influenzae type b (Hib) vaccine and ALL (OR: 0.58; 95% CI: 0.42–0.82) was present. Conversely, the measles, mumps, and rubella (MMR) vaccine resulted in a positive association with developing non-Hodgkin lymphoma (OR = 2.81, 95% CI: 1.27–6.22). ^