7 resultados para longitudinal Poisson data
em Duke University
Resumo:
Empirical studies of education programs and systems, by nature, rely upon use of student outcomes that are measurable. Often, these come in the form of test scores. However, in light of growing evidence about the long-run importance of other student skills and behaviors, the time has come for a broader approach to evaluating education. This dissertation undertakes experimental, quasi-experimental, and descriptive analyses to examine social, behavioral, and health-related mechanisms of the educational process. My overarching research question is simply, which inside- and outside-the-classroom features of schools and educational interventions are most beneficial to students in the long term? Furthermore, how can we apply this evidence toward informing policy that could effectively reduce stark social, educational, and economic inequalities?
The first study of three assesses mechanisms by which the Fast Track project, a randomized intervention in the early 1990s for high-risk children in four communities (Durham, NC; Nashville, TN; rural PA; and Seattle, WA), reduced delinquency, arrests, and health and mental health service utilization in adolescence through young adulthood (ages 12-20). A decomposition of treatment effects indicates that about a third of Fast Track’s impact on later crime outcomes can be accounted for by improvements in social and self-regulation skills during childhood (ages 6-11), such as prosocial behavior, emotion regulation and problem solving. These skills proved less valuable for the prevention of mental and physical health problems.
The second study contributes new evidence on how non-instructional investments – such as increased spending on school social workers, guidance counselors, and health services – affect multiple aspects of student performance and well-being. Merging several administrative data sources spanning the 1996-2013 school years in North Carolina, I use an instrumental variables approach to estimate the extent to which local expenditure shifts affect students’ academic and behavioral outcomes. My findings indicate that exogenous increases in spending on non-instructional services not only reduce student absenteeism and disciplinary problems (important predictors of long-term outcomes) but also significantly raise student achievement, in similar magnitude to corresponding increases in instructional spending. Furthermore, subgroup analyses suggest that investments in student support personnel such as social workers, health services, and guidance counselors, in schools with concentrated low-income student populations could go a long way toward closing socioeconomic achievement gaps.
The third study examines individual pathways that lead to high school graduation or dropout. It employs a variety of machine learning techniques, including decision trees, random forests with bagging and boosting, and support vector machines, to predict student dropout using longitudinal administrative data from North Carolina. I consider a large set of predictor measures from grades three through eight including academic achievement, behavioral indicators, and background characteristics. My findings indicate that the most important predictors include eighth grade absences, math scores, and age-for-grade as well as early reading scores. Support vector classification (with a high cost parameter and low gamma parameter) predicts high school dropout with the highest overall validity in the testing dataset at 90.1 percent followed by decision trees with boosting and interaction terms at 89.5 percent.
Resumo:
DNA methylation is a key epigenetic mechanism involved in the developmental regulation of gene expression. Alterations in DNA methylation are established contributors to inter-individual phenotypic variation and have been associated with disease susceptibility. The degree to which changes in loci-specific DNA methylation are under the influence of heritable and environmental factors is largely unknown. In this study, we quantitatively measured DNA methylation across the promoter regions of the dopamine receptor 4 gene (DRD4), the serotonin transporter gene (SLC6A4/SERT) and the X-linked monoamine oxidase A gene (MAOA) using DNA sampled at both ages 5 and 10 years in 46 MZ twin-pairs and 45 DZ twin-pairs (total n=182). Our data suggest that DNA methylation differences are apparent already in early childhood, even between genetically identical individuals, and that individual differences in methylation are not stable over time. Our longitudinal-developmental study suggests that environmental influences are important factors accounting for interindividual DNA methylation differences, and that these influences differ across the genome. The observation of dynamic changes in DNA methylation over time highlights the importance of longitudinal research designs for epigenetic research.
Resumo:
Using data from a longitudinal study of community-dwelling older adults, we analyzed the most extensive set of known correlates of PTSD symptoms obtained from a single sample to examine the measures' independent and combined utility in accounting for PTSD symptom severity. Fifteen measures identified as PTSD risk factors in published meta-analyses and 12 theoretically and empirically supported individual difference and health-related measures were included. Individual difference measures assessed after the trauma, including insecure attachment and factors related to the current trauma memory, such as self-rated severity, event centrality, frequency of involuntary recall, and physical reactions to the memory, accounted for symptom severity better than measures of pre-trauma factors. In an analysis restricted to prospective measures assessed before the trauma, the total variance explained decreased from 56% to 16%. Results support a model of PTSD in which characteristics of the current trauma memory promote the development and maintenance of PTSD symptoms.
Resumo:
Assays that assess cellular mediated immune responses performed under Good Clinical Laboratory Practice (GCLP) guidelines are required to provide specific and reproducible results. Defined validation procedures are required to establish the Standard Operating Procedure (SOP), include pass and fail criteria, as well as implement positivity criteria. However, little to no guidance is provided on how to perform longitudinal assessment of the key reagents utilized in the assay. Through the External Quality Assurance Program Oversight Laboratory (EQAPOL), an Interferon-gamma (IFN-γ) Enzyme-linked immunosorbent spot (ELISpot) assay proficiency testing program is administered. A limit of acceptable within site variability was estimated after six rounds of proficiency testing (PT). Previously, a PT send-out specific within site variability limit was calculated based on the dispersion (variance/mean) of the nine replicate wells of data. Now an overall 'dispersion limit' for the ELISpot PT program within site variability has been calculated as a dispersion of 3.3. The utility of this metric was assessed using a control sample to calculate the within (precision) and between (accuracy) experiment variability to determine if the dispersion limit could be applied to bridging studies (studies that assess lot-to-lot variations of key reagents) for comparing the accuracy of results with new lots to results with old lots. Finally, simulations were conducted to explore how this dispersion limit could provide guidance in the number of replicate wells needed for within and between experiment variability and the appropriate donor reactivity (number of antigen-specific cells) to be used for the evaluation of new reagents. Our bridging study simulations indicate using a minimum of six replicate wells of a control donor sample with reactivity of at least 150 spot forming cells per well is optimal. To determine significant lot-to-lot variations use the 3.3 dispersion limit for between and within experiment variability.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
How do infants learn word meanings? Research has established the impact of both parent and child behaviors on vocabulary development, however the processes and mechanisms underlying these relationships are still not fully understood. Much existing literature focuses on direct paths to word learning, demonstrating that parent speech and child gesture use are powerful predictors of later vocabulary. However, an additional body of research indicates that these relationships don’t always replicate, particularly when assessed in different populations, contexts, or developmental periods.
The current study examines the relationships between infant gesture, parent speech, and infant vocabulary over the course of the second year (10-22 months of age). Through the use of detailed coding of dyadic mother-child play interactions and a combination of quantitative and qualitative data analytic methods, the process of communicative development was explored. Findings reveal non-linear patterns of growth in both parent speech content and child gesture use. Analyses of contingency in dyadic interactions reveal that children are active contributors to communicative engagement through their use of gestures, shaping the type of input they receive from parents, which in turn influences child vocabulary acquisition. Recommendations for future studies and the use of nuanced methodologies to assess changes in the dynamic system of dyadic communication are discussed.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.