10 resultados para latent growth curve modeling

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To examine the associations between attention-deficit/hyperactivity disorder (ADHD) symptoms, obesity and hypertension in young adults in a large population-based cohort. DESIGN, SETTING AND PARTICIPANTS: The study population consisted of 15,197 respondents from the National Longitudinal Study of Adolescent Health, a nationally representative sample of adolescents followed from 1995 to 2009 in the United States. Multinomial logistic and logistic models examined the odds of overweight, obesity and hypertension in adulthood in relation to retrospectively reported ADHD symptoms. Latent curve modeling was used to assess the association between symptoms and naturally occurring changes in body mass index (BMI) from adolescence to adulthood. RESULTS: Linear association was identified between the number of inattentive (IN) and hyperactive/impulsive (HI) symptoms and waist circumference, BMI, diastolic blood pressure and systolic blood pressure (all P-values for trend <0.05). Controlling for demographic variables, physical activity, alcohol use, smoking and depressive symptoms, those with three or more HI or IN symptoms had the highest odds of obesity (HI 3+, odds ratio (OR)=1.50, 95% confidence interval (CI) = 1.22-2.83; IN 3+, OR = 1.21, 95% CI = 1.02-1.44) compared with those with no HI or IN symptoms. HI symptoms at the 3+ level were significantly associated with a higher OR of hypertension (HI 3+, OR = 1.24, 95% CI = 1.01-1.51; HI continuous, OR = 1.04, 95% CI = 1.00-1.09), but associations were nonsignificant when models were adjusted for BMI. Latent growth modeling results indicated that compared with those reporting no HI or IN symptoms, those reporting 3 or more symptoms had higher initial levels of BMI during adolescence. Only HI symptoms were associated with change in BMI. CONCLUSION: Self-reported ADHD symptoms were associated with adult BMI and change in BMI from adolescence to adulthood, providing further evidence of a link between ADHD symptoms and obesity.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A common challenge that users of academic databases face is making sense of their query outputs for knowledge discovery. This is exacerbated by the size and growth of modern databases. PubMed, a central index of biomedical literature, contains over 25 million citations, and can output search results containing hundreds of thousands of citations. Under these conditions, efficient knowledge discovery requires a different data structure than a chronological list of articles. It requires a method of conveying what the important ideas are, where they are located, and how they are connected; a method of allowing users to see the underlying topical structure of their search. This paper presents VizMaps, a PubMed search interface that addresses some of these problems. Given search terms, our main backend pipeline extracts relevant words from the title and abstract, and clusters them into discovered topics using Bayesian topic models, in particular the Latent Dirichlet Allocation (LDA). It then outputs a visual, navigable map of the query results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss a general approach to dynamic sparsity modeling in multivariate time series analysis. Time-varying parameters are linked to latent processes that are thresholded to induce zero values adaptively, providing natural mechanisms for dynamic variable inclusion/selection. We discuss Bayesian model specification, analysis and prediction in dynamic regressions, time-varying vector autoregressions, and multivariate volatility models using latent thresholding. Application to a topical macroeconomic time series problem illustrates some of the benefits of the approach in terms of statistical and economic interpretations as well as improved predictions. Supplementary materials for this article are available online. © 2013 Copyright Taylor and Francis Group, LLC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Existing theories explain why operons are advantageous in prokaryotes, but their occurrence in metazoans is an enigma. Nematode operon genes, typically consisting of growth genes, are significantly upregulated during recovery from growth-arrested states. This expression pattern is anticorrelated to nonoperon genes, consistent with a competition for transcriptional resources. We find that transcriptional resources are initially limiting during recovery and that recovering animals are highly sensitive to any additional decrease in transcriptional resources. We provide evidence that operons become advantageous because, by clustering growth genes into operons, fewer promoters compete for the limited transcriptional machinery, effectively increasing the concentration of transcriptional resources and accelerating recovery. Mathematical modeling reveals how a moderate increase in transcriptional resources can substantially enhance transcription rate and recovery. This design principle occurs in different nematodes and the chordate C. intestinalis. As transition from arrest to rapid growth is shared by many metazoans, operons could have evolved to facilitate these processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.

Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation documents the results of a theoretical and numerical study of time dependent storage of energy by melting a phase change material. The heating is provided along invading lines, which change from single-line invasion to tree-shaped invasion. Chapter 2 identifies the special design feature of distributing energy storage in time-dependent fashion on a territory, when the energy flows by fluid flow from a concentrated source to points (users) distributed equidistantly on the area. The challenge in this chapter is to determine the architecture of distributed energy storage. The chief conclusion is that the finite amount of storage material should be distributed proportionally with the distribution of the flow rate of heating agent arriving on the area. The total time needed by the source stream to ‘invade’ the area is cumulative (the sum of the storage times required at each storage site), and depends on the energy distribution paths and the sequence in which the users are served by the source stream. Chapter 3 shows theoretically that the melting process consists of two phases: “invasion” thermal diffusion along the invading line, which is followed by “consolidation” as heat diffuses perpendicularly to the invading line. This chapter also reports the duration of both phases and the evolution of the melt layer around the invading line during the two-dimensional and three-dimensional invasion. It also shows that the amount of melted material increases in time according to a curve shaped as an S. These theoretical predictions are validated by means of numerical simulations in chapter 4. This chapter also shows that the heat transfer rate density increases (i.e., the S curve becomes steeper) as the complexity and number of degrees of freedom of the structure are increased, in accord with the constructal law. The optimal geometric features of the tree structure are detailed in this chapter. Chapter 5 documents a numerical study of time-dependent melting where the heat transfer is convection dominated, unlike in chapter 3 and 4 where the melting is ruled by pure conduction. In accord with constructal design, the search is for effective heat-flow architectures. The volume-constrained improvement of the designs for heat flow begins with assuming the simplest structure, where a single line serves as heat source. Next, the heat source is endowed with freedom to change its shape as it grows. The objective of the numerical simulations is to discover the geometric features that lead to the fastest melting process. The results show that the heat transfer rate density increases as the complexity and number of degrees of freedom of the structure are increased. Furthermore, the angles between heat invasion lines have a minor effect on the global performance compared to other degrees of freedom: number of branching levels, stem length, and branch lengths. The effect of natural convection in the melt zone is documented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.

This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.

The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new

individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the

refreshment sample itself. As we illustrate, nonignorable unit nonresponse

can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse

in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.

The second method incorporates informative prior beliefs about

marginal probabilities into Bayesian latent class models for categorical data.

The basic idea is to append synthetic observations to the original data such that

(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.

We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.

The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.