10 resultados para Modeling Rapport Using Hidden Markov Models

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\

Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various

misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be

much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying

folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational

distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that

has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental

data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted

specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria

that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern

statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably

know very well about protein unfolded states. \\

Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context

of the general field of unfolded state research and the basics of helix-coil models are introduced.\\

Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies

of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing

hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model

exhibits greatly improved predictive performance relative to its predecessor. \\

Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific

properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally

and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used

method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide

also show a remarkable consistency with the predictions of the helix-coil model. \\

Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,

this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially

allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A common challenge that users of academic databases face is making sense of their query outputs for knowledge discovery. This is exacerbated by the size and growth of modern databases. PubMed, a central index of biomedical literature, contains over 25 million citations, and can output search results containing hundreds of thousands of citations. Under these conditions, efficient knowledge discovery requires a different data structure than a chronological list of articles. It requires a method of conveying what the important ideas are, where they are located, and how they are connected; a method of allowing users to see the underlying topical structure of their search. This paper presents VizMaps, a PubMed search interface that addresses some of these problems. Given search terms, our main backend pipeline extracts relevant words from the title and abstract, and clusters them into discovered topics using Bayesian topic models, in particular the Latent Dirichlet Allocation (LDA). It then outputs a visual, navigable map of the query results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The long-term soil carbon dynamics may be approximated by networks of linear compartments, permitting theoretical analysis of transit time (i.e., the total time spent by a molecule in the system) and age (the time elapsed since the molecule entered the system) distributions. We compute and compare these distributions for different network. configurations, ranging from the simple individual compartment, to series and parallel linear compartments, feedback systems, and models assuming a continuous distribution of decay constants. We also derive the transit time and age distributions of some complex, widely used soil carbon models (the compartmental models CENTURY and Rothamsted, and the continuous-quality Q-Model), and discuss them in the context of long-term carbon sequestration in soils. We show how complex models including feedback loops and slow compartments have distributions with heavier tails than simpler models. Power law tails emerge when using continuous-quality models, indicating long retention times for an important fraction of soil carbon. The responsiveness of the soil system to changes in decay constants due to altered climatic conditions or plant species composition is found to be stronger when all compartments respond equally to the environmental change, and when the slower compartments are more sensitive than the faster ones or lose more carbon through microbial respiration. Copyright 2009 by the American Geophysical Union.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log-likelihood function. We show, however, that an extension of the EM algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lipoprotein-associated phospholipase A(2) (Lp-PLA(2)) is an emerging risk factor and therapeutic target for cardiovascular disease. The activity and mass of this enzyme are heritable traits, but major genetic determinants have not been explored in a systematic, genome-wide fashion. We carried out a genome-wide association study of Lp-PLA(2) activity and mass in 6,668 Caucasian subjects from the population-based Framingham Heart Study. Clinical data and genotypes from the Affymetrix 550K SNP array were obtained from the open-access Framingham SHARe project. Each polymorphism that passed quality control was tested for associations with Lp-PLA(2) activity and mass using linear mixed models implemented in the R statistical package, accounting for familial correlations, and controlling for age, sex, smoking, lipid-lowering-medication use, and cohort. For Lp-PLA(2) activity, polymorphisms at four independent loci reached genome-wide significance, including the APOE/APOC1 region on chromosome 19 (p = 6 x 10(-24)); CELSR2/PSRC1 on chromosome 1 (p = 3 x 10(-15)); SCARB1 on chromosome 12 (p = 1x10(-8)) and ZNF259/BUD13 in the APOA5/APOA1 gene region on chromosome 11 (p = 4 x 10(-8)). All of these remained significant after accounting for associations with LDL cholesterol, HDL cholesterol, or triglycerides. For Lp-PLA(2) mass, 12 SNPs achieved genome-wide significance, all clustering in a region on chromosome 6p12.3 near the PLA2G7 gene. Our analyses demonstrate that genetic polymorphisms may contribute to inter-individual variation in Lp-PLA(2) activity and mass.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Lower concentrations of the insulin-like growth factor binding protein-1 (IGFBP-1) and elevated concentrations of insulin or C-peptide have been associated with an increase in colorectal cancer risk (CRC). However few studies have evaluated IGFBP-1 and C-peptide in relation to adenomatous polyps, the only known precursor for CRC. METHODS: Between November 2001 and December 2002, we examined associations between circulating concentrations of insulin, C-peptide, IGFBP-1 and apoptosis among 190 individuals with one or more adenomatous polyps and 488 with no adenomatous polyps using logistic regression models. RESULTS: Individuals with the highest concentrations of C-peptide were more likely to have adenomas (OR = 2.2, 95% CI 1.4-4.0) than those with the lowest concentrations; associations that appeared to be stronger in men (OR = 4.4, 95% CI 1.7-10.9) than women. Individuals with high insulin concentrations also had a higher risk of adenomas (OR = 3.5, 95% CI 1.7-7.4), whereas higher levels of IGFBP-1 were associated with a reduced risk of adenomas in men only (OR = 0.3, 95% CI 0.1-0.7). Overweight and obese individuals with higher C-peptide levels (>1(st) Q) were at increased risk for lower apoptosis index (OR = 2.5, 95% CI 0.9-7.1), an association that remained strong in overweight and obese men (OR = 6.3, 95% CI 1.0-36.7). Higher levels of IGFBP-1 in overweight and obese individuals were associated with a reduced risk of low apoptosis (OR = 0.3, 95% CI 0.1-1.0). CONCLUSIONS: Associations between these peptides and the apoptosis index in overweight and obese individuals, suggest that the mechanism by which C-peptide could induce adenomas may include its anti-apoptotic properties. This study suggests that hyperinsulinemia and IGF hormones predict adenoma risk, and that outcomes associated with colorectal carcinogenesis maybe modified by gender.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

© 2016, Serdi and Springer-Verlag France.Objectives: The association between cognitive function and cholesterol levels is poorly understood and inconsistent results exist among the elderly. The purpose of this study is to investigate the association of cholesterol level with cognitive performance among Chinese elderly. Design: A cross-sectional study was implemented in 2012 and data were analyzed using generalized additive models, linear regression models and logistic regression models. Setting: Community-based setting in eight longevity areas in China. Subjects: A total of 2000 elderly aged 65 years and over (mean 85.8±12.0 years) participated in this study. Measurements: Total cholesterol (TC), triglycerides (TG), low density lipoprotein cholesterol (LDL-C) and high density lipoprotein cholesterol (HDL-C) concentration were determined and cognitive impairment was defined as Mini-Mental State Examination (MMSE) score≤23. Results: There was a significant positive linear association between TC, TG, LDL-C, HDL-C and MMSE score in linear regression models. Each 1 mmol/L increase in TC, TG, LDL-C and HDL-C corresponded to a decreased risk of cognitive impairment in logistic regression models. Compared with the lowest tertile, the highest tertile of TC, LDL-C and HDL-C had a lower risk of cognitive impairment. The adjusted odds ratios and 95% CI were 0.73(0.62–0.84) for TC, 0.81(0.70–0.94) for LDL-C and 0.81(0.70–0.94) for HDL-C. There was no gender difference in the protective effects of high TC and LDL-C levels on cognitive impairment. However, for high HDL-C levels the effect was only observed in women. High TC, LDL-C and HDL-C levels were associated with lower risk of cognitive impairment in the oldest old (aged 80 and older), but not in the younger elderly (aged 65 to 79 years). Conclusions: These findings suggest that cholesterol levels within the high normal range are associated with better cognitive performance in Chinese elderly, specifically in the oldest old. With further validation, low cholesterol may serve a clinical indicator of risk for cognitive impairment in the elderly.