27 resultados para multimodel inference


Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Infection with human papillomavirus (HPV) is associated with uterine cervical intraepithelial neoplasia (CIN) and invasive cancers (ICC). Approximately 80% of ICC cases are diagnosed in under-developed countries. Vaccine development relies on knowledge of HPV genotypes characteristic of LSIL, HSIL and cancer; however, these genotypes remain poorly characterized in many African countries. To contribute to the characterization of HPV genotypes in Northeastern Tanzania, we recruited 215 women from the Reproductive Health Clinic at Kilimanjaro Christian Medical Centre. Cervical scrapes and biopsies were obtained for cytology and HPV DNA detection. RESULTS: 79 out of 215 (36.7%) enrolled participants tested positive for HPV DNA, with a large proportion being multiple infections (74%). The prevalence of HPV infection increased with lesion grade (14% in controls, 67% in CIN1 cases and 88% in CIN2-3). Among ICC cases, 89% had detectable HPV. Overall, 31 HPV genotypes were detected; the three most common HPV genotypes among ICC were HPV16, 35 and 45. In addition to these genotypes, co-infection with HPV18, 31, 33, 52, 58, 68 and 82 was found in 91% of ICC. Among women with CIN2-3, HPV53, 58 and 84/83 were the most common. HPV35, 45, 53/58/59 were the most common among CIN1 cases. CONCLUSIONS: In women with no evidence of cytological abnormalities, the most prevalent genotypes were HPV58 with HPV16, 35, 52, 66 and 73 occurring equally. Although numerical constraints limit inference, findings that 91% of ICC harbor only a small number of HPV genotypes suggests that prevention efforts including vaccine development or adjuvant screening should focus on these genotypes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: We have previously shown that a functional polymorphism of the UGT2B15 gene (rs1902023) was associated with increased risk of prostate cancer (PC). Novel functional polymorphisms of the UGT2B17 and UGT2B15 genes have been recently characterized by in vitro assays but have not been evaluated in epidemiologic studies. METHODS: Fifteen functional SNPs of the UGT2B17 and UGT2B15 genes, including cis-acting UGT2B gene SNPs, were genotyped in African American and Caucasian men (233 PC cases and 342 controls). Regression models were used to analyze the association between SNPs and PC risk. RESULTS: After adjusting for race, age and BMI, we found that six UGT2B15 SNPs (rs4148269, rs3100, rs9994887, rs13112099, rs7686914 and rs7696472) were associated with an increased risk of PC in log-additive models (p < 0.05). A SNP cis-acting on UGT2B17 and UGT2B15 expression (rs17147338) was also associated with increased risk of prostate cancer (OR = 1.65, 95% CI = 1.00-2.70); while a stronger association among men with high Gleason sum was observed for SNPs rs4148269 and rs3100. CONCLUSIONS: Although small sample size limits inference, we report novel associations between UGT2B15 and UGT2B17 variants and PC risk. These associations with PC risk in men with high Gleason sum, more frequently found in African American men, support the relevance of genetic differences in the androgen metabolism pathway, which could explain, in part, the high incidence of PC among African American men. Larger studies are required.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

© Institute of Mathematical Statistics, 2014.Motivated by recent findings in the field of consumer science, this paper evaluates the causal effect of debit cards on household consumption using population-based data from the Italy Survey on Household Income and Wealth (SHIW). Within the Rubin Causal Model, we focus on the estimand of population average treatment effect for the treated (PATT). We consider three existing estimators, based on regression, mixed matching and regression, propensity score weighting, and propose a new doubly-robust estimator. Semiparametric specification based on power series for the potential outcomes and the propensity score is adopted. Cross-validation is used to select the order of the power series. We conduct a simulation study to compare the performance of the estimators. The key assumptions, overlap and unconfoundedness, are systematically assessed and validated in the application. Our empirical results suggest statistically significant positive effects of debit cards on the monthly household spending in Italy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

UNLABELLED: • PREMISE OF THE STUDY: Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• METHODS: Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• KEY RESULTS: Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• CONCLUSIONS: Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

© 2014, The International Biometric Society.A potential venue to improve healthcare efficiency is to effectively tailor individualized treatment strategies by incorporating patient level predictor information such as environmental exposure, biological, and genetic marker measurements. Many useful statistical methods for deriving individualized treatment rules (ITR) have become available in recent years. Prior to adopting any ITR in clinical practice, it is crucial to evaluate its value in improving patient outcomes. Existing methods for quantifying such values mainly consider either a single marker or semi-parametric methods that are subject to bias under model misspecification. In this article, we consider a general setting with multiple markers and propose a two-step robust method to derive ITRs and evaluate their values. We also propose procedures for comparing different ITRs, which can be used to quantify the incremental value of new markers in improving treatment selection. While working models are used in step I to approximate optimal ITRs, we add a layer of calibration to guard against model misspecification and further assess the value of the ITR non-parametrically, which ensures the validity of the inference. To account for the sampling variability of the estimated rules and their corresponding values, we propose a resampling procedure to provide valid confidence intervals for the value functions as well as for the incremental value of new markers for treatment selection. Our proposals are examined through extensive simulation studies and illustrated with the data from a clinical trial that studies the effects of two drug combinations on HIV-1 infected patients.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For optimal solutions in health care, decision makers inevitably must evaluate trade-offs, which call for multi-attribute valuation methods. Researchers have proposed using best-worst scaling (BWS) methods which seek to extract information from respondents by asking them to identify the best and worst items in each choice set. While a companion paper describes the different types of BWS, application and their advantages and downsides, this contribution expounds their relationships with microeconomic theory, which also have implications for statistical inference. This article devotes to the microeconomic foundations of preference measurement, also addressing issues such as scale invariance and scale heterogeneity. Furthermore the paper discusses the basics of preference measurement using rating, ranking and stated choice data in the light of the findings of the preceding section. Moreover the paper gives an introduction to the use of stated choice data and juxtaposes BWS with the microeconomic foundations.