882 resultados para Bayesian model selection
Resumo:
With the proliferation of social media sites, social streams have proven to contain the most up-to-date information on current events. Therefore, it is crucial to extract events from the social streams such as tweets. However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy. In this paper we propose a simple and yet effective Bayesian model, called Latent Event Model (LEM), to extract structured representation of events from social media. LEM is fully unsupervised and does not require annotated data for training. We evaluate LEM on a Twitter corpus. Experimental results show that the proposed model achieves 83% in F-measure, and outperforms the state-of-the-art baseline by over 7%.© 2014 Association for Computational Linguistics.
Resumo:
Storyline detection from news articles aims at summarizing events described under a certain news topic and revealing how those events evolve over time. It is a difficult task because it requires first the detection of events from news articles published in different time periods and then the construction of storylines by linking events into coherent news stories. Moreover, each storyline has different hierarchical structures which are dependent across epochs. Existing approaches often ignore the dependency of hierarchical structures in storyline generation. In this paper, we propose an unsupervised Bayesian model, called dynamic storyline detection model, to extract structured representations and evolution patterns of storylines. The proposed model is evaluated on a large scale news corpus. Experimental results show that our proposed model outperforms several baseline approaches.
Resumo:
The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.
Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.
Resumo:
Testing for differences within data sets is an important issue across various applications. Our work is primarily motivated by the analysis of microbiomial composition, which has been increasingly relevant and important with the rise of DNA sequencing. We first review classical frequentist tests that are commonly used in tackling such problems. We then propose a Bayesian Dirichlet-multinomial framework for modeling the metagenomic data and for testing underlying differences between the samples. A parametric Dirichlet-multinomial model uses an intuitive hierarchical structure that allows for flexibility in characterizing both the within-group variation and the cross-group difference and provides very interpretable parameters. A computational method for evaluating the marginal likelihoods under the null and alternative hypotheses is also given. Through simulations, we show that our Bayesian model performs competitively against frequentist counterparts. We illustrate the method through analyzing metagenomic applications using the Human Microbiome Project data.
Resumo:
Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.
Resumo:
The use of chemical control measures to reduce the impact of parasite and pest species has frequently resulted in the development of resistance. Thus, resistance management has become a key concern in human and veterinary medicine, and in agricultural production. Although it is known that factors such as gene flow between susceptible and resistant populations, drug type, application methods, and costs of resistance can affect the rate of resistance evolution, less is known about the impacts of density-dependent eco-evolutionary processes that could be altered by drug-induced mortality. The overall aim of this thesis was to take an experimental evolution approach to assess how life history traits respond to drug selection, using a free-living dioecious worm (Caenorhabditis remanei) as a model. In Chapter 2, I defined the relationship between C. remanei survival and Ivermectin dose over a range of concentrations, in order to control the intensity of selection used in the selection experiment described in Chapter 4. The dose-response data were also used to appraise curve-fitting methods, using Akaike Information Criterion (AIC) model selection to compare a series of nonlinear models. The type of model fitted to the dose response data had a significant effect on the estimates of LD50 and LD99, suggesting that failure to fit an appropriate model could give misleading estimates of resistance status. In addition, simulated data were used to establish that a potential cost of resistance could be predicted by comparing survival at the upper asymptote of dose-response curves for resistant and susceptible populations, even when differences were as low as 4%. This approach to dose-response modeling ensures that the maximum amount of useful information relating to resistance is gathered in one study. In Chapter 3, I asked how simulations could be used to inform important design choices used in selection experiments. Specifically, I focused on the effects of both within- and between-line variation on estimated power, when detecting small, medium and large effect sizes. Using mixed-effect models on simulated data, I demonstrated that commonly used designs with realistic levels of variation could be underpowered for substantial effect sizes. Thus, use of simulation-based power analysis provides an effective way to avoid under or overpowering a study designs incorporating variation due to random effects. In Chapter 4, I 3 investigated how Ivermectin dosage and changes in population density affect the rate of resistance evolution. I exposed replicate lines of C. remanei to two doses of Ivermectin (high and low) to assess relative survival of lines selected in drug-treated environments compared to untreated controls over 10 generations. Additionally, I maintained lines where mortality was imposed randomly to control for differences in density between drug treatments and to distinguish between the evolutionary consequences of drug treatment versus ecological processes affected by changes in density-dependent feedback. Intriguingly, both drug-selected and random-mortality lines showed an increase in survivorship when challenged with Ivermectin; the magnitude of this increase varied with the intensity of selection and life-history stage. The results suggest that interactions between density-dependent processes and life history may mediate evolved changes in susceptibility to control measures, which could result in misleading conclusions about the evolution of heritable resistance following drug treatment. In Chapter 5, I investigated whether the apparent changes in drug susceptibility found in Chapter 4 were related to evolved changes in life-history of C. remanei populations after selection in drug-treated and random-mortality environments. Rapid passage of lines in the drug-free environment had no effect on the measured life-history traits. In the drug-free environment, adult size and fecundity of drug-selected lines increased compared to the controls but drug selection did not affect lifespan. In the treated environment, drug-selected lines showed increased lifespan and fecundity relative to controls. Adult size of randomly culled lines responded in a similar way to drug-selected lines in the drug-free environment, but no change in fecundity or lifespan was observed in either environment. The results suggest that life histories of nematodes can respond to selection as a result of the application of control measures. Failure to take these responses into account when applying control measures could result in adverse outcomes, such as larger and more fecund parasites, as well as over-estimation of the development of genetically controlled resistance. In conclusion, my thesis shows that there may be a complex relationship between drug selection, density-dependent regulatory processes and life history of populations challenged with control measures. This relationship could have implications for how resistance is monitored and managed if life histories of parasitic species show such eco-evolutionary responses to drug application.
Resumo:
The cerebral cortex presents self-similarity in a proper interval of spatial scales, a property typical of natural objects exhibiting fractal geometry. Its complexity therefore can be characterized by the value of its fractal dimension (FD). In the computation of this metric, it has usually been employed a frequentist approach to probability, with point estimator methods yielding only the optimal values of the FD. In our study, we aimed at retrieving a more complete evaluation of the FD by utilizing a Bayesian model for the linear regression analysis of the box-counting algorithm. We used T1-weighted MRI data of 86 healthy subjects (age 44.2 ± 17.1 years, mean ± standard deviation, 48% males) in order to gain insights into the confidence of our measure and investigate the relationship between mean Bayesian FD and age. Our approach yielded a stronger and significant (P < .001) correlation between mean Bayesian FD and age as compared to the previous implementation. Thus, our results make us suppose that the Bayesian FD is a more truthful estimation for the fractal dimension of the cerebral cortex compared to the frequentist FD.
Resumo:
Despite the success of the ΛCDM model in describing the Universe, a possible tension between early- and late-Universe cosmological measurements is calling for new independent cosmological probes. Amongst the most promising ones, gravitational waves (GWs) can provide a self-calibrated measurement of the luminosity distance. However, to obtain cosmological constraints, additional information is needed to break the degeneracy between parameters in the gravitational waveform. In this thesis, we exploit the latest LIGO-Virgo-KAGRA Gravitational Wave Transient Catalog (GWTC-3) of GW sources to constrain the background cosmological parameters together with the astrophysical properties of Binary Black Holes (BBHs), using information from their mass distribution. We expand the public code MGCosmoPop, previously used for the application of this technique, by implementing a state-of-the-art model for the mass distribution, needed to account for the presence of non-trivial features, i.e. a truncated power law with two additional Gaussian peaks, referred to as Multipeak. We then analyse GWTC-3 comparing this model with simpler and more commonly adopted ones, both in the case of fixed and varying cosmology, and assess their goodness-of-fit with different model selection criteria, and their constraining power on the cosmological and population parameters. We also start to explore different sampling methods, namely Markov Chain Monte Carlo and Nested Sampling, comparing their performances and evaluating the advantages of both. We find concurring evidence that the Multipeak model is favoured by the data, in line with previous results, and show that this conclusion is robust to the variation of the cosmological parameters. We find a constraint on the Hubble constant of H0 = 61.10+38.65−22.43 km/s/Mpc (68% C.L.), which shows the potential of this method in providing independent constraints on cosmological parameters. The results obtained in this work have been included in [1].
Resumo:
Background: Neotropical freshwater stingrays (Batoidea: Potamotrygonidae) host a diverse parasite fauna, including cestodes. Both cestodes and their stingray hosts are marine-derived, but the taxonomy of this host/parasite system is poorly understood. Methodology: Morphological and molecular (Cytochrome oxidase I) data were used to investigate diversity in freshwater lineages of the cestode genus Rhinebothrium Linton, 1890. Results were based on a phylogenetic hypothesis for 74 COI sequences and morphological analysis of over 400 specimens. Cestodes studied were obtained from 888 individual potamotrygonids, representing 14 recognized and 18 potentially undescribed species from most river systems of South America. Results: Morphological species boundaries were based mainly on microthrix characters observed with scanning electron microscopy, and were supported by COI data. Four species were recognized, including two redescribed (Rhinebothrium copianullum and R. paratrygoni), and two newly described (R. brooksi n. sp. and R. fulbrighti n. sp.). Rhinebothrium paranaensis Menoret & Ivanov, 2009 is considered a junior synonym of R. paratrygoni because the morphological features of the two species overlap substantially. The diagnosis of Rhinebothrium Linton, 1890 is emended to accommodate the presence of marginal longitudinal septa observed in R. copianullum and R. brooksi n. sp. Patterns of host specificity and distribution ranged from use of few host species in few river basins, to use of as many as eight host species in multiple river basins. Significance: The level of intra-specific morphological variation observed in features such as total length and number of proglottids is unparalleled among other elasmobranch cestodes. This is attributed to the large representation of host and biogeographical samples. It is unclear whether the intra-specific morphological variation observed is unique to this freshwater system. Nonetheless, caution is urged when using morphological discontinuities to delimit elasmobranch cestode species because the amount of variation encountered is highly dependent on sample size and/or biogeographical representation.
Resumo:
We consider the problem of interaction neighborhood estimation from the partial observation of a finite number of realizations of a random field. We introduce a model selection rule to choose estimators of conditional probabilities among natural candidates. Our main result is an oracle inequality satisfied by the resulting estimator. We use then this selection rule in a two-step procedure to evaluate the interacting neighborhoods. The selection rule selects a small prior set of possible interacting points and a cutting step remove from this prior set the irrelevant points. We also prove that the Ising models satisfy the assumptions of the main theorems, without restrictions on the temperature, on the structure of the interacting graph or on the range of the interactions. It provides therefore a large class of applications for our results. We give a computationally efficient procedure in these models. We finally show the practical efficiency of our approach in a simulation study.
Resumo:
We formulated a general unrestricted model of the Brazilian Emerging Markets Bond Index Plus (EMBI+) spreads, a proxy for the country`s default risk. Employing algorithms that perform automated model selection, we found that macroeconomic fundamentals, such as current account deficit ratio to gross domestic product, public deficit ratio to gross domestic product and imports over foreign exchange reserves, can explain a great part of the variation in EMBI+ spreads. There is also robust evidence of systematic contagion from Argentina and Mexico and that the variance of the spread also affects its mean.
Resumo:
Fogo selvagem (FS) is mediated by pathogenic, predominantly IgG4, anti-desmoglein 1 (Dsg1) autoantibodies and is endemic in Limao Verde, Brazil. IgG and IgG subclass autoantibodies were tested in a sample of 214 FS patients and 261 healthy controls by Dsg1 ELISA. For model selection, the sample was randomly divided into training (50%), validation (25%), and test (25%) sets. Using the training and validation sets, IgG4 was chosen as the best predictor of FS, with index values above 6.43 classified as FS. Using the test set, IgG4 has sensitivity of 92% (95% confidence interval (95% CI): 82-95%), specificity of 97% (95% CI: 89-100%), and area under the curve of 0.97 ( 95% CI: 0.94-1.00). The IgG4 positive predictive value (PPV) in Limao Verde (3% FS prevalence) was 49%. The sensitivity, specificity, and PPV of IgG anti-Dsg1 were 87, 91, and 23%, respectively. The IgG4-based classifier was validated by testing 11 FS patients before and after clinical disease and 60 Japanese pemphigus foliaceus patients. It classified 21 of 96 normal individuals from a Limao Verde cohort as having FS serology. On the basis of its PPV, half of the 21 individuals may currently have preclinical FS and could develop clinical disease in the future. Identifying individuals during preclinical FS will enhance our ability to identify the etiological agent(s) triggering FS.
Resumo:
Emotional liability and mood dysregulation characterize bipolar disorder (BID), yet no study has examined effective connectivity between parahippocampal gyrus and prefrontal cortical regions in ventromedial and dorsal/lateral neural systems subserving mood regulation in BD. Participants comprised 46 individuals (age range: 18-56 years): 21 with a DSM-IV diagnosis of BID, type I currently remitted; and 25 age- and gender-matched healthy controls (HC). Participants performed an event-related functional magnetic resonance imaging paradigm, viewing mild and intense happy and neutral faces. We employed dynamic causal modeling (I)CM) to identify significant alterations in effective connectivity between BD and HC. Bayes model selection was used to determine the best model. The right parahippocampal gyrus (PHG) and right subgenual cingulate gyrus (sgCG) were included as representative regions of the ventromedial neural system. The right dorsolateral prefrontal cortex (DLPFC) region was included as representative of the dorsal/lateral neural system. Right PHG-sgCG effective connectivity was significantly greater in BD than HC, reflecting more rapid, forward PHG-sgCG signaling in BD than HC. There was no between-group difference in sgCG-DLPFC effective connectivity. In BD, abnormally increased right PHG-sgCG effective connectivity and reduced right PHG activity to emotional stimuli suggest a dysfunctional ventromedial neural system implicated in early stimulus appraisal, encoding and automatic regulation of emotion that may represent a pathophysiological functional neural mechanism for mood dysregulation in BD. (C) 2009 Elsevier Ireland Ltd. All rights reserved.
Resumo:
Objective: Several limitations of published bioelectrical impedance analysis (BIA) equations have been reported. The aims were to develop in a multiethnic, elderly population a new prediction equation and cross-validate it along with some published BIA equations for estimating fat-free mass using deuterium oxide dilution as the reference method. Design and setting: Cross-sectional study of elderly from five developing countries. Methods: Total body water (TBW) measured by deuterium dilution was used to determine fat-free mass (FFM) in 383 subjects. Anthropometric and BIA variables were also measured. Only 377 subjects were included for the analysis, randomly divided into development and cross-validation groups after stratified by gender. Stepwise model selection was used to generate the model and Bland Altman analysis was used to test agreement. Results: FFM = 2.95 - 3.89 (Gender) + 0.514 (Ht(2)/Z) + 0.090 (Waist) + 0.156 (Body weight). The model fit parameters were an R(2), total F-Ratio, and the SEE of 0.88, 314.3, and 3.3, respectively. None of the published BIA equations met the criteria for agreement. The new BIA equation underestimated FFM by just 0.3 kg in the cross-validation sample. The mean of the difference between FFM by TBW and the new BIA equation were not significantly different; 95% of the differences were between the limits of agreement of -6.3 to 6.9 kg of FFM. There was no significant association between the mean of the differences and their averages (r = 0.008 and p = 0.2). Conclusions: This new BIA equation offers a valid option compared with some of the current published BIA equations to estimate FFM in elderly subjects from five developing countries.
Resumo:
The bridled nailtail wallaby is restricted to one locality in central Queensland, Australia. The population declined severely during a major drought between 1991 and 1995. We investigated age-specific covariates of survival and proximate causes of mortality from 1994 to 1997, using mark-recapture and radio-tagging techniques at two study sites. Using a matrix population model, we also modelled the effect of drought on age-specific survival and the intrinsic rate of population increase,;,. The only significant covariate of survival for adults was a measure of health unrelated to drought. Rainfall, food, predator activity, year, sex and habitat were not associated with variation in adult survival. Juvenile survival was negatively affected by drought, and predation was the proximate cause of most juvenile deaths. The matrix projection model showed that the observed juvenile survivorship during the drought was low enough to have produced a population decline, although fecundity and survival of other age classes was high throughout the study. (C) 2001 Elsevier Science Ltd. All rights reserved.