897 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo
Resumo:
In this study a new, fully non-linear, approach to Local Earthquake Tomography is presented. Local Earthquakes Tomography (LET) is a non-linear inversion problem that allows the joint determination of earthquakes parameters and velocity structure from arrival times of waves generated by local sources. Since the early developments of seismic tomography several inversion methods have been developed to solve this problem in a linearized way. In the framework of Monte Carlo sampling, we developed a new code based on the Reversible Jump Markov Chain Monte Carlo sampling method (Rj-McMc). It is a trans-dimensional approach in which the number of unknowns, and thus the model parameterization, is treated as one of the unknowns. I show that our new code allows overcoming major limitations of linearized tomography, opening a new perspective in seismic imaging. Synthetic tests demonstrate that our algorithm is able to produce a robust and reliable tomography without the need to make subjective a-priori assumptions about starting models and parameterization. Moreover it provides a more accurate estimate of uncertainties about the model parameters. Therefore, it is very suitable for investigating the velocity structure in regions that lack of accurate a-priori information. Synthetic tests also reveal that the lack of any regularization constraints allows extracting more information from the observed data and that the velocity structure can be detected also in regions where the density of rays is low and standard linearized codes fails. I also present high-resolution Vp and Vp/Vs models in two widespread investigated regions: the Parkfield segment of the San Andreas Fault (California, USA) and the area around the Alto Tiberina fault (Umbria-Marche, Italy). In both the cases, the models obtained with our code show a substantial improvement in the data fit, if compared with the models obtained from the same data set with the linearized inversion codes.
Resumo:
Oscillations between high and low values of the membrane potential (UP and DOWN states respectively) are an ubiquitous feature of cortical neurons during slow wave sleep and anesthesia. Nevertheless, a surprisingly small number of quantitative studies have been conducted only that deal with this phenomenon’s implications for computation. Here we present a novel theory that explains on a detailed mathematical level the computational benefits of UP states. The theory is based on random sampling by means of interspike intervals (ISIs) of the exponential integrate and fire (EIF) model neuron, such that each spike is considered a sample, whose analog value corresponds to the spike’s preceding ISI. As we show, the EIF’s exponential sodium current, that kicks in when balancing a noisy membrane potential around values close to the firing threshold, leads to a particularly simple, approximative relationship between the neuron’s ISI distribution and input current. Approximation quality depends on the frequency spectrum of the current and is improved upon increasing the voltage baseline towards threshold. Thus, the conceptually simpler leaky integrate and fire neuron that is missing such an additional current boost performs consistently worse than the EIF and does not improve when voltage baseline is increased. For the EIF in contrast, the presented mechanism is particularly effective in the high-conductance regime, which is a hallmark feature of UP-states. Our theoretical results are confirmed by accompanying simulations, which were conducted for input currents of varying spectral composition. Moreover, we provide analytical estimations of the range of ISI distributions the EIF neuron can sample from at a given approximation level. Such samples may be considered by any algorithmic procedure that is based on random sampling, such as Markov Chain Monte Carlo or message-passing methods. Finally, we explain how spike-based random sampling relates to existing computational theories about UP states during slow wave sleep and present possible extensions of the model in the context of spike-frequency adaptation.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
A maximum likelihood estimator based on the coalescent for unequal migration rates and different subpopulation sizes is developed. The method uses a Markov chain Monte Carlo approach to investigate possible genealogies with branch lengths and with migration events. Properties of the new method are shown by using simulated data from a four-population n-island model and a source–sink population model. Our estimation method as coded in migrate is tested against genetree; both programs deliver a very similar likelihood surface. The algorithm converges to the estimates fairly quickly, even when the Markov chain is started from unfavorable parameters. The method was used to estimate gene flow in the Nile valley by using mtDNA data from three human populations.
Resumo:
We perform a detailed modelling of the post-outburst surface emission of the low magnetic field magnetar SGR 0418+5729. The dipolar magnetic field of this source, B=6×1012G estimated from its spin-down rate, is in the observed range of magnetic fields for normal pulsars. The source is further characterized by a high pulse fraction and a single-peak profile. Using synthetic temperature distribution profiles, and fully accounting for the general-relativistic effects of light deflection and gravitational redshift, we generate synthetic X-ray spectra and pulse profiles that we fit to the observations. We find that asymmetric and symmetric surface temperature distributions can reproduce equally well the observed pulse profiles and spectra of SGR 0418. None the less, the modelling allows us to place constraints on the system geometry (i.e. the angles ψ and ξ that the rotation axis makes with the line of sight and the dipolar axis, respectively), as well as on the spot size and temperature contrast on the neutron star surface. After performing an analysis iterating between the pulse profile and spectra, as done in similar previous works, we further employed, for the first time in this context, a Markov-Chain Monte Carlo approach to extract constraints on the model parameters from the pulse profiles and spectra, simultaneously. We find that, to reproduce the observed spectrum and flux modulation: (a) the angles must be restricted to 65° ≲ ψ + ξ ≲ 125° or 235° ≲ ψ + ξ ≲ 295°; (b) the temperature contrast between the poles and the equator must be at least a factor of ∼6, and (c) the size of the hottest region ranges between 0.2 and 0.7 km (including uncertainties on the source distance). Lastly, we interpret our findings within the context of internal and external heating models.
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
We present results that compare the performance of neural networks trained with two Bayesian methods, (i) the Evidence Framework of MacKay (1992) and (ii) a Markov Chain Monte Carlo method due to Neal (1996) on a task of classifying segmented outdoor images. We also investigate the use of the Automatic Relevance Determination method for input feature selection.
Resumo:
Statistical methodology is proposed for comparing molecular shapes. In order to account for the continuous nature of molecules, classical shape analysis methods are combined with techniques used for predicting random fields in spatial statistics. Applying a modification of Procrustes analysis, Bayesian inference is carried out using Markov chain Monte Carlo methods for the pairwise alignment of the resulting molecular fields. Superimposing entire fields rather than the configuration matrices of nuclear positions thereby solves the problem that there is usually no clear one--to--one correspondence between the atoms of the two molecules under consideration. Using a similar concept, we also propose an adaptation of the generalised Procrustes analysis algorithm for the simultaneous alignment of multiple molecular fields. The methodology is applied to a dataset of 31 steroid molecules.
Resumo:
Scientific curiosity, exploration of georesources and environmental concerns are pushing the geoscientific research community toward subsurface investigations of ever-increasing complexity. This review explores various approaches to formulate and solve inverse problems in ways that effectively integrate geological concepts with geophysical and hydrogeological data. Modern geostatistical simulation algorithms can produce multiple subsurface realizations that are in agreement with conceptual geological models and statistical rock physics can be used to map these realizations into physical properties that are sensed by the geophysical or hydrogeological data. The inverse problem consists of finding one or an ensemble of such subsurface realizations that are in agreement with the data. The most general inversion frameworks are presently often computationally intractable when applied to large-scale problems and it is necessary to better understand the implications of simplifying (1) the conceptual geological model (e.g., using model compression); (2) the physical forward problem (e.g., using proxy models); and (3) the algorithm used to solve the inverse problem (e.g., Markov chain Monte Carlo or local optimization methods) to reach practical and robust solutions given today's computer resources and knowledge. We also highlight the need to not only use geophysical and hydrogeological data for parameter estimation purposes, but also to use them to falsify or corroborate alternative geological scenarios.
Resumo:
The fundamental objective for health research is to determine whether changes should be made to clinical decisions. Decisions made by veterinary surgeons in the light of new research evidence are known to be influenced by their prior beliefs, especially their initial opinions about the plausibility of possible results. In this paper, clinical trial results for a bovine mastitis control plan were evaluated within a Bayesian context, to incorporate a community of prior distributions that represented a spectrum of clinical prior beliefs. The aim was to quantify the effect of veterinary surgeons’ initial viewpoints on the interpretation of the trial results. A Bayesian analysis was conducted using Markov chain Monte Carlo procedures. Stochastic models included a financial cost attributed to a change in clinical mastitis following implementation of the control plan. Prior distributions were incorporated that covered a realistic range of possible clinical viewpoints, including scepticism, enthusiasm and uncertainty. Posterior distributions revealed important differences in the financial gain that clinicians with different starting viewpoints would anticipate from the mastitis control plan, given the actual research results. For example, a severe sceptic would ascribe a probability of 0.50 for a return of <£5 per cow in an average herd that implemented the plan, whereas an enthusiast would ascribe this probability for a return of >£20 per cow. Simulations using increased trial sizes indicated that if the original study was four times as large, an initial sceptic would be more convinced about the efficacy of the control plan but would still anticipate less financial return than an initial enthusiast would anticipate after the original study. In conclusion, it is possible to estimate how clinicians’ prior beliefs influence their interpretation of research evidence. Further research on the extent to which different interpretations of evidence result in changes to clinical practice would be worthwhile.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.
Resumo:
Background: Hepatitis C virus (HCV) is an important human pathogen affecting around 3% of the human population. In Brazil, it is estimated that there are approximately 2 to 3 million HCV chronic carriers. There are few reports of HCV prevalence in Rondonia State (RO), but it was estimated in 9.7% from 1999 to 2005. The aim of this study was to characterize HCV genotypes in 58 chronic HCV infected patients from Porto Velho, Rondonia (RO), Brazil. Methods: A fragment of 380 bp of NS5B region was amplified by nested PCR for genotyping analysis. Viral sequences were characterized by phylogenetic analysis using reference sequences obtained from the GenBank (n = 173). Sequences were aligned using Muscle software and edited in the SE-AL software. Phylogenetic analyses were conducted using Bayesian Markov chain Monte Carlo simulation (MCMC) to obtain the MCC tree using BEAST v. 1.5.3. Results: From 58 anti-HCV positive samples, 22 were positive to the NS5B fragment and successfully sequenced. Genotype 1b was the most prevalent in this population (50%), followed by 1a (27.2%), 2b (13.6%) and 3a (9.0%). Conclusions: This study is the first report of HCV genotypes from Rondonia State and subtype 1b was found to be the most prevalent. This subtype is mostly found among people who have a previous history of blood transfusion but more detailed studies with a larger number of patients are necessary to understand the HCV dynamics in the population of Rondonia State, Brazil.
Resumo:
Background: GB virus C (GBV-C) is an enveloped positive-sense ssRNA virus belonging to the Flaviviridae family. Studies on the genetic variability of the GBV-C reveals the existence of six genotypes: genotype 1 predominates in West Africa, genotype 2 in Europe and America, genotype 3 in Asia, genotype 4 in Southwest Asia, genotype 5 in South Africa and genotype 6 in Indonesia. The aim of this study was to determine the frequency and genotypic distribution of GBV-C in the Colombian population. Methods: Two groups were analyzed: i) 408 Colombian blood donors infected with HCV (n = 250) and HBV (n = 158) from Bogota and ii) 99 indigenous people with HBV infection from Leticia, Amazonas. A fragment of 344 bp from the 5' untranslated region (5' UTR) was amplified by nested RT PCR. Viral sequences were genotyped by phylogenetic analysis using reference sequences from each genotype obtained from GenBank (n = 160). Bayesian phylogenetic analyses were conducted using Markov chain Monte Carlo (MCMC) approach to obtain the MCC tree using BEAST v. 1.5.3. Results: Among blood donors, from 158 HBsAg positive samples, eight 5.06% (n = 8) were positive for GBV-C and from 250 anti-HCV positive samples, 3.2%(n = 8) were positive for GBV-C. Also, 7.7% (n = 7) GBV-C positive samples were found among indigenous people from Leticia. A phylogenetic analysis revealed the presence of the following GBV-C genotypes among blood donors: 2a (41.6%), 1 (33.3%), 3 (16.6%) and 2b (8.3%). All genotype 1 sequences were found in co-infection with HBV and 4/5 sequences genotype 2a were found in co-infection with HCV. All sequences from indigenous people from Leticia were classified as genotype 3. The presence of GBV-C infection was not correlated with the sex (p = 0.43), age (p = 0.38) or origin (p = 0.17). Conclusions: It was found a high frequency of GBV-C genotype 1 and 2 in blood donors. The presence of genotype 3 in indigenous population was previously reported from Santa Marta region in Colombia and in native people from Venezuela and Bolivia. This fact may be correlated to the ancient movements of Asian people to South America a long time ago.
Resumo:
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.