898 resultados para Hidden Markov chain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents a statistical method for detecting recombination in DNA sequence alignments, which is based on combining two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignments. We adopt a Bayesian approach and sample the parameters of the model from the posterior distribution with Markov chain Monte Carlo, using a Metropolis-Hastings and Gibbs-within-Gibbs scheme. The proposed method is tested on various synthetic and real-world DNA sequence alignments, and we compare its performance with the established detection methods RECPARS, PLATO, and TOPAL, as well as with two alternative parameter estimation schemes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider in this paper the optimal stationary dynamic linear filtering problem for continuous-time linear systems subject to Markovian jumps in the parameters (LSMJP) and additive noise (Wiener process). It is assumed that only an output of the system is available and therefore the values of the jump parameter are not accessible. It is a well known fact that in this setting the optimal nonlinear filter is infinite dimensional, which makes the linear filtering a natural numerically, treatable choice. The goal is to design a dynamic linear filter such that the closed loop system is mean square stable and minimizes the stationary expected value of the mean square estimation error. It is shown that an explicit analytical solution to this optimal filtering problem is obtained from the stationary solution associated to a certain Riccati equation. It is also shown that the problem can be formulated using a linear matrix inequalities (LMI) approach, which can be extended to consider convex polytopic uncertainties on the parameters of the possible modes of operation of the system and on the transition rate matrix of the Markov process. As far as the authors are aware of this is the first time that this stationary filtering problem (exact and robust versions) for LSMJP with no knowledge of the Markov jump parameters is considered in the literature. Finally, we illustrate the results with an example.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes the geography of regional competitiveness in manufacturing in Brazil. The authors estimate stochastic frontiers to calculate regional efficiency of representative firms in 137 regions in the period 2000-2006, in four sectors defined by technological intensity. The efficiency results are analyzed using Markov Spatial Transition Matrices to provide insights into the transition of regions between efficiency levels, considering their local spatial context. The results indicate that geography plays an important role in manufacturing competitiveness. In particular, regions with more competitive neighbors are more likely to improve their relative efficiency (pull effect) over time, and regions with less competitive neighbors are more likely to lose relative efficiency (drag effect). The authors find that the pull effect is stronger than the drag effect.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops the model of Bicego, Grosso, and Otranto (2008) and applies Hidden Markov Models to predict market direction. The paper draws an analogy between financial markets and speech recognition, seeking inspiration from the latter to solve common issues in quantitative investing. Whereas previous works focus mostly on very complex modifications of the original hidden markov model algorithm, the current paper provides an innovative methodology by drawing inspiration from thoroughly tested, yet simple, speech recognition methodologies. By grouping returns into sequences, Hidden Markov Models can then predict market direction the same way they are used to identify phonemes in speech recognition. The model proves highly successful in identifying market direction but fails to consistently identify whether a trend is in place. All in all, the current paper seeks to bridge the gap between speech recognition and quantitative finance and, even though the model is not fully successful, several refinements are suggested and the room for improvement is significant.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present an integrated system for real-time automatic detection of human actions from video. The proposed approach uses the boundary of humans as the main feature for recognizing actions. Background subtraction is performed using Gaussian mixture model. Then, features are extracted from silhouettes and Vector Quantization is used to map features into symbols (bag of words approach). Finally, actions are detected using the Hidden Markov Model. The proposed system was validated using a newly collected real- world dataset. The obtained results show that the system is capable of achieving robust human detection, in both indoor and outdoor environments. Moreover, promising classification results were achieved when detecting two basic human actions: walking and sitting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L'Anàlisi de la supervivència s'utilitza en diferents camps per analitzar el temps transcorregut entre dos esdeveniments. El que distingeix l'anàlisi de la supervivència d'altres àrees de l'estadística és que les dades normalment estan censurades. La censura en un interval apareix quan l'esdeveniment final d'interès no és directament observable i només se sap que el temps de fallada està en un interval concret. Un esquema de censura més complex encara apareix quan tant el temps inicial com el temps final estan censurats en un interval. Aquesta situació s'anomena doble censura. En aquest article donem una descripció formal d'un mètode bayesà paramètric per a l'anàlisi de dades censurades en un interval i dades doblement censurades així com unes indicacions clares de la seva utilització o pràctica. La metodologia proposada s'ilustra amb dades d'una cohort de pacients hemofílics que es varen infectar amb el virus VIH a principis dels anys 1980's.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a non-equidistant Q rate matrix formula and an adaptive numerical algorithm for a continuous time Markov chain to approximate jump-diffusions with affine or non-affine functional specifications. Our approach also accommodates state-dependent jump intensity and jump distribution, a flexibility that is very hard to achieve with other numerical methods. The Kolmogorov-Smirnov test shows that the proposed Markov chain transition density converges to the one given by the likelihood expansion formula as in Ait-Sahalia (2008). We provide numerical examples for European stock option pricing in Black and Scholes (1973), Merton (1976) and Kou (2002).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ground-penetrating radar (GPR) has the potential to provide valuable information on hydrological properties of the vadose zone because of their strong sensitivity to soil water content. In particular, recent evidence has suggested that the stochastic inversion of crosshole GPR data within a coupled geophysical-hydrological framework may allow for effective estimation of subsurface van-Genuchten-Mualem (VGM) parameters and their corresponding uncertainties. An important and still unresolved issue, however, is how to best integrate GPR data into a stochastic inversion in order to estimate the VGM parameters and their uncertainties, thus improving hydrological predictions. Recognizing the importance of this issue, the aim of the research presented in this thesis was to first introduce a fully Bayesian inversion called Markov-chain-Monte-carlo (MCMC) strategy to perform the stochastic inversion of steady-state GPR data to estimate the VGM parameters and their uncertainties. Within this study, the choice of the prior parameter probability distributions from which potential model configurations are drawn and tested against observed data was also investigated. Analysis of both synthetic and field data collected at the Eggborough (UK) site indicates that the geophysical data alone contain valuable information regarding the VGM parameters. However, significantly better results are obtained when these data are combined with a realistic, informative prior. A subsequent study explore in detail the dynamic infiltration case, specifically to what extent time-lapse ZOP GPR data, collected during a forced infiltration experiment at the Arrenaes field site (Denmark), can help to quantify VGM parameters and their uncertainties using the MCMC inversion strategy. The findings indicate that the stochastic inversion of time-lapse GPR data does indeed allow for a substantial refinement in the inferred posterior VGM parameter distributions. In turn, this significantly improves knowledge of the hydraulic properties, which are required to predict hydraulic behaviour. Finally, another aspect that needed to be addressed involved the comparison of time-lapse GPR data collected under different infiltration conditions (i.e., natural loading and forced infiltration conditions) to estimate the VGM parameters using the MCMC inversion strategy. The results show that for the synthetic example, considering data collected during a forced infiltration test helps to better refine soil hydraulic properties compared to data collected under natural infiltration conditions. When investigating data collected at the Arrenaes field site, further complications arised due to model error and showed the importance of also including a rigorous analysis of the propagation of model error with time and depth when considering time-lapse data. Although the efforts in this thesis were focused on GPR data, the corresponding findings are likely to have general applicability to other types of geophysical data and field environments. Moreover, the obtained results allow to have confidence for future developments in integration of geophysical data with stochastic inversions to improve the characterization of the unsaturated zone but also reveal important issues linked with stochastic inversions, namely model errors, that should definitely be addressed in future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time-lapse geophysical measurements are widely used to monitor the movement of water and solutes through the subsurface. Yet commonly used deterministic least squares inversions typically suffer from relatively poor mass recovery, spread overestimation, and limited ability to appropriately estimate nonlinear model uncertainty. We describe herein a novel inversion methodology designed to reconstruct the three-dimensional distribution of a tracer anomaly from geophysical data and provide consistent uncertainty estimates using Markov chain Monte Carlo simulation. Posterior sampling is made tractable by using a lower-dimensional model space related both to the Legendre moments of the plume and to predefined morphological constraints. Benchmark results using cross-hole ground-penetrating radar travel times measurements during two synthetic water tracer application experiments involving increasingly complex plume geometries show that the proposed method not only conserves mass but also provides better estimates of plume morphology and posterior model uncertainty than deterministic inversion results.