906 resultados para Synchronous hidden Markov models
Resumo:
Context tree models have been introduced by Rissanen in [25] as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper investigates non-asymptotic properties of two popular procedures of context tree estimation: Rissanen's algorithm Context and penalized maximum likelihood. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory conditions are required: the proof relies on new deviation inequalities for empirical probabilities of independent interest. The under-estimation properties rely on classical hypotheses for processes of infinite memory. These results improve on and generalize the bounds obtained in Duarte et al. (2006) [12], Galves et al. (2008) [18], Galves and Leonardi (2008) [17], Leonardi (2010) [22], refining asymptotic results of Buhlmann and Wyner (1999) [4] and Csiszar and Talata (2006) [9]. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
When building genetic maps, it is necessary to choose from several marker ordering algorithms and criteria, and the choice is not always simple. In this study, we evaluate the efficiency of algorithms try (TRY), seriation (SER), rapid chain delineation (RCD), recombination counting and ordering (RECORD) and unidirectional growth (UG), as well as the criteria PARF (product of adjacent recombination fractions), SARF (sum of adjacent recombination fractions), SALOD (sum of adjacent LOD scores) and LHMC (likelihood through hidden Markov chains), used with the RIPPLE algorithm for error verification, in the construction of genetic linkage maps. A linkage map of a hypothetical diploid and monoecious plant species was simulated containing one linkage group and 21 markers with fixed distance of 3 cM between them. In all, 700 F(2) populations were randomly simulated with and 400 individuals with different combinations of dominant and co-dominant markers, as well as 10 and 20% of missing data. The simulations showed that, in the presence of co-dominant markers only, any combination of algorithm and criteria may be used, even for a reduced population size. In the case of a smaller proportion of dominant markers, any of the algorithms and criteria (except SALOD) investigated may be used. In the presence of high proportions of dominant markers and smaller samples (around 100), the probability of repulsion linkage increases between them and, in this case, use of the algorithms TRY and SER associated to RIPPLE with criterion LHMC would provide better results. Heredity (2009) 103, 494-502; doi:10.1038/hdy.2009.96; published online 29 July 2009
Resumo:
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Resumo:
Surrogate methods for detecting lateral gene transfer are those that do not require inference of phylogenetic trees. Herein I apply four such methods to identify open reading frames (ORFs) in the genome of Escherichia coli K12 that may have arisen by lateral gene transfer. Only two of these methods detect the same ORFs more frequently than expected by chance, whereas several intersections contain many fewer ORFs than expected. Each of the four methods detects a different non-random set of ORFs. The methods may detect lateral ORFs of different relative ages; testing this hypothesis will require rigorous inference of trees. (C) 2001 Federation of European Microbiological Societies. Published by Elsevier Science BN. All rights reserved.
Resumo:
This paper uses an infinite hidden Markov model (IIHMM) to analyze U.S. inflation dynamics with a particular focus on the persistence of inflation. The IHMM is a Bayesian nonparametric approach to modeling structural breaks. It allows for an unknown number of breakpoints and is a flexible and attractive alternative to existing methods. We found a clear structural break during the recent financial crisis. Prior to that, inflation persistence was high and fairly constant.
Resumo:
We propose and validate a multivariate classification algorithm for characterizing changes in human intracranial electroencephalographic data (iEEG) after learning motor sequences. The algorithm is based on a Hidden Markov Model (HMM) that captures spatio-temporal properties of the iEEG at the level of single trials. Continuous intracranial iEEG was acquired during two sessions (one before and one after a night of sleep) in two patients with depth electrodes implanted in several brain areas. They performed a visuomotor sequence (serial reaction time task, SRTT) using the fingers of their non-dominant hand. Our results show that the decoding algorithm correctly classified single iEEG trials from the trained sequence as belonging to either the initial training phase (day 1, before sleep) or a later consolidated phase (day 2, after sleep), whereas it failed to do so for trials belonging to a control condition (pseudo-random sequence). Accurate single-trial classification was achieved by taking advantage of the distributed pattern of neural activity. However, across all the contacts the hippocampus contributed most significantly to the classification accuracy for both patients, and one fronto-striatal contact for one patient. Together, these human intracranial findings demonstrate that a multivariate decoding approach can detect learning-related changes at the level of single-trial iEEG. Because it allows an unbiased identification of brain sites contributing to a behavioral effect (or experimental condition) at the level of single subject, this approach could be usefully applied to assess the neural correlates of other complex cognitive functions in patients implanted with multiple electrodes.
Resumo:
We present MBIS (Multivariate Bayesian Image Segmentation tool), a clustering tool based on the mixture of multivariate normal distributions model. MBIS supports multichannel bias field correction based on a B-spline model. A second methodological novelty is the inclusion of graph-cuts optimization for the stationary anisotropic hidden Markov random field model. Along with MBIS, we release an evaluation framework that contains three different experiments on multi-site data. We first validate the accuracy of segmentation and the estimated bias field for each channel. MBIS outperforms a widely used segmentation tool in a cross-comparison evaluation. The second experiment demonstrates the robustness of results on atlas-free segmentation of two image sets from scan-rescan protocols on 21 healthy subjects. Multivariate segmentation is more replicable than the monospectral counterpart on T1-weighted images. Finally, we provide a third experiment to illustrate how MBIS can be used in a large-scale study of tissue volume change with increasing age in 584 healthy subjects. This last result is meaningful as multivariate segmentation performs robustly without the need for prior knowledge.
Resumo:
Although sources in general nonlinear mixturm arc not separable iising only statistical independence, a special and realistic case of nonlinear mixtnres, the post nonlinear (PNL) mixture is separable choosing a suited separating system. Then, a natural approach is based on the estimation of tho separating Bystem parameters by minimizing an indcpendence criterion, like estimated mwce mutual information. This class of methods requires higher (than 2) order statistics, and cannot separate Gaarsian sources. However, use of [weak) prior, like source temporal correlation or nonstationarity, leads to other source separation Jgw rithms, which are able to separate Gaussian sourra, and can even, for a few of them, works with second-order statistics. Recently, modeling time correlated s011rces by Markov models, we propose vcry efficient algorithms hmed on minimization of the conditional mutual information. Currently, using the prior of temporally correlated sources, we investigate the fesihility of inverting PNL mixtures with non-bijectiw non-liacarities, like quadratic functions. In this paper, we review the main ICA and BSS results for riunlinear mixtures, present PNL models and algorithms, and finish with advanced resutts using temporally correlated snu~sm
Resumo:
We propose a deep study on tissue modelization andclassification Techniques on T1-weighted MR images. Threeapproaches have been taken into account to perform thisvalidation study. Two of them are based on FiniteGaussian Mixture (FGM) model. The first one consists onlyin pure gaussian distributions (FGM-EM). The second oneuses a different model for partial volume (PV) (FGM-GA).The third one is based on a Hidden Markov Random Field(HMRF) model. All methods have been tested on a DigitalBrain Phantom image considered as the ground truth. Noiseand intensity non-uniformities have been added tosimulate real image conditions. Also the effect of ananisotropic filter is considered. Results demonstratethat methods relying in both intensity and spatialinformation are in general more robust to noise andinhomogeneities. However, in some cases there is nosignificant differences between all presented methods.
Resumo:
BACKGROUND: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. RESULTS: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. CONCLUSIONS: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.
Resumo:
This thesis considers optimization problems arising in printed circuit board assembly. Especially, the case in which the electronic components of a single circuit board are placed using a single placement machine is studied. Although there is a large number of different placement machines, the use of collect-and-place -type gantry machines is discussed because of their flexibility and increasing popularity in the industry. Instead of solving the entire control optimization problem of a collect-andplace machine with a single application, the problem is divided into multiple subproblems because of its hard combinatorial nature. This dividing technique is called hierarchical decomposition. All the subproblems of the one PCB - one machine -context are described, classified and reviewed. The derived subproblems are then either solved with exact methods or new heuristic algorithms are developed and applied. The exact methods include, for example, a greedy algorithm and a solution based on dynamic programming. Some of the proposed heuristics contain constructive parts while others utilize local search or are based on frequency calculations. For the heuristics, it is made sure with comprehensive experimental tests that they are applicable and feasible. A number of quality functions will be proposed for evaluation and applied to the subproblems. In the experimental tests, artificially generated data from Markov-models and data from real-world PCB production are used. The thesis consists of an introduction and of five publications where the developed and used solution methods are described in their full detail. For all the problems stated in this thesis, the methods proposed are efficient enough to be used in the PCB assembly production in practice and are readily applicable in the PCB manufacturing industry.
Resumo:
Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal
Resumo:
Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.
Resumo:
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.