876 resultados para Modeling Rapport Using Hidden Markov Models
Resumo:
TIGRFAMs is a collection of protein families featuring curated multiple sequence alignments, hidden Markov models and associated information designed to support the automated functional identification of proteins by sequence homology. We introduce the term ‘equivalog’ to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types. TIGRFAMs currently contains over 800 protein families, available for searching or downloading at www.tigr.org/TIGRFAMs. Classification by equivalog family, where achievable, complements classification by orthology, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large-scale genome sequencing projects.
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
Este trabalho apresenta um sistema neural modular, que processa separadamente informações de contexto espacial e temporal, para a tarefa de reprodução de sequências temporais. Para o desenvolvimento do sistema neural foram considerados redes neurais recorrentes, modelos estocásticos, sistemas neurais modulares e processamento de informações de contexto. Em seguida, foram estudados três modelos com abordagens distintas para aprendizagem de seqüências temporais: uma rede neural parcialmente recorrente, um exemplo de sistema neural modular e um modelo estocástico utilizando a teoria de modelos markovianos escondidos. Com base nos estudos e modelos apresentados, esta pesquisa propõe um sistema formado por dois módulos sucessivos distintos. Uma rede de propagação direta (módulo estimador de contexto espacial) realiza o processamento de contexto espacial identificando a seqüência a ser reproduzida e fornecendo um protótipo do contexto para o segundo módulo. Este é formado por uma rede parcialmente recorrente (módulo de reprodução de sequências temporais) para aprender as informações de contexto temporal e reproduzir em suas saídas a seqüência identificada pelo módulo anterior. Para a finalidade mencionada, este mestrado utiliza a distribuição de Gibbs na saída do módulo para contexto espacial de forma que este forneça probabilidades de contexto espacial, indicando o grau de certeza do módulo e possibilitando a utilização de procedimentos especiais para os casos de dúvida. O sistema neural foi testado em conjuntos contendo trajetórias abertas, fechadas, e com diferentes situações de ambigüidade e complexidade. Duas situações distintas foram avaliadas: (a) capacidade do sistema em reproduzir trajetórias a partir de pontos iniciais treinados; e (b) capacidade de generalização do sistema reproduzindo trajetórias considerando pontos iniciais ou finais em situações não treinadas. A situação (b) é um problema de difícil ) solução em redes neurais devido à falta de contexto temporal, essencial na reprodução de seqüências. Foram realizados experimentos comparando o desempenho do sistema modular proposto com o de uma rede parcialmente recorrente operando sozinha e um sistema modular neural (TOTEM). Os resultados sugerem que o sistema proposto apresentou uma capacidade de generalização significamente melhor, sem que houvesse uma deterioração na capacidade de reproduzir seqüências treinadas. Esses resultados foram obtidos em sistema mais simples que o TOTEM.
Resumo:
Questions of "viability" evaluation of innovation projects are considered in this article. As a method of evaluation Hidden Markov Models are used. Problem of determining model parameters, which reproduce test data with highest accuracy are solving. For training the model statistical data on the implementation of innovative projects are used. Baum-Welch algorithm is used as a training algorithm.
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.
Resumo:
MULTIPRED is a web-based computational system for the prediction of peptide binding to multiple molecules ( proteins) belonging to human leukocyte antigens (HLA) class I A2, A3 and class II DR supertypes. It uses hidden Markov models and artificial neural network methods as predictive engines. A novel data representation method enables MULTIPRED to predict peptides that promiscuously bind multiple HLA alleles within one HLA supertype. Extensive testing was performed for validation of the prediction models. Testing results show that MULTIPRED is both sensitive and specific and it has good predictive ability ( area under the receiver operating characteristic curve A(ROC) > 0.80). MULTIPRED can be used for the mapping of promiscuous T-cell epitopes as well as the regions of high concentration of these targets termed T-cell epitope hotspots. MULTIPRED is available at http:// antigen.i2r.a-star.edu.sg/ multipred/.
Resumo:
This work introduces a Gaussian variational mean-field approximation for inference in dynamical systems which can be modeled by ordinary stochastic differential equations. This new approach allows one to express the variational free energy as a functional of the marginal moments of the approximating Gaussian process. A restriction of the moment equations to piecewise polynomial functions, over time, dramatically reduces the complexity of approximate inference for stochastic differential equation models and makes it comparable to that of discrete time hidden Markov models. The algorithm is demonstrated on state and parameter estimation for nonlinear problems with up to 1000 dimensional state vectors and compares the results empirically with various well-known inference methodologies.
Resumo:
Il riconoscimento delle gesture è un tema di ricerca che sta acquisendo sempre più popolarità, specialmente negli ultimi anni, grazie ai progressi tecnologici dei dispositivi embedded e dei sensori. Lo scopo di questa tesi è quello di utilizzare alcune tecniche di machine learning per realizzare un sistema in grado di riconoscere e classificare in tempo reale i gesti delle mani, a partire dai segnali mioelettrici (EMG) prodotti dai muscoli. Inoltre, per consentire il riconoscimento di movimenti spaziali complessi, verranno elaborati anche segnali di tipo inerziale, provenienti da una Inertial Measurement Unit (IMU) provvista di accelerometro, giroscopio e magnetometro. La prima parte della tesi, oltre ad offrire una panoramica sui dispositivi wearable e sui sensori, si occuperà di analizzare alcune tecniche per la classificazione di sequenze temporali, evidenziandone vantaggi e svantaggi. In particolare, verranno considerati approcci basati su Dynamic Time Warping (DTW), Hidden Markov Models (HMM), e reti neurali ricorrenti (RNN) di tipo Long Short-Term Memory (LSTM), che rappresentano una delle ultime evoluzioni nel campo del deep learning. La seconda parte, invece, riguarderà il progetto vero e proprio. Verrà impiegato il dispositivo wearable Myo di Thalmic Labs come caso di studio, e saranno applicate nel dettaglio le tecniche basate su DTW e HMM per progettare e realizzare un framework in grado di eseguire il riconoscimento real-time di gesture. Il capitolo finale mostrerà i risultati ottenuti (fornendo anche un confronto tra le tecniche analizzate), sia per la classificazione di gesture isolate che per il riconoscimento in tempo reale.
Resumo:
The study of acoustic communication in animals often requires not only the recognition of species specific acoustic signals but also the identification of individual subjects, all in a complex acoustic background. Moreover, when very long recordings are to be analyzed, automatic recognition and identification processes are invaluable tools to extract the relevant biological information. A pattern recognition methodology based on hidden Markov models is presented inspired by successful results obtained in the most widely known and complex acoustical communication signal: human speech. This methodology was applied here for the first time to the detection and recognition of fish acoustic signals, specifically in a stream of round-the-clock recordings of Lusitanian toadfish (Halobatrachus didactylus) in their natural estuarine habitat. The results show that this methodology is able not only to detect the mating sounds (boatwhistles) but also to identify individual male toadfish, reaching an identification rate of ca. 95%. Moreover this method also proved to be a powerful tool to assess signal durations in large data sets. However, the system failed in recognizing other sound types.
Resumo:
The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
A potentially renewable and sustainable source of energy is the chemical energy associated with solvation of salts. Mixing of two aqueous streams with different saline concentrations is spontaneous and releases energy. The global theoretically obtainable power from salinity gradient energy due to World’s rivers discharge into the oceans has been estimated to be within the range of 1.4-2.6 TW. Reverse electrodialysis (RED) is one of the emerging, membrane-based, technologies for harvesting the salinity gradient energy. A common RED stack is composed by alternately-arranged cation- and anion-exchange membranes, stacked between two electrodes. The compartments between the membranes are alternately fed with concentrated (e.g., sea water) and dilute (e.g., river water) saline solutions. Migration of the respective counter-ions through the membranes leads to ionic current between the electrodes, where an appropriate redox pair converts the chemical salinity gradient energy into electrical energy. Given the importance of the need for new sources of energy for power generation, the present study aims at better understanding and solving current challenges, associated with the RED stack design, fluid dynamics, ionic mass transfer and long-term RED stack performance with natural saline solutions as feedwaters. Chronopotentiometry was used to determinate diffusion boundary layer (DBL) thickness from diffusion relaxation data and the flow entrance effects on mass transfer were found to avail a power generation increase in RED stacks. Increasing the linear flow velocity also leads to a decrease of DBL thickness but on the cost of a higher pressure drop. Pressure drop inside RED stacks was successfully simulated by the developed mathematical model, in which contribution of several pressure drops, that until now have not been considered, was included. The effect of each pressure drop on the RED stack performance was identified and rationalized and guidelines for planning and/or optimization of RED stacks were derived. The design of new profiled membranes, with a chevron corrugation structure, was proposed using computational fluid dynamics (CFD) modeling. The performance of the suggested corrugation geometry was compared with the already existing ones, as well as with the use of conductive and non-conductive spacers. According to the estimations, use of chevron structures grants the highest net power density values, at the best compromise between the mass transfer coefficient and the pressure drop values. Finally, long-term experiments with natural waters were performed, during which fouling was experienced. For the first time, 2D fluorescence spectroscopy was used to monitor RED stack performance, with a dedicated focus on following fouling on ion-exchange membrane surfaces. To extract relevant information from fluorescence spectra, parallel factor analysis (PARAFAC) was performed. Moreover, the information obtained was then used to predict net power density, stack electric resistance and pressure drop by multivariate statistical models based on projection to latent structures (PLS) modeling. The use in such models of 2D fluorescence data, containing hidden, but extractable by PARAFAC, information about fouling on membrane surfaces, considerably improved the models fitting to the experimental data.
Resumo:
This work proposes an original contribution to the understanding of shermen spatial behavior, based on the behavioral ecology and movement ecology paradigms. Through the analysis of Vessel Monitoring System (VMS) data, we characterized the spatial behavior of Peruvian anchovy shermen at di erent scales: (1) the behavioral modes within shing trips (i.e., searching, shing and cruising); (2) the behavioral patterns among shing trips; (3) the behavioral patterns by shing season conditioned by ecosystem scenarios; and (4) the computation of maps of anchovy presence proxy from the spatial patterns of behavioral mode positions. At the rst scale considered, we compared several Markovian (hidden Markov and semi-Markov models) and discriminative models (random forests, support vector machines and arti cial neural networks) for inferring the behavioral modes associated with VMS tracks. The models were trained under a supervised setting and validated using tracks for which behavioral modes were known (from on-board observers records). Hidden semi-Markov models performed better, and were retained for inferring the behavioral modes on the entire VMS dataset. At the second scale considered, each shing trip was characterized by several features, including the time spent within each behavioral mode. Using a clustering analysis, shing trip patterns were classi ed into groups associated to management zones, eet segments and skippers' personalities. At the third scale considered, we analyzed how ecological conditions shaped shermen behavior. By means of co-inertia analyses, we found signi cant associations between shermen, anchovy and environmental spatial dynamics, and shermen behavioral responses were characterized according to contrasted environmental scenarios. At the fourth scale considered, we investigated whether the spatial behavior of shermen re ected to some extent the spatial distribution of anchovy. Finally, this work provides a wider view of shermen behavior: shermen are not only economic agents, but they are also foragers, constrained by ecosystem variability. To conclude, we discuss how these ndings may be of importance for sheries management, collective behavior analyses and end-to-end models.
Resumo:
This paper develops a general stochastic framework and an equilibrium asset pricing model that make clear how attitudes towards intertemporal substitution and risk matter for option pricing. In particular, we show under which statistical conditions option pricing formulas are not preference-free, in other words, when preferences are not hidden in the stock and bond prices as they are in the standard Black and Scholes (BS) or Hull and White (HW) pricing formulas. The dependence of option prices on preference parameters comes from several instantaneous causality effects such as the so-called leverage effect. We also emphasize that the most standard asset pricing models (CAPM for the stock and BS or HW preference-free option pricing) are valid under the same stochastic setting (typically the absence of leverage effect), regardless of preference parameter values. Even though we propose a general non-preference-free option pricing formula, we always keep in mind that the BS formula is dominant both as a theoretical reference model and as a tool for practitioners. Another contribution of the paper is to characterize why the BS formula is such a benchmark. We show that, as soon as we are ready to accept a basic property of option prices, namely their homogeneity of degree one with respect to the pair formed by the underlying stock price and the strike price, the necessary statistical hypotheses for homogeneity provide BS-shaped option prices in equilibrium. This BS-shaped option-pricing formula allows us to derive interesting characterizations of the volatility smile, that is, the pattern of BS implicit volatilities as a function of the option moneyness. First, the asymmetry of the smile is shown to be equivalent to a particular form of asymmetry of the equivalent martingale measure. Second, this asymmetry appears precisely when there is either a premium on an instantaneous interest rate risk or on a generalized leverage effect or both, in other words, whenever the option pricing formula is not preference-free. Therefore, the main conclusion of our analysis for practitioners should be that an asymmetric smile is indicative of the relevance of preference parameters to price options.
Resumo:
This paper proposes a new reconstruction method for diffuse optical tomography using reduced-order models of light transport in tissue. The models, which directly map optical tissue parameters to optical flux measurements at the detector locations, are derived based on data generated by numerical simulation of a reference model. The reconstruction algorithm based on the reduced-order models is a few orders of magnitude faster than the one based on a finite element approximation on a fine mesh incorporating a priori anatomical information acquired by magnetic resonance imaging. We demonstrate the accuracy and speed of the approach using a phantom experiment and through numerical simulation of brain activation in a rat's head. The applicability of the approach for real-time monitoring of brain hemodynamics is demonstrated through a hypercapnic experiment. We show that our results agree with the expected physiological changes and with results of a similar experimental study. However, by using our approach, a three-dimensional tomographic reconstruction can be performed in ∼3 s per time point instead of the 1 to 2 h it takes when using the conventional finite element modeling approach