897 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo
Resumo:
The folding mechanism of a 125-bead heteropolymer model for proteins is investigated with Monte Carlo simulations on a cubic lattice. Sequences that do and do not fold in a reasonable time are compared. The overall folding behavior is found to be more complex than that of models for smaller proteins. Folding begins with a rapid collapse followed by a slow search through the semi-compact globule for a sequence-dependent stable core with about 30 out of 176 native contacts which serves as the transition state for folding to a near-native structure. Efficient search for the core is dependent on structural features of the native state. Sequences that fold have large amounts of stable, cooperative structure that is accessible through short-range initiation sites, such as those in anti-parallel sheets connected by turns. Before folding is completed, the system can encounter a second bottleneck, involving the condensation and rearrangement of surface residues. Overly stable local structure of the surface residues slows this stage of the folding process. The relation of the results from the 125-mer model studies to the folding of real proteins is discussed.
Ultra-fast excited state dynamics in green fluorescent protein: multiple states and proton transfer.
Resumo:
The green fluorescent protein (GFP) of the jellyfish Aequorea Victoria has attracted widespread interest since the discovery that its chromophore is generated by the autocatalytic, posttranslational cyclization and oxidation of a hexapeptide unit. This permits fusion of the DNA sequence of GFP with that of any protein whose expression or transport can then be readily monitored by sensitive fluorescence methods without the need to add exogenous fluorescent dyes. The excited state dynamics of GFP were studied following photo-excitation of each of its two strong absorption bands in the visible using fluorescence upconversion spectroscopy (about 100 fs time resolution). It is shown that excitation of the higher energy feature leads very rapidly to a form of the lower energy species, and that the excited state interconversion rate can be markedly slowed by replacing exchangeable protons with deuterons. This observation and others lead to a model in which the two visible absorption bands correspond to GFP in two ground-state conformations. These conformations can be slowly interconverted in the ground state, but the process is much faster in the excited state. The observed isotope effect suggests that the initial excited state process involves a proton transfer reaction that is followed by additional structural changes. These observations may help to rationalize and motivate mutations that alter the absorption properties and improve the photo stability of GFP.
Resumo:
An experimental strategy to facilitate correction of single-base mutations of episomal targets in mammalian cells has been developed. The method utilizes a chimeric oligonucleotide composed of a contiguous stretch of RNA and DNA residues in a duplex conformation with double hairpin caps on the ends. The RNA/DNA sequence is designed to align with the sequence of the mutant locus and to contain the desired nucleotide change. Activity of the chimeric molecule in targeted correction was tested in a model system in which the aim was to correct a point mutation in the gene encoding the human liver/bone/kidney alkaline phosphatase. When the chimeric molecule was introduced into cells containing the mutant gene on an extrachromosomal plasmid, correction of the point mutation was accomplished with a frequency approaching 30%. These results extend the usefulness of the oligonucleotide-based gene targeting approaches by increasing specific targeting frequency. This strategy should enable the design of antiviral agents.
Resumo:
Parallel recordings of spike trains of several single cortical neurons in behaving monkeys were analyzed as a hidden Markov process. The parallel spike trains were considered as a multivariate Poisson process whose vector firing rates change with time. As a consequence of this approach, the complete recording can be segmented into a sequence of a few statistically discriminated hidden states, whose dynamics are modeled as a first-order Markov chain. The biological validity and benefits of this approach were examined in several independent ways: (i) the statistical consistency of the segmentation and its correspondence to the behavior of the animals; (ii) direct measurement of the collective flips of activity, obtained by the model; and (iii) the relation between the segmentation and the pair-wise short-term cross-correlations between the recorded spike trains. Comparison with surrogate data was also carried out for each of the above examinations to assure their significance. Our results indicated the existence of well-separated states of activity, within which the firing rates were approximately stationary. With our present data we could reliably discriminate six to eight such states. The transitions between states were fast and were associated with concomitant changes of firing rates of several neurons. Different behavioral modes and stimuli were consistently reflected by different states of neural activity. Moreover, the pair-wise correlations between neurons varied considerably between the different states, supporting the hypothesis that these distinct states were brought about by the cooperative action of many neurons.
Resumo:
Este trabalho apresenta um sistema neural modular, que processa separadamente informações de contexto espacial e temporal, para a tarefa de reprodução de sequências temporais. Para o desenvolvimento do sistema neural foram considerados redes neurais recorrentes, modelos estocásticos, sistemas neurais modulares e processamento de informações de contexto. Em seguida, foram estudados três modelos com abordagens distintas para aprendizagem de seqüências temporais: uma rede neural parcialmente recorrente, um exemplo de sistema neural modular e um modelo estocástico utilizando a teoria de modelos markovianos escondidos. Com base nos estudos e modelos apresentados, esta pesquisa propõe um sistema formado por dois módulos sucessivos distintos. Uma rede de propagação direta (módulo estimador de contexto espacial) realiza o processamento de contexto espacial identificando a seqüência a ser reproduzida e fornecendo um protótipo do contexto para o segundo módulo. Este é formado por uma rede parcialmente recorrente (módulo de reprodução de sequências temporais) para aprender as informações de contexto temporal e reproduzir em suas saídas a seqüência identificada pelo módulo anterior. Para a finalidade mencionada, este mestrado utiliza a distribuição de Gibbs na saída do módulo para contexto espacial de forma que este forneça probabilidades de contexto espacial, indicando o grau de certeza do módulo e possibilitando a utilização de procedimentos especiais para os casos de dúvida. O sistema neural foi testado em conjuntos contendo trajetórias abertas, fechadas, e com diferentes situações de ambigüidade e complexidade. Duas situações distintas foram avaliadas: (a) capacidade do sistema em reproduzir trajetórias a partir de pontos iniciais treinados; e (b) capacidade de generalização do sistema reproduzindo trajetórias considerando pontos iniciais ou finais em situações não treinadas. A situação (b) é um problema de difícil ) solução em redes neurais devido à falta de contexto temporal, essencial na reprodução de seqüências. Foram realizados experimentos comparando o desempenho do sistema modular proposto com o de uma rede parcialmente recorrente operando sozinha e um sistema modular neural (TOTEM). Os resultados sugerem que o sistema proposto apresentou uma capacidade de generalização significamente melhor, sem que houvesse uma deterioração na capacidade de reproduzir seqüências treinadas. Esses resultados foram obtidos em sistema mais simples que o TOTEM.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.
Resumo:
In this paper a Markov chain based analytical model is proposed to evaluate the slotted CSMA/CA algorithm specified in the MAC layer of IEEE 802.15.4 standard. The analytical model consists of two two-dimensional Markov chains, used to model the state transition of an 802.15.4 device, during the periods of a transmission and between two consecutive frame transmissions, respectively. By introducing the two Markov chains a small number of Markov states are required and the scalability of the analytical model is improved. The analytical model is used to investigate the impact of the CSMA/CA parameters, the number of contending devices, and the data frame size on the network performance in terms of throughput and energy efficiency. It is shown by simulations that the proposed analytical model can accurately predict the performance of slotted CSMA/CA algorithm for uplink, downlink and bi-direction traffic, with both acknowledgement and non-acknowledgement modes.
Resumo:
IEEE 802.11 standard has achieved huge success in the past decade and is still under development to provide higher physical data rate and better quality of service (QoS). An important problem for the development and optimization of IEEE 802.11 networks is the modeling of the MAC layer channel access protocol. Although there are already many theoretic analysis for the 802.11 MAC protocol in the literature, most of the models focus on the saturated traffic and assume infinite buffer at the MAC layer. In this paper we develop a unified analytical model for IEEE 802.11 MAC protocol in ad hoc networks. The impacts of channel access parameters, traffic rate and buffer size at the MAC layer are modeled with the assistance of a generalized Markov chain and an M/G/1/K queue model. The performance of throughput, packet delivery delay and dropping probability can be achieved. Extensive simulations show the analytical model is highly accurate. From the analytical model it is shown that for practical buffer configuration (e.g. buffer size larger than one), we can maximize the total throughput and reduce the packet blocking probability (due to limited buffer size) and the average queuing delay to zero by effectively controlling the offered load. The average MAC layer service delay as well as its standard deviation, is also much lower than that in saturated conditions and has an upper bound. It is also observed that the optimal load is very close to the maximum achievable throughput regardless of the number of stations or buffer size. Moreover, the model is scalable for performance analysis of 802.11e in unsaturated conditions and 802.11 ad hoc networks with heterogenous traffic flows. © 2012 KSI.
Resumo:
Protein-DNA interactions are an essential feature in the genetic activities of life, and the ability to predict and manipulate such interactions has applications in a wide range of fields. This Thesis presents the methods of modelling the properties of protein-DNA interactions. In particular, it investigates the methods of visualising and predicting the specificity of DNA-binding Cys2His2 zinc finger interaction. The Cys2His2 zinc finger proteins interact via their individual fingers to base pair subsites on the target DNA. Four key residue positions on the a- helix of the zinc fingers make non-covalent interactions with the DNA with sequence specificity. Mutating these key residues generates combinatorial possibilities that could potentially bind to any DNA segment of interest. Many attempts have been made to predict the binding interaction using structural and chemical information, but with only limited success. The most important contribution of the thesis is that the developed model allows for the binding properties of a given protein-DNA binding to be visualised in relation to other protein-DNA combinations without having to explicitly physically model the specific protein molecule and specific DNA sequence. To prove this, various databases were generated, including a synthetic database which includes all possible combinations of the DNA-binding Cys2His2 zinc finger interactions. NeuroScale, a topographic visualisation technique, is exploited to represent the geometric structures of the protein-DNA interactions by measuring dissimilarity between the data points. In order to verify the effect of visualisation on understanding the binding properties of the DNA-binding Cys2His2 zinc finger interaction, various prediction models are constructed by using both the high dimensional original data and the represented data in low dimensional feature space. Finally, novel data sets are studied through the selected visualisation models based on the experimental DNA-zinc finger protein database. The result of the NeuroScale projection shows that different dissimilarity representations give distinctive structural groupings, but clustering in biologically-interesting ways. This method can be used to forecast the physiochemical properties of the novel proteins which may be beneficial for therapeutic purposes involving genome targeting in general.
Detecting Precipitation Climate Changes: An Approach Based on a Stochastic Daily Precipitation Model
Resumo:
2002 Mathematics Subject Classification: 62M10.
Resumo:
In this work, we present our understanding about the article of Aksoy [1], which uses Markov chains to model the flow of intermittent rivers. Then, we executed an application of his model in order to generate data for intermittent streamflows, based on a data set of Brazilian streams. After that, we build a hidden Markov model as a proposed new approach to the problem of simulation of such flows. We used the Gamma distribution to simulate the increases and decreases in river flows, along with a two-state Markov chain. The motivation for us to use a hidden Markov model comes from the possibility of obtaining the same information that the Aksoy’s model provides, but using a single tool capable of treating the problem as a whole, and not through multiple independent processes
Resumo:
This research explores Bayesian updating as a tool for estimating parameters probabilistically by dynamic analysis of data sequences. Two distinct Bayesian updating methodologies are assessed. The first approach focuses on Bayesian updating of failure rates for primary events in fault trees. A Poisson Exponentially Moving Average (PEWMA) model is implemnented to carry out Bayesian updating of failure rates for individual primary events in the fault tree. To provide a basis for testing of the PEWMA model, a fault tree is developed based on the Texas City Refinery incident which occurred in 2005. A qualitative fault tree analysis is then carried out to obtain a logical expression for the top event. A dynamic Fault Tree analysis is carried out by evaluating the top event probability at each Bayesian updating step by Monte Carlo sampling from posterior failure rate distributions. It is demonstrated that PEWMA modeling is advantageous over conventional conjugate Poisson-Gamma updating techniques when failure data is collected over long time spans. The second approach focuses on Bayesian updating of parameters in non-linear forward models. Specifically, the technique is applied to the hydrocarbon material balance equation. In order to test the accuracy of the implemented Bayesian updating models, a synthetic data set is developed using the Eclipse reservoir simulator. Both structured grid and MCMC sampling based solution techniques are implemented and are shown to model the synthetic data set with good accuracy. Furthermore, a graphical analysis shows that the implemented MCMC model displays good convergence properties. A case study demonstrates that Likelihood variance affects the rate at which the posterior assimilates information from the measured data sequence. Error in the measured data significantly affects the accuracy of the posterior parameter distributions. Increasing the likelihood variance mitigates random measurement errors, but casuses the overall variance of the posterior to increase. Bayesian updating is shown to be advantageous over deterministic regression techniques as it allows for incorporation of prior belief and full modeling uncertainty over the parameter ranges. As such, the Bayesian approach to estimation of parameters in the material balance equation shows utility for incorporation into reservoir engineering workflows.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.