873 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
As the requirements for health care hospitalization have become more demanding, so has the discharge planning process become a more important part of the health services system. A thorough understanding of hospital discharge planning can, then, contribute to our understanding of the health services system. This study involved the development of a process model of discharge planning from hospitals. Model building involved the identification of factors used by discharge planners to develop aftercare plans, and the specification of the roles of these factors in the development of the discharge plan. The factors in the model were concatenated in 16 discrete decision sequences, each of which produced an aftercare plan.^ The sample for this study comprised 407 inpatients admitted to the M. D. Anderson Hospital and Tumor Institution at Houston, Texas, who were discharged to any site within Texas during a 15 day period. Allogeneic bone marrow donors were excluded from the sample. The factors considered in the development of discharge plans were recorded by discharge planners and were used to develop the model. Data analysis consisted of sorting the discharge plans using the plan development factors until for some combination and sequence of factors all patients were discharged to a single site. The arrangement of factors that led to that aftercare plan became a decision sequence in the model.^ The model constructs the same discharge plans as those developed by hospital staff for every patient in the study. Tests of the validity of the model should be extended to other patients at the MDAH, to other cancer hospitals, and to other inpatient services. Revisions of the model based on these tests should be of value in the management of discharge planning services and in the design and development of comprehensive community health services.^
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^
Resumo:
Nuclear matrix binding assays (NMBAs) define certain DNA sequences as matrix attachment regions (MARs), which often have cis-acting epigenetic regulatory functions. We used NMBAs to analyze the functionally important 15q11-q13 imprinting center (IC). We find that the IC is composed of an unusually high density of MARs, located in close proximity to the germ line elements that are proposed to direct imprint switching in this region. Moreover, we find that the organization of MARs is the same at the homologous mouse locus, despite extensive divergence of DNA sequence. MARs of this size are not usually associated with genes but rather with heterochromatin-forming areas of the genome. In contrast, the 15q11-q13 region contains multiple transcribed genes and is unusual for being subject to genomic imprinting, causing the maternal chromosome to be more transcriptionally silent, methylated, and late replicating than the paternal chromosome. We suggest that the extensive MAR sequences at the IC are organized as heterochromatin during oogenesis, an organization disrupted during spermatogenesis. Consistent with this model, multicolor fluorescence in situ hybridization to halo nuclei demonstrates a strong matrix association of the maternal IC, whereas the paternal IC is more decondensed, extending into the nuclear halo. This model also provides a mechanism for spreading of the imprinting signal, because heterochromatin at the IC on the maternal chromosome may exert a suppressive position effect in cis. We propose that the germ line elements at the 15q11-q13 IC mediate their effects through the candidate heterochromatin-forming DNA identified in this study.
Resumo:
The transcription factors nuclear factor of activated T cells (NFAT) and activator protein 1 (AP-1) coordinately regulate cytokine gene expression in activated T-cells by binding to closely juxtaposed sites in cytokine promoters. The structural basis for cooperative binding of NFAT and AP-1 to these sites, and indeed for the cooperative binding of transcription factors to composite regulatory elements in general, is not well understood. Mutagenesis studies have identified a segment of AP-1, which lies at the junction of its DNA-binding and dimerization domains (basic region and leucine zipper, respectively), as being essential for protein–protein interactions with NFAT in the ternary NFAT/AP-1/DNA complex. In a model of the ternary complex, the segment of NFAT nearest AP-1 is the Rel insert region (RIR), a feature that is notable for its hypervariability in size and in sequence amongst members of the Rel transcription factor family. Here we have used mutational analysis to study the role of the NFAT RIR in binding to DNA and AP-1. Parallel yeast one-hybrid screening assays in combination with alanine-scanning mutagenesis led to the identification of four amino acid residues in the RIR of NFAT2 (also known as NFATC1 or NFATc) that are essential for cooperativity with AP-1 (Ile-544, Glu-545, Thr-551, and Ile-553), and three residues that are involved in interactions with DNA (Lys-538, Arg-540, and Asn-541). These results were confirmed and extended through in vitro binding assays. We thus conclude that the NFAT RIR plays an essential dual role in DNA recognition and cooperative binding to AP-1 family transcription factors.
Resumo:
FokI is a member an unusual class of restriction enzymes that recognize a specific DNA sequence and cleave nonspecifically a short distance away from that sequence. FokI consists of an N-terminal DNA recognition domain and a C-terminal cleavage domain. The bipartite nature of FokI has led to the development of artificial enzymes with novel specificities. We have solved the structure of FokI to 2.3 Å resolution. The structure reveals a dimer, in which the dimerization interface is mediated by the cleavage domain. Each monomer has an overall conformation similar to that found in the FokI–DNA complex, with the cleavage domain packing alongside the DNA recognition domain. In corroboration with the cleavage data presented in the accompanying paper in this issue of Proceedings, we propose a model for FokI DNA cleavage that requires the dimerization of FokI on DNA to cleave both DNA strands.
Resumo:
We introduce a quantitative framework for assessing the generation of crossovers in DNA shuffling experiments. The approach uses free energy calculations and complete sequence information to model the annealing process. Statistics obtained for the annealing events then are combined with a reassembly algorithm to infer crossover allocation in the reassembled sequences. The fraction of reassembled sequences containing zero, one, two, or more crossovers and the probability that a given nucleotide position in a reassembled sequence is the site of a crossover event are estimated. Comparisons of the predictions against experimental data for five example systems demonstrate good agreement despite the fact that no adjustable parameters are used. An in silico case study of a set of 12 subtilases examines the effect of fragmentation length, annealing temperature, sequence identity and number of shuffled sequences on the number, type, and distribution of crossovers. A computational verification of crossover aggregation in regions of near-perfect sequence identity and the presence of synergistic reassembly in family DNA shuffling is obtained.
Resumo:
Yeast co-expressing rat APOBEC-1 and a fragment of human apolipoprotein B (apoB) mRNA assembled functional editosomes and deaminated C6666 to U in a mooring sequence-dependent fashion. The occurrence of APOBEC-1-complementing proteins suggested a naturally occurring mRNA editing mechanism in yeast. Previously, a hidden Markov model identified seven yeast genes encoding proteins possessing putative zinc-dependent deaminase motifs. Here, only CDD1, a cytidine deaminase, is shown to have the capacity to carry out C→U editing on a reporter mRNA. This is only the second report of a cytidine deaminase that can use mRNA as a substrate. CDD1-dependent editing was growth phase regulated and demonstrated mooring sequence-dependent editing activity. Candidate yeast mRNA substrates were identified based on their homology with the mooring sequence-containing tripartite motif at the editing site of apoB mRNA and their ability to be edited by ectopically expressed APOBEC-1. Naturally occurring yeast mRNAs edited to a significant extent by CDD1 were, however, not detected. We propose that CDD1 be designated an orphan C→U editase until its native RNA substrate, if any, can be identified and that it be added to the CDAR (cytidine deaminase acting on RNA) family of editing enzymes.
Resumo:
The folding mechanism of a 125-bead heteropolymer model for proteins is investigated with Monte Carlo simulations on a cubic lattice. Sequences that do and do not fold in a reasonable time are compared. The overall folding behavior is found to be more complex than that of models for smaller proteins. Folding begins with a rapid collapse followed by a slow search through the semi-compact globule for a sequence-dependent stable core with about 30 out of 176 native contacts which serves as the transition state for folding to a near-native structure. Efficient search for the core is dependent on structural features of the native state. Sequences that fold have large amounts of stable, cooperative structure that is accessible through short-range initiation sites, such as those in anti-parallel sheets connected by turns. Before folding is completed, the system can encounter a second bottleneck, involving the condensation and rearrangement of surface residues. Overly stable local structure of the surface residues slows this stage of the folding process. The relation of the results from the 125-mer model studies to the folding of real proteins is discussed.
Ultra-fast excited state dynamics in green fluorescent protein: multiple states and proton transfer.
Resumo:
The green fluorescent protein (GFP) of the jellyfish Aequorea Victoria has attracted widespread interest since the discovery that its chromophore is generated by the autocatalytic, posttranslational cyclization and oxidation of a hexapeptide unit. This permits fusion of the DNA sequence of GFP with that of any protein whose expression or transport can then be readily monitored by sensitive fluorescence methods without the need to add exogenous fluorescent dyes. The excited state dynamics of GFP were studied following photo-excitation of each of its two strong absorption bands in the visible using fluorescence upconversion spectroscopy (about 100 fs time resolution). It is shown that excitation of the higher energy feature leads very rapidly to a form of the lower energy species, and that the excited state interconversion rate can be markedly slowed by replacing exchangeable protons with deuterons. This observation and others lead to a model in which the two visible absorption bands correspond to GFP in two ground-state conformations. These conformations can be slowly interconverted in the ground state, but the process is much faster in the excited state. The observed isotope effect suggests that the initial excited state process involves a proton transfer reaction that is followed by additional structural changes. These observations may help to rationalize and motivate mutations that alter the absorption properties and improve the photo stability of GFP.
Resumo:
An experimental strategy to facilitate correction of single-base mutations of episomal targets in mammalian cells has been developed. The method utilizes a chimeric oligonucleotide composed of a contiguous stretch of RNA and DNA residues in a duplex conformation with double hairpin caps on the ends. The RNA/DNA sequence is designed to align with the sequence of the mutant locus and to contain the desired nucleotide change. Activity of the chimeric molecule in targeted correction was tested in a model system in which the aim was to correct a point mutation in the gene encoding the human liver/bone/kidney alkaline phosphatase. When the chimeric molecule was introduced into cells containing the mutant gene on an extrachromosomal plasmid, correction of the point mutation was accomplished with a frequency approaching 30%. These results extend the usefulness of the oligonucleotide-based gene targeting approaches by increasing specific targeting frequency. This strategy should enable the design of antiviral agents.
Resumo:
Parallel recordings of spike trains of several single cortical neurons in behaving monkeys were analyzed as a hidden Markov process. The parallel spike trains were considered as a multivariate Poisson process whose vector firing rates change with time. As a consequence of this approach, the complete recording can be segmented into a sequence of a few statistically discriminated hidden states, whose dynamics are modeled as a first-order Markov chain. The biological validity and benefits of this approach were examined in several independent ways: (i) the statistical consistency of the segmentation and its correspondence to the behavior of the animals; (ii) direct measurement of the collective flips of activity, obtained by the model; and (iii) the relation between the segmentation and the pair-wise short-term cross-correlations between the recorded spike trains. Comparison with surrogate data was also carried out for each of the above examinations to assure their significance. Our results indicated the existence of well-separated states of activity, within which the firing rates were approximately stationary. With our present data we could reliably discriminate six to eight such states. The transitions between states were fast and were associated with concomitant changes of firing rates of several neurons. Different behavioral modes and stimuli were consistently reflected by different states of neural activity. Moreover, the pair-wise correlations between neurons varied considerably between the different states, supporting the hypothesis that these distinct states were brought about by the cooperative action of many neurons.
Resumo:
Este trabalho apresenta um sistema neural modular, que processa separadamente informações de contexto espacial e temporal, para a tarefa de reprodução de sequências temporais. Para o desenvolvimento do sistema neural foram considerados redes neurais recorrentes, modelos estocásticos, sistemas neurais modulares e processamento de informações de contexto. Em seguida, foram estudados três modelos com abordagens distintas para aprendizagem de seqüências temporais: uma rede neural parcialmente recorrente, um exemplo de sistema neural modular e um modelo estocástico utilizando a teoria de modelos markovianos escondidos. Com base nos estudos e modelos apresentados, esta pesquisa propõe um sistema formado por dois módulos sucessivos distintos. Uma rede de propagação direta (módulo estimador de contexto espacial) realiza o processamento de contexto espacial identificando a seqüência a ser reproduzida e fornecendo um protótipo do contexto para o segundo módulo. Este é formado por uma rede parcialmente recorrente (módulo de reprodução de sequências temporais) para aprender as informações de contexto temporal e reproduzir em suas saídas a seqüência identificada pelo módulo anterior. Para a finalidade mencionada, este mestrado utiliza a distribuição de Gibbs na saída do módulo para contexto espacial de forma que este forneça probabilidades de contexto espacial, indicando o grau de certeza do módulo e possibilitando a utilização de procedimentos especiais para os casos de dúvida. O sistema neural foi testado em conjuntos contendo trajetórias abertas, fechadas, e com diferentes situações de ambigüidade e complexidade. Duas situações distintas foram avaliadas: (a) capacidade do sistema em reproduzir trajetórias a partir de pontos iniciais treinados; e (b) capacidade de generalização do sistema reproduzindo trajetórias considerando pontos iniciais ou finais em situações não treinadas. A situação (b) é um problema de difícil ) solução em redes neurais devido à falta de contexto temporal, essencial na reprodução de seqüências. Foram realizados experimentos comparando o desempenho do sistema modular proposto com o de uma rede parcialmente recorrente operando sozinha e um sistema modular neural (TOTEM). Os resultados sugerem que o sistema proposto apresentou uma capacidade de generalização significamente melhor, sem que houvesse uma deterioração na capacidade de reproduzir seqüências treinadas. Esses resultados foram obtidos em sistema mais simples que o TOTEM.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.