94 resultados para Modeling Rapport Using Hidden Markov Models
em Université de Lausanne, Switzerland
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot
Resumo:
Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.
Resumo:
OBJECTIVE: To better understand the structure of the Patient Assessment of Chronic Illness Care (PACIC) instrument. More specifically to test all published validation models, using one single data set and appropriate statistical tools. DESIGN: Validation study using data from cross-sectional survey. PARTICIPANTS: A population-based sample of non-institutionalized adults with diabetes residing in Switzerland (canton of Vaud). MAIN OUTCOME MEASURE: French version of the 20-items PACIC instrument (5-point response scale). We conducted validation analyses using confirmatory factor analysis (CFA). The original five-dimension model and other published models were tested with three types of CFA: based on (i) a Pearson estimator of variance-covariance matrix, (ii) a polychoric correlation matrix and (iii) a likelihood estimation with a multinomial distribution for the manifest variables. All models were assessed using loadings and goodness-of-fit measures. RESULTS: The analytical sample included 406 patients. Mean age was 64.4 years and 59% were men. Median of item responses varied between 1 and 4 (range 1-5), and range of missing values was between 5.7 and 12.3%. Strong floor and ceiling effects were present. Even though loadings of the tested models were relatively high, the only model showing acceptable fit was the 11-item single-dimension model. PACIC was associated with the expected variables of the field. CONCLUSIONS: Our results showed that the model considering 11 items in a single dimension exhibited the best fit for our data. A single score, in complement to the consideration of single-item results, might be used instead of the five dimensions usually described.
Resumo:
Because data on rare species usually are sparse, it is important to have efficient ways to sample additional data. Traditional sampling approaches are of limited value for rare species because a very large proportion of randomly chosen sampling sites are unlikely to shelter the species. For these species, spatial predictions from niche-based distribution models can be used to stratify the sampling and increase sampling efficiency. New data sampled are then used to improve the initial model. Applying this approach repeatedly is an adaptive process that may allow increasing the number of new occurrences found. We illustrate the approach with a case study of a rare and endangered plant species in Switzerland and a simulation experiment. Our field survey confirmed that the method helps in the discovery of new populations of the target species in remote areas where the predicted habitat suitability is high. In our simulations the model-based approach provided a significant improvement (by a factor of 1.8 to 4 times, depending on the measure) over simple random sampling. In terms of cost this approach may save up to 70% of the time spent in the field.
Resumo:
Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Resumo:
BACKGROUND: Membrane-bound organelles are a defining feature of eukaryotic cells, and play a central role in most of their fundamental processes. The Rab G proteins are the single largest family of proteins that participate in the traffic between organelles, with 66 Rabs encoded in the human genome. Rabs direct the organelle-specific recruitment of vesicle tethering factors, motor proteins, and regulators of membrane traffic. Each organelle or vesicle class is typically associated with one or more Rab, with the Rabs present in a particular cell reflecting that cell's complement of organelles and trafficking routes. RESULTS: Through iterative use of hidden Markov models and tree building, we classified Rabs across the eukaryotic kingdom to provide the most comprehensive view of Rab evolution obtained to date. A strikingly large repertoire of at least 20 Rabs appears to have been present in the last eukaryotic common ancestor (LECA), consistent with the 'complexity early' view of eukaryotic evolution. We were able to place these Rabs into six supergroups, giving a deep view into eukaryotic prehistory. CONCLUSIONS: Tracing the fate of the LECA Rabs revealed extensive losses with many extant eukaryotes having fewer Rabs, and none having the full complement. We found that other Rabs have expanded and diversified, including a large expansion at the dawn of metazoans, which could be followed to provide an account of the evolutionary history of all human Rabs. Some Rab changes could be correlated with differences in cellular organization, and the relative lack of variation in other families of membrane-traffic proteins suggests that it is the changes in Rabs that primarily underlies the variation in organelles between species and cell types.
Resumo:
The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.
Resumo:
The geodynamic forces acting in the Earth's interior manifest themselves in a variety of ways. Volcanoes are amongst the most impressive examples in this respect, but like with an iceberg, they only represent the tip of a more extensive system hidden underground. This system consists of a source region where melt forms and accumulates, feeder connections in which magma is transported towards the surface, and different reservoirs where it is stored before it eventually erupts to form a volcano. A magma represents a mixture of melt and crystals. The latter can be extracted from the source region, or form anywhere along the path towards their final crystallization place. They will retain information of the overall plumbing system. The host rocks of an intrusion, in contrast, provide information at the emplacement level. They record the effects of thermal and mechanical forces imposed by the magma. For a better understanding of the system, both parts - magmatic and metamorphic petrology - have to be integrated. I will demonstrate in my thesis that information from both is complementary. It is an iterative process, using constraints from one field to better constrain the other. Reading the history of the host rocks is not always straightforward. This is shown in chapter two, where a model for the formation of clustered garnets observed in the contact aureole is proposed. Fragments of garnets, older than the intrusive rocks are overgrown by garnet crystallizing due to the reheating during emplacement of the adjacent pluton. The formation of the clusters is therefore not a single event as generally assumed but the result of a two-stage process, namely the alteration of the old grains and the overgrowth and amalgamation of new garnet rims. This makes an important difference when applying petrological methods such as thermobarometry, geochronology or grain size distributions. The thermal conditions in the aureole are a strong function of the emplacement style of the pluton. therefore it is necessary to understand the pluton before drawing conclusions about its aureole. A study investigating the intrusive rocks by means of field, geochemical, geochronologi- cal and structural methods is presented in chapter three. This provided important information about the assembly of the intrusion, but also new insights on the nature of large, homogeneous plutons and the structure of the plumbing system in general. The incremental nature of the emplacement of the Western Adamello tonalité is documented, and the existence of an intermediate reservoir beneath homogeneous plutons is proposed. In chapter four it is demonstrated that information extracted from the host rock provides further constraints on the emplacement process of the intrusion. The temperatures obtain by combining field observations with phase petrology modeling are used together with thermal models to constrain the magmatic activity in the immediate intrusion. Instead of using the thermal models to control the petrology result, the inverse is done. The model parameters were changed until a match with the aureole temperatures was obtained. It is shown, that only a few combinations give a positive match and that temperature estimates from the aureole can constrain the frequency of ancient magmatic systems. In the fifth chapter, the Anisotropy of Magnetic Susceptibility of intrusive rocks is compared to 3D tomography. The obtained signal is a function of the shape and distribution of ferromagnetic grains, and is often used to infer flow directions of magma. It turns out that the signal is dominated by the shape of the magnetic crystals, and where they form tight clusters, also by their distribution. This is in good agreement with the predictions made in the theoretical and experimental literature. In the sixth chapter arguments for partial melting of host rock carbonates are presented. While at first very surprising, this is to be expected when considering the prior results from the intrusive study and experiments from the literature. Partial melting is documented by compelling microstructures, geochemical and structural data. The necessary conditions are far from extreme and this process might be more frequent than previously thought. The carbonate melt is highly mobile and can move along grain boundaries, infiltrating other rocks and ultimately alter the existing mineral assemblage. Finally, a mineralogical curiosity is presented in chapter seven. The mineral assemblage magne§site and calcite is in apparent equilibrium. It is well known that these two carbonates are not stable together in the system Ca0-Mg0-Fe0-C02. Indeed, magnesite and calcite should react to dolomite during metamorphism. The presented explanation for this '"forbidden" assemblage is, that a calcite melt infiltrated the magnesite bearing rock along grain boundaries and caused the peculiar microstructure. This is supported by isotopie disequilibrium between calcite and magnesite. A further implication of partially molten carbonates is, that the host rock drastically looses its strength so that its physical properties may be comparable to the ones of the intrusive rocks. This contrasting behavior of the host rock may ease the emplacement of the intrusion. We see that the circle closes and the iterative process of better constraining the emplacement could start again. - La Terre est en perpétuel mouvement et les forces tectoniques associées à ces mouvements se manifestent sous différentes formes. Les volcans en sont l'un des exemples les plus impressionnants, mais comme les icebergs, les laves émises en surfaces ne représentent que la pointe d'un vaste système caché dans les profondeurs. Ce système est constitué d'une région source, région où la roche source fond et produit le magma ; ce magma peut s'accumuler dans cette région source ou être transporté à travers différents conduits dans des réservoirs où le magma est stocké. Ce magma peut cristalliser in situ et produire des roches plutoniques ou alors être émis en surface. Un magma représente un mélange entre un liquide et des cristaux. Ces cristaux peuvent être extraits de la source ou se former tout au long du chemin jusqu'à l'endroit final de cristallisation. L'étude de ces cristaux peut ainsi donner des informations sur l'ensemble du système magmatique. Au contraire, les roches encaissantes fournissent des informations sur le niveau d'emplacement de l'intrusion. En effet ces roches enregistrent les effets thermiques et mécaniques imposés par le magma. Pour une meilleure compréhension du système, les deux parties, magmatique et métamorphique, doivent être intégrées. Cette thèse a pour but de montrer que les informations issues de l'étude des roches magmatiques et des roches encaissantes sont complémentaires. C'est un processus itératif qui utilise les contraintes d'un domaine pour améliorer la compréhension de l'autre. Comprendre l'histoire des roches encaissantes n'est pas toujours aisé. Ceci est démontré dans le chapitre deux, où un modèle de formation des grenats observés sous forme d'agrégats dans l'auréole de contact est proposé. Des fragments de grenats plus vieux que les roches intru- sives montrent une zone de surcroissance générée par l'apport thermique produit par la mise en place du pluton adjacent. La formation des agrégats de grenats n'est donc pas le résultat d'un seul événement, comme on le décrit habituellement, mais d'un processus en deux phases, soit l'altération de vieux grains engendrant une fracturation de ces grenats, puis la formation de zone de surcroissance autour de ces différents fragments expliquant la texture en agrégats observée. Cette interprétation en deux phases est importante, car elle engendre des différences notables lorsque l'on applique des méthodes pétrologiques comme la thermobarométrie, la géochronologie ou encore lorsque l'on étudie la distribution relative de la taille des grains. Les conditions thermales dans l'auréole de contact dépendent fortement du mode d'emplacement de l'intrusion et c'est pourquoi il est nécessaire de d'abord comprendre le pluton avant de faire des conclusions sur son auréole de contact. Une étude de terrain des roches intrusives ainsi qu'une étude géochimique, géochronologique et structurale est présente dans le troisième chapitre. Cette étude apporte des informations importantes sur la formation de l'intrusion mais également de nouvelles connaissances sur la nature de grands plutons homogènes et la structure de système magmatique en général. L'emplacement incrémental est mis en évidence et l'existence d'un réservoir intermédiaire en-dessous des plutons homogènes est proposé. Le quatrième chapitre de cette thèse illustre comment utiliser l'information extraite des roches encaissantes pour expliquer la mise en place de l'intrusion. Les températures obtenues par la combinaison des observations de terrain et l'assemblage métamorphique sont utilisées avec des modèles thermiques pour contraindre l'activité magmatique au contact directe de cette auréole. Au lieu d'utiliser le modèle thermique pour vérifier le résultat pétrologique, une approche inverse a été choisie. Les paramètres du modèle ont été changés jusqu'à ce qu'on obtienne une correspondance avec les températures observées dans l'auréole de contact. Ceci montre qu'il y a peu de combinaison qui peuvent expliquer les températures et qu'on peut contraindre la fréquence de l'activité magmatique d'un ancien système magmatique de cette manière. Dans le cinquième chapitre, les processus contrôlant l'anisotropie de la susceptibilité magnétique des roches intrusives sont expliqués à l'aide d'images de la distribution des minéraux dans les roches obtenues par tomographie 3D. Le signal associé à l'anisotropie de la susceptibilité magnétique est une fonction de la forme et de la distribution des grains ferromagnétiques. Ce signal est fréquemment utilisé pour déterminer la direction de mouvement d'un magma. En accord avec d'autres études de la littérature, les résultats montrent que le signal est dominé par la forme des cristaux magnétiques, ainsi que par la distribution des agglomérats de ces minéraux dans la roche. Dans le sixième chapitre, une étude associée à la fusion partielle de carbonates dans les roches encaissantes est présentée. Si la présence de liquides carbonatés dans les auréoles de contact a été proposée sur la base d'expériences de laboratoire, notre étude démontre clairement leur existence dans la nature. La fusion partielle est documentée par des microstructures caractéristiques pour la présence de liquides ainsi que par des données géochimiques et structurales. Les conditions nécessaires sont loin d'être extrêmes et ce processus pourrait être plus fréquent qu'attendu. Les liquides carbonatés sont très mobiles et peuvent circuler le long des limites de grain avant d'infiltrer d'autres roches en produisant une modification de leurs assemblages minéralogiques. Finalement, une curiosité minéralogique est présentée dans le chapitre sept. L'assemblage de minéraux de magnésite et de calcite en équilibre apparent est observé. Il est bien connu que ces deux carbonates ne sont pas stables ensemble dans le système CaO-MgO-FeO-CO.,. En effet, la magnésite et la calcite devraient réagir et produire de la dolomite pendant le métamorphisme. L'explication présentée pour cet assemblage à priori « interdit » est que un liquide carbonaté provenant des roches adjacentes infiltre cette roche et est responsable pour cette microstructure. Une autre implication associée à la présence de carbonates fondus est que la roche encaissante montre une diminution drastique de sa résistance et que les propriétés physiques de cette roche deviennent comparables à celles de la roche intrusive. Cette modification des propriétés rhéologiques des roches encaissantes peut faciliter la mise en place des roches intrusives. Ces différentes études démontrent bien le processus itératif utilisé et l'intérêt d'étudier aussi bien les roches intrusives que les roches encaissantes pour la compréhension des mécanismes de mise en place des magmas au sein de la croûte terrestre.
Resumo:
BACKGROUND: Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS: We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION: We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites.
Resumo:
Regulatory gene networks contain generic modules, like those involving feedback loops, which are essential for the regulation of many biological functions (Guido et al. in Nature 439:856-860, 2006). We consider a class of self-regulated genes which are the building blocks of many regulatory gene networks, and study the steady-state distribution of the associated Gillespie algorithm by providing efficient numerical algorithms. We also study a regulatory gene network of interest in gene therapy, using mean-field models with time delays. Convergence of the related time-nonhomogeneous Markov chain is established for a class of linear catalytic networks with feedback loops.
Resumo:
Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.
Resumo:
OBJECTIVE: To investigate the evolution of delirium of nursing home (NH) residents and their possible predictors. DESIGN: Post-hoc analysis of a prospective cohort assessment. SETTING: Ninety NHs in Switzerland. PARTICIPANTS: Included 14,771 NH residents. MEASUREMENTS: The Resident Assessment Instrument Minimum Data Set and the Nursing Home Confusion Assessment Method were used to determine follow-up of subsyndromal or full delirium in NH residents using discrete Markov chain modeling to describe long-term trajectories and multiple logistic regression analyses to determine predictors of the trajectories. RESULTS: We identified four major types of delirium time courses in NH. Increasing severity of cognitive impairment and of depressive symptoms at the initial assessment predicted the different delirium time courses. CONCLUSION: More pronounced cognitive impairment and depressive symptoms at the initial assessment are associated with different subsequent evolutions of delirium. The presence and evolution of delirium in the first year after NH admission predicted the subsequent course of delirium until death.