943 resultados para Markov Model


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Levels of differentiation among populations depend both on demographic and selective factors: genetic drift and local adaptation increase population differentiation, which is eroded by gene flow and balancing selection. We describe here the genomic distribution and the properties of genomic regions with unusually high and low levels of population differentiation in humans to assess the influence of selective and neutral processes on human genetic structure. Methods Individual SNPs of the Human Genome Diversity Panel (HGDP) showing significantly high or low levels of population differentiation were detected under a hierarchical-island model (HIM). A Hidden Markov Model allowed us to detect genomic regions or islands of high or low population differentiation. Results Under the HIM, only 1.5% of all SNPs are significant at the 1% level, but their genomic spatial distribution is significantly non-random. We find evidence that local adaptation shaped high-differentiation islands, as they are enriched for non-synonymous SNPs and overlap with previously identified candidate regions for positive selection. Moreover there is a negative relationship between the size of islands and recombination rate, which is stronger for islands overlapping with genes. Gene ontology analysis supports the role of diet as a major selective pressure in those highly differentiated islands. Low-differentiation islands are also enriched for non-synonymous SNPs, and contain an overly high proportion of genes belonging to the 'Oncogenesis' biological process. Conclusions Even though selection seems to be acting in shaping islands of high population differentiation, neutral demographic processes might have promoted the appearance of some genomic islands since i) as much as 20% of islands are in non-genic regions ii) these non-genic islands are on average two times shorter than genic islands, suggesting a more rapid erosion by recombination, and iii) most loci are strongly differentiated between Africans and non-Africans, a result consistent with known human demographic history.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a sequential coupling of a Hidden Markov Model (HMM) recognizer for offline handwritten English sentences with a probabilistic bottom-up chart parser using Stochastic Context-Free Grammars (SCFG) extracted from a text corpus. Based on extensive experiments, we conclude that syntax analysis helps to improve recognition rates significantly.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A simulation model adopting a health system perspective showed population-based screening with DXA, followed by alendronate treatment of persons with osteoporosis, or with anamnestic fracture and osteopenia, to be cost-effective in Swiss postmenopausal women from age 70, but not in men. INTRODUCTION: We assessed the cost-effectiveness of a population-based screen-and-treat strategy for osteoporosis (DXA followed by alendronate treatment if osteoporotic, or osteopenic in the presence of fracture), compared to no intervention, from the perspective of the Swiss health care system. METHODS: A published Markov model assessed by first-order Monte Carlo simulation was refined to reflect the diagnostic process and treatment effects. Women and men entered the model at age 50. Main screening ages were 65, 75, and 85 years. Age at bone densitometry was flexible for persons fracturing before the main screening age. Realistic assumptions were made with respect to persistence with intended 5 years of alendronate treatment. The main outcome was cost per quality-adjusted life year (QALY) gained. RESULTS: In women, costs per QALY were Swiss francs (CHF) 71,000, CHF 35,000, and CHF 28,000 for the main screening ages of 65, 75, and 85 years. The threshold of CHF 50,000 per QALY was reached between main screening ages 65 and 75 years. Population-based screening was not cost-effective in men. CONCLUSION: Population-based DXA screening, followed by alendronate treatment in the presence of osteoporosis, or of fracture and osteopenia, is a cost-effective option in Swiss postmenopausal women after age 70.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The past decade has seen the energy consumption in servers and Internet Data Centers (IDCs) skyrocket. A recent survey estimated that the worldwide spending on servers and cooling have risen to above $30 billion and is likely to exceed spending on the new server hardware . The rapid rise in energy consumption has posted a serious threat to both energy resources and the environment, which makes green computing not only worthwhile but also necessary. This dissertation intends to tackle the challenges of both reducing the energy consumption of server systems and by reducing the cost for Online Service Providers (OSPs). Two distinct subsystems account for most of IDC’s power: the server system, which accounts for 56% of the total power consumption of an IDC, and the cooling and humidifcation systems, which accounts for about 30% of the total power consumption. The server system dominates the energy consumption of an IDC, and its power draw can vary drastically with data center utilization. In this dissertation, we propose three models to achieve energy effciency in web server clusters: an energy proportional model, an optimal server allocation and frequency adjustment strategy, and a constrained Markov model. The proposed models have combined Dynamic Voltage/Frequency Scaling (DV/FS) and Vary-On, Vary-off (VOVF) mechanisms that work together for more energy savings. Meanwhile, corresponding strategies are proposed to deal with the transition overheads. We further extend server energy management to the IDC’s costs management, helping the OSPs to conserve, manage their own electricity cost, and lower the carbon emissions. We have developed an optimal energy-aware load dispatching strategy that periodically maps more requests to the locations with lower electricity prices. A carbon emission limit is placed, and the volatility of the carbon offset market is also considered. Two energy effcient strategies are applied to the server system and the cooling system respectively. With the rapid development of cloud services, we also carry out research to reduce the server energy in cloud computing environments. In this work, we propose a new live virtual machine (VM) placement scheme that can effectively map VMs to Physical Machines (PMs) with substantial energy savings in a heterogeneous server cluster. A VM/PM mapping probability matrix is constructed, in which each VM request is assigned with a probability running on PMs. The VM/PM mapping probability matrix takes into account resource limitations, VM operation overheads, server reliability as well as energy effciency. The evolution of Internet Data Centers and the increasing demands of web services raise great challenges to improve the energy effciency of IDCs. We also express several potential areas for future research in each chapter.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND/AIMS: While several risk factors for the histological progression of chronic hepatitis C have been identified, the contribution of HCV genotypes to liver fibrosis evolution remains controversial. The aim of this study was to assess independent predictors for fibrosis progression. METHODS: We identified 1189 patients from the Swiss Hepatitis C Cohort database with at least one biopsy prior to antiviral treatment and assessable date of infection. Stage-constant fibrosis progression rate was assessed using the ratio of fibrosis Metavir score to duration of infection. Stage-specific fibrosis progression rates were obtained using a Markov model. Risk factors were assessed by univariate and multivariate regression models. RESULTS: Independent risk factors for accelerated stage-constant fibrosis progression (>0.083 fibrosis units/year) included male sex (OR=1.60, [95% CI 1.21-2.12], P<0.001), age at infection (OR=1.08, [1.06-1.09], P<0.001), histological activity (OR=2.03, [1.54-2.68], P<0.001) and genotype 3 (OR=1.89, [1.37-2.61], P<0.001). Slower progression rates were observed in patients infected by blood transfusion (P=0.02) and invasive procedures or needle stick (P=0.03), compared to those infected by intravenous drug use. Maximum likelihood estimates (95% CI) of stage-specific progression rates (fibrosis units/year) for genotype 3 versus the other genotypes were: F0-->F1: 0.126 (0.106-0.145) versus 0.091 (0.083-0.100), F1-->F2: 0.099 (0.080-0.117) versus 0.065 (0.058-0.073), F2-->F3: 0.077 (0.058-0.096) versus 0.068 (0.057-0.080) and F3-->F4: 0.171 (0.106-0.236) versus 0.112 (0.083-0.142, overall P<0.001). CONCLUSIONS: This study shows a significant association of genotype 3 with accelerated fibrosis using both stage-constant and stage-specific estimates of fibrosis progression rates. This observation may have important consequences for the management of patients infected with this genotype.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mathematical models of disease progression predict disease outcomes and are useful epidemiological tools for planners and evaluators of health interventions. The R package gems is a tool that simulates disease progression in patients and predicts the effect of different interventions on patient outcome. Disease progression is represented by a series of events (e.g., diagnosis, treatment and death), displayed in a directed acyclic graph. The vertices correspond to disease states and the directed edges represent events. The package gems allows simulations based on a generalized multistate model that can be described by a directed acyclic graph with continuous transition-specific hazard functions. The user can specify an arbitrary hazard function and its parameters. The model includes parameter uncertainty, does not need to be a Markov model, and may take the history of previous events into account. Applications are not limited to the medical field and extend to other areas where multistate simulation is of interest. We provide a technical explanation of the multistate models used by gems, explain the functions of gems and their arguments, and show a sample application.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sterols are an essential class of lipids in eukaryotes, where they serve as structural components of membranes and play important roles as signaling molecules. Sterols are also of high pharmacological significance: cholesterol-lowering drugs are blockbusters in human health, and inhibitors of ergosterol biosynthesis are widely used as antifungals. Inhibitors of ergosterol synthesis are also being developed for Chagas's disease, caused by Trypanosoma cruzi. Here we develop an in silico pipeline to globally evaluate sterol metabolism and perform comparative genomics. We generate a library of hidden Markov model-based profiles for 42 sterol biosynthetic enzymes, which allows expressing the genomic makeup of a given species as a numerical vector. Hierarchical clustering of these vectors functionally groups eukaryote proteomes and reveals convergent evolution, in particular metabolic reduction in obligate endoparasites. We experimentally explore sterol metabolism by testing a set of sterol biosynthesis inhibitors against trypanosomatids, Plasmodium falciparum, Giardia, and mammalian cells, and by quantifying the expression levels of sterol biosynthetic genes during the different life stages of T. cruzi and Trypanosoma brucei. The phenotypic data correlate with genomic makeup for simvastatin, which showed activity against trypanosomatids. Other findings, such as the activity of terbinafine against Giardia, are not in agreement with the genotypic profile.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ability to determine what activity of daily living a person performs is of interest in many application domains. It is possible to determine the physical and cognitive capabilities of the elderly by inferring what activities they perform in their houses. Our primary aim was to establish a proof of concept that a wireless sensor system can monitor and record physical activity and these data can be modeled to predict activities of daily living. The secondary aim was to determine the optimal placement of the sensor boxes for detecting activities in a room. A wireless sensor system was set up in a laboratory kitchen. The ten healthy participants were requested to make tea following a defined sequence of tasks. Data were collected from the eight wireless sensor boxes placed in specific places in the test kitchen and analyzed to detect the sequences of tasks performed by the participants. These sequence of tasks were trained and tested using the Markov Model. Data analysis focused on the reliability of the system and the integrity of the collected data. The sequence of tasks were successfully recognized for all subjects and the averaged data pattern of tasks sequences between the subjects had a high correlation. Analysis of the data collected indicates that sensors placed in different locations are capable of recognizing activities, with the movement detection sensor contributing the most to detection of tasks. The central top of the room with no obstruction of view was considered to be the best location to record data for activity detection. Wireless sensor systems show much promise as easily deployable to monitor and recognize activities of daily living.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ecological speciation is the process by which reproductively isolated populations emerge as a consequence of divergent natural or ecologically-mediated sexual selection. Most genomic studies of ecological speciation have investigated allopatric populations, making it difficult to infer reproductive isolation. The few studies on sympatric ecotypes have focused on advanced stages of the speciation process after thousands of generations of divergence. As a consequence, we still do not know what genomic signatures of the early onset of ecological speciation look like. Here, we examined genomic differentiation among migratory lake and resident stream ecotypes of threespine stickleback reproducing in sympatry in one stream, and in parapatry in another stream. Importantly, these ecotypes started diverging less than 150 years ago. We obtained 34,756 SNPs with restriction-site associated DNA sequencing and identified genomic islands of differentiation using a Hidden Markov Model approach. Consistent with incipient ecological speciation, we found significant genomic differentiation between ecotypes both in sympatry and parapatry. Of 19 islands of differentiation resisting gene flow in sympatry, all were also differentiated in parapatry and were thus likely driven by divergent selection among habitats. These islands clustered in quantitative trait loci controlling divergent traits among the ecotypes, many of them concentrated in one region with low to intermediate recombination. Our findings suggest that adaptive genomic differentiation at many genetic loci can arise and persist in sympatry at the very early stage of ecotype divergence, and that the genomic architecture of adaptation may facilitate this.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This report describes the development of a Markov model for comparing percutaneous radiofrequency ablation (RFA) and stereotactic body radiation therapy (SBRT) in terms of their cost-utility in treating isolated liver metastases from colorectal cancer. The model is based on data from multiple retrospective and prospective studies, available data on different utility states associated with treatment and complications, as well as publicly available Medicare costs. The purpose of this report is to establish a well-justified model for clinical management decisions. In comparison with SBRT, RFA is the most cost-effective treatment for this patient population. From the societal perspective, SBRT may be an acceptable alternative with an ICER of $28,673/QALY. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detección de Actividad de Voz en entornos acústicos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefonía basadas en reconocimiento automático de voz, aplicaciones en sistemas de transcripción automática, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial interés en el estudio de las voces de fondo, principal fuente de error de la mayoría de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de características contiene dos componentes: la energía normalizada y la variación de la energía. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliación del vector de características de partida dotándole así de información espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topologías y, finalmente, 3) estudio e inclusión de nuevas características, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detección, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For most of us, speaking in a non-native language involves deviating to some extent from native pronunciation norms. However, the detailed basis for foreign accent (FA) remains elusive, in part due to methodological challenges in isolating segmental from suprasegmental factors. The current study examines the role of segmental features in conveying FA through the use of a generative approach in which accent is localised to single consonantal segments. Three techniques are evaluated: the first requires a highly-proficiency bilingual to produce words with isolated accented segments; the second uses cross-splicing of context-dependent consonants from the non-native language into native words; the third employs hidden Markov model synthesis to blend voice models for both languages. Using English and Spanish as the native/non-native languages respectively, listener cohorts from both languages identified words and rated their degree of FA. All techniques were capable of generating accented words, but to differing degrees. Naturally-produced speech led to the strongest FA ratings and synthetic speech the weakest, which we interpret as the outcome of over-smoothing. Nevertheless, the flexibility offered by synthesising localised accent encourages further development of the method.