7 resultados para hierarchical hidden Markov model

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Markov random fields (MRF) are popular in image processing applications to describe spatial dependencies between image units. Here, we take a look at the theory and the models of MRFs with an application to improve forest inventory estimates. Typically, autocorrelation between study units is a nuisance in statistical inference, but we take an advantage of the dependencies to smooth noisy measurements by borrowing information from the neighbouring units. We build a stochastic spatial model, which we estimate with a Markov chain Monte Carlo simulation method. The smooth values are validated against another data set increasing our confidence that the estimates are more accurate than the originals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Elucidating the mechanisms responsible for the patterns of species abundance, diversity, and distribution within and across ecological systems is a fundamental research focus in ecology. Species abundance patterns are shaped in a convoluted way by interplays between inter-/intra-specific interactions, environmental forcing, demographic stochasticity, and dispersal. Comprehensive models and suitable inferential and computational tools for teasing out these different factors are quite limited, even though such tools are critically needed to guide the implementation of management and conservation strategies, the efficacy of which rests on a realistic evaluation of the underlying mechanisms. This is even more so in the prevailing context of concerns over climate change progress and its potential impacts on ecosystems. This thesis utilized the flexible hierarchical Bayesian modelling framework in combination with the computer intensive methods known as Markov chain Monte Carlo, to develop methodologies for identifying and evaluating the factors that control the structure and dynamics of ecological communities. These methodologies were used to analyze data from a range of taxa: macro-moths (Lepidoptera), fish, crustaceans, birds, and rodents. Environmental stochasticity emerged as the most important driver of community dynamics, followed by density dependent regulation; the influence of inter-specific interactions on community-level variances was broadly minor. This thesis contributes to the understanding of the mechanisms underlying the structure and dynamics of ecological communities, by showing directly that environmental fluctuations rather than inter-specific competition dominate the dynamics of several systems. This finding emphasizes the need to better understand how species are affected by the environment and acknowledge species differences in their responses to environmental heterogeneity, if we are to effectively model and predict their dynamics (e.g. for management and conservation purposes). The thesis also proposes a model-based approach to integrating the niche and neutral perspectives on community structure and dynamics, making it possible for the relative importance of each category of factors to be evaluated in light of field data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Action, Power and Experience in Organizational Change - A Study of Three Major Corporations This study explores change management and resistance to change as social activities and power displays through worker experiences in three major Finnish corporations. Two important sensitizing concepts were applied. Firstly, Richard Sennett's perspective on work in the new form of capitalism, and its shortcomings - the lack of commitment and freedom accompanied by the disruption to lifelong career planning and the feeling of job insecurity - offered a fruitful starting point for a critical study. Secondly, Michel Foucault's classical concept of power, treated as anecdotal, interactive and nonmeasurable, provided tools for analyzing change-enabling and resisting acts. The study bridges the gap between management and social sciences. The former have usually concentrated on leadership issues, best practices and goal attainment, while the latter have covered worker experiences, power relations and political conflicts. The study was motivated by three research questions. Firstly, why people resist or support changes in their work, work environment or organization, and the kind of analyses these behavioural choices are based on. Secondly, the kind of practical forms which support for, and resistance to change take, and how people choose the different ways of acting. Thirdly, how the people involved experience and describe their own subject position and actions in changing environments. The examination focuses on practical interpretations and action descriptions given by the members of three major Finnish business organizations. The empirical data was collected during a two-year period in the Finnish Post Corporation, the Finnish branch of Vattenfal Group, one of the leading European energy companies, and the Mehiläinen Group, the leading private medical service provider in Finland. It includes 154 non-structured thematic interviews and 309 biographies concentrating on personal experiences of change. All positions and organizational levels were represented. The analysis was conducted using the grounded theory method introduced by Straus and Corbin in three sequential phases, including open, axial and selective coding processes. As a result, there is a hierarchical structure of categories, which is summarized in the process model of change behaviour patterns. Key ingredients are past experiences and future expectations which lead to different change relations and behavioural roles. Ultimately, they contribute to strategic and tactical choices realized as both public and hidden forms of action. The same forms of action can be used in both supporting and resisting change, and there are no specific dividing lines either between employer and employee roles or between different hierarchical positions. In general, however, it is possible to conclude that strategic choices lead more often to public forms of action, whereas tactical choices result in hidden forms. The primary goal of the study was to provide knowledge which has practical applications in everyday business life, HR and change management. The results, therefore, are highly applicable to other organizations as well as to less change-dominated situations, whenever power relations and conflicting interests are present. A sociological thesis on classical business management issues can be of considerable value in revealing the crucial social processes behind behavioural patterns. Keywords: change management, organizational development, organizational resistance, resistance to change, change management, labor relations, organization, leadership

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The open development model of software production has been characterized as the future model of knowledge production and distributed work. Open development model refers to publicly available source code ensured by an open source license, and the extensive and varied distributed participation of volunteers enabled by the Internet. Contemporary spokesmen of open source communities and academics view open source development as a new form of volunteer work activity characterized by hacker ethic and bazaar governance . The development of the Linux operating system is perhaps the best know example of such an open source project. It started as an effort by a user-developer and grew quickly into a large project with hundreds of user-developer as contributors. However, in hybrids , in which firms participate in open source projects oriented towards end-users, it seems that most users do not write code. The OpenOffice.org project, initiated by Sun Microsystems, in this study represents such a project. In addition, the Finnish public sector ICT decision-making concerning open source use is studied. The purpose is to explore the assumptions, theories and myths related to the open development model by analysing the discursive construction of the OpenOffice.org community: its developers, users and management. The qualitative study aims at shedding light on the dynamics and challenges of community construction and maintenance, and related power relations in hybrid open source, by asking two main research questions: How is the structure and membership constellation of the community, specifically the relation between developers and users linguistically constructed in hybrid open development? What characterizes Internet-mediated virtual communities and how can they be defined? How do they differ from hierarchical forms of knowledge production on one hand and from traditional volunteer communities on the other? The study utilizes sociological, psychological and anthropological concepts of community for understanding the connection between the real and the imaginary in so-called virtual open source communities. Intermediary methodological and analytical concepts are borrowed from discourse and rhetorical theories. A discursive-rhetorical approach is offered as a methodological toolkit for studying texts and writing in Internet communities. The empirical chapters approach the problem of community and its membership from four complementary points of views. The data comprises mailing list discussion, personal interviews, web page writings, email exchanges, field notes and other historical documents. The four viewpoints are: 1) the community as conceived by volunteers 2) the individual contributor s attachment to the project 3) public sector organizations as users of open source 4) the community as articulated by the community manager. I arrive at four conclusions concerning my empirical studies (1-4) and two general conclusions (5-6). 1) Sun Microsystems and OpenOffice.org Groupware volunteers failed in developing necessary and sufficient open code and open dialogue to ensure collaboration thus splitting the Groupware community into volunteers we and the firm them . 2) Instead of separating intrinsic and extrinsic motivations, I find that volunteers unique patterns of motivations are tied to changing objects and personal histories prior and during participation in the OpenOffice.org Lingucomponent project. Rather than seeing volunteers as a unified community, they can be better understood as independent entrepreneurs in search of a collaborative community . The boundaries between work and hobby are blurred and shifting, thus questioning the usefulness of the concept of volunteer . 3) The public sector ICT discourse portrays a dilemma and tension between the freedom to choose, use and develop one s desktop in the spirit of open source on one hand and the striving for better desktop control and maintenance by IT staff and user advocates, on the other. The link between the global OpenOffice.org community and the local end-user practices are weak and mediated by the problematic IT staff-(end)user relationship. 4) Authoring community can be seen as a new hybrid open source community-type of managerial practice. The ambiguous concept of community is a powerful strategic tool for orienting towards multiple real and imaginary audiences as evidenced in the global membership rhetoric. 5) The changing and contradictory discourses of this study show a change in the conceptual system and developer-user relationship of the open development model. This change is characterized as a movement from hacker ethic and bazaar governance to more professionally and strategically regulated community. 6) Community is simultaneously real and imagined, and can be characterized as a runaway community . Discursive-action can be seen as a specific type of online open source engagement. Hierarchies and structures are created through discursive acts. Key words: Open Source Software, open development model, community, motivation, discourse, rhetoric, developer, user, end-user