17 resultados para Robust Probabilistic Model, Dyslexic Users, Rewriting, Question-Answering

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is concerned with using the bootstrap to obtain improved critical values for the error correction model (ECM) cointegration test in dynamic models. In the paper we investigate the effects of dynamic specification on the size and power of the ECM cointegration test with bootstrap critical values. The results from a Monte Carlo study show that the size of the bootstrap ECM cointegration test is close to the nominal significance level. We find that overspecification of the lag length results in a loss of power. Underspecification of the lag length results in size distortion. The performance of the bootstrap ECM cointegration test deteriorates if the correct lag length is not used in the ECM. The bootstrap ECM cointegration test is therefore not robust to model misspecification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Atmospheric particles affect the radiation balance of the Earth and thus the climate. New particle formation from nucleation has been observed in diverse atmospheric conditions but the actual formation path is still unknown. The prevailing conditions can be exploited to evaluate proposed formation mechanisms. This study aims to improve our understanding of new particle formation from the view of atmospheric conditions. The role of atmospheric conditions on particle formation was studied by atmospheric measurements, theoretical model simulations and simulations based on observations. Two separate column models were further developed for aerosol and chemical simulations. Model simulations allowed us to expand the study from local conditions to varying conditions in the atmospheric boundary layer, while the long-term measurements described especially characteristic mean conditions associated with new particle formation. The observations show statistically significant difference in meteorological and back-ground aerosol conditions between observed event and non-event days. New particle formation above boreal forest is associated with strong convective activity, low humidity and low condensation sink. The probability of a particle formation event is predicted by an equation formulated for upper boundary layer conditions. The model simulations call into question if kinetic sulphuric acid induced nucleation is the primary particle formation mechanism in the presence of organic vapours. Simultaneously the simulations show that ignoring spatial and temporal variation in new particle formation studies may lead to faulty conclusions. On the other hand, the theoretical simulations indicate that short-scale variations in temperature and humidity unlikely have a significant effect on mean binary water sulphuric acid nucleation rate. The study emphasizes the significance of mixing and fluxes in particle formation studies, especially in the atmospheric boundary layer. The further developed models allow extensive aerosol physical and chemical studies in the future.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The ProFacil model is a generic process model defined as a framework model showing the links between the facilities management process and the building end user’s business process. The purpose of using the model is to support more detailed process modelling. The model has been developed using the IDEF0 modelling method. The ProFacil model describes business activities from the generalized point of view as management-, support-, and core processes and their relations. The model defines basic activities in the provision of a facility. Examples of these activities are “operate facilities”, “provide new facilities”, “provide re-build facilities”, “provide maintained facilities” and “perform dispose of facilities”. These are all generic activities providing a basis for a further specialisation of company specific FM activities and their tasks. A facilitator can establish a specialized process model using the ProFacil model and interacting with company experts to describe their company’s specific processes. These modelling seminars or interviews will be done in an informal way, supported by the high-level process model as a common reference.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The research question of this thesis was how knowledge can be managed with information systems. Information systems can support but not replace knowledge management. Systems can mainly store epistemic organisational knowledge included in content, and process data and information. Certain value can be achieved by adding communication technology to systems. All communication, however, can not be managed. A new layer between communication and manageable information was named as knowformation. Knowledge management literature was surveyed, together with information species from philosophy, physics, communication theory, and information system science. Positivism, post-positivism, and critical theory were studied, but knowformation in extended organisational memory seemed to be socially constructed. A memory management model of an extended enterprise (M3.exe) and knowformation concept were findings from iterative case studies, covering data, information and knowledge management systems. The cases varied from groups towards extended organisation. Systems were investigated, and administrators, users (knowledge workers) and managers interviewed. The model building required alternative sets of data, information and knowledge, instead of using the traditional pyramid. Also the explicit-tacit dichotomy was reconsidered. As human knowledge is the final aim of all data and information in the systems, the distinction between management of information vs. management of people was harmonised. Information systems were classified as the core of organisational memory. The content of the systems is in practice between communication and presentation. Firstly, the epistemic criterion of knowledge is not required neither in the knowledge management literature, nor from the content of the systems. Secondly, systems deal mostly with containers, and the knowledge management literature with applied knowledge. Also the construction of reality based on the system content and communication supports the knowformation concept. Knowformation belongs to memory management model of an extended enterprise (M3.exe) that is divided into horizontal and vertical key dimensions. Vertically, processes deal with content that can be managed, whereas communication can be supported, mainly by infrastructure. Horizontally, the right hand side of the model contains systems, and the left hand side content, which should be independent from each other. A strategy based on the model was defined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A Priviledged Gender? The Question of Authority in the Feminist Theology of Elisabeth Schüssler Fiorenza. Elisbeth Schüssler Fiorenza (b. 1938) is one of the pioneers of Christian feminist theology. The aim of this study is to analyze what she understands by the authority of women as (re)interpreters of the Bible and the Christian tradition. The research method is conceptual analysis, and the sources consist of Schüssler Fiorenza s key writings from 1975 to 2006. The starting point of the study is Schüssler Fiorenza s definition of the task of feminist theology as claiming women s intellectual-religious authority. It is assumed that the issue of authority offers an angle from which Schüssler Fiorenza s feminist theology can be understood as a whole. It is also supposed that the notion of authority opens up a perspective on the way Schüssler Fiorenza dialogues with non-theological feminist theory in her writings. The analysis is first directed to five key concepts of Schüssler Fiorenza s work: authority, patriarchy, androcentrism, gender and women-church, i.e. the ekklesia of wo/men. Special attention is given to her gender-theoretical considerations and her neologism wo/men, by which she refers to women and marginalized men. The aim of this conceptual analysis is to clarify her thought on the subjects of feminist theology. In addition, Schüssler Fiorenza s dialogue with other feminist scholars is evaluated. It is argued that eclecticism characterizes her way of treating non-theological feminist theory. Rewriting early Christian history from a feminist perspective is at the core of Schüssler Fiorenza s scholarship. From her early writings on, she argues for women s authority to define the Christian religion, past and present. In the 1990s Schüssler Fiorenza s theoretical background is feminist standpoint epistemology, and she represents feminist women as an epistemologically priviledged group. Later she claims to defend the epistemic authority of all those wo/men women and men who want to produce emancipatory knowledge. The analysis of Schüssler Fiorenza s work on feminist biblical interpretation shows that her stated aim to regard both women and men as subjects of feminist theology is not realized in the actual descriptions of her hermeneutical model. In fact, Schüssler Fiorenza argues for the authority of feminist women to interpret the Bible in their own interests . Thus in her work, women seem to figure as representatives of the priviledged gender in the field of biblical and theological knowledge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this dissertation is to provide conceptual tools for the social scientist for clarifying, evaluating and comparing explanations of social phenomena based on formal mathematical models. The focus is on relatively simple theoretical models and simulations, not statistical models. These studies apply a theory of explanation according to which explanation is about tracing objective relations of dependence, knowledge of which enables answers to contrastive why and how-questions. This theory is developed further by delineating criteria for evaluating competing explanations and by applying the theory to social scientific modelling practices and to the key concepts of equilibrium and mechanism. The dissertation is comprised of an introductory essay and six published original research articles. The main theses about model-based explanations in the social sciences argued for in the articles are the following. 1) The concept of explanatory power, often used to argue for the superiority of one explanation over another, compasses five dimensions which are partially independent and involve some systematic trade-offs. 2) All equilibrium explanations do not causally explain the obtaining of the end equilibrium state with the multiple possible initial states. Instead, they often constitutively explain the macro property of the system with the micro properties of the parts (together with their organization). 3) There is an important ambivalence in the concept mechanism used in many model-based explanations and this difference corresponds to a difference between two alternative research heuristics. 4) Whether unrealistic assumptions in a model (such as a rational choice model) are detrimental to an explanation provided by the model depends on whether the representation of the explanatory dependency in the model is itself dependent on the particular unrealistic assumptions. Thus evaluating whether a literally false assumption in a model is problematic requires specifying exactly what is supposed to be explained and by what. 5) The question of whether an explanatory relationship depends on particular false assumptions can be explored with the process of derivational robustness analysis and the importance of robustness analysis accounts for some of the puzzling features of the tradition of model-building in economics. 6) The fact that economists have been relatively reluctant to use true agent-based simulations to formulate explanations can partially be explained by the specific ideal of scientific understanding implicit in the practise of orthodox economics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research has its background on Knowledge Practices Laboratory (KP-Lab) research project. One of the aims of KP-Lab is to create virtual and technological tools to support the interventionist who use the Change Laboratory method as a developmental tool. In this research I studied the interventionists' mirror material practices which are context and theory bound and for that reason they pose challenges on the development of new tools. I focused on the gathering and working on the mirror material. The purpose of this study was to find out what kind of user knowledge does the research on narratives of mirror material practices in Change Laboratory provide to the developers of virtual tools? I answered this question by three sub questions: 1. What kind of mirror material the interventionists gather and analyze in different phases of developmental cycle of Change Laboratory project? 2. What kind of knowledge the different mirror materials contain and how is the knowledge transformed when the mirror material is analyzed and worked on? 3. What kind of tools the interventionists use and create when gathering and working on the mirror material? What wishes do the interventionists have on tools? I interviewed five interventionists in four different projects. I created narratives from the document supported interviews. Then I analyzed the narratives in three steps: first I placed the mirror materials in the developmental cycle. Secondly, I analyzed the mirror materials by placing them in a table by the form of the knowledge. Thirdly, I examined the tools the interventionists had used and created and what wishes they had on virtual tools. This research showed that different user groups of Change Laboratory method have different needs. All interventionists transform knowledge from one form to another so they seem to need especially tools by which they can analyze and transform knowledge. It seems that standardized model of gathering and analysing mirror material is not meaningful because mirror material is constructed in accordance with the object developed. This research also shows that the mirror material has a social function. This finding should be also noted when developing virtual tools together with actual users.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.