8 resultados para textual data
em Helda - Digital Repository of University of Helsinki
Resumo:
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.
Resumo:
Purpose This study focused on craft from a standpoint of phenomenological philosophy and craft was interpreted through Maurice Merleau-Ponty’s phenomenology of the body. The main focus was the physical phase of the craft process, wherein a product is made from material. The aim was to interpret corporality in craft. There is no former research focusing on lived body in craft science. Physical, bodily making is inalienable in craft, but it is not articulated. Recent discussion has focused on craft as ”whole”, which emphasizes designing part in the process, and craft becomes conceptualized with the theories of art and design. The axiomatic yet silenced basis of craft, corporality, deserves to become examined as well. That is why this study answers the questions: how craft manifests in the light of phenomenology of the body and what is corporality in craft? Methods In this study I cultivated a phenomenological attitude and turned my exploring eye on craft ”in itself”. In addition I restrained myself from mere making and placed myself looking at the occurrence of craft to describe it verbally. I read up Maurice Merleau-Ponty’s phenomenology of the body on his principal work (2002) and former interpretations of it. Interpreting and understanding textual data were based on Gadamer’s hermeneutics, and the four-pronged composition of the study followed Koski’s (1995) version of the Gadamerian process of textual interpretation. Conclusions In the construction of bodily phenomenology craft was to be contemplated as a mutual relationship between the maker and the world materializing in bodily making. At the moment of making a human being becomes one with his craft, and the connection between the maker, material and the equipment appears as communication. Operational dimension was distinctive in the intentionality of craft, which operates in many ways, also in craft products. The synesthesia and synergy of craft were emphasized and craft as bodily practice came to life through them. The moment of making appeared as situation generating time and space, where throwing oneself into making may give the maker an experience of upraise beyond the dualism of mind and body. The conception of the implicit nature of craft knowledge was strengthened. In the light of interpretation it was possible to conceptualize craft as a performance and making ”in itself” as a work of art. In that case craft appeared as bodily expression, which as an experience approaches art without being it after all. The concept of aesthetic was settled into making as well. Bodily and phenomenological viewpoint on craft gave material to critically contemplate the concept of “whole craft” (kokonainen käsityö) and provided different kind of understanding of craft as making.
Resumo:
In this study, which pertains to the field of social gerontology and family research, I analyse the meaning of everyday life as perceived by elderly couples living at home. I use the ethnographic approach, with the aim of interpreting meanings from the elderly people s personal point of view and to increase understanding of their way of life. The study deepens our conception of what gives purpose to the everyday life of elderly people. The number of elderly couples is growing and, to an increasing extent, a couple will live and cope together to a ripe old age. Such coping can also be viewed as an important resource for society. Ethnography tries to get close to people's life practices. I examine the day-to-day life of elderly couples based on textual data, which I obtained by visiting the homes of 16 couples in a total of five small municipalities in Southern Finland. The couples had married soon after the war or in the early 1950s. I found that the aspiration towards continuity, which unites the concepts of place and home, housework and a long marriage, is the most important notion connecting the discussion themes. The results show that in the opinion of the elderly, the concept of a good life is intertwined with a long marriage spent at home, as well as its values. Old people find that they lead an independent life if they feel that they can hold on to the key features of their way of life. Elderly couples ability to cope with everyday life involves taking care of housework and other tasks around the home together. This means that they support one another and have common goals and aspirations. Daily tasks provide substance in the lives of elderly couples. Each day has its rhythm, and the pace of this rhythm is set by routine and habits. Satisfaction stems from the fact that you can do something you are good at. The couples have also revised the division of housework. Men have learned to perform new tasks around the house when their wives can no longer manage them by themselves. Some tasks are given up. Day-to-day life at home and around the house provides room for men s participation. Mutual support and care between husband and wife can also protect them from having to resort to outside or official help. Old couples integrate their life experiences and memories, as well as present and future risks and opportunities. They wish to carry on their lives as before, and still think that their present life corresponds with their idea of a good life. Key words: elderly couples, continuity theory of aging, everyday life, social gerontology, family research
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
This study reports a corpus-based study of medieval English herbals, which are texts conveying information on medicinal plants. Herbals belong to the medieval medical register. The study charts intertextual parallels within the medieval genre, and between herbals and other contemporary medical texts. It seeks to answer questions where and how herbal texts are linked to each other, and to other medical writing. The theoretical framework of the study draws on intertextuality and genre studies, manuscript studies, corpus linguistics, and multi-dimensional text analysis. The method combines qualitative and quantitative analyses of textual material from three historical special-language corpora of Middle and Early Modern English, one of which was compiled for the purposes of this study. The text material contains over 800,000 words of medical texts. The time span of the material is from c. 1330 to 1550. Text material is retrieved from the corpora by using plant name lists as search criteria. The raw data is filtered through qualitative analysis which produces input for the quantitative analysis, multi-dimensional scaling (MDS). In MDS, the textual space that parallel text passages form is observed, and the observations are explained by a qualitative analysis. This study concentrates on evidence of material and structural intertextuality. The analysis shows patterns of affinity between the texts of the herbal genre, and between herbals and other texts in the medical register. Herbals are most closely linked with recipe collections and regimens of health: they comprise over 95 per cent of the intertextual links between herbals and other medical writing. Links to surgical texts, or to specialised medical texts are very few. This can be explained by the history of the herbal genre: as herbals carry information on medical ingredients, herbs, they are relevant for genres that are related to pharmacological therapy. Conversely, herbals draw material from recipe collections in order to illustrate the medicinal properties of the herbs they describe. The study points out the close relationship between medical recipes and recipe-like passages in herbals (recipe paraphrases). The examples of recipe paraphrases show that they may have been perceived as indirect instruction. Keywords: medieval herbals, early English medicine, corpus linguistics, intertextuality, manuscript studies
Resumo:
This study examines both theoretically an empirically how well the theories of Norman Holland, David Bleich, Wolfgang Iser and Stanley Fish can explain readers' interpretations of literary texts. The theoretical analysis concentrates on their views on language from the point of view of Wittgenstein's Philosophical Investigations. This analysis shows that many of the assumptions related to language in these theories are problematic. The empirical data show that readers often form very similar interpretations. Thus the study challenges the common assumption that literary interpretations tend to be idiosyncratic. The empirical data consists of freely worded written answers to questions on three short stories. The interpretations were made by 27 Finnish university students. Some of the questions addressed issues that were discussed in large parts of the texts, some referred to issues that were mentioned only in passing or implied. The short stories were "The Witch à la Mode" by D. H. Lawrence, "Rain in the Heart" by Peter Taylor and "The Hitchhiking Game" by Milan Kundera. According to Fish, readers create both the formal features of a text and their interpretation of it according to an interpretive strategy. People who agree form an interpretive community. However, a typical answer usually contains ideas repeated by several readers as well as observations not mentioned by anyone else. Therefore it is very difficult to determine which readers belong to the same interpretive community. Moreover, readers with opposing opinions often seem to pay attention to the same textual features and even acknowledge the possibility of an opposing interpretation; therefore they do not seem to create the formal features of the text in different ways. Iser suggests that an interpretation emerges from the interaction between the text and the reader when the reader determines the implications of the text and in this way fills the "gaps" in the text. Iser believes that the text guides the reader, but as he also believes that meaning is on a level beyond words, he cannot explain how the text directs the reader. The similarity in the interpretations and the fact that the agreement is strongest when related to issues that are discussed broadly in the text do, however, support his assumption that readers are guided by the text. In Bleich's view, all interpretations have personal motives and each person has an idiosyncratic language system. The situation where a person learns a word determines the most important meaning it has for that person. In order to uncover the personal etymologies of words, Bleich asks his readers to associate freely on the basis of a text and note down all the personal memories and feelings that the reading experience evokes. Bleich's theory of the idiosyncratic language system seems to rely on a misconceived notion of the role that ostensive definitions have in language use. The readers' responses show that spontaneous associations to personal life seem to colour the readers' interpretations, but such instances are rather rare. According to Holland, an interpretation reflects the reader's identity theme. Language use is regulated by shared rules, but everyone follows the rules in his or her own way. Words mean different things to different people. The problem with this view is that if there is any basis for language use, it seems to be the shared way of following linguistic rules. Wittgenstein suggests that our understanding of words is related to the shared ways of using words and our understanding of human behaviour. This view seems to give better grounds for understanding similarity and differences in literary interpretations than the theories of Holland, Bleich, Fish and Iser.
Resumo:
In this paper, I look into a grammatical phenomenon found among speakers of the Cambridgeshire dialect of English. According to my hypothesis, the phenomenon is a new entry into the past BE verb paradigm in the English language. In my paper, I claim that the structure I have found complements the existing two verb forms, was and were, with a third verb form that I have labelled ‘intermediate past BE’. The paper is divided into two parts. In the first section, I introduce the theoretical ground for the study of variation, which is founded on empiricist principles. In variationist linguistics, the main claim is that heterogeneous language use is structured and ordered. In the last 50 years of history in modern linguistics, this claim is controversial. In the 1960s, the generativist movement spearheaded by Noam Chomsky diverted attention away from grammatical theories that are based on empirical observations. The generativists steered away from language diversity, variation and change in favour of generalisations, abstractions and universalist claims. The theoretical part of my paper goes through the main points of the variationist agenda and concludes that abandoning the concept of language variation in linguistics is harmful for both theory and methodology. In the method part of the paper, I present the Helsinki Archive of Regional English Speech (HARES) corpus. It is an audio archive that contains interviews conducted in England in the 1970s and 1980s. The interviews were done in accordance to methods used generally in traditional dialectology. The informants are mostly elderly male people who have lived in the same region throughout their lives and who have left school at an early age. The interviews are actually conversations: the interviewer allowed the informant to pick the topic of conversation to induce a maximally relaxed and comfortable atmosphere and thus allow the most natural dialect variant to emerge in the informant’s speech. In the paper, the corpus chapter introduces some of the transcription and annotation problems associated with spoken language corpora (especially those containing dialectal speech). Questions surrounding the concept of variation are present in this part of the paper too, as especially transcription work is troubled by the fundamental problem of having to describe the fluctuations of everyday speech in text. In the empirical section of the paper, I use HARES to analyse the speech of four informants, with special focus on the emergence of the intermediate past BE variant. My observations and the subsequent analysis permit me to claim that my hypothesis seems to hold. The intermediate variant occupies almost all contexts where one would expect was or were in the informants’ speech. This means that the new variant is integrated into the speakers’ grammars and exemplifies the kind of variation that is at the heart of this paper.