23 resultados para Word Sense Disambguaion, WSD, Natural Language Processing

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatisk språkprocessering har efter mer än ett halvt sekel av forskning blivit ett mycket viktigt område inom datavetenskapen. Flera vetenskapligt viktiga problem har lösts och praktiska applikationer har nått programvarumarknaden. Disambiguering av ord innebär att hitta rätt betydelse för ett mångtydigt ord. Sammanhanget, de omkringliggande orden och kunskap om ämnesområdet är faktorer som kan användas för att disambiguera ett ord. Automatisk sammanfattning innebär att förkorta en text utan att den relevanta informationen går förlorad. Relevanta meningar kan plockas ur texten, eller så kan en ny, kortare text genereras på basen av fakta i den ursprungliga texten. Avhandlingen ger en allmän översikt och kort historik av språkprocesseringen och jämför några metoder för disambiguering av ord och automatisk sammanfattning. Problemområdenas likheter och skillnader lyfts fram och metodernas ställning inom datavetenskapen belyses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described. Moreover, results achieved in first five months of the project, i.e. language whitepapers, metadata specification and IPR, are presented.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars

Relevância:

100.00% 100.00%

Publicador:

Resumo:

FinnWordNet is a wordnet for Finnish that complies with the format of the Princeton WordNet (PWN) (Fellbaum, 1998). It was built by translating the PrincetonWordNet 3.0 synsets into Finnish by human translators. It is open source and contains 117000 synsets. The Finnish translations were inserted into the PWN structure resulting in a bilingual lexical database. In natural language processing (NLP), wordnets have been used for infusing computers with semantic knowledge assuming that humans already have a sufficient amount of this knowledge. In this paper we present a case study of using wordnets as an electronic dictionary. We tested whether native Finnish speakers benefit from using a wordnet while completing English sentence completion tasks. We found that using either an English wordnet or a bilingual English Finnish wordnet significantly improves performance in the task. This should be taken into account when setting standards and comparing human and computer performance on these tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents results from a study on the production of Finnish prosody. The effect of word order and the tonal shape in the production of Finnish prosody was studied as produced by 8 native Finnish speakers. Predictions formulated with regard to results from an earlier study pertaining to the perception of promi- nence were tested. These predictions had to do with the tonal shape of the utterances in the form of a flat hat pattern and the effect of word order on the so called top-line declination within an adver- bial phrase in the utterances. The results from the experiment give support to the following claims: the temporal domain of prosodic focus is the whole utterance, word order reversal from unmarked to marked has an effect on the production of prosody, and the pro- duction of the tonal aspects of focus in Finnish follows a basic flat hat pattern. That is the prominence of a word can be produced by an f 0 rise or a fall, depending on the location of the word in an utterance. The basic accentual shape of a Finnish word is then not a pointed rise/fall hat shape as claimed before since it can vary depending on the syllable structure and the position within an ut- terance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research is based on the problems in secondary school algebra I have noticed in my own work as a teacher of mathematics. Algebra does not touch the pupil, it remains knowledge that is not used or tested. Furthermore the performance level in algebra is quite low. This study presents a model for 7th grade algebra instruction in order to make algebra more natural and useful to students. I refer to the instruction model as the Idea-based Algebra (IDEAA). The basic ideas of this IDEAA model are 1) to combine children's own informal mathematics with scientific mathematics ("math math") and 2) to structure algebra content as a "map of big ideas", not as a traditional sequence of powers, polynomials, equations, and word problems. This research project is a kind of design process or design research. As such, this project has three, intertwined goals: research, design and pedagogical practice. I also assume three roles. As a researcher, I want to learn about learning and school algebra, its problems and possibilities. As a designer, I use research in the intervention to develop a shared artefact, the instruction model. In addition, I want to improve the practice through intervention and research. A design research like this is quite challenging. Its goals and means are intertwined and change in the research process. Theory emerges from the inquiry; it is not given a priori. The aim to improve instruction is normative, as one should take into account what "good" means in school algebra. An important part of my study is to work out these paradigmatic questions. The result of the study is threefold. The main result is the instruction model designed in the study. The second result is the theory that is developed of the teaching, learning and algebra. The third result is knowledge of the design process. The instruction model (IDEAA) is connected to four main features of good algebra education: 1) the situationality of learning, 2) learning as knowledge building, in which natural language and intuitive thinking work as "intermediaries", 3) the emergence and diversity of algebra, and 4) the development of high performance skills at any stage of instruction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on reading has been successful in revealing how attention guides eye movements when people read single sentences or text paragraphs in simplified and strictly controlled experimental conditions. However, less is known about reading processes in more naturalistic and applied settings, such as reading Web pages. This thesis investigates online reading processes by recording participants eye movements. The thesis consists of four experimental studies that examine how location of stimuli presented outside the currently fixated region (Study I and III), text format (Study II), animation and abrupt onset of online advertisements (Study III), and phase of an online information search task (Study IV) affect written language processing. Furthermore, the studies investigate how the goal of the reading task affects attention allocation during reading by comparing reading for comprehension with free browsing, and by varying the difficulty of an information search task. The results show that text format affects the reading process, that is, vertical text (word/line) is read at a slower rate than a standard horizontal text, and the mean fixation durations are longer for vertical text than for horizontal text. Furthermore, animated online ads and abrupt ad onsets capture online readers attention and direct their gaze toward the ads, and distract the reading process. Compared to a reading-for-comprehension task, online ads are attended to more in a free browsing task. Moreover, in both tasks abrupt ad onsets result in rather immediate fixations toward the ads. This effect is enhanced when the ad is presented in the proximity of the text being read. In addition, the reading processes vary when Web users proceed in online information search tasks, for example when they are searching for a specific keyword, looking for an answer to a question, or trying to find a subjectively most interesting topic. A scanning type of behavior is typical at the beginning of the tasks, after which participants tend to switch to a more careful reading state before finishing the tasks in the states referred to as decision states. Furthermore, the results also provided evidence that left-to-right readers extract more parafoveal information to the right of the fixated word than to the left, suggesting that learning biases attentional orienting towards the reading direction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research deals with direct speech quotations in magazine articles through two questions: As my major research question, I study the functions of speech quotations based on a data consisting of six literary-journalistic magazine articles. My minor research question builds on the fact that there is no absolute relation between the sound waves of the spoken language and the graphemes of the written one. Hence, I study the general thoughts on how utterances should be arranged in the written form based on a large review of literature and textbooks on journalistic writing as well as interviews I have made with magazine writers and editors, and the Council of Mass Media in Finland. To support my main research questions, I also examine the reference system of the Finnish language, define the aspects of the literary-journalistic article and study vernacular cues in written speech quotations. FUNCTIONS OF QUOTATIONS. I demonstrate the results of my analysis with a six-pointed apparatus. It is a continuum which extends from the structural level of text, all the way through the explicit functions, to the implicit functions of the quotation. The explicit functions deal with the question of what is the content, whereas the implicit ones base mainly on the question how the content is presented. 1. The speech quotation is an distinctive element in the structure of the magazine article. Thereby it creates a rhythm for the text, such as episodes, paragraphs and clauses. 2. All stories are told through a plot, and in magazine articles, the speech quotations are one of the narrative elements that propel the plot forward. 3. The speech quotations create and intensify the location written in the story. This location can be a physical one but also a social one, in which case it describes the atmosphere and mood in the physical environment and of the story characters. 4. The quotations enhance the plausibility of the facts and assumptions presented in the article, and moreover, when a text is placed between quotation marks, the reader can be assured that the text has been reproduced in the authentic verbatim way. 5. Speech quotations tell about the speaker's unique way of using language and the first-hand experiences of the person quoted. 6. The sixth function of speech quotations is probably the most essential one: the quotations characterize the quoted speaker. In other words, in addition to the propositional content of the utterance, the way in which it has been said transmits a lot of the speaker's character (e.g. nature, generation, behaviour, education, attitudes etc.). It is important to notice, that these six functions of my speech quotation apparatus do not exlude one another. It means that every speech quotation basically includes all of the functions discussed above. However, in practice one or more of them have a principal role, while the others play a subsidiary role. HOW TO MAKE QUOTATIONS? It is not suprising that the field of journalism (textbooks, literature and interviews) holds heterogeneous and unestablished thoughts on how the spoken language should be arranged in written quotations, which is my minor research question. However, the most frequent and distinctive aspects can be depicted in a couple of words: serve the reader and respect the target person. Very common advice on how to arrange the quotations is − firstly, to delete such vernacular cues (e.g. repetitions and ”expletives”) that are common in spoken communication, but purposeless in the written language. − secondly, to complete the phonetic word forms of the spoken language into a more reader-friendly form (esim. punanen → punainen, 'red'), and − thirdly, to enhance the independence of clauses from the (authentic) context and to toughen reciprocal links between them. According to the knowledge of the journalistic field, utterances recorded in different points in time of an interview or a data-collecting session can be transferred as consecutive quotations or even merged together. However, if there is any temporal-spatial location written in the story, the dialogue of the story characters should also be situated in an authentic context – chronologically in the right place in the continuum of the events. To summarize, the way in which the utterances should be arranged into written speech quotations is always situationally-specific − and it is strongly based on the author's discretion.