11 resultados para programming language processing

em Helda - Digital Repository of University of Helsinki


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described. Moreover, results achieved in first five months of the project, i.e. language whitepapers, metadata specification and IPR, are presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automatisk språkprocessering har efter mer än ett halvt sekel av forskning blivit ett mycket viktigt område inom datavetenskapen. Flera vetenskapligt viktiga problem har lösts och praktiska applikationer har nått programvarumarknaden. Disambiguering av ord innebär att hitta rätt betydelse för ett mångtydigt ord. Sammanhanget, de omkringliggande orden och kunskap om ämnesområdet är faktorer som kan användas för att disambiguera ett ord. Automatisk sammanfattning innebär att förkorta en text utan att den relevanta informationen går förlorad. Relevanta meningar kan plockas ur texten, eller så kan en ny, kortare text genereras på basen av fakta i den ursprungliga texten. Avhandlingen ger en allmän översikt och kort historik av språkprocesseringen och jämför några metoder för disambiguering av ord och automatisk sammanfattning. Problemområdenas likheter och skillnader lyfts fram och metodernas ställning inom datavetenskapen belyses.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents results from a study on the production of Finnish prosody. The effect of word order and the tonal shape in the production of Finnish prosody was studied as produced by 8 native Finnish speakers. Predictions formulated with regard to results from an earlier study pertaining to the perception of promi- nence were tested. These predictions had to do with the tonal shape of the utterances in the form of a flat hat pattern and the effect of word order on the so called top-line declination within an adver- bial phrase in the utterances. The results from the experiment give support to the following claims: the temporal domain of prosodic focus is the whole utterance, word order reversal from unmarked to marked has an effect on the production of prosody, and the pro- duction of the tonal aspects of focus in Finnish follows a basic flat hat pattern. That is the prominence of a word can be produced by an f 0 rise or a fall, depending on the location of the word in an utterance. The basic accentual shape of a Finnish word is then not a pointed rise/fall hat shape as claimed before since it can vary depending on the syllable structure and the position within an ut- terance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Research on reading has been successful in revealing how attention guides eye movements when people read single sentences or text paragraphs in simplified and strictly controlled experimental conditions. However, less is known about reading processes in more naturalistic and applied settings, such as reading Web pages. This thesis investigates online reading processes by recording participants eye movements. The thesis consists of four experimental studies that examine how location of stimuli presented outside the currently fixated region (Study I and III), text format (Study II), animation and abrupt onset of online advertisements (Study III), and phase of an online information search task (Study IV) affect written language processing. Furthermore, the studies investigate how the goal of the reading task affects attention allocation during reading by comparing reading for comprehension with free browsing, and by varying the difficulty of an information search task. The results show that text format affects the reading process, that is, vertical text (word/line) is read at a slower rate than a standard horizontal text, and the mean fixation durations are longer for vertical text than for horizontal text. Furthermore, animated online ads and abrupt ad onsets capture online readers attention and direct their gaze toward the ads, and distract the reading process. Compared to a reading-for-comprehension task, online ads are attended to more in a free browsing task. Moreover, in both tasks abrupt ad onsets result in rather immediate fixations toward the ads. This effect is enhanced when the ad is presented in the proximity of the text being read. In addition, the reading processes vary when Web users proceed in online information search tasks, for example when they are searching for a specific keyword, looking for an answer to a question, or trying to find a subjectively most interesting topic. A scanning type of behavior is typical at the beginning of the tasks, after which participants tend to switch to a more careful reading state before finishing the tasks in the states referred to as decision states. Furthermore, the results also provided evidence that left-to-right readers extract more parafoveal information to the right of the fixated word than to the left, suggesting that learning biases attentional orienting towards the reading direction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Goals. Specific language impairment (SLI) has a negative impact on child s speech and language development and interaction. Disorder may be associated with a wide range of comorbid problems. In clinical speech therapy it is important to see the child as a whole so that the rehabilitation can be targeted properly. The aim of this study was to describe the linguistic-cognitive and comorbid symptoms of children with SLI at the age of five, as well as to provide an overwiew of the developmental disorders in the families. The study is part of a larger research project, which will examine paths of development and quality of life of children with SLI as young adults. Methods. The data consisted of patient documents of 100 5-year old children, who were examined in Lastenlinna mainly at 1998. Majority of the subjects were boys, and children s primary diagnosis was either F80.1 or F80.2, which was most common, or both. The diagnosis and the information about the linguistic-cognitive status and comorbid symptoms were collected from reports of medical doctors and experts of other fields, as well as mentions related to familiality. Linguistic-cognitive symptoms were divided into subclasses of speech motor functions, prosessing of language, comprehension of language and use of language. Comorbid symptoms were divided into subclasses of interaction, activity and attention, emotional and behavior problems and neurologic problems. Statistical analyses were based mainly on Pearson s Chi Square test. Results and conclusions. Problems in language processing and speech motor functions were most common of the linguistic-cognitive symptoms. Most of the children had symptoms from two or three symptom classes, and it seemed that girls had more symptoms than boys. Usually children did not have any comorbid symptoms, or had them from one or three symptom classes. Of the comorbid symptoms the most prevalent ones were problems in activity and attention and neurological symptoms, which consisted mostly of motoric and visuomotoric symptoms. The most common of the comorbid diagnoses was F82, specific developmental disorder of motor function. According to literature children with SLI may have problems in mental health, but the results of this study did not confirm that. Children with diagnosis F80.2 had more linguistic-cognitive and comorbid symptoms than children with diagnosis F80.1. The cluster analyses based on all the symtoms revealed four subgroups of the subjects. Of the subjects 85 percent had a positive family history of developmental disorders, and the most prevalent problem in the families was delayed speech development. This study outlined the symptom profile of children with SLI and laid a foundation for the future longitudinal study. The results suggested that there are differences between linguistic-cognitive symptoms of boys and girls, which is important to notice especially when assessing and diagnosing children with SLI.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

FinnWordNet is a wordnet for Finnish that complies with the format of the Princeton WordNet (PWN) (Fellbaum, 1998). It was built by translating the PrincetonWordNet 3.0 synsets into Finnish by human translators. It is open source and contains 117000 synsets. The Finnish translations were inserted into the PWN structure resulting in a bilingual lexical database. In natural language processing (NLP), wordnets have been used for infusing computers with semantic knowledge assuming that humans already have a sufficient amount of this knowledge. In this paper we present a case study of using wordnets as an electronic dictionary. We tested whether native Finnish speakers benefit from using a wordnet while completing English sentence completion tasks. We found that using either an English wordnet or a bilingual English Finnish wordnet significantly improves performance in the task. This should be taken into account when setting standards and comparing human and computer performance on these tasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.