10 resultados para Grammars

em Helda - Digital Repository of University of Helsinki


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coordination and juxtaposed sentences The object of this study is the examination of the relations between juxtaposed clauses in contemporary French. The matter in question is sentences which are composed of several clauses adjoined without a conjunction or other connector, as in: Je détournai les yeux, mon c ur se mit à battre. The aim of the study is to determine, which quality is the relation in these sentences and, on the other hand, what is the part of the coordination there. Furthermore, what is this relation of coordination, which, according to some grammars, manifests through a conjunction of coordination, but which, according to some others is marked in juxtaposed sentences through different features. The study is based on a corpus of written French from literary and journalistic text sources. Syntactic, semantic and textual properties in the clauses are discussed. The analysis points to differences so, it has been noted, in each case, if one of the clauses is affirmative and the other negative and if in the second clause, the subject has not been repeated. Also, an analysis has been made on the ground of the tense, mode, phrase structure type, and thematic structure, taking into account, in each case, if the clauses are identical or different. Punctuation has been one of the properties considered. The final aim has been to eliminate gradually, based on the partition of properties, subordinate sentences, so that only the hard core of coordinate sentences remains. In this way, the coordination could be defined similarly as the phoneme is defined as a group of distinctive features. The quantitative analyses have led to the conclusion that the sentences which, from a semantic point of view, are interpreted as coordinating, contain the least of these differences, while the sentences which can be considered as subordinating present the most of these differences. The conditions of coordination are, in that sense, hierarchical, so that the syntactic constraints have to make room for semantic, textual and cognitive factors. It is interesting to notice that everyone has the ability to produce correct coordinating structures and recognize incorrect coordinating structures. This can be explained by the human ability to categorize which has been widely researched in the semantic of prototype. The study suggests that coordination and subordination could be considered as prototypical cognitive categories based on different linguistic and pragmatic features.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This dissertation provides a synchronic grammatical description of Mauwake, a Papuan (Trans-New Guinea) language of about 2000 speakers on the North Coast of the Madang Province in Papua New Guinea. The theoretical background is that of Basic Linguistic Theory (BLT), used extensively in analysing and writing descriptive grammars. The chapters from morphology to clause level are described from form to function; in the later chapters the function is taken more often as the starting point. Any theory-specific terminology is kept to the minimum and formalisms have been avoided in accordance with BLT principles. Mauwake has a classic 5-vowel system and 14 consonant phonemes. With its simple phonology it is a typical representative of the Madang North Coast languages. For a Papuan language there are relatively few morphophonological alternations. Nouns are either alienably or inalienably possessed. There is no obligatory number marking in nouns or noun phrases. Pronouns have several different forms: five for case and three for other functions. The dative pronouns are treated as [+human] locatives, and they have also grammaticalised as possessives. The verbal morphology is agglutinative and mainly suffixal. Unusual features include two distributive suffixes, and the interaction of the derivational benefactive and the inflectional beneficiary suffixes. The applicative suffix has either transitivising or causative but not benefactive function. The switch-reference system distinguishes between simultaneous and sequential action, as well as same or different subject in relation to the following clause. There are several verbs denoting coming and going, and they may combine with one of three prefixes to indicate bringing and taking. Mauwake is a nominative-accusative type language, and the basic constituent order in a clause is SOV. Subject and object are the only syntactic arguments. There is no indirect object, but a clause can have two or even three objects. A nominalised clause with a finite verb functions as a relative clause or a complement clause; one with a nominalised verb has several different functions. Functional domains described include modality, negation, deixis, quantification, possession and comparison. As there are four negators, Mauwake has more variation in negative expressions than is usual in Papuan languages. Clause chaining is the preferred strategy for joining clauses into sentences, but coordination and subordination of finite clauses are also common. The form of a complement clause depends on whether it is of the fact, action or potential type. Tail-head linkage is used as a cohesive device between sentences. The discourse-level features described are topic and focus.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Title of the Master's thesis: Análisis de la preposición hacia y establecimiento de sus equivalentes en finés (trans. Analysis of the Spanish preposition hacia and the finding of its equivalents in Finnish) Abstracts: The aim of this Master thesis is to provide a detailed analysis of the Spanish preposition hacia from a cognitive perspective and to establish its equivalents in Finnish language. In this sense, my purpose is to demonstrate the suitability of both cognitive perspectives and Contrastive Linguistics for semantic analysis. This thesis is divided into five chapters. The first chapter includes a presentation and a critical review of the monolingual lexical processing and semantic analysis of the Spanish preposition hacia in major reference works. Through this chapter it is possible to see both the inadequacies and omissions that are present in all the given definitions. In this sense, this chapter shows that these problems are not but the upper stage of an ontological (and therefore methodological) problem in the treatment of prepositions. The second chapter covers the presentation of the methodological and theoretical perspective adopted for this thesis for the monolingual analysis and definition of the Spanish preposition hacia, following mainly the guidelines established by G. Lakoff (1987) and R. Langacker (2008) in his Cognitive grammar. Taken together, and within the same paradigm, recent analytical and methodological contributions are discussed critically for the treatment of polysemy in language (cf. Tyler ja Evans 2003). In the third chapter, and in accordance with the requirements regarding the use of empirical data from corpora, is my aim to set out a monolingual original analysis of the Spanish preposition hacia in observance of the principles and the methodology spelled out in the second chapter. The main objective of this chapter is to build a full fledged semantic representation of the polysemy of this preposition in order to understand and articulate its meanings with Finnish language (and other possible languages). The fourth chapter, in accordance with the results of chapter 3, examines and describes and establishes the corresponding equivalents in Finnish for this preposition. The results obtained in this chapter are also contrasted with the current bilingual lexicographical definitions found in the most important dictionaries and grammars. Finally, in the fifth chapter of this thesis, the results of this work are discussed critically. In this way, some observations are given regarding both the ontological and theoretical assumptions as well regarding the methodological perspective adopted. I also present some notes for the construction of a general methodology for the semantic analysis of Spanish prepositions to be carried out in further investigations. El objetivo de este trabajo, que caracterizamos como una tarea de carácter comparativo-analítico, es brindar un análisis detallado de la preposición castellana hacia desde una perspectiva cognitiva en tanto y a través del establecimiento de sus equivalentes en finés. Se procura, de esta forma, demostrar la adecuación de una perspectiva cognitiva tanto para el examen como para el establecimiento y articulación de la serie de equivalentes que una partícula, en nuestro caso una preposición, encuentra en otra lengua. De esta forma, y frente a definiciones canónicas que advierten sobre la imposibilidad de una caracterización acabada del conjunto de usos de una preposición, se observa como posible, a través de la aplicación de una metodología teórica-analítica adecuada, la construcción de una definición viable tanto en un nivel jerárquico como descriptivo. La presente tesis se encuentra dividida en cinco capítulos. El primer capítulo comprende una exposición y revisión critica del tratamiento monolingüe lexicográfico y analítico que la preposición hacia ha recibido en las principales obras de referencia, donde se observa que las inadecuaciones y omisiones presentes en la totalidad de las definiciones analizadas representan tan sólo el estadio superior de una problemática de carácter ontológico y, por tanto, metodológico, en el tratamiento de las preposiciones. El capítulo segundo comprende la presentación de la perspectiva teórica metodológica adoptada en esta tesis para el análisis y definición monolingüe de la preposición hacia, teniendo por líneas directrices las propuestas realizadas por G. Lakoff , así como a los fundamentos establecidos por R. Langacker en su propuesta cognitiva para una nueva gramática. En forma conjunta y complementaria, y dentro del mismo paradigma, empleamos, discutimos críticamente y desarrollamos diferentes aportes analítico-metodológicos para el tratamiento de la polisemia en unidades lingüísticas locativas. En el capítulo tercero, y en acuerdo con las exigencias respecto a la utilización de datos empíricos obtenidos a partir de corpus textuales, se expone un análisis original monolingüe de la preposición hacia en observancia de los principios y la metodología explicitada en el capítulo segundo, teniendo por principal objetivo la construcción de una representación semántica de la polisemia de la preposición que comprenda y articule los sentidos prototípicos para ésta especificados. El capítulo cuarto, y en acuerdo con los resultados de nuestro análisis monolingual de la preposición, se examinan, describen y establecen los equivalentes correspondientes en finés para hacia; asimismo, se contrastan en este capítulo los resultados obtenidos con las definiciones lexicográficas bilingües vigentes. Se recogen en el último y quinto capítulo de esta tesis algunas observaciones tanto respecto a los postulados ontológicos y teórico-metodológicos de la perspectiva adoptada, así como algunas notas para la construcción de una metodología general para el análisis semántico preposicional.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, I look into a grammatical phenomenon found among speakers of the Cambridgeshire dialect of English. According to my hypothesis, the phenomenon is a new entry into the past BE verb paradigm in the English language. In my paper, I claim that the structure I have found complements the existing two verb forms, was and were, with a third verb form that I have labelled ‘intermediate past BE’. The paper is divided into two parts. In the first section, I introduce the theoretical ground for the study of variation, which is founded on empiricist principles. In variationist linguistics, the main claim is that heterogeneous language use is structured and ordered. In the last 50 years of history in modern linguistics, this claim is controversial. In the 1960s, the generativist movement spearheaded by Noam Chomsky diverted attention away from grammatical theories that are based on empirical observations. The generativists steered away from language diversity, variation and change in favour of generalisations, abstractions and universalist claims. The theoretical part of my paper goes through the main points of the variationist agenda and concludes that abandoning the concept of language variation in linguistics is harmful for both theory and methodology. In the method part of the paper, I present the Helsinki Archive of Regional English Speech (HARES) corpus. It is an audio archive that contains interviews conducted in England in the 1970s and 1980s. The interviews were done in accordance to methods used generally in traditional dialectology. The informants are mostly elderly male people who have lived in the same region throughout their lives and who have left school at an early age. The interviews are actually conversations: the interviewer allowed the informant to pick the topic of conversation to induce a maximally relaxed and comfortable atmosphere and thus allow the most natural dialect variant to emerge in the informant’s speech. In the paper, the corpus chapter introduces some of the transcription and annotation problems associated with spoken language corpora (especially those containing dialectal speech). Questions surrounding the concept of variation are present in this part of the paper too, as especially transcription work is troubled by the fundamental problem of having to describe the fluctuations of everyday speech in text. In the empirical section of the paper, I use HARES to analyse the speech of four informants, with special focus on the emergence of the intermediate past BE variant. My observations and the subsequent analysis permit me to claim that my hypothesis seems to hold. The intermediate variant occupies almost all contexts where one would expect was or were in the informants’ speech. This means that the new variant is integrated into the speakers’ grammars and exemplifies the kind of variation that is at the heart of this paper.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This dissertation is a synchronic description of adnominal person in the highly synthetic morphological system of Erzya as attested in extensive Erzya-language written-text corpora consisting of nearly 140 publications with over 4.5 million words and over 285,000 unique lexical items. Insight for this description have been obtained from several source grammars in German, Russian, Erzya, Finnish, Estonian and Hungarian, as well as bounteous discussions in the understanding of the language with native speakers and grammarians 1993 2010. Introductory information includes the discussion of the status of Erzya as a lan- guage, the enumeration of phonemes generally used in the transliteration of texts and an in-depth description of adnominal morphology. The reader is then made aware of typological and Erzya-specifc work in the study of adnominal-type person. Methods of description draw upon the prerequisite information required in the development of a two-level morphological analyzer, as can be obtained in the typological description of allomorphic variation in the target language. Indication of original author or dialect background is considered important in the attestation of linguistic phenomena, such that variation might be plotted for a synchronic description of the language. The phonological description includes the establishment of a 6-vowel, 29-consonant phoneme system for use in the transliteration of annotated texts, i.e. two phonemes more than are generally recognized, and numerous rules governing allophonic variation in the language. Erzya adnominal morphology is demonstrated to have a three-way split in stem types and a three-layer system of non-derivative affixation. The adnominal-affixation layers are broken into (a) declension (the categories of case, number and deictic marking); (b) nominal conjugation (non-verb grammatical and oblique-case items can be conjugated), and (c) clitic marking. Each layer is given statistical detail with regard to concatenability. Finally, individual subsections are dedicated to the matters of: possessive declension compatibility in the distinction of sublexica; genitive and dative-case paradigmatic defectivity in the possessive declension, where it is demonstrated to be parametrically diverse, and secondary declension, a proposed typology modifiers without nouns , as compatible with adnominal person.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pappret conceptualizes parsning med Constraint Grammar på ett nytt sätt som en process med två viktiga representationer. En representation innehåller lokala tvetydighet och den andra sammanfattar egenskaperna hos den lokala tvetydighet klasser. Båda representationer manipuleras med ren finite-state metoder, men deras samtrafik är en ad hoc -tillämpning av rationella potensserier. Den nya tolkningen av parsning systemet har flera praktiska fördelar, bland annat det inåt deterministiska sättet att beräkna, representera och räkna om alla potentiella tillämpningar av reglerna i meningen.