Biblioteca Digital

939 resultados para Contrastive linguistics

Contributions to the Theory of Finite-State Based Grammars

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars

Agreement Patterns in English : Diachronic Corpus Studies on Common-Number Pronouns

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study reports a diachronic corpus investigation of common-number pronouns used to convey unknown or otherwise unspecified reference. The study charts agreement patterns in these pronouns in various diachronic and synchronic corpora. The objective is to provide base-line data on variant frequencies and distributions in the history of English, as there are no previous systematic corpus-based observations on this topic. This study seeks to answer the questions of how pronoun use is linked with the overall typological development in English and how their diachronic evolution is embedded in the linguistic and social structures in which they are used. The theoretical framework draws on corpus linguistics and historical sociolinguistics, grammaticalisation, diachronic typology, and multivariate analysis of modelling sociolinguistic variation. The method employs quantitative corpus analyses from two main electronic corpora, one from Modern English and the other from Present-day English. The Modern English material is the Corpus of Early English Correspondence, and the time frame covered is 1500-1800. The written component of the British National Corpus is used in the Present-day English investigations. In addition, the study draws supplementary data from other electronic corpora. The material is used to compare the frequencies and distributions of common-number pronouns between these two time periods. The study limits the common-number uses to two subsystems, one anaphoric to grammatically singular antecedents and one cataphoric, in which the pronoun is followed by a relative clause. Various statistical tools are used to process the data, ranging from cross-tabulations to multivariate VARBRUL analyses in which the effects of sociolinguistic and systemic parameters are assessed to model their impact on the dependent variable. This study shows how one pronoun type has extended its uses in both subsystems, an increase linked with grammaticalisation and the changes in other pronouns in English through the centuries. The variationist sociolinguistic analysis charts how grammaticalisation in the subsystems is embedded in the linguistic and social structures in which the pronouns are used. The study suggests a scale of two statistical generalisations of various sociolinguistic factors which contribute to grammaticalisation and its embedding at various stages of the process.

Primestoimenno-otnositel'nye konstrukcii v sovremennom russkom jazyke

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Relative Constructions with Pronominal Heads in Contemporary Russian Chapter 1 introduces the distinctive syntactic and semantic properties of Russian relative constructions (RCs), which are then divided into two main classes according to the type of the head phrase. The study concentrates on RCs with pronominal heads, which are systematically compared with noun-headed RCs. Chapter 2 clarifies the categorization of pronouns in Russian. The conclusion is that Russian pronouns include only personal, reflexive and wh-pronouns. The remaining words that are traditionally seen as pronouns are actually functional equivalents of determiners. This idea leads to the suggestion that RCs with these determiner-like words as the only constituent of the head phrase are actually headed by zero pronouns. In the other type of RCs with pronominal heads, the head position is occupied by wh-pronouns with clitics expressing different types of indefiniteness and quantification. Comparison of the two types of pronoun-headed RCs shows that the wh-heads and zero-heads share a number of common properties with respect to the grammatical gender, number and person as well as to the semantic distinction between animates and inanimates. The rest of Chapter 2 gives an overview of various uses of wh-pronouns in Russian and an experimental analysis of RCs headed by pronominal adverbs. Chapter 3 discusses fundamental differences between RCs with noun and pronominal heads. One of the main findings is that the choice of the relative pronoun (kto 'who' and chto 'what' versus kotoryj 'which') is motivated by a tendency to reproduce maximally the essential grammatical and semantic properties of the antecedent. Chapter 4 gives a detailed description of the determiner-like words and wh-based heads used in the two types of RCs with pronominal heads. In addition, several issues related to the syntax and semantics of free relatives are discussed. The conclusion is that there is no need to establish a separate category of free relatives in Russian. Chapter 5 discusses the syntax and semantics of correlative and free concessive constructions. They share a number of properties with pronoun-headed RCs and the two are often confused in Russian linguistics. However, a detailed analysis shows that these constructions must be distinguished from RCs. The study combines the methods of functionally-oriented Russian structuralism with some insights from generative syntax.

Zur Sättigung der Valenz in den Kleinen Meldungen des Typus Notiz : Eine pragmatisch fundierte Analyse

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Valency Realization in Short Excerpts of News Text. A Pragmatics-funded analysis This dissertation is a study of the so-called pragmatic valency. The aim of the study is to examine the phenomenon both theoretically by discussing the research literature and empirically based on evidence from a text corpus consisting of 218 short excerpts of news text from the German newspaper Frankfurter Allgemeine Zeitung. In the theoretical part of the study, the central concepts of the valency and the pragmatic valency are discussed. In the research literature, the valency denotes the relation among the verb and its obligatory and optional complements. The pragmatic valency can be defined as modification of the so-called system valency in the parole, including non-realization of an obligatory complement, non- realization of an optional complement and realization of an optional complement. Furthermore, the investigation of the pragmatic valency includes the role of the adjuncts, elements that are not defined by the valency, in the concrete valency realization. The corpus study investigates the valency behaviour of German verbs in a corpus of about 1500 sentences combining the methodology and concepts of valency theory, semantics and text linguistics. The analysis is focused on the about 600 sentences which show deviations from the system valency, providing over 800 examples for the modification of the system valency as codified in the (valency) dictionaries. The study attempts to answer the following primary question: Why is the system valency modified in the parole? To answer the question, the concept of modification types is entered. The modification types are recognized using distinctive feature bundles in which each feature with a negative or a positive value refers to one reason for the modification treated in the research literature. For example, the features of irrelevance and relevance, focus, world and text type knowledge, text theme, theme-rheme structure and cohesive chains are applied. The valency approach appears in a new light when explored through corpus-based investigation; both the optionality of complements and the distinction between complements and adjuncts as defined in the present valency approach seem in some respects defective. Furthermore, the analysis indicates that the adjuncts outside the valency domain play a central role in the concrete realization of the valency. Finally, the study suggests a definition of pragmatic valency, based on the modification types introduced in the study and tested in the corpus analysis.

Parsing in two frameworks: finite-state and functional dependency grammar

Relevância:

10.00% 10.00%

Publicador:

Structure informationnelle et constructions du kabyle : Etude de trois types de phrase dans le cadre de la grammaire constructionnelle

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information structure and Kabyle constructions Three sentence types in the Construction Grammar framework The study examines three Kabyle sentence types and their variants. These sentence types have been chosen because they code the same state of affairs but have different syntactic structures. The sentence types are Dislocated sentence, Cleft sentence, and Canonical sentence. I argue first that a proper description of these sentence types should include information structure and, second, that a description which takes into account information structure is possible in the Construction Grammar framework. The study thus constitutes a testing ground for Construction Grammar for its applicability to a less known language. It constitutes a testing ground notably because the differentiation between the three types of sentences cannot be done without information structure categories and, consequently, these categories must be integrated also in the grammatical description. The information structure analysis is based on the model outlined by Knud Lambrecht. In that model, information structure is considered as a component of sentence grammar that assures the pragmatically correct sentence forms. The work starts by an examination of the three sentence types and the analyses that have been done in André Martinet s functional grammar framework. This introduces the sentence types chosen as the object of study and discusses the difficulties related to their analysis. After a presentation of the state of the art, including earlier and more recent models, the principles and notions of Construction Grammar and of Lambrecht s model are introduced and explicated. The information structure analysis is presented in three chapters, each treating one of the three sentence types. The analyses are based on spoken language data and elicitation. Prosody is included in the study when a syntactic structure seems to code two different focus structures. In such cases, it is pertinent to investigate whether these are coded by prosody. The final chapter presents the constructions that have been established and the problems encountered in analysing them. It also discusses the impact of the study on the theories used and on the theory of syntax in general.

Coordinated Verb Pairs in Texts

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This dissertation discusses the relation between lexis, grammar and textual organisation. The major premise adopted here is that grammatical structures are motivated both by semantic potential of words and by text-pragmatic demands. In other words, it is argued that grammatical structures form the interface between lexis and textual organisation, and that linguistic analysis should not concentrate on analysing grammatical structures in isolation, independent of context. From this point of view, grammatical structures are said to be 'well-formed' only in relation to the context they occur in. This study is based on a corpus of three million words of recent Finnish fiction from which all the occurrences of the coordinated verb pairs ([V ja V] -pairs]) containing one of the intransitive motion verbs 'lähteä' (to go), 'mennä' (to go), 'päästä' (to get into), 'nousta' (to get up), and 'laskea' (to go down), were extracted. This set of verbs was established using methods described in earlier work by Lagus & Airola (2001, and 2005). The quantitative analysis of the [V ja V] -pairs was used to carry out a qualitative analysis of individual texts. In analysing the texts, an analogy was made between musical and textual structure. The results show among others that individual verbs specialise in different functions when occurring in coordinated verb pairs. One aspect was that those verb pairs including the verb 'nousta' tend to function as markers of textual boundaries and thus reflect the organisation of narrative substance. The verb 'mennä' has weakened literal meanings, but strengthened modal meanings when occurring in [V ja V] -pairs, and, in many cases, the verb 'lähteä' in [V ja V] -pairs function as an aspectual marker rather than a pure verb of motion. That there is a gradient from the concrete sense of motion into more differentiated senses of a verb in [V ja V] -pairs alongside the structure-creating potential of the [V ja V] -pairs themselves suggest an ongoing grammaticalisation process of the patterns discussed.

La contextualisation du discours radiophonique par des moyens prosodiques. L’exemple de cinq grands philosophes français du XXe siècle

Relevância:

10.00% 10.00%

Publicador:

Resumo:

"Radiodiskurssin kontekstualisointi prosodisin keinoin. Esimerkkinä viisi suurta ranskalaista 1900-luvun filosofia" Väitöskirja käsittelee puheen kontekstualisointia prosodisin keinoin. Toisin sanottuna työssä käsitellään sitä, miten puheen prosodiset piirteet (kuten sävelkulku, intensiteetti, tauot, kesto ja rytmi) ohjaavat puheen tulkintaa vanhastaan enemmän tutkittujen sana- ja lausemerkitysten ohella. Työssä keskitytään seitsemään prosodisesti merkittyyn kuvioon, jotka koostuvat yhden tai usean parametrin silmiinpistävistä muutoksista. Ilmiöitä käsitellään sekä niiden akustisten muotojen että tyypillisten esiintymisyhteyksien ja diskursiivisten tehtävien näkökulmasta. Aineisto koostuu radio-ohjelmista, joissa puhuu viisi suurta ranskalaista 1900-luvun filosofia: Gaston Bachelard, Albert Camus, Michel Foucault, Maurice Merleau-Ponty ja Jean-Paul Sartre. Ohjelmat on lähetetty eri radiokanavilla Ranskassa vuosina 1948–1973. Väitöskirjan tulokset osoittavat, että prosodisesti merkityt kuviot ovat moniulotteisia puheen ilmiöitä, joilla on keskeinen rooli sanotun kontekstualisoinnissa: ne voivat esimerkiksi nostaa tai laskea sanotun informaatioarvoa, ilmaista puhujan voimakasta tai heikkoa sitoutumista sanomaansa, ilmaista rakenteellisen kokonaisuuden jatkumista tai päättymistä, jne. Väitöskirja sisältää myös kontrastiivisia osia, joissa ilmiöitä verrataan erääseen klassisessa pianomusiikissa esiintyvään melodiseen kuvioon sekä erääseen suomen kielen prosodiseen ilmiöön. Tulokset viittaavat siihen, että tietynlaista melodista kuviota käytetään samankaltaisena jäsentämiskeinona sekä puheessa että klassisessa musiikissa. Lisäksi tulokset antavat viitteitä siitä, että tiettyjä melodisia muotoja käytetään samankaltaisten implikaatioiden luomiseen kahdessa niinkin erilaisessa kielessä kuin suomessa ja ranskassa. Yksi väitöskirjan osa käsittelee pisteen ja pilkun prosodista merkitsemistä puheessa. Tulosten mukaan pisteellä ja pilkulla on kummallakin oma suullinen prototyyppinsä: piste merkitään tyypillisesti sävelkulun laskulla ja tauolla, ja pilkku puolestaan sävelkulun nousulla ja tauolla. Merkittävimmät tulokset koskevat kuitenkin tapauksia, joissa välimerkki tulkitaan prosodisesti epätyypillisellä tavalla: sekä pisteellä että pilkulla vaikuttaisi olevan useita eri suullisia vastaavuuksia, ja välimerkkien tehtävät voivat muotoutua hyvin erilaisiksi niiden prosodisesta tulkinnasta riippuen.

Given and News : Media discourse and the construction of community on national days

Relevância:

10.00% 10.00%

Publicador:

Resumo:

National anniversaries such as independence days demand precise coordination in order to make citizens change their routines to forego work and spend the day at rest or at festivities that provide social focus and spectacle. The complex social construction of national days is taken for granted and operates as a given in the news media, which are the main agents responsible for coordinating these planned disruptions of normal routines. This study examines the language used in the news to construct the rather unnatural idea of national days and to align people in observing them. The data for the study consist of news stories about the Fourth of July in the New York Times, sampled over 150 years and are supplemented by material from other sources and other countries. The study is multidimensional, applying concepts from pragmatics (speech acts, politeness, information structure), systemic functional linguistics (the interpersonal metafunction and the Appraisal framework) and cognitive linguistics (frames, metaphor) as well as journalism and communications to arrive at an interdisciplinary understanding of how resources for meaning are used by writers and readers of the news stories. The analysis shows that on national anniversaries, nations tend to be metaphorized as persons having birthdays, to whom politeness should be shown. The face of the nation is to be respected in the sense of identifying the nation's interests as one's own (positive face) and speaking of citizen responsibilities rather than rights (negative face). Resources are available for both positive and negative evaluations of events and participants and the newspaper deftly changes footings (Goffman 1981) to demonstrate the required politeness while also heteroglossically allowing for a certain amount of disattention and even protest - within limits, for state holidays are almost never construed as Bakhtinian festivals, as they tend to reaffirm the hierarchy rather than invert it. Celebrations are evaluated mainly for impressiveness, and for the essentially contested quality of appropriateness, which covers norms of predictability, size, audience response, aesthetics, and explicit reference to the past. Events may also be negatively evaluated as dull ("banal") or inauthentic ("hoopla"). Audiences are evaluated chiefly in terms of their enthusiasm, or production of appropriate displays for emotional response, for national days are supposed to be occasions of flooding-out of nationalistic feeling. By making these evaluations, the newspaper reinforces its powerful position as an independent critic, while at the same time playing an active role in the construction and reproduction of emotional order embodied in "the nation's birthday." As an occasion for mobilization and demonstrations of power, national days may be seen to stand to war in the relation of play to fighting (Bateson 1955). Evidence from the newspaper's coverage of recent conflicts is adduced to support this analysis. In the course of the investigation, methods are developed for analyzing large collections of newspaper content, particularly topical soft news and feature materials that have hitherto been considered less influential and worthy of study than so-called hard news. In his work on evaluation in newspaper stories, White (1998) proposed that the classic hard news story is focused on an event that threatens the social order, but news of holidays and celebrations in general does not fit this pattern, in fact its central event is a reproduction of the social order. Thus in the system of news values (Galtung and Ruge 1965), national holiday news draws on "ground" news values such as continuity and predictability rather than "figure" news values such as negativity and surprise. It is argued that this ground helps form a necessary space for hard news to be seen as important, similar to the way in which the information structure of language is seen to rely on the regular alternation of given and new information (Chafe 1994).

Da hatte das Pferd die Nüstern voll. Gebrauch und Funktion von Phraseologie im Kinderbuch : Untersuchungen zu Erich Kästner und anderen Autoren

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Da hatte das Pferd die Nüstern voll. Gebrauch und Funktion von Phraseologie im Kinderbuch. Untersuchungen zu Erich Kästner und anderen Autoren. [Da hatte das Pferd die Nüstern voll. Fraseologian käyttö ja tehtävä lastenkirjallisuudessa. Tutkimuksia Erich Kästnerin ja muiden kirjailijoiden tuotannossa.] Usein oletetaan, että idiomit ovat lapsille vaikeita ymmärtää, koska niiden merkitystä ei voi kokonaisuudessaan johtaa rakenteeseen kuuluvien yksittäisten sanojen merkityksestä. Silti lastenkirjallisuudessa idiomeja käytetään paljon ja monessa eri tehtävässä. Tässä tutkimuksessa tarkistetaan fraseologian (idiomien ja sanalaskujen) käytön koko skaala saksankielisessä lastenkirjallisuudessa Erich Kästnerin (1899-1976) klassikoista tähän päivään asti. Kolmen eri korpuksen avulla (905 idiomiesimerkkiä kuudesta Kästnerin lastenkirjasta, 333 idiomia kahdesta Kästnerin aikuisromaanista ja 580 esimerkkiä kuudesta eri kirjailijoiden kirjoittamasta lastenkirjasta) pyritään vastamaan mm. seuraaviin kysymyksiin: Kuinka paljon ja minkälaisia idiomeja teksteissä käytetään? Miten idiomit sijoitetaan teksteihin, minkälaisia suhteita kontekstiin rakentuu? Millaisia eroavaisuuksia idiomien käytössä on havaittavissa ensinnäkin saman kirjailijan (Kästnerin) lastenkirjojen ja aikuisille tarkoitettujen kirjojen välillä sekä toisaalta eri kirjailijoiden kirjoittamien lastenkirjojen välillä? Tutkimuksesta käy ilmi, että idiomien käyttö vaihtelee lastenkirjallisuudessa ensisijaisesti kirjailijoittain, joka näkyy erilaisten ’fraseologisten profiilien’ esiintyminä. Parafraasien käyttö (idiomin rinnalle asetetaan synonyyminen ei-idiomaattinen ilmaisu) on varsin yleistä kaikissa tutkituissa lastenkirjoissa. Kästnerin lastenkirjoissa parafraasin käyttö on selvästi yleisempää kuin aikuisromaaneissa. Näyttää siltä, että lastenkirjallisuudessa siis tietoisesti tai tiedostumatta otetaan huomioon lasten rajoitettu fraseologinen kompetenssi.

Vyra enie obob čёnno-ličnogo značenija v russkom jazyke

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Expressing generalized-personal meaning in Russian Based on data from Russian, this doctoral dissertation examines generalized-personal meaning that is, generic expressions referring to all human beings, people in general, each or any person (e.g. S vozrastom načinae cenit prostye ve či With age you start to appreciate simple things ). The study shares its basic theoretical orientation with functional approaches going from meaning to form . The objective of the thesis is to determine and describe the various linguistic means which can be used by the speaker to express generalized-personal meaning. The main material of the study consists of 2,000 examples collected from modern Russian literature, newspapers, and magazines. The linguistic means of expressing generalized-personal meaning are divided into three main classes. Morphological and lexico-grammatical means (22% of the material) include the use of personal pronouns and personal verbal endings. In Russian, all personal forms except the 3rd person singular can be used in a generalized-personal meaning. Lexical means (14% of the material) involve, above all, pronouns like vse all , ka dyj everyone , nikto no one , as well as the nouns čelovek man and ljudi people . In emotional speech, generalized-personal meaning can also be conveyed lexically by using utterances like da e idiot znaet even an idiot knows . In rhetorical questions the pronoun kto who can appear in this meaning (cf. Kto ne ljubit moro enoe?! Who doesn t like ice cream?! ). The third main class, syntactic means (64% of the material), consists of constructions in which the generic person is not expressed at the surface level. This class mainly includes two-component structures in which the infinitive relates to a modal predicative adverb (e.g. mo no can, be allowed to , nado must ), modal verb (e.g. stoit be worth(while) , sleduet must, be obliged to ), or predicative adverb ending in -о (e.g. trudno it is hard to , neprilično is not appropriate ). Other syntactic means are: one-component infinitive structures, so-called embedded structures, structures with a processual noun, passive constructions, and gerund constructions. The different forms of expression available in Russian are not interchangeable in all contexts. Even if a given context tolerates the substitution of one construction for another, the two expressions are never entirely synonymous. In addition to determining the range of forms which can express generalized-personal meaning, the study aims to compare these forms and to specify the conditions and possible restrictions (contextual, semantic, syntactic, stylistic, etc.) associated with the use of each construction. In Russian linguistics, the generalized-personal meaning has not been extensively studied from a functional perspective. The advantage of a meaning-based functional approach is that it gives a comprehensive picture of the diversity and distribution of the phenomenon.

A cross-linguistic study of lexical iconicity and its manifestation in bird names

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract This dissertation is a cross-linguistic study of lexical iconicity. The study is based on a genealogically stratified sample of 237 languages. The aim is to contribute with an empirical study to the growing dialogue focusing on different forms of lexical iconicity. The conceptual framework of the present study is based on an analysis of types and means of lexical iconicity in the sample languages. Archaeological and cultural evidence are used to tie lexical iconicity to its context. Phenomena related to lexical iconicity are studied both cross-linguistically and language-specifically. The cognitive difference between imitation and symbolism is essential. Lexical iconicity is not only about the iconic relationship between form and referents, but also about how certain iconic properties may become conventional, means used to create sound symbolism. All the sample languages show some evidence of lexical iconicity, demonstrating that it is a universal feature. Nine comparisons of onomatopoeic verbs and nouns, with samples varying between six and 141 languages, show that typologically highly different languages use similar means for creating words based on sound imitation. Two cross-linguistic comparisons of bird names demonstrate that a vast majority of the Eurasian names of the common cuckoo and the world-wide names of crow and raven of the 141 genera are onomatopoeic.

Univariate, bivariate, and multivariate methods in corpus-based lexicography : A study of synonymy

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.

Èvoljucija sistemy glasnyh fonem v nekotoryh russkih govorah Vologodskoj oblasti

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The present dissertation analyses 36 local vernaculars of villages surrounding the northern Russian city of Vologda in relation to the system of the vowels in the stressed syllables and those preceding the stressed syllables by using the available dialectological researches. The system in question differs from the corresponding standard Russian system by that the palatalisation of the surrounding consonants affects the vowels much more significantly in the vernaculars, whereas the phonetic difference between the stressed and non-stressed vowels is less obvious in them. The detailed information on the local vernaculars is retrieved from the Dialektologičeskij Atlas Russkogo Jazyka dialect atlas, the data for which were collected, for the most part, in the 1940 s and 1950 s. The theoretical framework of the research consists of a brief cross-section of western sociolinguistic theory related to language change and that of historical linguistics related to the Slavonic vowel development, which includes some new theories concerning the development of the Russian vowel phonemes. The author has collected dialect data in one of the 36 villages and three villages surrounding it. During the fieldwork, speech of nine elderly persons and ten school children was recorded. The speech data were then transcribed with coded information on the corresponding etymological vowels, the phonetic position, and the factual pronunciation at each appearance of vowels in the phonetic positions named above. The data from both of the dialect strata were then systematised to two corresponding systems that were compared with the information retrievable from the dialect atlas and other dialectological literature on the vowel phoneme system of the traditional local vernacular. As a result, it was found out (as hypothesised) that the vernacular vowel phoneme system has approached that of the standard language but has nonetheless not become similar to it. The phoneme quantity of the traditional vernacular is by one greater than that of the standard language, whereas the vowel phoneme quantity in the speech of the school children coincides with that in the standard language, although the phonetic realisations differ to some extent. The analysis of the speech of the elderly people resulted in that it is quite difficult to define the exact phoneme quantity of this stratum due to the fluctuation and irregularities in the realisation of the old phoneme that has ceased to exist in the newest stratum. It was noticed that the effect of the quality of the surrounding consonants on the phonetic realisation of the vowel phonemes has diminished, and the dependence of the phonetic realisation of a vowel phoneme on its place in a word in relation to the word stress has become more and more obvious, which is the state of affairs in the standard language as well.

Word Sense Discovery and Disambiguation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.

«
1
2
...
16
17
18
19
20
21
22
...
62
63
»