63 resultados para Textual complexity for Romanian language
Resumo:
In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.
Resumo:
The work integrates research in the language and terminology of various fields with lexicography, etymology, semantics, word formation, and pragmatics. Additionally, examination of German and Finnish provides the work with perspective of contrastive linguistics and the translation of texts in specialized fields. The work is an attempt to chart the language, vocabulary, different textual types, and essential communication-connected features of this special field. The study is primary concerned with internal communication within the field of ecology, but it also provides a comparison of the public discussion of environmental issues in Germany and Finland. The work attempts to use textual signs to provide a picture of the literary communication used on the different vertical levels in the central text types within the field. The dictionaries in the fields of environmental issues and ecology for the individual text types are examined primarily from the perspective of their quantity and diversity. One central point of the work is to clarify and collect all of the dictionaries in the field that have been compiled thus far in which German and/or Finnish ware included. Ecology and environmental protection are closely linked not only to each other but also to many other scientific fields. Consequently, the language of the environmental field has acquired an abundance of influences and vocabulary from the language of the special fields close to it as well as from that of politics and various areas of public administration. The work also demonstrates how the popularization of environmental terminology often leads to semantic distortion. Traditionally, scientific texts have used the smallest number of expressions, the purpose of which is to appeal to or influence the behavior of the text recipient. Particularly in Germany, those who support or oppose measures to protect the environment have long been making concerted efforts to represent their own views in the language that they use. When discussing controversial issues competing designations for the same referent or concept are used in accordance with the interest group to which the speaker belongs. One of the objectives of the study is to sensitize recipients of texts to notice the euphemistic expressions that occur in German and Finnish texts dealing with issues that are sensitive from the standpoint of environmental policy. One particular feature of the field is the wealth and large number of variants designating the same entry or concept. The terminological doublets formed by words of foreign origin and their German or Finnish language equivalents are quite typical of the field. Methods of corpus linguistics are used to determine the reasons for the large number of variant designations as well as their functionality.
Resumo:
This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars
Resumo:
Kirjallisuuden- ja kulttuurintutkimus on viimeisten kolmen vuosikymmenen aikana tullut yhä enenevässä määrin tietoiseksi tieteen ja taiteen suhteen monimutkaisesta luonteesta. Nykyään näiden kahden kulttuurin tutkimus muodostaa oman kenttänsä, jolla niiden suhdetta tarkastellaan ennen kaikkea dynaamisena vuorovaikutuksena, joka heijastaa kulttuurimme kieltä, arvoja ja ideologisia sisältöjä. Toisin kuin aiemmat näkemykset, jotka pitävät tiedettä ja taidetta toisilleen enemmän tai vähemmän vastakkaisina pyrkimyksinä, nykytutkimus lähtee oletuksesta, jonka mukaan ne ovat kulttuurillisesti rakentuneita diskursseja, jotka kohtaavat usein samankaltaisia todellisuuden mallintamiseen liittyviä ongelmia, vaikka niiden käyttämät metodit eroavatkin toisistaan. Väitöskirjani keskittyy yllä mainitun suhteen osa-alueista popularisoidun tietokirjallisuuden (muun muassa Paul Davies, James Gleick ja Richard Dawkins) käyttämän kielen ja luonnontieteistä ideoita ammentavan kaunokirjallisuuden (muun muassa Jeanette Winterson, Tom Stoppard ja Richard Powers) hyödyntämien keinojen tarkasteluun nojautuen yli 30 teoksen kattavaa aineistoa koskevaan tyylin ja teemojen tekstianalyysiin. Populaarin tietokirjallisuuden osalta tarkoituksenani on osoittaa, että sen käyttämä kieli rakentuu huomattavassa määrin sellaisille rakenteille, jotka tarjoavat mahdollisuuden esittää todellisuutta koskevia argumentteja mahdollisimman vakuuttavalla tavalla. Tässä tehtävässä monilla klassisen retoriikan määrittelemillä kuvioilla on tärkeä rooli, koska ne auttavat liittämään sanotun sisällön ja muodon tiukasti toisiinsa: retoristen kuvioiden käyttö ei näin ollen edusta pelkkää tyylikeinoa, vaan se myös usein kiteyttää argumenttien taustalla olevat tieteenfilosofiset olettamukset ja auttaa vakiinnuttamaan argumentoinnin logiikan. Koska monet aikaisemmin ilmestyneistä tutkimuksista ovat keskittyneet pelkästään metaforan rooliin tieteellisissä argumenteissa, tämä väitöskirja pyrkii laajentamaan tutkimuskenttää analysoimalla myös toisenlaisten kuvioiden käyttöä. Osoitan myös, että retoristen kuvioiden käyttö muodostaa yhtymäkohdan tieteellisiä ideoita hyödyntävään kaunokirjallisuuteen. Siinä missä popularisoitu tiede käyttää retoriikkaa vahvistaakseen sekä argumentatiivisia että kaunokirjallisia ominaisuuksiaan, kuvaa tällainen sanataide tiedettä tavoilla, jotka usein heijastelevat tietokirjallisuuden kielellisiä rakenteita. Toisaalta on myös mahdollista nähdä, miten kaunokirjallisuuden keinot heijastuvat popularisoidun tieteen kerrontatapoihin ja kieleen todistaen kahden kulttuurin dynaamisesta vuorovaikutuksesta. Nykyaikaisen populaaritieteen retoristen elementtien ja kaunokirjallisuuden keinojen vertailu näyttää lisäksi, kuinka tiede ja taide osallistuvat keskusteluun kulttuurimme tiettyjen peruskäsitteiden kuten identiteetin, tiedon ja ajan merkityksestä. Tällä tavoin on mahdollista nähdä, että molemmat ovat perustavanlaatuisia osia merkityksenantoprosessissa, jonka kautta niin tieteelliset ideat kuin ihmiselämän suuret kysymyksetkin saavat kulttuurillisesti rakentuneen merkityksensä.
Resumo:
This dissertation focuses on the mythopoetics of the Soviet writer Andrej Platonov (1899-1951) in his late novel Schastlivaja Moskva (Happy Moscow), written in 1932 1936. The purpose of the work is to reveal the mythopoetic world model in the novel, to characterize the most significant features of Platonov's mythopoetics and finally, to reconstruct the author's myth in the novel by placing the novel in the context of Platonov's oeuvre and Russian literature and culture as a whole. The first chapter provides a representation of the problem and methodology of the work, a short overview of the history of creating and publishing the novel, and a survey of critical work on Platonov done to date. The study utilizes a structuralistic-semiotic approach devised by Tarto-Moscow scholars for analyzing mythopoetic texts and applies the methodology of a conceptual analysis of the mythology of language. The second chapter examines the peculiarities of Platonov's mythopoetics, and its relation to the neomythological paradigm of Russian literature. Some special consideration is given to the character of the scientific utopism of Platonov's myth, to the relation of Platonov's mythopoetic world model with mythopoetic thinking and to the syntagmatical, and paradigmatical aspects of Platonov's myth, in particular to the mythopoetical metasjuzhet and the ambivalent binary structure of myth. The third chapter presents a close examination of the mythopoetics of the novel by discerning the motif structure of the novel, analyzing the characters and main thematic oppositions of Platonov's myth in the novel. It is contended that in every textual level Platonov strives for ambivalency which provides an opportunity to discern his poetics as both utopian and antiutopian. The analysis in the fourth chapter of the key Platonovian ideological concepts revoljucia, kommunizm and socializm confirms this observation. The study concludes that Platonov's myth in the novel is based on the mythologema of his early prose, but reflect the gradual transition from early utopian themes to the intimate "humble" prose of the late 1930's.
Resumo:
This study reports a corpus-based study of medieval English herbals, which are texts conveying information on medicinal plants. Herbals belong to the medieval medical register. The study charts intertextual parallels within the medieval genre, and between herbals and other contemporary medical texts. It seeks to answer questions where and how herbal texts are linked to each other, and to other medical writing. The theoretical framework of the study draws on intertextuality and genre studies, manuscript studies, corpus linguistics, and multi-dimensional text analysis. The method combines qualitative and quantitative analyses of textual material from three historical special-language corpora of Middle and Early Modern English, one of which was compiled for the purposes of this study. The text material contains over 800,000 words of medical texts. The time span of the material is from c. 1330 to 1550. Text material is retrieved from the corpora by using plant name lists as search criteria. The raw data is filtered through qualitative analysis which produces input for the quantitative analysis, multi-dimensional scaling (MDS). In MDS, the textual space that parallel text passages form is observed, and the observations are explained by a qualitative analysis. This study concentrates on evidence of material and structural intertextuality. The analysis shows patterns of affinity between the texts of the herbal genre, and between herbals and other texts in the medical register. Herbals are most closely linked with recipe collections and regimens of health: they comprise over 95 per cent of the intertextual links between herbals and other medical writing. Links to surgical texts, or to specialised medical texts are very few. This can be explained by the history of the herbal genre: as herbals carry information on medical ingredients, herbs, they are relevant for genres that are related to pharmacological therapy. Conversely, herbals draw material from recipe collections in order to illustrate the medicinal properties of the herbs they describe. The study points out the close relationship between medical recipes and recipe-like passages in herbals (recipe paraphrases). The examples of recipe paraphrases show that they may have been perceived as indirect instruction. Keywords: medieval herbals, early English medicine, corpus linguistics, intertextuality, manuscript studies
Resumo:
This study deals with language change and variation in the correspondence of the eighteenth-century Bluestocking circle, a social network which provided learned men and women with an informal environment for the pursuit of scholarly entertainment. Elizabeth Montagu (1718 1800), a notable social hostess and a Shakespearean scholar, was one of their key figures. The study presents the reconstruction of Elizabeth Montagu s social networks from her youth to her later years with a special focus on the Bluestocking circle, and linguistic research on private correspondence between Montagu and her Bluestocking friends and family members between the years 1738 1778. The epistolary language use is investigated using the methods and frameworks of corpus linguistics, historical sociolinguistics, and social network analysis. The approach is diachronic and concerns real-time language change. The research is based on a selection of manuscript letters which I have edited and compiled into an electronic corpus (Bluestocking Corpus). I have also devised a network strength scale in order to quantify the strength of network ties and to compare the results of the linguistic research with the network analysis. The studies range from the reconstruction and analysis of Elizabeth Montagu s most prominent social networks to the analysis of changing morphosyntactic features and spelling variation in Montagu s and her network members correspondence. The linguistic studies look at the use of the progressive construction, preposition stranding and pied piping, and spelling variation in terms of preterite and past participle endings in the regular paradigm (-ed, - d, -d, - t, -t) and full / contracted spellings of auxiliary verbs. The results are analysed in terms of social network membership, sociolinguistic variables of the correspondents, and, when relevant, aspects of eighteenth-century linguistic prescriptivism. The studies showed a slight diachronic increase in the use of the progressive, a significant decrease of the stigmatised preposition stranding and increase of pied piping, and relatively informal but socially controlled epistolary spelling. Certain significant changes in Elizabeth Montagu s language use over the years could be attributed to her increasingly prominent social standing and the changes in her social networks, and the strength of ties correlated strongly with the use of the progressive in the Bluestocking Corpus. Gender, social rank, and register in terms of kinship/friendship had a significant influence in language use, and an effect of prescriptivism could also be detected. Elizabeth Montagu s network ties resulted in language variation in terms of network membership, her own position in a given network, and the social factors that controlled eighteenth-century interaction. When all the network ties are strong, linguistic variation seems to be essentially linked to the social variables of the informants.
Resumo:
This doctoral thesis focuses on the translation of Finnish prose literature into English in the United Kingdom between 1945 and 2003. The subject is approached using translation archaeology, interviews, archival material, detailed text analysis and reception material. The main theoretical framework is Descriptive Translation Studies, and certain sociological theories (Bourdieu s field theory, actor-network theory) are also used. After charting the published translations, two periods of time are selected for closer analysis: an earlier period from 1955 to 1959, involving eight translations, and a later one from 1990 to 2003, with a total of six translations. While these translation numbers may appear low, they are actually rather high in proportion to the total number of 28 one-author literary prose translations published in the UK over the approximately 60 years being studied. The two periods of time, the 1950s and 1990s, are compared in terms of the sociological context of translation activity, the reception of translations and their textual features. The comparisons show that the main changes in translation practice between these two periods are increased completeness (translations in the 1950s group often being shortened by hundreds of pages) and lesser use of indirect translation via an intermediary language (about half of the 1950s translations having been translated via Swedish). Otherwise, translation practices have not changed much: except for large omissions, which are far more frequent in the 1950s, variation within each group is larger than between groups. As to the sociological context, the main changes are an increase in long-term institution-level contacts and an increase in the promotion of foreign translation rights by Finnish publishing houses. This is in contrast to the 1950s when translation rights were mainly sold through personal contacts by individual authors and translators. The reception of translations is difficult to study because of scarce material. However, the 1950s translations were aggressively marketed and therefore obtained far more reviews and reprints than the 1990s translations. Several of the 1950s books, mostly historical novels by Mika Waltari, were mainstream bestsellers at the time, while current translations are frequently made for niche markets. The thesis introduces ample new material on the translation of Finnish prose literature into English in the UK. The results are also relevant to translation from a minority literature into a majority one. As to translation theory, they lead us to question the social nature of translation norms and the assumption of a static target culture. The translations analysed here are located in a very fragmented interculture and gain a stronger position in the Finnish culture than in the British one.