13 resultados para Dictionaries
em Helda - Digital Repository of University of Helsinki
Resumo:
There are numerous formats for writing spellcheckers for open-source systems and there are many descriptions for languages written in these formats. Similarly, for word hyphenation by computer there are TEX rules for many languages. In this paper we demonstrate a method for converting these spell-checking lexicons and hyphenation rule sets into finite-state automata, and present a new finite-state based system for writer’s tools used in current open-source software such as Firefox, OpenOffice.org and enchant via the spell-checking library voikko.
Resumo:
The present research is an investigation into the corpus of personal names and titles that are found in sources from the Middle Mongolian period, that is the time from the 13th to the beginning of the 15th century. The entry for every name or title has been divided into three parts: occurence(s) of a given name in Middle Mongolian sources (primary sources), etymology, and occurence(s) in sources other than Middle Mongolian (secondary sources). Culturally and lingistically the corpus can be divided into six sub-groups: Mongolian, Turkic (Old, Middle and Modern), Arabo-Persian (Islamic), Indo-Iranian and Tibetan (Buddhist), as well as Chinese. Among these, the largest group is formed by Mongolian and Turkic, followed by Chinese (mostly titles), Indo-Iranian, Arabo-Persian and Tibetan. With regard to the primary and secondary occurences the research is based mainly on primary sources including text-publications and dictionaries. Every name or title is documented as completely as possible within a Central Asian framework. However, due to the divergency of the sources available as well as diachronical importance, each sub-group has been dealt with slightly differently, but consistently. The corpus of investigated names and titles gives a fairly correct picture of the multi-ethnical composition of the Mongolian world-empire. It also shows the foreign influences on Mongolian names and titles, being in this respect a mirror of the influences that are visible in other parts of the Middle Mongolian culture too. Furthermore, the investigated corpus reflects the transitory stage of the 13th to 15th century in Central Asian history, and includes thus material from the past (Indo-Iranian, Old and Middle Turkic), and material that points to the future (Arabo-Persian, Tibetan, Modern Turkic).
Resumo:
The work integrates research in the language and terminology of various fields with lexicography, etymology, semantics, word formation, and pragmatics. Additionally, examination of German and Finnish provides the work with perspective of contrastive linguistics and the translation of texts in specialized fields. The work is an attempt to chart the language, vocabulary, different textual types, and essential communication-connected features of this special field. The study is primary concerned with internal communication within the field of ecology, but it also provides a comparison of the public discussion of environmental issues in Germany and Finland. The work attempts to use textual signs to provide a picture of the literary communication used on the different vertical levels in the central text types within the field. The dictionaries in the fields of environmental issues and ecology for the individual text types are examined primarily from the perspective of their quantity and diversity. One central point of the work is to clarify and collect all of the dictionaries in the field that have been compiled thus far in which German and/or Finnish ware included. Ecology and environmental protection are closely linked not only to each other but also to many other scientific fields. Consequently, the language of the environmental field has acquired an abundance of influences and vocabulary from the language of the special fields close to it as well as from that of politics and various areas of public administration. The work also demonstrates how the popularization of environmental terminology often leads to semantic distortion. Traditionally, scientific texts have used the smallest number of expressions, the purpose of which is to appeal to or influence the behavior of the text recipient. Particularly in Germany, those who support or oppose measures to protect the environment have long been making concerted efforts to represent their own views in the language that they use. When discussing controversial issues competing designations for the same referent or concept are used in accordance with the interest group to which the speaker belongs. One of the objectives of the study is to sensitize recipients of texts to notice the euphemistic expressions that occur in German and Finnish texts dealing with issues that are sensitive from the standpoint of environmental policy. One particular feature of the field is the wealth and large number of variants designating the same entry or concept. The terminological doublets formed by words of foreign origin and their German or Finnish language equivalents are quite typical of the field. Methods of corpus linguistics are used to determine the reasons for the large number of variant designations as well as their functionality.
Resumo:
Valency Realization in Short Excerpts of News Text. A Pragmatics-funded analysis This dissertation is a study of the so-called pragmatic valency. The aim of the study is to examine the phenomenon both theoretically by discussing the research literature and empirically based on evidence from a text corpus consisting of 218 short excerpts of news text from the German newspaper Frankfurter Allgemeine Zeitung. In the theoretical part of the study, the central concepts of the valency and the pragmatic valency are discussed. In the research literature, the valency denotes the relation among the verb and its obligatory and optional complements. The pragmatic valency can be defined as modification of the so-called system valency in the parole, including non-realization of an obligatory complement, non- realization of an optional complement and realization of an optional complement. Furthermore, the investigation of the pragmatic valency includes the role of the adjuncts, elements that are not defined by the valency, in the concrete valency realization. The corpus study investigates the valency behaviour of German verbs in a corpus of about 1500 sentences combining the methodology and concepts of valency theory, semantics and text linguistics. The analysis is focused on the about 600 sentences which show deviations from the system valency, providing over 800 examples for the modification of the system valency as codified in the (valency) dictionaries. The study attempts to answer the following primary question: Why is the system valency modified in the parole? To answer the question, the concept of modification types is entered. The modification types are recognized using distinctive feature bundles in which each feature with a negative or a positive value refers to one reason for the modification treated in the research literature. For example, the features of irrelevance and relevance, focus, world and text type knowledge, text theme, theme-rheme structure and cohesive chains are applied. The valency approach appears in a new light when explored through corpus-based investigation; both the optionality of complements and the distinction between complements and adjuncts as defined in the present valency approach seem in some respects defective. Furthermore, the analysis indicates that the adjuncts outside the valency domain play a central role in the concrete realization of the valency. Finally, the study suggests a definition of pragmatic valency, based on the modification types introduced in the study and tested in the corpus analysis.
Resumo:
The methodology of designing normative terminological products has been described in several guides and international standards. However, this methodology is not always applicable to designing translation-oriented terminological products which differ greatly from normative ones in terms of volume, function, and primary target group. This dissertation has three main goals. The first is to revise and enrich the stock of concepts and terms required in the process of designing an LSP dictionary for translators. The second is to detect, classify, and describe the factors which determine the characteristics of an LSP dictionary for translators and affect the process of its compilation. The third goal is to provide recommendations on different aspects of dictionary design. The study is based on an analysis of dictionaries, dictionary reviews, literature on translation-oriented lexicography, material from several dictionary projects, and the results of questionnaires. Thorough analysis of the concept of a dictionary helped us to compile a list of designable characteristics of a dictionary. These characteristics include target group, function, links to other resources, data carrier, list of lemmata, information about the lemmata, composition of other parts of the dictionary, compression of the data, structure of the data, and access structure. The factors which determine the characteristics of a dictionary have been divided into those derived from the needs of the intended users and those reflecting the restrictions of the real world (e.g. characteristics of the data carrier and organizational factors) and attitudes (e.g. traditions and scientific paradigms). The designer of a dictionary is recommended to take the intended users' needs as the starting point and aim at finding the best compromise between the conflicting factors. When designing an LSP dictionary, much depends on the level of knowledge of the intended users about the domain in question as well as their general linguistic competence, LSP competence, and lexicographic competence. This dissertation discusses the needs of LSP translators and the role of the dictionary in the process of translation of an LSP text. It also emphasizes the importance of planning lexicographic products and activities, and addresses many practical aspects of dictionary design.
Resumo:
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.
Resumo:
In this study I look at what people want to express when they talk about time in Russian and Finnish, and why they use the means they use. The material consists of expressions of time: 1087 from Russian and 1141 from Finnish. They have been collected from dictionaries, usage guides, corpora, and the Internet. An expression means here an idiomatic set of words in a preset form, a collocation or construction. They are studied as lexical entities, without a context, and analysed and categorized according to various features. The theoretical background for the study includes two completely different approaches. Functional Syntax is used in order to find out what general meanings the speaker wishes to convey when talking about time and how these meanings are expressed in specific languages. Conceptual metaphor theory is used for explaining why the expressions are as they are, i.e. what kind of conceptual metaphors (transfers from one conceptual domain to another) they include. The study has resulted in a grammatically glossed list of time expressions in Russian and Finnish, a list of 56 general meanings involved in these time expressions and an account of the means (constructions) that these languages have for expressing the general meanings defined. It also includes an analysis of conceptual metaphors behind the expressions. The general meanings involved turned out to revolve around expressing duration, point in time, period of time, frequency, sequence, passing of time, suitable time and the right time, life as time, limitedness of time, and some other notions having less obvious semantic relations to the others. Conceptual metaphor analysis of the material has shown that time is conceptualized in Russian and Finnish according to the metaphors Time Is Space (Time Is Container, Time Has Direction, Time Is Cycle, and the Time Line Metaphor), Time Is Resource (and its submapping Time Is Substance), Time Is Actor; and some characteristics are added to these conceptualizations with the help of the secondary metaphors Time Is Nature and Time Is Life. The limits between different conceptual metaphors and the connections these metaphors have with one another are looked at with the help of the theory of conceptual integration (the blending theory) and its schemas. The results of the study show that although Russian and Finnish are typologically different, they are very similar both in the needs of expression their speakers have concerning time, and in the conceptualizations behind expressing time. This study introduces both theoretical and methodological novelties in the nature of material used, in developing empirical methodology for conceptual metaphor studies, in the exactness of defining the limits of different conceptual metaphors, and in seeking unity among the different facets of time. Keywords: time, metaphor, time expression, idiom, conceptual metaphor theory, functional syntax, blending theory
Resumo:
Title of the Master's thesis: Análisis de la preposición hacia y establecimiento de sus equivalentes en finés (trans. Analysis of the Spanish preposition hacia and the finding of its equivalents in Finnish) Abstracts: The aim of this Master thesis is to provide a detailed analysis of the Spanish preposition hacia from a cognitive perspective and to establish its equivalents in Finnish language. In this sense, my purpose is to demonstrate the suitability of both cognitive perspectives and Contrastive Linguistics for semantic analysis. This thesis is divided into five chapters. The first chapter includes a presentation and a critical review of the monolingual lexical processing and semantic analysis of the Spanish preposition hacia in major reference works. Through this chapter it is possible to see both the inadequacies and omissions that are present in all the given definitions. In this sense, this chapter shows that these problems are not but the upper stage of an ontological (and therefore methodological) problem in the treatment of prepositions. The second chapter covers the presentation of the methodological and theoretical perspective adopted for this thesis for the monolingual analysis and definition of the Spanish preposition hacia, following mainly the guidelines established by G. Lakoff (1987) and R. Langacker (2008) in his Cognitive grammar. Taken together, and within the same paradigm, recent analytical and methodological contributions are discussed critically for the treatment of polysemy in language (cf. Tyler ja Evans 2003). In the third chapter, and in accordance with the requirements regarding the use of empirical data from corpora, is my aim to set out a monolingual original analysis of the Spanish preposition hacia in observance of the principles and the methodology spelled out in the second chapter. The main objective of this chapter is to build a full fledged semantic representation of the polysemy of this preposition in order to understand and articulate its meanings with Finnish language (and other possible languages). The fourth chapter, in accordance with the results of chapter 3, examines and describes and establishes the corresponding equivalents in Finnish for this preposition. The results obtained in this chapter are also contrasted with the current bilingual lexicographical definitions found in the most important dictionaries and grammars. Finally, in the fifth chapter of this thesis, the results of this work are discussed critically. In this way, some observations are given regarding both the ontological and theoretical assumptions as well regarding the methodological perspective adopted. I also present some notes for the construction of a general methodology for the semantic analysis of Spanish prepositions to be carried out in further investigations. El objetivo de este trabajo, que caracterizamos como una tarea de carácter comparativo-analítico, es brindar un análisis detallado de la preposición castellana hacia desde una perspectiva cognitiva en tanto y a través del establecimiento de sus equivalentes en finés. Se procura, de esta forma, demostrar la adecuación de una perspectiva cognitiva tanto para el examen como para el establecimiento y articulación de la serie de equivalentes que una partícula, en nuestro caso una preposición, encuentra en otra lengua. De esta forma, y frente a definiciones canónicas que advierten sobre la imposibilidad de una caracterización acabada del conjunto de usos de una preposición, se observa como posible, a través de la aplicación de una metodología teórica-analítica adecuada, la construcción de una definición viable tanto en un nivel jerárquico como descriptivo. La presente tesis se encuentra dividida en cinco capítulos. El primer capítulo comprende una exposición y revisión critica del tratamiento monolingüe lexicográfico y analítico que la preposición hacia ha recibido en las principales obras de referencia, donde se observa que las inadecuaciones y omisiones presentes en la totalidad de las definiciones analizadas representan tan sólo el estadio superior de una problemática de carácter ontológico y, por tanto, metodológico, en el tratamiento de las preposiciones. El capítulo segundo comprende la presentación de la perspectiva teórica metodológica adoptada en esta tesis para el análisis y definición monolingüe de la preposición hacia, teniendo por líneas directrices las propuestas realizadas por G. Lakoff , así como a los fundamentos establecidos por R. Langacker en su propuesta cognitiva para una nueva gramática. En forma conjunta y complementaria, y dentro del mismo paradigma, empleamos, discutimos críticamente y desarrollamos diferentes aportes analítico-metodológicos para el tratamiento de la polisemia en unidades lingüísticas locativas. En el capítulo tercero, y en acuerdo con las exigencias respecto a la utilización de datos empíricos obtenidos a partir de corpus textuales, se expone un análisis original monolingüe de la preposición hacia en observancia de los principios y la metodología explicitada en el capítulo segundo, teniendo por principal objetivo la construcción de una representación semántica de la polisemia de la preposición que comprenda y articule los sentidos prototípicos para ésta especificados. El capítulo cuarto, y en acuerdo con los resultados de nuestro análisis monolingual de la preposición, se examinan, describen y establecen los equivalentes correspondientes en finés para hacia; asimismo, se contrastan en este capítulo los resultados obtenidos con las definiciones lexicográficas bilingües vigentes. Se recogen en el último y quinto capítulo de esta tesis algunas observaciones tanto respecto a los postulados ontológicos y teórico-metodológicos de la perspectiva adoptada, así como algunas notas para la construcción de una metodología general para el análisis semántico preposicional.
Resumo:
Most of the world’s languages lack electronic word form dictionaries. The linguists who gather such dictionaries could be helped with an efficient morphology workbench that adapts to different environments and uses. A widely usable workbench could be characterized, ideally, as generally applicable, extensible, and freely available (GEA). It seems that such a solution could be implemented in the framework of finite-state methods. The current work defines the GEA desiderata and starts a series of articles concerning these desiderata in finite- state morphology. Subsequent parts will review the state of the art and present an action plan toward creating a widely usable finite-state morphology workbench.
Resumo:
The main objects of the investigation were the syntactic functions of adjectives. The reason for the interest in these functions are the different modes of use, in which an adjective can occur. All together an adjective can take three different modes of use: attributive (e. g. a fast car), predicative (e. g. the car is fast) and adverbial (e. g. the car drives fast). Since an adjective cannot always take every function, some dictionaries (esp. learner s dictionaries) deliver information within the lexical entry about any restrictions. The purpose of the research consisted of a comparison in relation to the lexical entries of adjectives, which were investigated within four selected monolingual German-speaking dictionaries. The comparison of the syntactical data of adjectives were done to work out the differences and the common characteristics of the lexical entries concerning the different modes of use and to analyse respective to assess them. In the foreground, however, were the differences of the syntactical information. Concerning those differences it had to be worked out, which entry is the grammatically right one respective if one entry is in fact wrong. To find that out an empirical analysis was needed, which based on the question in which way an adjective is used within a context as far as there are no conforming data within the dictionaries. The delivery of the correctness and the homogeneity of lexical entries of German-speaking dictionaries are very important to support people who are learning the German language and to ensure the user friendliness of dictionaries. Throughout the investigations it became clear that in almost half of the cases (over 40 %) syntactical information of adjectives differ from each other within the dictionaries. These differences make it for non-native speakers of course very difficult to understand the correct usage of an adjective. Thus the main aim of the doctoral thesis was it to deliver and to demonstrate the clear syntactical usage of a certain amount of adjectives.
Resumo:
592 s.
Resumo:
Language Documentation and Description as Language Planning Working with Three Signed Minority Languages Sign languages are minority languages that typically have a low status in society. Language planning has traditionally been controlled from outside the sign-language community. Even though signed languages lack a written form, dictionaries have played an important role in language description and as tools in foreign language learning. The background to the present study on sign language documentation and description as language planning is empirical research in three dictionary projects in Finland-Swedish Sign Language, Albanian Sign Language, and Kosovar Sign Language. The study consists of an introductory article and five detailed studies which address language planning from different perspectives. The theoretical basis of the study is sociocultural linguistics. The research methods used were participant observation, interviews, focus group discussions, and document analysis. The primary research questions are the following: (1) What is the role of dictionary and lexicographic work in language planning, in research on undocumented signed language, and in relation to the language community as such? (2) What factors are particular challenges in the documentation of a sign language and should therefore be given special attention during lexicographic work? (3) Is a conventional dictionary a valid tool for describing an undocumented sign language? The results indicate that lexicographic work has a central part to play in language documentation, both as part of basic research on undocumented sign languages and for status planning. Existing dictionary work has contributed new knowledge about the languages and the language communities. The lexicographic work adds to the linguistic advocacy work done by the community itself with the aim of vitalizing the language, empowering the community, receiving governmental recognition for the language, and improving the linguistic (human) rights of the language users. The history of signed languages as low status languages has consequences for language planning and lexicography. One challenge that the study discusses is the relationship between the sign-language community and the hearing sign linguist. In order to make it possible for the community itself to take the lead in a language planning process, raising linguistic awareness within the community is crucial. The results give rise to questions of whether lexicographic work is of more importance for status planning than for corpus planning. A conventional dictionary as a tool for describing an undocumented sign language is criticised. The study discusses differences between signed and spoken/written languages that are challenging for lexicographic presentations. Alternative electronic lexicographic approaches including both lexicon and grammar are also discussed. Keywords: sign language, Finland-Swedish Sign Language, Albanian Sign Language, Kosovar Sign Language, language documentation and description, language planning, lexicography