Biblioteca Digital

Tutkimukseni käsittelee suomen kielen sanaston kehitystä 1800-luvulla eli aikana, jolloin suomen kielestä kehittyi monialainen sivistyskieli. Esimerkkiaineistona on yhden erikoisalan, maantieteen sanasto. Suomen kirjakieli syntyi 1500-luvulla, mutta aluksi kirjoitettua kieltä tarvittiin pääasiassa uskonnollisissa yhteyksissä. 1800-luvun aikana kielen käyttöalat monipuolistuivat ja uutta sanastoa tarvittiin monien erikoisalojen tarpeisiin. Ryhdyttiin tietoisesti kääntämään tietokirjallisuutta ja kirjoittamaan eri aiheista. Tutkimukseni selvittää maantieteen sanaston kehittymistä sadassa vuodessa erityisesti maantieteen oppikirjoissa. Tutkimus kuvaa sanaston kehitystä teoreettisesti uudenlaisista lähtökohdista tarkastelemalla leksikaalista variaatiota. Variaatiota on kuvattu tarkasti sekä yksittäisten käsitteiden nimitysten kehityksenä että ilmiönä yleisesti. Tutkimus hyödyntää myös kognitiivista lähestymistapaa, etenkin sosiokognitiivisen terminologian teoriaa. Aineiston analyysin pohjalta syntyy kuva sanaston kehityksestä ja vakiintumisesta. Tutkimus kuvaa myös tapoja, joilla uusia käsitteitä nimettiin. Se pohtii eri nimeämistapojen suhdetta sekä kirjoittajien ja aikalaisten roolia sanaston vakiintumisessa. 1800-luvun maantieteen sanastossa on runsaasti variaatiota; vain harvojen käsitteiden nimitykset ovat vakiintuneita tai vakiintuvat nopeasti. Tämän variaation kuvaaminen leksikaalisena variaationa osoittautui tutkimuksessa hyväksi metodiksi. Koska kirjakieli oli vakiintumatonta, nimityksissä esiintyy paljon kontekstuaalista variaatiota esimerkiksi sanojen kirjoitusasuissa. Kirjoittajat myös pohtivat havainnollista tapaa nimetä käsitteitä, ja tästä aiheutuu onomasiologista variaatiota. Semasiologinen variaatio taas kertoo käsitejärjestelmän vakiintumattomuudesta. Aineiston sanaston lähtökohdat ovat vanhan kirjasuomessa, mutta tältä pohjalta luodaan valtava määrä uutta sanastoa tai otetaan aiemmin kirjakielessä käytettyjä nimityksiä uuteen merkitykseen. Tärkeä rooli on sekä nimitysten muodostamisella kotoisista aineksista että kääntämisellä, jossa malli saadaan toisesta kielestä mutta nimitysten ainekset ovat omaperäisiä.

Veja mais

Opinion Mining

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.

Veja mais

Machine Learning and Clinical Text. Supporting Health Information Flow

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.

Veja mais

Conference Proceedings. Academic Mobility Blending Perspectives - Mobilité Académique Perspectives croisées

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Les 21, 22 et 23 septembre 2006, le Département d’Études Françaises de l’Université de Turku (Finlande) a organisé une conférence internationale et bilingue (anglais et français) sur le thème de la mobilité académique ; le but de cette rencontre était de rendre possible la tenue d’un forum international et multidisciplinaire, susceptible d’être le siège de divers débats entre les différents acteurs de la mobilité académique (c’estàdire des étudiants, des chercheurs, des personnels enseignants et administratifs, etc.). Ainsi, ont été mis à contribution plus de cinquante intervenants, (tous issus de domaines aussi variés que la linguistique, les sciences de l’éducation, la didactique, l’anthropologie, la sociologie, la psychologie, l’histoire, la géographie, etc.) ainsi que cinq intervenants renommés1. La plupart des thèmes traités durant la conférence couvraient les champs suivants : l’organisation de la mobilité, les obstacles rencontrés par les candidats à la mobilité, l’intégration des étudiants en situation d’échange, le développement des programmes d’études, la mobilité virtuelle, l’apprentissage et l’enseignement des langues, la prise de cosncience interculturelle, le développement des compétences, la perception du système de mobilité académique et ses impacts sur la mobilité effective. L’intérêt du travail réalisé durant la conférence réside notamment dans le fait qu’il ne concentre pas uniquement des perspectives d’étudiants internationaux et en situation d’échange (comme c’est le cas de la plupart des travaux de recherche déjà menés sur ce sujet), mais aussi ceux d’autres corps : enseignants, chercheurs, etc. La contribution suivante contient un premier corpus de dixsept articles, répartis en trois sections : 1. Impacts de la mobilité étudiante ; 2. Formation en langues ; 3. Amélioration de la mobilité académique. À l’image de la conférence, la production qui suit est bilingue : huit des articles sont rédigés en français, et les neuf autres en anglais. Certains auteurs n’ont pas pu assister à la conférence mais ont tout de même souhaité apparaître dans cet ouvrage. Dans la première section de l’ouvrage, Sandrine Billaud tâche de mettre à jour les principaux obstacles à la mobilité étudiante en France (logement, organisation des universités, démarches administratives), et propose à ce sujet quelques pistes d’amélioration. Vient ensuite un article de Dominique Ulma, laquelle se penche sur la mobilité académique régnant au sein des Instituts Universitaires de Formation des Maîtres (IUFM) ; elle s’est tout particulièrement concentrée sur l’enthousiasme des stagiaires visàvis de la mobilité, et sur les bénéfices qu’apporte la mobilité Erasmus à ce type précis d’étudiant. Ensuite, dans un troisième article, Magali Hardoin s’interroge sur les potentialités éducationnelles de la mobilité des enseignantsstagiaires, et tâche de définir l’impact de celleci sur la construction de leur profil professionnel. Après cela arrive un groupe de trois articles, tous réalisés à bases d’observations faites dans l’enseignement supérieur espagnol, et qui traitent respectivement de la portée qu’a le programme de triple formation en langues européennes appliquées pour les étudiants en mobilité (Marián MorónMartín), des conséquences qu’occasionne la présence d’étudiants étrangers dans les classes de traductions (Dimitra Tsokaktsidu), et des réalités de l’intégration sur un campus espagnol d’étudiants américains en situation d’échange (Guadalupe Soriano Barabino). Le dernier article de la section, issu d’une étude sur la situation dans les institutions japonaises, fait état de la situation des programmes de doubles diplômes existant entre des établissements japonais et étrangers, et tente de voir quel est l’impact exact de tels programmes pour les institutions japonaises (Mihoko Teshigawara, Riichi Murakami and Yoneo Yano). La seconde section est elle consacrée à la relation entre apprentissage et enseignement des langues et mobilité académique. Dans un premier article, Martine Eisenbeis s’intéresse à des modules multimédia réalisés à base du film « L’auberge espagnole », de Cédric Klapish (2001), et destinés aux étudiants en mobilité désireux d’apprendre et/ou améliorer leur français par des méthodes moins classiques. Viennent ensuite les articles de Jeanine Gerbault et Sabine Ylönen, lesquels traitent d’un projet européen visant à supporter la mobilité étudiante par la création d’un programme multimédia de formation linguistique et culturelle pour les étudiants en situation de mobilité (le nom du projet est EUROMOBIL). Ensuite, un article de Pascal Schaller s’intéresse aux différents types d’activités que les étudiants en séjour à l’étranger expérimentent dans le cadre de leur formation en langue. Enfin, la section s’achève avec une contribution de Patricia KohlerBally, consacrée à un programme bilingue coordonné par l’Université de Fribourg (Suisse). La troisième et dernière section propose quelques pistes de réflexion destinées à améliorer la mobilité académique des étudiants et des enseignants ; dans ce cadre seront donc évoquées les questions de l’égalité face à la mobilité étudiante, de la préparation nécessitée par celleci, et de la prise de conscience interculturelle. Dans un premier chapitre, Javier Mato et Bego

Veja mais

D’abord, ensuite, enfin et 0, De plus: Organisation textuelle par des séries linéaires dans les articles de recherche

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The study examines the signalling of text organisation in research articles (RA) in French. The work concentrates on a particular type of organisation provided by text sequences, i.e. structures organising text to items of which at least some are signalled by markers of addition or order: First… 0… The third point… In addition… / Premièrement… 0… Le troisième point… De plus… By indicating the way the text is organised, these structures guide the reader in the reading process so that he doesn’t need to interpret the text structure himself. The aim of the work is to study factors affecting the marking of text sequences. Why is their structure sometimes signalled explicitly by markers such as secondly, whereas in other places such markers are not used? The corpus is manually XML-annotated and consists of 90 RAs (~800 000 words) in French from the fields of linguistics, education and history. The analysis highlights several factors affecting the marking of text sequences. First, exact markers (such as fist ) seem to be more frequent in sequences where all the items are explicitly signalled by a marker, whereas additive markers (such as moreover) are used in sequences with both explicitly signalled and unmarked items. The marking of explicitly signalled sequences seems thus to be precise and even repetitive, whereas the signalling of sequences with unmarked items is altogether more vague. Second, the marking of text sequences seems to depend on the length of the text. The longer the text segment, the more vague the marking. Additive markers and unmarked items are more frequent in longer sequences possibly covering several pages, whereas shorter sequences are often signalled explicitly by exact markers. Also the marker types vary according to the sequence length. Anaphoric expressions, such as first, are fairly close to their referents and are used in short sequences, connectors, such as secondly, are frequently used in sequences of intermediate length, whereas the longest sequences are often signalled by constructions composed of an ordinal and a noun acting as a subject of the sentence: The first item is… Finally, the marking of text organisation depends also on the discipline the RA belongs to. In linguistics, the marking is fairly frequent and precise; exact markers such as second are the most used, and structures with unmarked items are less common. Similarly, the marking is fairly frequent in education. In this field, however, it is also less precise than in linguistics, with frequent unmarked items and additive markers. History, on the other hand, is characterised by less frequent marking. In addition, when used, the marking in this field is also less precise and less explicit.

Veja mais

Uusien uhkakuvien luominen: tapaus 'kiinalaiset kybersoturit' : lingvistinen uhka-analyysi kyberdiskurssista

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Since his inauguration, President Barack Obama has emphasized the need for a new cybersecurity policy, pledging to make it a "national security priority". This is a significant change in security discourse after an eight-year war on terror – a term Obama announced to be no longer in use. After several white papers, reports and the release of the so-called 60-day Cybersecurity Review, Obama announced the creation of a "cyber czar" position and a new military cyber command to coordinate American cyber defence and warfare. China, as an alleged cyber rival, has played an important role in the discourse that introduced the need for the new office and the proposals for changes in legislation. Research conducted before this study suggest the dominance of state-centric enemy descriptions paused briefly after 9/11, but returned soon into threat discourse. The focus on China's cyber activities fits this trend. The aim of this study is to analyze the type of modern threat scenarios through a linguistic case study on the reporting on Chinese hackers. The methodology of this threat analysis is based on the systemic functional language theory, and realizes as an analysis of action and being descriptions (verbs) used by the American authorities. The main sources of data include the Cybersecurity Act 2009, Securing Cyberspace for the 44th Presidency, and 2008 Report to Congress of the U.S. - China Economic and Security Review Commission. Contrary to the prevailing and popularized terrorism discourse, the results show the comeback of Cold War rhetoric as well as the establishment of a state-centric threat perception in cyber discourse. Cyber adversaries are referred to with descriptions of capacity, technological superiority and untrustworthiness, whereas the ‘self’ is described as vulnerable and weak. The threat of cyber attacks is compared to physical attacks on critical military and civilian infrastructure. The authorities and the media form a cycle, in which both sides quote each other and foster each other’s distrust and rhetoric. The white papers present China's cyber army as an existential threat. This leads to cyber discourse turning into a school-book example of a securitization process. The need for security demands action descriptions, which makes new rules and regulations acceptable. Cyber discourse has motives and agendas that are separate from real security discourse: the arms race of the 21st century is about unmanned war.

Veja mais

Kurzrezensionen

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Kirjallisuusarvostelu

Veja mais

Suomi urbaanina kieliyhteisönä

Relevância:

10.00% 10.00%

Publicador:

Veja mais

Onko tiede kirjallisuutta?

Relevância:

10.00% 10.00%

Publicador:

Veja mais

Universaalitieteen mahdollisuudesta

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nimekkeen selitys: Kommentteja teokseen Universal history of linguistics : India, China, Arabia, Europe.

Veja mais

Reflexive Space. A Constructionist Model of the Russian Reflexive Marker

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study examines the structure of the Russian Reflexive Marker ( ся/-сь) and offers a usage-based model building on Construction Grammar and a probabilistic view of linguistic structure. Traditionally, reflexive verbs are accounted for relative to non-reflexive verbs. These accounts assume that linguistic structures emerge as pairs. Furthermore, these accounts assume directionality where the semantics and structure of a reflexive verb can be derived from the non-reflexive verb. However, this directionality does not necessarily hold diachronically. Additionally, the semantics and the patterns associated with a particular reflexive verb are not always shared with the non-reflexive verb. Thus, a model is proposed that can accommodate the traditional pairs as well as for the possible deviations without postulating different systems. A random sample of 2000 instances marked with the Reflexive Marker was extracted from the Russian National Corpus and the sample used in this study contains 819 unique reflexive verbs. This study moves away from the traditional pair account and introduces the concept of Neighbor Verb. A neighbor verb exists for a reflexive verb if they share the same phonological form excluding the Reflexive Marker. It is claimed here that the Reflexive Marker constitutes a system in Russian and the relation between the reflexive and neighbor verbs constitutes a cross-paradigmatic relation. Furthermore, the relation between the reflexive and the neighbor verb is argued to be of symbolic connectivity rather than directionality. Effectively, the relation holding between particular instantiations can vary. The theoretical basis of the present study builds on this assumption. Several new variables are examined in order to systematically model variability of this symbolic connectivity, specifically the degree and strength of connectivity between items. In usage-based models, the lexicon does not constitute an unstructured list of items. Instead, items are assumed to be interconnected in a network. This interconnectedness is defined as Neighborhood in this study. Additionally, each verb carves its own niche within the Neighborhood and this interconnectedness is modeled through rhyme verbs constituting the degree of connectivity of a particular verb in the lexicon. The second component of the degree of connectivity concerns the status of a particular verb relative to its rhyme verbs. The connectivity within the neighborhood of a particular verb varies and this variability is quantified by using the Levenshtein distance. The second property of the lexical network is the strength of connectivity between items. Frequency of use has been one of the primary variables in functional linguistics used to probe this. In addition, a new variable called Constructional Entropy is introduced in this study building on information theory. It is a quantification of the amount of information carried by a particular reflexive verb in one or more argument constructions. The results of the lexical connectivity indicate that the reflexive verbs have statistically greater neighborhood distances than the neighbor verbs. This distributional property can be used to motivate the traditional observation that the reflexive verbs tend to have idiosyncratic properties. A set of argument constructions, generalizations over usage patterns, are proposed for the reflexive verbs in this study. In addition to the variables associated with the lexical connectivity, a number of variables proposed in the literature are explored and used as predictors in the model. The second part of this study introduces the use of a machine learning algorithm called Random Forests. The performance of the model indicates that it is capable, up to a degree, of disambiguating the proposed argument construction types of the Russian Reflexive Marker. Additionally, a global ranking of the predictors used in the model is offered. Finally, most construction grammars assume that argument construction form a network structure. A new method is proposed that establishes generalization over the argument constructions referred to as Linking Construction. In sum, this study explores the structural properties of the Russian Reflexive Marker and a new model is set forth that can accommodate both the traditional pairs and potential deviations from it in a principled manner.

Veja mais

Das persönliche Glück auf dem unpersönlichen Markt. Deutsche und finnische Kontaktanzeigen im 20. Jahrhundert - eine kontrastive und diachrone Textsortenuntersuchung

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Min avhandling är en diakronisk och kontrastiv undersökning av texttyper. Forskningsmaterialet består av kontaktannonser i tidningarna Süddeutsche Zeitung och Helsingin Sanomat under tiden 1900 – 1999. Materialet består av 652 tyska och 538 finska annonser. De undersökta annonserna har publicerats i maj och har samlats från ovannämnda tidningar vart tionde år. Materialet har analyserats med ett statistiskt SPSS-program. I avhandlingen analyseras utvecklingen av ovannämnda texttyp under hundra år i två olika kulturer, den tyska och den finska. Syftet med avhandlingen är att med hjälp av detta material finna språkliga och kulturella likheter och skillnader i kontaktannonser. Utgångspunkten är att språkliga uttryck avspeglar sin tids samhälleliga värderingar, vilka således också påverkar sökandet efter en livskamrat. Analysresultaten granskas sålunda i ett större samhälleligt sammanhang under olika decennier. Annonstexterna undersöks dock inte utgående från enskilda samhälleliga skeenden. Avhandlingen analyserar 13 olika informationsenheter i kontaktannonserna, huruvida dessa enheter förekommer under hela den aktuella perioden och om samma informationsenheter förekommer i annonser i de båda kulturerna. Avhandlingen är sålunda intra- och interlingual samt interkulturell. Genom denna metod får man fram de kännetecken som är betecknande för denna texttyp under en viss tid i de bägge kulturerna. Avhandlingen är indelad i tre delar. Den första delen ger bakgrundsinformation om äktenskapets och familjebegreppets historia samt om uppkomsten av den tyska och finska pressen. Den andra teoretiska delen behandlar text- och texttyplingvistik samt nuvarande forskning inom dessa områden. Den tredje och mest omfattade delen består av en kvalitativ och kvantitativ analys, som omfattar 11 olika forskningsdelar. Undersökningen visar att man i texttypen kontaktannonser kan upptäcka skillnader t ex redan däri att en tysk annons skiljer sig från en finsk vad längd och informationsmängd beträffar. En finsk annons förlitar sig i sin språkliga knapphet på att läsaren förstår kontexten i texttypen. Av avhandlingen framgår också att vid analys av texttyper bör deras historiska och kulturella kontext beaktas, eftersom analysen påvisar att texttyperna är historie- och kulturbundna.

Veja mais

39 resultados para synchronic linguistics

Filtro por publicador