48 resultados para Saami linguistics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-­effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Kirja-arvio

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Les 21, 22 et 23 septembre 2006, le Département d’Études Françaises de l’Université de Turku (Finlande) a organisé une conférence internationale et bilingue (anglais et français) sur le thème de la mobilité académique ; le but de cette rencontre était de rendre possible la tenue d’un forum international et multidisciplinaire, susceptible d’être le siège de divers débats entre les différents acteurs de la mobilité académique (c’estàdire des étudiants, des chercheurs, des personnels enseignants et administratifs, etc.). Ainsi, ont été mis à contribution plus de cinquante intervenants, (tous issus de domaines aussi variés que la linguistique, les sciences de l’éducation, la didactique, l’anthropologie, la sociologie, la psychologie, l’histoire, la géographie, etc.) ainsi que cinq intervenants renommés1. La plupart des thèmes traités durant la conférence couvraient les champs suivants : l’organisation de la mobilité, les obstacles rencontrés par les candidats à la mobilité, l’intégration des étudiants en situation d’échange, le développement des programmes d’études, la mobilité virtuelle, l’apprentissage et l’enseignement des langues, la prise de cosncience interculturelle, le développement des compétences, la perception du système de mobilité académique et ses impacts sur la mobilité effective. L’intérêt du travail réalisé durant la conférence réside notamment dans le fait qu’il ne concentre pas uniquement des perspectives d’étudiants internationaux et en situation d’échange (comme c’est le cas de la plupart des travaux de recherche déjà menés sur ce sujet), mais aussi ceux d’autres corps : enseignants, chercheurs, etc. La contribution suivante contient un premier corpus de dixsept articles, répartis en trois sections : 1. Impacts de la mobilité étudiante ; 2. Formation en langues ; 3. Amélioration de la mobilité académique. À l’image de la conférence, la production qui suit est bilingue : huit des articles sont rédigés en français, et les neuf autres en anglais. Certains auteurs n’ont pas pu assister à la conférence mais ont tout de même souhaité apparaître dans cet ouvrage. Dans la première section de l’ouvrage, Sandrine Billaud tâche de mettre à jour les principaux obstacles à la mobilité étudiante en France (logement, organisation des universités, démarches administratives), et propose à ce sujet quelques pistes d’amélioration. Vient ensuite un article de Dominique Ulma, laquelle se penche sur la mobilité académique régnant au sein des Instituts Universitaires de Formation des Maîtres (IUFM) ; elle s’est tout particulièrement concentrée sur l’enthousiasme des stagiaires visàvis de la mobilité, et sur les bénéfices qu’apporte la mobilité Erasmus à ce type précis d’étudiant. Ensuite, dans un troisième article, Magali Hardoin s’interroge sur les potentialités éducationnelles de la mobilité des enseignantsstagiaires, et tâche de définir l’impact de celleci sur la construction de leur profil professionnel. Après cela arrive un groupe de trois articles, tous réalisés à bases d’observations faites dans l’enseignement supérieur espagnol, et qui traitent respectivement de la portée qu’a le programme de triple formation en langues européennes appliquées pour les étudiants en mobilité (Marián MorónMartín), des conséquences qu’occasionne la présence d’étudiants étrangers dans les classes de traductions (Dimitra Tsokaktsidu), et des réalités de l’intégration sur un campus espagnol d’étudiants américains en situation d’échange (Guadalupe Soriano Barabino). Le dernier article de la section, issu d’une étude sur la situation dans les institutions japonaises, fait état de la situation des programmes de doubles diplômes existant entre des établissements japonais et étrangers, et tente de voir quel est l’impact exact de tels programmes pour les institutions japonaises (Mihoko Teshigawara, Riichi Murakami and Yoneo Yano). La seconde section est elle consacrée à la relation entre apprentissage et enseignement des langues et mobilité académique. Dans un premier article, Martine Eisenbeis s’intéresse à des modules multimédia réalisés à base du film « L’auberge espagnole », de Cédric Klapish (2001), et destinés aux étudiants en mobilité désireux d’apprendre et/ou améliorer leur français par des méthodes moins classiques. Viennent ensuite les articles de Jeanine Gerbault et Sabine Ylönen, lesquels traitent d’un projet européen visant à supporter la mobilité étudiante par la création d’un programme multimédia de formation linguistique et culturelle pour les étudiants en situation de mobilité (le nom du projet est EUROMOBIL). Ensuite, un article de Pascal Schaller s’intéresse aux différents types d’activités que les étudiants en séjour à l’étranger expérimentent dans le cadre de leur formation en langue. Enfin, la section s’achève avec une contribution de Patricia KohlerBally, consacrée à un programme bilingue coordonné par l’Université de Fribourg (Suisse). La troisième et dernière section propose quelques pistes de réflexion destinées à améliorer la mobilité académique des étudiants et des enseignants ; dans ce cadre seront donc évoquées les questions de l’égalité face à la mobilité étudiante, de la préparation nécessitée par celleci, et de la prise de conscience interculturelle. Dans un premier chapitre, Javier Mato et Bego

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The study examines the signalling of text organisation in research articles (RA) in French. The work concentrates on a particular type of organisation provided by text sequences, i.e. structures organising text to items of which at least some are signalled by markers of addition or order: First… 0… The third point… In addition… / Premièrement… 0… Le troisième point… De plus… By indicating the way the text is organised, these structures guide the reader in the reading process so that he doesn’t need to interpret the text structure himself. The aim of the work is to study factors affecting the marking of text sequences. Why is their structure sometimes signalled explicitly by markers such as secondly, whereas in other places such markers are not used? The corpus is manually XML-annotated and consists of 90 RAs (~800 000 words) in French from the fields of linguistics, education and history. The analysis highlights several factors affecting the marking of text sequences. First, exact markers (such as fist ) seem to be more frequent in sequences where all the items are explicitly signalled by a marker, whereas additive markers (such as moreover) are used in sequences with both explicitly signalled and unmarked items. The marking of explicitly signalled sequences seems thus to be precise and even repetitive, whereas the signalling of sequences with unmarked items is altogether more vague. Second, the marking of text sequences seems to depend on the length of the text. The longer the text segment, the more vague the marking. Additive markers and unmarked items are more frequent in longer sequences possibly covering several pages, whereas shorter sequences are often signalled explicitly by exact markers. Also the marker types vary according to the sequence length. Anaphoric expressions, such as first, are fairly close to their referents and are used in short sequences, connectors, such as secondly, are frequently used in sequences of intermediate length, whereas the longest sequences are often signalled by constructions composed of an ordinal and a noun acting as a subject of the sentence: The first item is… Finally, the marking of text organisation depends also on the discipline the RA belongs to. In linguistics, the marking is fairly frequent and precise; exact markers such as second are the most used, and structures with unmarked items are less common. Similarly, the marking is fairly frequent in education. In this field, however, it is also less precise than in linguistics, with frequent unmarked items and additive markers. History, on the other hand, is characterised by less frequent marking. In addition, when used, the marking in this field is also less precise and less explicit.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Since his inauguration, President Barack Obama has emphasized the need for a new cybersecurity policy, pledging to make it a "national security priority". This is a significant change in security discourse after an eight-year war on terror – a term Obama announced to be no longer in use. After several white papers, reports and the release of the so-called 60-day Cybersecurity Review, Obama announced the creation of a "cyber czar" position and a new military cyber command to coordinate American cyber defence and warfare. China, as an alleged cyber rival, has played an important role in the discourse that introduced the need for the new office and the proposals for changes in legislation. Research conducted before this study suggest the dominance of state-centric enemy descriptions paused briefly after 9/11, but returned soon into threat discourse. The focus on China's cyber activities fits this trend. The aim of this study is to analyze the type of modern threat scenarios through a linguistic case study on the reporting on Chinese hackers. The methodology of this threat analysis is based on the systemic functional language theory, and realizes as an analysis of action and being descriptions (verbs) used by the American authorities. The main sources of data include the Cybersecurity Act 2009, Securing Cyberspace for the 44th Presidency, and 2008 Report to Congress of the U.S. - China Economic and Security Review Commission. Contrary to the prevailing and popularized terrorism discourse, the results show the comeback of Cold War rhetoric as well as the establishment of a state-centric threat perception in cyber discourse. Cyber adversaries are referred to with descriptions of capacity, technological superiority and untrustworthiness, whereas the ‘self’ is described as vulnerable and weak. The threat of cyber attacks is compared to physical attacks on critical military and civilian infrastructure. The authorities and the media form a cycle, in which both sides quote each other and foster each other’s distrust and rhetoric. The white papers present China's cyber army as an existential threat. This leads to cyber discourse turning into a school-book example of a securitization process. The need for security demands action descriptions, which makes new rules and regulations acceptable. Cyber discourse has motives and agendas that are separate from real security discourse: the arms race of the 21st century is about unmanned war.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nimekkeen selitys: Kommentteja teokseen Universal history of linguistics : India, China, Arabia, Europe.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study examines the structure of the Russian Reflexive Marker ( ся/-сь) and offers a usage-based model building on Construction Grammar and a probabilistic view of linguistic structure. Traditionally, reflexive verbs are accounted for relative to non-reflexive verbs. These accounts assume that linguistic structures emerge as pairs. Furthermore, these accounts assume directionality where the semantics and structure of a reflexive verb can be derived from the non-reflexive verb. However, this directionality does not necessarily hold diachronically. Additionally, the semantics and the patterns associated with a particular reflexive verb are not always shared with the non-reflexive verb. Thus, a model is proposed that can accommodate the traditional pairs as well as for the possible deviations without postulating different systems. A random sample of 2000 instances marked with the Reflexive Marker was extracted from the Russian National Corpus and the sample used in this study contains 819 unique reflexive verbs. This study moves away from the traditional pair account and introduces the concept of Neighbor Verb. A neighbor verb exists for a reflexive verb if they share the same phonological form excluding the Reflexive Marker. It is claimed here that the Reflexive Marker constitutes a system in Russian and the relation between the reflexive and neighbor verbs constitutes a cross-paradigmatic relation. Furthermore, the relation between the reflexive and the neighbor verb is argued to be of symbolic connectivity rather than directionality. Effectively, the relation holding between particular instantiations can vary. The theoretical basis of the present study builds on this assumption. Several new variables are examined in order to systematically model variability of this symbolic connectivity, specifically the degree and strength of connectivity between items. In usage-based models, the lexicon does not constitute an unstructured list of items. Instead, items are assumed to be interconnected in a network. This interconnectedness is defined as Neighborhood in this study. Additionally, each verb carves its own niche within the Neighborhood and this interconnectedness is modeled through rhyme verbs constituting the degree of connectivity of a particular verb in the lexicon. The second component of the degree of connectivity concerns the status of a particular verb relative to its rhyme verbs. The connectivity within the neighborhood of a particular verb varies and this variability is quantified by using the Levenshtein distance. The second property of the lexical network is the strength of connectivity between items. Frequency of use has been one of the primary variables in functional linguistics used to probe this. In addition, a new variable called Constructional Entropy is introduced in this study building on information theory. It is a quantification of the amount of information carried by a particular reflexive verb in one or more argument constructions. The results of the lexical connectivity indicate that the reflexive verbs have statistically greater neighborhood distances than the neighbor verbs. This distributional property can be used to motivate the traditional observation that the reflexive verbs tend to have idiosyncratic properties. A set of argument constructions, generalizations over usage patterns, are proposed for the reflexive verbs in this study. In addition to the variables associated with the lexical connectivity, a number of variables proposed in the literature are explored and used as predictors in the model. The second part of this study introduces the use of a machine learning algorithm called Random Forests. The performance of the model indicates that it is capable, up to a degree, of disambiguating the proposed argument construction types of the Russian Reflexive Marker. Additionally, a global ranking of the predictors used in the model is offered. Finally, most construction grammars assume that argument construction form a network structure. A new method is proposed that establishes generalization over the argument constructions referred to as Linking Construction. In sum, this study explores the structural properties of the Russian Reflexive Marker and a new model is set forth that can accommodate both the traditional pairs and potential deviations from it in a principled manner.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Min avhandling är en diakronisk och kontrastiv undersökning av texttyper. Forskningsmaterialet består av kontaktannonser i tidningarna Süddeutsche Zeitung och Helsingin Sanomat under tiden 1900 – 1999. Materialet består av 652 tyska och 538 finska annonser. De undersökta annonserna har publicerats i maj och har samlats från ovannämnda tidningar vart tionde år. Materialet har analyserats med ett statistiskt SPSS-program. I avhandlingen analyseras utvecklingen av ovannämnda texttyp under hundra år i två olika kulturer, den tyska och den finska. Syftet med avhandlingen är att med hjälp av detta material finna språkliga och kulturella likheter och skillnader i kontaktannonser. Utgångspunkten är att språkliga uttryck avspeglar sin tids samhälleliga värderingar, vilka således också påverkar sökandet efter en livskamrat. Analysresultaten granskas sålunda i ett större samhälleligt sammanhang under olika decennier. Annonstexterna undersöks dock inte utgående från enskilda samhälleliga skeenden. Avhandlingen analyserar 13 olika informationsenheter i kontaktannonserna, huruvida dessa enheter förekommer under hela den aktuella perioden och om samma informationsenheter förekommer i annonser i de båda kulturerna. Avhandlingen är sålunda intra- och interlingual samt interkulturell. Genom denna metod får man fram de kännetecken som är betecknande för denna texttyp under en viss tid i de bägge kulturerna. Avhandlingen är indelad i tre delar. Den första delen ger bakgrundsinformation om äktenskapets och familjebegreppets historia samt om uppkomsten av den tyska och finska pressen. Den andra teoretiska delen behandlar text- och texttyplingvistik samt nuvarande forskning inom dessa områden. Den tredje och mest omfattade delen består av en kvalitativ och kvantitativ analys, som omfattar 11 olika forskningsdelar. Undersökningen visar att man i texttypen kontaktannonser kan upptäcka skillnader t ex redan däri att en tysk annons skiljer sig från en finsk vad längd och informationsmängd beträffar. En finsk annons förlitar sig i sin språkliga knapphet på att läsaren förstår kontexten i texttypen. Av avhandlingen framgår också att vid analys av texttyper bör deras historiska och kulturella kontext beaktas, eftersom analysen påvisar att texttyperna är historie- och kulturbundna.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Suomalaisten ja saksalaisten arkikeskustelujen välillä on sekä yhtäläisyyksiä että eroja. Tässä saksalaisen filologian alaan kuuluvassa tutkimuksessa tarkastellaan yhtä keskeistä arkikeskustelun toimintoa, puhelinkeskustelun lopetusta, suomen- ja saksanpuhujien tuottamana. Aineistona on käytetty suomen- ja saksankielisten äidinkielisten puhujien tätä tutkimusta varten nauhoittamia henkilökohtaisia luonnollisia puhelinkeskusteluja. Aineistoon valikoitui 12 suomalaista ja 12 saksalaista puhelua. Nauhoitteiden käyttöön on saatu asianmukainen lupa kaikilta osapuolilta. Puhelut on litteroitu saksalaisella kielialueella vakiintuneen GAT-litterointisysteemin mukaan. Teoreettis-metodisena kehyksenä on kaksi tutkimusalaa, vuorovaikutuslingvistiikka ja kielten vertailu. Vuorovaikutuslingvistinen tarkastelu keskittyy havaintoihin vuorojen ja puheen sekvenssien rakenteesta. Vuorojen merkitysten tulkinnassa hyödynnetään systemaattisesti prosodian antamia vihjeitä. Tuloksena on yksittäisten lopetusten keskustelunanalyyttinen lähikuvaus, jonka pohjalta määritellään kulloisenkin lopetuksen sekvenssirakenne. Kaikki lopetukset olivat siltä osin yhteneväisiä, että niissä kaikissa havaittiin ainakin aloittava, tulevaan tapaamiseen viittaava sekä lopputervehdyksiin johtava sekvenssi. Sekvenssirakenteen variaatioiden pohjalta aineiston lopetukset voidaan kuitenkin jaotella ryhmiin. Sekä suomen- että saksankielisessä aineistossa havaittiin kolmentyyppisiä lopetuksia: kompakteja, komplekseja ja keskeytettyjä lopetuksia. Ryhmittely kolmeen tyyppiin on avuksi seuraavassa kuvausvaiheessa, jossa verrataan suomen- ja saksankielisiä lopetuksia toisiinsa. Samanaikaisesti kun tutkimus valottaa kohtia, joissa kaksi aineistosettiä yhtenevät ja eroavat, se myös esittää, mitkä vuorovaikutuksen tasot soveltuvat kieltenvälisen vertailun kohteiksi. Pohdintaa siitä, mitä vuorovaikutuksen tasoja kieltenväliseen vertailuun voidaan sisällyttää, onkin toistaiseksi esitetty verrattain vähän. Työ siis rakentaa siltaa vuorovaikutuslingvistisen ja kontrastiivisen kielitieteen välille.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Can crowdsourcing solutions serve many masters? Can they be beneficial for both, for the layman or native speakers of minority languages on the one hand and serious linguistic research on the other? How did an infrastructure that was designed to support linguistics turn out to be a solution for raising awareness of native languages? Since 2012 the National Library of Finland has been developing the Digitisation Project for Kindred Languages, in which the key objective is to support a culture of openness and interaction in linguistic research, but also to promote crowdsourcing as a tool for participation of the language community in research. In the course of the project, over 1,200 monographs and nearly 111,000 pages of newspapers in Finno-Ugric languages will be digitised and made available in the Fenno-Ugrica digital collection. This material was published in the Soviet Union in the 1920s and 1930s, and users have had only sporadic access to the material. The publication of open-access and searchable materials from this period is a goldmine for researchers. Historians, social scientists and laymen with an interest in specific local publications can now find text materials pertinent to their studies. The linguistically-oriented population can also find writings to delight them: (1) lexical items specific to a given publication, and (2) orthographically-documented specifics of phonetics. In addition to the open access collection, we developed an open source code OCR editor that enables the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary since these rare and peripheral prints often include already archaic characters, which are neglected by modern OCR software developers but belong to the historical context of kindred languages, and are thus an essential part of the linguistic heritage. When modelling the OCR editor, it was essential to consider both the needs of researchers and the capabilities of lay citizens, and to have them participate in the planning and execution of the project from the very beginning. By implementing the feedback iteratively from both groups, it was possible to transform the requested changes as tools for research that not only supported the work of linguistics but also encouraged the citizen scientists to face the challenge and work with the crowdsourcing tools for the benefit of research. This presentation will not only deal with the technical aspects, developments and achievements of the infrastructure but will highlight the way in which user groups, researchers and lay citizens were engaged in a process as an active and communicative group of users and how their contributions were made to mutual benefit.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.