20 resultados para Online corpora
em Helda - Digital Repository of University of Helsinki
Resumo:
Tämä pro gradu -tutkielma vertailee korpuksen avulla erisnimien kvantitatiivista jakautumista luokkiin kahdessa saksalaisessa verkkolehdessä. Työn tavoitteena on selvittää, kuinka erisnimiä voidaan luokitella ja mitä eroja niiden avulla on havaittavissa lehtien raportoinnissa. Laajempana kehyksenä toimii kysymys siitä, voidaanko erisnimiä hyödyntäen hahmottaa lehtien sisältöjä. Korpus on kerätty Frankfurter Allgemeine Zeitungin ja Süddeutsche Zeitungin verkkolehtien http: //www.faz.net (FAZ) ja http://www.sueddeutsche.de (SZ) artikkeleista ajalta 2.11.2004-8.11.2004. Valitut sivustot edustavat Saksan arvostetuimpien päivittäisten, koko maan kattavien sanomaleh- tien verkkojulkaisuja. Näistä FAZ:ia pidetään konservatiivisena ja SZ:ia liberaalina lehtenä. Kumpikin korpus käsittelee USA:n presidentinvaaleja syksyllä 2004 ja sisältää hieman alle 30 000 sanaa noin 40 lehtiartikkelista. Aihesidonnaisen korpuksen valinta perustuu erityisesti siihen, että tutkimuksen päämääränä on saada erisnimien avulla selville, miltä osin FAZ ja SZ eroavat toisistaan käsitellessään samaa aihetta. Teoriaosassa käydään läpi saksalaisten verkkolehtien taustaa, työhön liittyviä tekstilingvistisiä teo- rioita sekä erisnimien erikoispiirteitä. Siinä käsitellään myös kolmea aiempaa, saksankielisen eris- nimitutkimuksen luokittelua ja yhtä englanninkielistä, kieliteknologian luokittelua. Näissä havaitut puutteet motivoivat yhdistelemään ja muuttamaan olemassa olevia luokitteluja tätä työtä varten. Uusi luokittelu sisältää neljä yläluokkaa (olentojen, maantieteelliset, instituutioden ja asioiden ni- met), jotka kaikki kattavat kahdesta yhdeksään alaluokkaa. Kummankin korpuksen erisnimet luo- kitellaan tämän perusteella. Kvantitatiivinen analyysi keskittyy ylä- ja alaluokkien vertailuun lehtien välillä. Lisäksi se kattaa sekä kummankin aineiston että pääluokkien frekventimpien sanojen tarkastelun. Vaikka FAZ ja SZ käyttivätkin pääosin samoja erisnimiä raportoinnissaan, voidaan lehtien välillä osoittaa selkeitä eroja alaluokkien kohdalla ja vähäisiä eroja erisnimien jakautumisessa yläluokkiin. chi2 -testin näytti kuitenkin, että erisnimien jakautuminen yläluokkiin on lehtisidonnaista. Siksi voidaan väittää, että muun muassa valittu media vaikuttaa erisnimivalintoihin. Erisnimien frekvenssit antavat ymmärtää, että SZ raportoisi monipuolisemmin kuin FAZ, joka käyttää erisnimiä keskitetymmin. SZ:in aineiston erisnimiä yhdistää eurooppalainen näkökulma vaaleihin, kun taas FAZ pyrkii tuomaan esille tapahtumia USA:n eri osavaltioissa. Niin lehdissä mainitut henkilöiden kuin instituutioden nimet tukevat tätä väitetettä. SZ korostaa maantieteellisesti kaupunkien merkitystä, FAZ osavaltioiden. Saadut tulokset osoittavat, että tämänkaltaisen erisnimitutkimuksen soveltaminen lehtiteksteihin on mahdollista. Luokitellut erisnimet heijastavat osittain käsiteltyjen aineistojen sisältöä ja paljastavat raportoinnin painopisteistä.
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
Resumo:
Marja Heinonen s dissertation Verkkomedian käyttö ja tutkiminen. Iltalehti Online 1995-2001 describes the usage of new internet based news service Iltalehti Online during its first years of existence, 1995-2001. The study focuses on the content of the service and users attitudes towards the new media and its contents. Heinonen has also analyzed and described the research methods that can be used in the research of any new media phenomenon when there is no historical perspective to do the research. Heinonen has created a process model for the research of net medium, which is based on a multidimensional approach. She has chosen an iterative research method inspired by Sudweeks and Simoff s CEDA-methodology in which qualitative and quantitative methods take turns both creating results and new research questions. The dissertation discusses and describes the possibilities of combining several research methods in the study of online news media. On general level it discusses the methodological possibilities of researching a completely new media form when there is no historical perspective. The result of these discussions is in favour for the multidimensional methods. The empiric research was built around three cases of Iltalehti Online among its users: log analysis 1996-1999, interviews 1999 and clustering 2000-2001. Even though the results of different cases were somewhat conflicting here are the central results from the analysis of Iltalehti Online 1995-2001: - Reading was strongly determined by the gender. - The structure of Iltalehti Online guided the reading strongly. - People did not make a clear distinction in content between news and entertainment. - Users created new habits in their everyday life during the first years of using Iltalehti Online. These habits were categorized as follows: - break between everyday routines - established habit - new practice within the rhythm of the day - In the clustering of the users sports, culture and celebrities were the most distinguishing contents. Users did not move across these borders as much as within them. The dissertation gives contribution to the development of multidimensional research methods in the field of emerging phenomena in media field. It is also a unique description of a phase of development in media history through an unique research material. There is no such information (logs + demographics) available of any other Finnish online news media. Either from the first years or today.
Resumo:
Recent evidence from adult pronoun comprehension suggests that semantic factors such as verb transitivity affect referent salience and thereby anap- hora resolution. We tested whether the same semantic factors influence pronoun comprehension in young children. In a visual world study, 3-year- olds heard stories that began with a sentence containing either a high or a low transitivity verb. Looking behaviour to pictures depicting the subject and object of this sentence was recorded as children listened to a subsequent sentence containing a pronoun. Children showed a stronger preference to look to the subject as opposed to the object antecedent in the low transitivity condition. In addition there were general preferences (1) to look to the subject in both conditions and (2) to look more at both potential antecedents in the high transitivity condition. This suggests that children, like adults, are affected by semantic factors, specifically semantic prominence, when interpreting anaphoric pronouns.
Resumo:
The paper explores the effect of customer satisfaction with online supporting services on loyalty to providers of an offline core service. Supporting services are provided to customers before, during, or after the purchase of a tangible or intangible core product, and have the purpose of enhancing or facilitating the use of this product. The internet has the potential to dominate all other marketing channels when it comes to the interactive and personalised communication that is considered quintessential for supporting services. Our study shows that the quality of online supporting services powerfully affects satisfaction with the provider and customer loyalty through its effect on online value and enjoyment. Managerial implications are provided.