30 resultados para online classification
em Helda - Digital Repository of University of Helsinki
Resumo:
Tämä pro gradu -tutkielma vertailee korpuksen avulla erisnimien kvantitatiivista jakautumista luokkiin kahdessa saksalaisessa verkkolehdessä. Työn tavoitteena on selvittää, kuinka erisnimiä voidaan luokitella ja mitä eroja niiden avulla on havaittavissa lehtien raportoinnissa. Laajempana kehyksenä toimii kysymys siitä, voidaanko erisnimiä hyödyntäen hahmottaa lehtien sisältöjä. Korpus on kerätty Frankfurter Allgemeine Zeitungin ja Süddeutsche Zeitungin verkkolehtien http: //www.faz.net (FAZ) ja http://www.sueddeutsche.de (SZ) artikkeleista ajalta 2.11.2004-8.11.2004. Valitut sivustot edustavat Saksan arvostetuimpien päivittäisten, koko maan kattavien sanomaleh- tien verkkojulkaisuja. Näistä FAZ:ia pidetään konservatiivisena ja SZ:ia liberaalina lehtenä. Kumpikin korpus käsittelee USA:n presidentinvaaleja syksyllä 2004 ja sisältää hieman alle 30 000 sanaa noin 40 lehtiartikkelista. Aihesidonnaisen korpuksen valinta perustuu erityisesti siihen, että tutkimuksen päämääränä on saada erisnimien avulla selville, miltä osin FAZ ja SZ eroavat toisistaan käsitellessään samaa aihetta. Teoriaosassa käydään läpi saksalaisten verkkolehtien taustaa, työhön liittyviä tekstilingvistisiä teo- rioita sekä erisnimien erikoispiirteitä. Siinä käsitellään myös kolmea aiempaa, saksankielisen eris- nimitutkimuksen luokittelua ja yhtä englanninkielistä, kieliteknologian luokittelua. Näissä havaitut puutteet motivoivat yhdistelemään ja muuttamaan olemassa olevia luokitteluja tätä työtä varten. Uusi luokittelu sisältää neljä yläluokkaa (olentojen, maantieteelliset, instituutioden ja asioiden ni- met), jotka kaikki kattavat kahdesta yhdeksään alaluokkaa. Kummankin korpuksen erisnimet luo- kitellaan tämän perusteella. Kvantitatiivinen analyysi keskittyy ylä- ja alaluokkien vertailuun lehtien välillä. Lisäksi se kattaa sekä kummankin aineiston että pääluokkien frekventimpien sanojen tarkastelun. Vaikka FAZ ja SZ käyttivätkin pääosin samoja erisnimiä raportoinnissaan, voidaan lehtien välillä osoittaa selkeitä eroja alaluokkien kohdalla ja vähäisiä eroja erisnimien jakautumisessa yläluokkiin. chi2 -testin näytti kuitenkin, että erisnimien jakautuminen yläluokkiin on lehtisidonnaista. Siksi voidaan väittää, että muun muassa valittu media vaikuttaa erisnimivalintoihin. Erisnimien frekvenssit antavat ymmärtää, että SZ raportoisi monipuolisemmin kuin FAZ, joka käyttää erisnimiä keskitetymmin. SZ:in aineiston erisnimiä yhdistää eurooppalainen näkökulma vaaleihin, kun taas FAZ pyrkii tuomaan esille tapahtumia USA:n eri osavaltioissa. Niin lehdissä mainitut henkilöiden kuin instituutioden nimet tukevat tätä väitetettä. SZ korostaa maantieteellisesti kaupunkien merkitystä, FAZ osavaltioiden. Saadut tulokset osoittavat, että tämänkaltaisen erisnimitutkimuksen soveltaminen lehtiteksteihin on mahdollista. Luokitellut erisnimet heijastavat osittain käsiteltyjen aineistojen sisältöä ja paljastavat raportoinnin painopisteistä.
Resumo:
Hereditary nonpolyposis colorectal cancer (HNPCC) is the most common known clearly hereditary cause of colorectal and endometrial cancer (CRC and EC). Dominantly inherited mutations in one of the known mismatch repair (MMR) genes predispose to HNPCC. Defective MMR leads to an accumulation of mutations especially in repeat tracts, presenting microsatellite instability. HNPCC is clinically a very heterogeneous disease. The age at onset varies and the target tissue may vary. In addition, families that fulfill the diagnostic criteria for HNPCC but fail to show any predisposing mutation in MMR genes exist. Our aim was to evaluate the genetic background of familial CRC and EC. We performed comprehensive molecular and DNA copy number analyses of CRCs fulfilling the diagnostic criteria for HNPCC. We studied the role of five pathways (MMR, Wnt, p53, CIN, PI3K/AKT) and divided the tumors into two groups, one with MMR gene germline mutations and the other without. We observed that MMR proficient familial CRC consist of two molecularly distinct groups that differ from MMR deficient tumors. Group A shows paucity of common molecular and chromosomal alterations characteristic of colorectal carcinogenesis. Group B shows molecular features similar to classical microsatellite stable tumors with gross chromosomal alterations. Our finding of a unique tumor profile in group A suggests the involvement of novel predisposing genes and pathways in colorectal cancer cohorts not linked to MMR gene defects. We investigated the genetic background of familial ECs. Among 22 families with clustering of EC, two (9%) were due to MMR gene germline mutations. The remaining familial site-specific ECs are largely comparable with HNPCC associated ECs, the main difference between these groups being MMR proficiency vs. deficiency. We studied the role of PI3K/AKT pathway in familial ECs as well and observed that PIK3CA amplifications are characteristic of familial site-specific EC without MMR gene germline mutations. Most of the high-level amplifications occurred in tumors with stable microsatellites, suggesting that these tumors are more likely associated with chromosomal rather than microsatellite instability and MMR defect. The existence of site-specific endometrial carcinoma as a separate entity remains equivocal until predisposing genes are identified. It is possible that no single highly penetrant gene for this proposed syndrome exists, it may, for example be due to a combination of multiple low penetrance genes. Despite advances in deciphering the molecular genetic background of HNPCC, it is poorly understood why certain organs are more susceptible than others to cancer development. We found that important determinants of the HNPCC tumor spectrum are, in addition to different predisposing germline mutations, organ specific target genes and different instability profiles, loss of heterozygosity at MLH1 locus, and MLH1 promoter methylation. This study provided more precise molecular classification of families with CRC and EC. Our observations on familial CRC and EC are likely to have broader significance that extends to sporadic CRC and EC as well.
Resumo:
A new rock mass classification scheme, the Host Rock Classification system (HRC-system) has been developed for evaluating the suitability of volumes of rock mass for the disposal of high-level nuclear waste in Precambrian crystalline bedrock. To support the development of the system, the requirements of host rock to be used for disposal have been studied in detail and the significance of the various rock mass properties have been examined. The HRC-system considers both the long-term safety of the repository and the constructability in the rock mass. The system is specific to the KBS-3V disposal concept and can be used only at sites that have been evaluated to be suitable at the site scale. By using the HRC-system, it is possible to identify potentially suitable volumes within the site at several different scales (repository, tunnel and canister scales). The selection of the classification parameters to be included in the HRC-system is based on an extensive study on the rock mass properties and their various influences on the long-term safety, the constructability and the layout and location of the repository. The parameters proposed for the classification at the repository scale include fracture zones, strength/stress ratio, hydraulic conductivity and the Groundwater Chemistry Index. The parameters proposed for the classification at the tunnel scale include hydraulic conductivity, Q´ and fracture zones and the parameters proposed for the classification at the canister scale include hydraulic conductivity, Q´, fracture zones, fracture width (aperture + filling) and fracture trace length. The parameter values will be used to determine the suitability classes for the volumes of rock to be classified. The HRC-system includes four suitability classes at the repository and tunnel scales and three suitability classes at the canister scale and the classification process is linked to several important decisions regarding the location and acceptability of many components of the repository at all three scales. The HRC-system is, thereby, one possible design tool that aids in locating the different repository components into volumes of host rock that are more suitable than others and that are considered to fulfil the fundamental requirements set for the repository host rock. The generic HRC-system, which is the main result of this work, is also adjusted to the site-specific properties of the Olkiluoto site in Finland and the classification procedure is demonstrated by a test classification using data from Olkiluoto. Keywords: host rock, classification, HRC-system, nuclear waste disposal, long-term safety, constructability, KBS-3V, crystalline bedrock, Olkiluoto
Resumo:
In visual object detection and recognition, classifiers have two interesting characteristics: accuracy and speed. Accuracy depends on the complexity of the image features and classifier decision surfaces. Speed depends on the hardware and the computational effort required to use the features and decision surfaces. When attempts to increase accuracy lead to increases in complexity and effort, it is necessary to ask how much are we willing to pay for increased accuracy. For example, if increased computational effort implies quickly diminishing returns in accuracy, then those designing inexpensive surveillance applications cannot aim for maximum accuracy at any cost. It becomes necessary to find trade-offs between accuracy and effort. We study efficient classification of images depicting real-world objects and scenes. Classification is efficient when a classifier can be controlled so that the desired trade-off between accuracy and effort (speed) is achieved and unnecessary computations are avoided on a per input basis. A framework is proposed for understanding and modeling efficient classification of images. Classification is modeled as a tree-like process. In designing the framework, it is important to recognize what is essential and to avoid structures that are narrow in applicability. Earlier frameworks are lacking in this regard. The overall contribution is two-fold. First, the framework is presented, subjected to experiments, and shown to be satisfactory. Second, certain unconventional approaches are experimented with. This allows the separation of the essential from the conventional. To determine if the framework is satisfactory, three categories of questions are identified: trade-off optimization, classifier tree organization, and rules for delegation and confidence modeling. Questions and problems related to each category are addressed and empirical results are presented. For example, related to trade-off optimization, we address the problem of computational bottlenecks that limit the range of trade-offs. We also ask if accuracy versus effort trade-offs can be controlled after training. For another example, regarding classifier tree organization, we first consider the task of organizing a tree in a problem-specific manner. We then ask if problem-specific organization is necessary.
Resumo:
Open Access -liike pyrkii vapauttamaan tieteellisen tiedon kaupallisuuden rajoitteista edesauttamalla artikkeleiden rinnakkaisversioiden avointa ja esteetöntä verkkotallennusta. Sen mahdollistamiseksi verkkoon perustetaan julkaisuarkistoja, joiden toiminta-ajatuksena on säilöä taustayhteisönsä tieteellinen tuotanto avoimesti ja keskitetysti yhteen paikkaan. Avoimen lähdekoodin arkistosovellukset jakavat sisältönsä OAI-protokollan avulla ja muodostavat näin globaalin virtuaalisen tietoverkon. Suurten tietomäärien käsittelyssä on huomioitava erityisesti kuvailutiedon rooli tehokkaiden hakujen toteuttamisessa sekä tiedon yksilöiminen verkossa erilaisten pysyvien tunnisteiden, kuten Handle:n tai URN:n avulla. Tieteellisen tiedon avoimella saatavuudella on merkittävä vaikutus myös oppimisen näkökulmasta. Julkaisuarkistot tarjoavat oppimateriaalin lisäksi uusia mahdollisuuksia julkaisukanavan ja oppimisymp äristön integroimiseen. Työssä esitellään avoimen saatavuuden keskeisiä teemoja sekä sen käytännön toteutusta varten kehitettyjä teknisiä ratkaisuja. Näiden pohjalta toteutetaan Meilahden kampuksen avoin julkaisuarkisto. Työssä pohditaan myös julkaisuarkistojen soveltuvuutta oppimisprosessin tukemiseen tutkivan- ja sulautuvan oppimisen viitekehyksessä. ACM Computing Classification System (CCS): H.3 [INFORMATION STORAGE AND RETRIEVAL], H.3.7 [Digital Libraries], H.3.3 [Information Search and Retrieval], H.3.5 [Online Information Services], K.3 [COMPUTERS AND EDUCATION], K.3.1 [Computer Uses in Education]
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
Resumo:
Climate change contributes directly or indirectly to changes in species distributions, and there is very high confidence that recent climate warming is already affecting ecosystems. The Arctic has already experienced the greatest regional warming in recent decades, and the trend is continuing. However, studies on the northern ecosystems are scarce compared to more southerly regions. Better understanding of the past and present environmental change is needed to be able to forecast the future. Multivariate methods were used to explore the distributional patterns of chironomids in 50 shallow (≤ 10m) lakes in relation to 24 variables determined in northern Fennoscandia at the ecotonal area from the boreal forest in the south to the orohemiarctic zone in the north. Highest taxon richness was noted at middle elevations around 400 m a.s.l. Significantly lower values were observed from cold lakes situated in the tundra zone. Lake water alkalinity had the strongest positive correlation with the taxon richness. Many taxa had preference for lakes either on tundra area or forested area. The variation in the chironomid abundance data was best correlated with sediment organic content (LOI), lake water total organic carbon content, pH and air temperature, with LOI being the strongest variable. Three major lake groups were separated on the basis of their chironomid assemblages: (i) small and shallow organic-rich lakes, (ii) large and base-rich lakes, and (iii) cold and clear oligotrophic tundra lakes. Environmental variables best discriminating the lake groups were LOI, taxon richness, and Mg. When repeated, this kind of an approach could be useful and efficient in monitoring the effects of global change on species ranges. Many species of fast spreading insects, including chironomids, show a remarkable ability to track environmental changes. Based on this ability, past environmental conditions have been reconstructed using their chitinous remains in the lake sediment profiles. In order to study the Holocene environmental history of subarctic aquatic systems, and quantitatively reconstruct the past temperatures at or near the treeline, long sediment cores covering the last 10000 years (the Holocene) were collected from three lakes. Lower temperature values than expected based on the presence of pine in the catchment during the mid-Holocene were reconstructed from a lake with great water volume and depth. The lake provided thermal refuge for profundal, cold adapted taxa during the warm period. In a shallow lake, the decrease in the reconstructed temperatures during the late Holocene may reflect the indirect response of the midges to climate change through, e.g., pH change. The results from three lakes indicated that the response of chironomids to climate have been more or less indirect. However, concurrent shifts in assemblages of chironomids and vegetation in two lakes during the Holocene time period indicated that the midges together with the terrestrial vegetation had responded to the same ultimate cause, which most likely was the Holocene climate change. This was also supported by the similarity in the long-term trends in faunal succession for the chironomid assemblages in several lakes in the area. In northern Finnish Lapland the distribution of chironomids were significantly correlated with physical and limnological factors that are most likely to change as a result of future climate change. The indirect and individualistic response of aquatic systems, as reconstructed using the chironomid assemblages, to the climate change in the past suggests that in the future, the lake ecosystems in the north do not respond in one predictable way to the global climate change. Lakes in the north may respond to global climate change in various ways that are dependent on the initial characters of the catchment area and the lake.
Resumo:
Marja Heinonen s dissertation Verkkomedian käyttö ja tutkiminen. Iltalehti Online 1995-2001 describes the usage of new internet based news service Iltalehti Online during its first years of existence, 1995-2001. The study focuses on the content of the service and users attitudes towards the new media and its contents. Heinonen has also analyzed and described the research methods that can be used in the research of any new media phenomenon when there is no historical perspective to do the research. Heinonen has created a process model for the research of net medium, which is based on a multidimensional approach. She has chosen an iterative research method inspired by Sudweeks and Simoff s CEDA-methodology in which qualitative and quantitative methods take turns both creating results and new research questions. The dissertation discusses and describes the possibilities of combining several research methods in the study of online news media. On general level it discusses the methodological possibilities of researching a completely new media form when there is no historical perspective. The result of these discussions is in favour for the multidimensional methods. The empiric research was built around three cases of Iltalehti Online among its users: log analysis 1996-1999, interviews 1999 and clustering 2000-2001. Even though the results of different cases were somewhat conflicting here are the central results from the analysis of Iltalehti Online 1995-2001: - Reading was strongly determined by the gender. - The structure of Iltalehti Online guided the reading strongly. - People did not make a clear distinction in content between news and entertainment. - Users created new habits in their everyday life during the first years of using Iltalehti Online. These habits were categorized as follows: - break between everyday routines - established habit - new practice within the rhythm of the day - In the clustering of the users sports, culture and celebrities were the most distinguishing contents. Users did not move across these borders as much as within them. The dissertation gives contribution to the development of multidimensional research methods in the field of emerging phenomena in media field. It is also a unique description of a phase of development in media history through an unique research material. There is no such information (logs + demographics) available of any other Finnish online news media. Either from the first years or today.
Resumo:
Recent evidence from adult pronoun comprehension suggests that semantic factors such as verb transitivity affect referent salience and thereby anap- hora resolution. We tested whether the same semantic factors influence pronoun comprehension in young children. In a visual world study, 3-year- olds heard stories that began with a sentence containing either a high or a low transitivity verb. Looking behaviour to pictures depicting the subject and object of this sentence was recorded as children listened to a subsequent sentence containing a pronoun. Children showed a stronger preference to look to the subject as opposed to the object antecedent in the low transitivity condition. In addition there were general preferences (1) to look to the subject in both conditions and (2) to look more at both potential antecedents in the high transitivity condition. This suggests that children, like adults, are affected by semantic factors, specifically semantic prominence, when interpreting anaphoric pronouns.
Resumo:
The paper explores the effect of customer satisfaction with online supporting services on loyalty to providers of an offline core service. Supporting services are provided to customers before, during, or after the purchase of a tangible or intangible core product, and have the purpose of enhancing or facilitating the use of this product. The internet has the potential to dominate all other marketing channels when it comes to the interactive and personalised communication that is considered quintessential for supporting services. Our study shows that the quality of online supporting services powerfully affects satisfaction with the provider and customer loyalty through its effect on online value and enjoyment. Managerial implications are provided.