16 resultados para Semantic classes
em Helda - Digital Repository of University of Helsinki
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.
Resumo:
Relative Constructions with Pronominal Heads in Contemporary Russian Chapter 1 introduces the distinctive syntactic and semantic properties of Russian relative constructions (RCs), which are then divided into two main classes according to the type of the head phrase. The study concentrates on RCs with pronominal heads, which are systematically compared with noun-headed RCs. Chapter 2 clarifies the categorization of pronouns in Russian. The conclusion is that Russian pronouns include only personal, reflexive and wh-pronouns. The remaining words that are traditionally seen as pronouns are actually functional equivalents of determiners. This idea leads to the suggestion that RCs with these determiner-like words as the only constituent of the head phrase are actually headed by zero pronouns. In the other type of RCs with pronominal heads, the head position is occupied by wh-pronouns with clitics expressing different types of indefiniteness and quantification. Comparison of the two types of pronoun-headed RCs shows that the wh-heads and zero-heads share a number of common properties with respect to the grammatical gender, number and person as well as to the semantic distinction between animates and inanimates. The rest of Chapter 2 gives an overview of various uses of wh-pronouns in Russian and an experimental analysis of RCs headed by pronominal adverbs. Chapter 3 discusses fundamental differences between RCs with noun and pronominal heads. One of the main findings is that the choice of the relative pronoun (kto 'who' and chto 'what' versus kotoryj 'which') is motivated by a tendency to reproduce maximally the essential grammatical and semantic properties of the antecedent. Chapter 4 gives a detailed description of the determiner-like words and wh-based heads used in the two types of RCs with pronominal heads. In addition, several issues related to the syntax and semantics of free relatives are discussed. The conclusion is that there is no need to establish a separate category of free relatives in Russian. Chapter 5 discusses the syntax and semantics of correlative and free concessive constructions. They share a number of properties with pronoun-headed RCs and the two are often confused in Russian linguistics. However, a detailed analysis shows that these constructions must be distinguished from RCs. The study combines the methods of functionally-oriented Russian structuralism with some insights from generative syntax.
Resumo:
Expressing generalized-personal meaning in Russian Based on data from Russian, this doctoral dissertation examines generalized-personal meaning that is, generic expressions referring to all human beings, people in general, each or any person (e.g. S vozrastom načinae cenit prostye ve či With age you start to appreciate simple things ). The study shares its basic theoretical orientation with functional approaches going from meaning to form . The objective of the thesis is to determine and describe the various linguistic means which can be used by the speaker to express generalized-personal meaning. The main material of the study consists of 2,000 examples collected from modern Russian literature, newspapers, and magazines. The linguistic means of expressing generalized-personal meaning are divided into three main classes. Morphological and lexico-grammatical means (22% of the material) include the use of personal pronouns and personal verbal endings. In Russian, all personal forms except the 3rd person singular can be used in a generalized-personal meaning. Lexical means (14% of the material) involve, above all, pronouns like vse all , ka dyj everyone , nikto no one , as well as the nouns čelovek man and ljudi people . In emotional speech, generalized-personal meaning can also be conveyed lexically by using utterances like da e idiot znaet even an idiot knows . In rhetorical questions the pronoun kto who can appear in this meaning (cf. Kto ne ljubit moro enoe?! Who doesn t like ice cream?! ). The third main class, syntactic means (64% of the material), consists of constructions in which the generic person is not expressed at the surface level. This class mainly includes two-component structures in which the infinitive relates to a modal predicative adverb (e.g. mo no can, be allowed to , nado must ), modal verb (e.g. stoit be worth(while) , sleduet must, be obliged to ), or predicative adverb ending in -о (e.g. trudno it is hard to , neprilično is not appropriate ). Other syntactic means are: one-component infinitive structures, so-called embedded structures, structures with a processual noun, passive constructions, and gerund constructions. The different forms of expression available in Russian are not interchangeable in all contexts. Even if a given context tolerates the substitution of one construction for another, the two expressions are never entirely synonymous. In addition to determining the range of forms which can express generalized-personal meaning, the study aims to compare these forms and to specify the conditions and possible restrictions (contextual, semantic, syntactic, stylistic, etc.) associated with the use of each construction. In Russian linguistics, the generalized-personal meaning has not been extensively studied from a functional perspective. The advantage of a meaning-based functional approach is that it gives a comprehensive picture of the diversity and distribution of the phenomenon.
Resumo:
This dissertation is a synchronic description of the phonology and grammar of two dialects of the Rajbanshi language (Eastern Indo-Aryan) as spoken in Jhapa, Nepal. I have primarily confined the analysis to the oral expression, since the emerging literary form is still in its infancy. The grammatical analysis is therefore based, for the most part, on a corpus of oral narrative text which was recorded and transcribed from three informants from north-east Jhapa. An informant, speaking a dialect from south-west Jhapa cross checked this text corpus and provided additional elicited material. I have described the phonology, morphology and syntax of the language, and also one aspect of its discourse structure. For the most part the phonology follows the basic Indo-Aryan pattern. Derivational morphology, compounding, reduplication, echo formation and onomatopoeic constructions are considered, as well as number, noun classes (their assignment and grammatical function), pronouns, and case and postpositions. In verbal morphology I cover causative stems, the copula, primary and secondary agreement, tense, aspect, mood, auxiliary constructions and non-finite forms. The term secondary agreement here refers to genitive agreement, dative-subject agreement and patient (and sometimes patient-agent) agreement. The breaking of default agreement rules has a range of pragmatic inferences. I argue that a distinction, based on formal, semantic and statistical grounds, should be made between conjunct verbs, derivational compound verbs and quasi-aspectual compound verbs. Rajbanshi has an open set of adjectives, and it additionally makes use of a restricted set of nouns which can function as adjectives. Various particles, and the emphatic and conjunctive clitics are also considered. The syntactic structures studied include: non-declarative speech acts, phrase-internal and clause-internal constituent order, negation, subordination, coordination and valence adjustment. I explain how the future, present and past tenses in Rajbanshi oral narratives do not seem to maintain a time reference, but rather to indicate a distinction between background and foreground information. I call this tense neutralisation .
Resumo:
Based on a one-year ethnographic study of a primary school in Finland with specialised classes in Finnish and English (referred to as bilingual classes by research participants), this research traces patterns of how nationed, raced, classed and gendered differences are produced and gain meaning in school. I examine several aspects of these differences: the ways the teachers and parents make sense of school and of school choice; the repertoires of self put forward by teachers, parents and pupils of the bilingual classes; and the insitutional and classroom practices in Sunny Lane School (pseudonym). My purpose is to examine how the construction of differentness is related to the policy of school choice. I approach this questions from a knowledge problematic, and explore connections and disjunctions between the interpretations of teachers and those of parents, as well as between what teachers and parents expressed or said and the practices they engaged in. My data consists of fieldnotes generated through a one-year period of ethnographic study in Sunny Lane School, and of ethnographic interviews with teachers and parents primarily of the bilingual classes. This data focuses on the initial stages of the bilingual classes, which included the application and testing processes for these classes, and on Grades 1─3. In my analysis, I pursue poststructural feminist theorisations on questions of knowledge, power and subjectivity, which foreground an understanding of the constitutive force of discourse and the performative, partial, and relational nature of knowledge. I begin by situating my ethnographic field in relation to wider developments, namely, the emergence of school choice and the rhetoric of curricular reform and language education in Finland. I move on from there to ask how teachers discuss the introduction of these specialised classes, then trace pupils paths to these classes, their parents goals related to school choice, teachers constructions of the pupils and parents of bilingual classes, and how these shape the ways in which school and classroom practices unfold. School choice, I argue, functioned as a spatial practice, defining who belongs in school and demarcating the position of teachers, parents and pupils in school. Notions of classed and ethnicised differences entered the ways teachers and parents made sense of school choice. Teachers idealised school in terms of social cohesiveness and constructed social cohesion as a task for school to perform. The hopes parents iterated were connected to ensuring their children s futurity, to their perceptions of the advantages of fluency in English, but also to the differences they believed to exist between the social milieus of different schools. Ideals such as openmindedness and cosmopolitanism were also articulated by parents, and these ideals assumed different content for ethnic majority and minority parents. Teachers discussed the introduction of bilingual classes as being a means to ensure the school s future, and emphasised bilingual classes as fitting into the rubric of Finnish comprehensive schooling which, they maintained, is committed to equality. Parents were expected to accommodate their views and adopt the position of the responsible, supportive parent that was suggested to them by teachers. Teachers assumed a posture teachers of appreciating different cultures, while maintaining Finnishness as common ground in school. Discussion on pupils knowledge and experience of other countries took place often in bilingual classes, and various cultural theme events were organized on occasion. In school, pupils are taught to identify themselves in terms of cultural belonging. The rhetoric promoted by teachers was one of inclusiveness, which was also applied to describe the task of qualifying pupils for bilingual classes, qualifying which pupils can belong. Bilingual classes were idealised as taking a neutral, impartial posture toward difference by ethnic majority teachers and parents, and the relationship of school choice to classed advantage, for example, was something teachers, as well as parents, preferred not to discuss. Pupils were addressed by teachers during lessons in ways that assumed self responsibility and diligence, and they assumed the discursive category of being good, competent pupils made available to them. While this allowed them to position themselves favourably in school, their participation in a bilingual class was marked by the pressure to succeed well in school.
Resumo:
Alzheimer's disease (AD) is characterized by an impairment of the semantic memory responsible for processing meaning-related knowledge. This study was aimed at examining how Finnish-speaking healthy elderly subjects (n = 30) and mildly (n=20) and moderately (n = 20) demented AD patients utilize semantic knowledge to performa semantic fluency task, a method of studying semantic memory. In this task subjects are typically given 60 seconds to generate words belonging to the semantic category of animals. Successful task performance requires fast retrieval of subcategory exemplars in clusters (e.g., farm animals: 'cow', 'horse', 'sheep') and switching between subcategories (e.g., pets, water animals, birds, rodents). In this study, thescope of the task was extended to cover various noun and verb categories. The results indicated that, compared with normal controls, both mildly and moderately demented AD patients showed reduced word production, limited clustering and switching, narrowed semantic space, and an increase in errors, particularly perseverations. However, the size of the clusters, the proportion of clustered words, and the frequency and prototypicality of words remained relatively similar across the subject groups. Although the moderately demented patients showed a poor eroverall performance than the mildly demented patients in the individual categories, the error analysis appeared unaffected by the severity of AD. The results indicate a semantically rather coherent performance but less specific, effective, and flexible functioning of the semantic memory in mild and moderate AD patients. The findings are discussed in relation to recent theories of word production and semantic representation. Keywords: semantic fluency, clustering, switching, semantic category, nouns, verbs, Alzheimer's disease
Resumo:
It has been suggested that semantic information processing is modularized according to the input form (e.g., visual, verbal, non-verbal sound). A great deal of research has concentrated on detecting a separate verbal module. Also, it has traditionally been assumed in linguistics that the meaning of a single clause is computed before integration to a wider context. Recent research has called these views into question. The present study explored whether it is reasonable to assume separate verbal and nonverbal semantic systems in the light of the evidence from event-related potentials (ERPs). The study also provided information on whether the context influences processing of a single clause before the local meaning is computed. The focus was on an ERP called N400. Its amplitude is assumed to reflect the effort required to integrate an item to the preceding context. For instance, if a word is anomalous in its context, it will elicit a larger N400. N400 has been observed in experiments using both verbal and nonverbal stimuli. Contents of a single sentence were not hypothesized to influence the N400 amplitude. Only the combined contents of the sentence and the picture were hypothesized to influence the N400. The subjects (n = 17) viewed pictures on a computer screen while hearing sentences through headphones. Their task was to judge the congruency of the picture and the sentence. There were four conditions: 1) the picture and the sentence were congruent and sensible, 2) the sentence and the picture were congruent, but the sentence ended anomalously, 3) the picture and the sentence were incongruent but sensible, 4) the picture and the sentence were incongruent and anomalous. Stimuli from the four conditions were presented in a semi-randomized sequence. Their electroencephalography was simultaneously recorded. ERPs were computed for the four conditions. The amplitude of the N400 effect was largest in the incongruent sentence-picture -pairs. The anomalously ending sentences did not elicit a larger N400 than the sensible sentences. The results suggest that there is no separate verbal semantic system, and that the meaning of a single clause is not processed independent of the context.
Resumo:
DEVELOPING A TEXTILE ONTOLOGY FOR THE SEMANTIC WEB AND CONNECTING IT TO MUSEUM CATALOGING DATA The goal of the Semantic Web is to share concept-based information in a versatile way on the Internet. This is achievable using formal data structures called ontologies. The goal of this re-search is to increase the usability of museum cataloging data in information retrieval. The work is interdisciplinary, involving craft science, terminology science, computer science, and museology. In the first part of the dissertation an ontology of concepts of textiles, garments, and accessories is developed for museum cataloging work. The ontology work was done with the help of thesauri, vocabularies, research reports, and standards. The basis of the ontology development was the Museoalan asiasanasto MASA, a thesaurus for museum cataloging work which has been enriched by other vocabularies. Concepts and terms concerning the research object, as well as the material names of textiles, costumes, and accessories, were focused on. The research method was terminological concept analysis complemented by an ontological view of the Semantic Web. The concept structure was based on the hierarchical generic relation. Attention was also paid to other relations between terms and concepts, and between concepts themselves. Altogether 977 concept classes were created. Issues including how to choose and name concepts for the ontology hierarchy and how deep and broad the hierarchy could be are discussed from the viewpoint of the ontology developer and museum cataloger. The second part of the dissertation analyzes why some of the cataloged terms did not match with the developed textile ontology. This problem is significant because it prevents automatic ontological content integration of the cataloged data on the Semantic Web. The research datasets, i.e. the cataloged museum data on textile collections, came from three museums: Espoo City Museum, Lahti City Museum and The National Museum of Finland. The data included 1803 textile, costume, and accessory objects. Unmatched object and textile material names were analyzed. In the case of the object names six categories (475 cases), and of the material names eight categories (423 cases), were found where automatic annotation was not possible. The most common explanation was that the cataloged field was filled with a long sentence comprised of many terms. Sometimes in the compound term, the object name and material, or the name and the way of usage, were combined. As well, numeric values in the material name cataloging field prevented annotation and so did the absence of a corresponding concept in the ontology. Ready-made drop-down lists of materials used in one cataloging system facilitated the annotation. In the case of naming objects and materials, one should use terms in basic form without attributes. The developed textile ontology has been applied in two cultural portals, MuseumFinland and Culturesampo, where one can search for and browse information based on cataloged data using integrated ontologies in an interoperable way. The textile ontology is also part of the national FinnONTO ontology infrastructure. Keywords: annotation, concept, concept analysis, cataloging, museum collection, ontology, Semantic Web, textile collection, textile material
Resumo:
The research in model theory has extended from the study of elementary classes to non-elementary classes, i.e. to classes which are not completely axiomatizable in elementary logic. The main theme has been the attempt to generalize tools from elementary stability theory to cover more applications arising in other branches of mathematics. In this doctoral thesis we introduce finitary abstract elementary classes, a non-elementary framework of model theory. These classes are a special case of abstract elementary classes (AEC), introduced by Saharon Shelah in the 1980's. We have collected a set of properties for classes of structures, which enable us to develop a 'geometric' approach to stability theory, including an independence calculus, in a very general framework. The thesis studies AEC's with amalgamation, joint embedding, arbitrarily large models, countable Löwenheim-Skolem number and finite character. The novel idea is the property of finite character, which enables the use of a notion of a weak type instead of the usual Galois type. Notions of simplicity, superstability, Lascar strong type, primary model and U-rank are inroduced for finitary classes. A categoricity transfer result is proved for simple, tame finitary classes: categoricity in any uncountable cardinal transfers upwards and to all cardinals above the Hanf number. Unlike the previous categoricity transfer results of equal generality the theorem does not assume the categoricity cardinal being a successor. The thesis consists of three independent papers. All three papers are joint work with Tapani Hyttinen.
Resumo:
Recent evidence from adult pronoun comprehension suggests that semantic factors such as verb transitivity affect referent salience and thereby anap- hora resolution. We tested whether the same semantic factors influence pronoun comprehension in young children. In a visual world study, 3-year- olds heard stories that began with a sentence containing either a high or a low transitivity verb. Looking behaviour to pictures depicting the subject and object of this sentence was recorded as children listened to a subsequent sentence containing a pronoun. Children showed a stronger preference to look to the subject as opposed to the object antecedent in the low transitivity condition. In addition there were general preferences (1) to look to the subject in both conditions and (2) to look more at both potential antecedents in the high transitivity condition. This suggests that children, like adults, are affected by semantic factors, specifically semantic prominence, when interpreting anaphoric pronouns.