968 resultados para Semantic relations
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.
Resumo:
Social tagging evolved in response to a need to tag heterogeneous objects, the automated tagging of which is usually not feasible by current technological means. Social tagging can be used for more flexible competence management within organizations. The profiles of employees can be built in the form of groups of tags, as employees tag each other, based on their familiarity of each other’s expertise. This can serve as a replacement for the more traditional competence management approaches, which usually become outdated due to social and organizational hurdles, and obsolete data. These limitations can be overcome by people tagging, as the information revealed by such tags is usually based on most recent employee interaction and knowledge. Task management as part of personal information management aims at the support of users’ individual task handling. This can include collaborating with other individuals, sharing one’s knowledge, both functional and process-related, and distributing documents and web resources. In this context, Task patterns can be used as templates that collect information and experience around tasks associated to it during run time, facilitating agility. The effective collaboration among contributors necessitates the means to find the appropriate individuals to work with on the task, and this can be made possible by using social tagging to describe individual competencies. The goal of this study is to support finding and tagging people within task management, through the effective exploitation of the work/task context. This involves the utilization of knowledge of the workers’ expertise, nature of the task/task pattern and information available from the documents and web resources attached to the task. Vice versa, task management provides an excellent environment for social tagging due to the task context that already provides suitable tags. The study also aims at assisting users of the task management solution with the collaborative construction of light-weight ontology by inferring semantic relations between tags. The thesis project aims at an implementation of people finding & tagging within the java application for task management that consumes web services, which provide the required ontology for the organization.
Resumo:
Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.
Resumo:
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
Resumo:
La diversification des résultats de recherche (DRR) vise à sélectionner divers documents à partir des résultats de recherche afin de couvrir autant d’intentions que possible. Dans les approches existantes, on suppose que les résultats initiaux sont suffisamment diversifiés et couvrent bien les aspects de la requête. Or, on observe souvent que les résultats initiaux n’arrivent pas à couvrir certains aspects. Dans cette thèse, nous proposons une nouvelle approche de DRR qui consiste à diversifier l’expansion de requête (DER) afin d’avoir une meilleure couverture des aspects. Les termes d’expansion sont sélectionnés à partir d’une ou de plusieurs ressource(s) suivant le principe de pertinence marginale maximale. Dans notre première contribution, nous proposons une méthode pour DER au niveau des termes où la similarité entre les termes est mesurée superficiellement à l’aide des ressources. Quand plusieurs ressources sont utilisées pour DER, elles ont été uniformément combinées dans la littérature, ce qui permet d’ignorer la contribution individuelle de chaque ressource par rapport à la requête. Dans la seconde contribution de cette thèse, nous proposons une nouvelle méthode de pondération de ressources selon la requête. Notre méthode utilise un ensemble de caractéristiques qui sont intégrées à un modèle de régression linéaire, et génère à partir de chaque ressource un nombre de termes d’expansion proportionnellement au poids de cette ressource. Les méthodes proposées pour DER se concentrent sur l’élimination de la redondance entre les termes d’expansion sans se soucier si les termes sélectionnés couvrent effectivement les différents aspects de la requête. Pour pallier à cet inconvénient, nous introduisons dans la troisième contribution de cette thèse une nouvelle méthode pour DER au niveau des aspects. Notre méthode est entraînée de façon supervisée selon le principe que les termes reliés doivent correspondre au même aspect. Cette méthode permet de sélectionner des termes d’expansion à un niveau sémantique latent afin de couvrir autant que possible différents aspects de la requête. De plus, cette méthode autorise l’intégration de plusieurs ressources afin de suggérer des termes d’expansion, et supporte l’intégration de plusieurs contraintes telles que la contrainte de dispersion. Nous évaluons nos méthodes à l’aide des données de ClueWeb09B et de trois collections de requêtes de TRECWeb track et montrons l’utilité de nos approches par rapport aux méthodes existantes.
Resumo:
The ongoing growth of the World Wide Web, catalyzed by the increasing possibility of ubiquitous access via a variety of devices, continues to strengthen its role as our prevalent information and commmunication medium. However, although tools like search engines facilitate retrieval, the task of finally making sense of Web content is still often left to human interpretation. The vision of supporting both humans and machines in such knowledge-based activities led to the development of different systems which allow to structure Web resources by metadata annotations. Interestingly, two major approaches which gained a considerable amount of attention are addressing the problem from nearly opposite directions: On the one hand, the idea of the Semantic Web suggests to formalize the knowledge within a particular domain by means of the "top-down" approach of defining ontologies. On the other hand, Social Annotation Systems as part of the so-called Web 2.0 movement implement a "bottom-up" style of categorization using arbitrary keywords. Experience as well as research in the characteristics of both systems has shown that their strengths and weaknesses seem to be inverse: While Social Annotation suffers from problems like, e. g., ambiguity or lack or precision, ontologies were especially designed to eliminate those. On the contrary, the latter suffer from a knowledge acquisition bottleneck, which is successfully overcome by the large user populations of Social Annotation Systems. Instead of being regarded as competing paradigms, the obvious potential synergies from a combination of both motivated approaches to "bridge the gap" between them. These were fostered by the evidence of emergent semantics, i. e., the self-organized evolution of implicit conceptual structures, within Social Annotation data. While several techniques to exploit the emergent patterns were proposed, a systematic analysis - especially regarding paradigms from the field of ontology learning - is still largely missing. This also includes a deeper understanding of the circumstances which affect the evolution processes. This work aims to address this gap by providing an in-depth study of methods and influencing factors to capture emergent semantics from Social Annotation Systems. We focus hereby on the acquisition of lexical semantics from the underlying networks of keywords, users and resources. Structured along different ontology learning tasks, we use a methodology of semantic grounding to characterize and evaluate the semantic relations captured by different methods. In all cases, our studies are based on datasets from several Social Annotation Systems. Specifically, we first analyze semantic relatedness among keywords, and identify measures which detect different notions of relatedness. These constitute the input of concept learning algorithms, which focus then on the discovery of synonymous and ambiguous keywords. Hereby, we assess the usefulness of various clustering techniques. As a prerequisite to induce hierarchical relationships, our next step is to study measures which quantify the level of generality of a particular keyword. We find that comparatively simple measures can approximate the generality information encoded in reference taxonomies. These insights are used to inform the final task, namely the creation of concept hierarchies. For this purpose, generality-based algorithms exhibit advantages compared to clustering approaches. In order to complement the identification of suitable methods to capture semantic structures, we analyze as a next step several factors which influence their emergence. Empirical evidence is provided that the amount of available data plays a crucial role for determining keyword meanings. From a different perspective, we examine pragmatic aspects by considering different annotation patterns among users. Based on a broad distinction between "categorizers" and "describers", we find that the latter produce more accurate results. This suggests a causal link between pragmatic and semantic aspects of keyword annotation. As a special kind of usage pattern, we then have a look at system abuse and spam. While observing a mixed picture, we suggest that an individual decision should be taken instead of disregarding spammers as a matter of principle. Finally, we discuss a set of applications which operationalize the results of our studies for enhancing both Social Annotation and semantic systems. These comprise on the one hand tools which foster the emergence of semantics, and on the one hand applications which exploit the socially induced relations to improve, e. g., searching, browsing, or user profiling facilities. In summary, the contributions of this work highlight viable methods and crucial aspects for designing enhanced knowledge-based services of a Social Semantic Web.
Resumo:
Perception is linked to action via two routes: a direct route based on affordance information in the environment and an indirect route based on semantic knowledge about objects. The present study explored the factors modulating the recruitment of the two routes, in particular which factors affecting the selection of paired objects. In Experiment 1, we presented real objects among semantically related or unrelated distracters. Participants had to select two objects that can interact. The presence of distracters affected selection times, but not the semantic relations of the objects with the distracters. Furthermore, participants first selected the active object (e.g. teaspoon) with their right hand, followed by the passive object (e.g. mug), often with their left hand. In Experiment 2, we presented pictures of the same objects with no hand grip, congruent or incongruent hand grip. Participants had to decide whether the two objects can interact. Action decisions were faster when the presentation of the active object preceded the presentation of the passive object, and when the grip was congruent. Interestingly, participants were slower when the objects were semantically but not functionally related; this effect increased with congruently gripped objects. Our data showed that action decisions in the presence of strong affordance cues (real objects, pictures of congruently gripped objects) relied on sensory-motor representation, supporting the direct route from perception-to-action that bypasses semantic knowledge. However, in the case of weak affordance cues (pictures), semantic information interfered with action decisions, indicating that semantic knowledge impacts action decisions. The data support the dual-route account from perception-to-action.
Resumo:
Observational data encodes values of properties associated with a feature of interest, estimated by a specified procedure. For water the properties are physical parameters like level, volume, flow and pressure, and concentrations and counts of chemicals, substances and organisms. Water property vocabularies have been assembled at project, agency and jurisdictional level. Organizations such as EPA, USGS, CEH, GA and BoM maintain vocabularies for internal use, and may make them available externally as text files. BODC and MMI have harvested many water vocabularies alongside others of interest in their domain, formalized the content using SKOS, and published them through web interfaces. Scope is highly variable both within and between vocabularies. Individual items may conflate multiple concerns (e.g. property, instrument, statistical procedure, units). There is significant duplication between vocabularies. Semantic web technologies provide the opportunity both to publish vocabularies more effectively, and achieve harmonization to support greater interoperability between datasets. - Models for vocabulary items (property, substance/taxon, process, unit-of-measure, etc) may be formalized OWL ontologies, supporting semantic relations between items in related vocabularies; - By specializing the ontology elements from SKOS concepts and properties, diverse vocabularies may be published through a common interface; - Properties from standard vocabularies (e.g. OWL, SKOS, PROV-O and VAEM) support mappings between vocabularies having a similar scope - Existing items from various sources may be assembled into new virtual vocabularies However, there are a number of challenges: - use of standard properties such as sameAs/exactMatch/equivalentClass require reasoning support; - items have been conceptualised as both classes and individuals, complicating the mapping mechanics; - re-use of items across vocabularies may conflict with expectations concerning URI patterns; - versioning complicates cross-references and re-use. This presentation will discuss ways to harness semantic web technologies to publish harmonized vocabularies, and will summarise how many of the challenges may be addressed.
Resumo:
This paper presents a proposal for the semantic treatment of ambiguous homographic forms in Brazilian Portuguese, and to offer linguistic strategies for its computational implementation in Systems of Natural Language Processing (SNLP). Pustejovsky's Generative Lexicon was used as a theoretical model. From this model, the Qualia Structure - QS (and the Formal, Telic, Agentive and Constitutive roles) was selected as one of the linguistic and semantic expedients for the achievement of disambiguation of homonym forms. So that analyzed and treated data could be manipulated, we elaborated a Lexical Knowledge Base (LKB) where lexical items are correlated and interconnected by different kinds of semantic relations in the QS and ontological information.
Resumo:
This paper carries out a descriptive study on Portuguese adjectives. Our aim is to describe the semantics of the legal domain adjectives in order to construct an ontology which may improve Information Retrieval Systems. For this, we present an approach based on valency and semantic relations. The ontology proposed here is a first step aiming to build a legal ontology based on top-level concepts. © AEPIA.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)