872 resultados para Semantic extraction


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The number of connected devices collecting and distributing real-world information through various systems, is expected to soar in the coming years. As the number of such connected devices grows, it becomes increasingly difficult to store and share all these new sources of information. Several context representation schemes try to standardize this information, but none of them have been widely adopted. In previous work we addressed this challenge, however our solution had some drawbacks: poor semantic extraction and scalability. In this paper we discuss ways to efficiently deal with representation schemes' diversity and propose a novel d-dimension organization model. Our evaluation shows that d-dimension model improves scalability and semantic extraction.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Based in internet growth, through semantic web, together with communication speed improvement and fast development of storage device sizes, data and information volume rises considerably every day. Because of this, in the last few years there has been a growing interest in structures for formal representation with suitable characteristics, such as the possibility to organize data and information, as well as the reuse of its contents aimed for the generation of new knowledge. Controlled Vocabulary, specifically Ontologies, present themselves in the lead as one of such structures of representation with high potential. Not only allow for data representation, as well as the reuse of such data for knowledge extraction, coupled with its subsequent storage through not so complex formalisms. However, for the purpose of assuring that ontology knowledge is always up to date, they need maintenance. Ontology Learning is an area which studies the details of update and maintenance of ontologies. It is worth noting that relevant literature already presents first results on automatic maintenance of ontologies, but still in a very early stage. Human-based processes are still the current way to update and maintain an ontology, which turns this into a cumbersome task. The generation of new knowledge aimed for ontology growth can be done based in Data Mining techniques, which is an area that studies techniques for data processing, pattern discovery and knowledge extraction in IT systems. This work aims at proposing a novel semi-automatic method for knowledge extraction from unstructured data sources, using Data Mining techniques, namely through pattern discovery, focused in improving the precision of concept and its semantic relations present in an ontology. In order to verify the applicability of the proposed method, a proof of concept was developed, presenting its results, which were applied in building and construction sector.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we present a description of the role of definitional verbal patterns for the extraction of semantic relations. Several studies show that semantic relations can be extracted from analytic definitions contained in machine-readable dictionaries (MRDs). In addition, definitions found in specialised texts are a good starting point to search for different types of definitions where other semantic relations occur. The extraction of definitional knowledge from specialised corpora represents another interesting approach for the extraction of semantic relations. Here, we present a descriptive analysis of definitional verbal patterns in Spanish and the first steps towards the development of a system for the automatic extraction of definitional knowledge.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work briefly analyses the difficulties to adopt the Semantic Web, and in particular proposes systems to know the present level of migration to the different technologies that make up the Semantic Web. It focuses on the presentation and description of two tools, DigiDocSpider and DigiDocMetaEdit, designed with the aim of verifYing, evaluating, and promoting its implementation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Even though the digital processing of documents is increasingly widespread in industry, printed documents are still largely in use. In order to process electronically the contents of printed documents, information must be extracted from digital images of documents. When dealing with complex documents, in which the contents of different regions and fields can be highly heterogeneous with respect to layout, printing quality and the utilization of fonts and typing standards, the reconstruction of the contents of documents from digital images can be a difficult problem. In the present article we present an efficient solution for this problem, in which the semantic contents of fields in a complex document are extracted from a digital image.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Clinical text understanding (CTU) is of interest to health informatics because critical clinical information frequently represented as unconstrained text in electronic health records are extensively used by human experts to guide clinical practice, decision making, and to document delivery of care, but are largely unusable by information systems for queries and computations. Recent initiatives advocating for translational research call for generation of technologies that can integrate structured clinical data with unstructured data, provide a unified interface to all data, and contextualize clinical information for reuse in multidisciplinary and collaborative environment envisioned by CTSA program. This implies that technologies for the processing and interpretation of clinical text should be evaluated not only in terms of their validity and reliability in their intended environment, but also in light of their interoperability, and ability to support information integration and contextualization in a distributed and dynamic environment. This vision adds a new layer of information representation requirements that needs to be accounted for when conceptualizing implementation or acquisition of clinical text processing tools and technologies for multidisciplinary research. On the other hand, electronic health records frequently contain unconstrained clinical text with high variability in use of terms and documentation practices, and without commitmentto grammatical or syntactic structure of the language (e.g. Triage notes, physician and nurse notes, chief complaints, etc). This hinders performance of natural language processing technologies which typically rely heavily on the syntax of language and grammatical structure of the text. This document introduces our method to transform unconstrained clinical text found in electronic health information systems to a formal (computationally understandable) representation that is suitable for querying, integration, contextualization and reuse, and is resilient to the grammatical and syntactic irregularities of the clinical text. We present our design rationale, method, and results of evaluation in processing chief complaints and triage notes from 8 different emergency departments in Houston Texas. At the end, we will discuss significance of our contribution in enabling use of clinical text in a practical bio-surveillance setting.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Folksonomies emerge as the result of the free tagging activity of a large number of users over a variety of resources. They can be considered as valuable sources from which it is possible to obtain emerging vocabularies that can be leveraged in knowledge extraction tasks. However, when it comes to understanding the meaning of tags in folksonomies, several problems mainly related to the appearance of synonymous and ambiguous tags arise, specifically in the context of multilinguality. The authors aim to turn folksonomies into knowledge structures where tag meanings are identified, and relations between them are asserted. For such purpose, they use DBpedia as a general knowledge base from which they leverage its multilingual capabilities.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the main arguments, and they are aligned among languages. In this paper we present an automatic system for the extraction and alignment of syntactic semantic patterns from two manually annotated corpora, and evaluate the main linguistic problems that we must deal with in the alignment process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyzes and evaluates, in the context of Ontology learning, some techniques to identify and extract candidate terms to classes of a taxonomy. Besides, this work points out some inconsistencies that may be occurring in the preprocessing of text corpus, and proposes techniques to obtain good terms candidate to classes of a taxonomy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acquiring lexical information is a complex problem, typically approached by relying on a number of contexts to contribute information for classification. One of the first issues to address in this domain is the determination of such contexts. The work presented here proposes the use of automatically obtained FORMAL role descriptors as features used to draw nouns from the same lexical semantic class together in an unsupervised clustering task. We have dealt with three lexical semantic classes (HUMAN, LOCATION and EVENT) in English. The results obtained show that it is possible to discriminate between elements from different lexical semantic classes using only FORMAL role information, hence validating our initial hypothesis. Also, iterating our method accurately accounts for fine-grained distinctions within lexical classes, namely distinctions involving ambiguous expressions. Moreover, a filtering and bootstrapping strategy employed in extracting FORMAL role descriptors proved to minimize effects of sparse data and noise in our task.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic creation of polarity lexicons is a crucial issue to be solved in order to reduce time andefforts in the first steps of Sentiment Analysis. In this paper we present a methodology based onlinguistic cues that allows us to automatically discover, extract and label subjective adjectivesthat should be collected in a domain-based polarity lexicon. For this purpose, we designed abootstrapping algorithm that, from a small set of seed polar adjectives, is capable to iterativelyidentify, extract and annotate positive and negative adjectives. Additionally, the methodautomatically creates lists of highly subjective elements that change their prior polarity evenwithin the same domain. The algorithm proposed reached a precision of 97.5% for positiveadjectives and 71.4% for negative ones in the semantic orientation identification task.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past, research in ontology learning from text has mainly focused on entity recognition, taxonomy induction and relation extraction. In this work we approach a challenging research issue: detecting semantic frames from texts and using them to encode web ontologies. We exploit a new generation Natural Language Processing technology for frame detection, and we enrich the frames acquired so far with argument restrictions provided by a super-sense tagger and domain specializations. The results are encoded according to a Linguistic MetaModel, which allows a complete translation of lexical resources and data acquired from text, enabling custom transformations of the enriched frames into modular ontology components.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.