921 resultados para cross-language information retrieval


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The overall aim of our research is to develop a clinical information retrieval system that retrieves systematic reviews and underlying clinical studies from the Cochrane Library to support physician decision making. We believe that in order to accomplish this goal we need to develop a mechanism for effectively representing documents that will be retrieved by the application. Therefore, as a first step in developing the retrieval application we have developed a methodology that semi-automatically generates high quality indices and applies them as descriptors to documents from The Cochrane Library. In this paper we present a description and implementation of the automatic indexing methodology and an evaluation that demonstrates that enhanced document representation results in the retrieval of relevant documents for clinical queries. We argue that the evaluation of information retrieval applications should also include an evaluation of the quality of the representation of documents that may be retrieved. ©2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The information architecture supports information retrieval by users in Web environment. The design should be center in the information user, favoring usability. The Faculty of Industrial Engineering and Tourism of the Universidad Central "Marta Abreu" de Las Villas, lacks a site that enhances the disclosure of information to its members. Are presented as objectives of the study: 1) conduct a user survey to identify information needs of users, 2) establish guidelines for information architecture for the institution focused on users, 3) designing the information architecture for the institution and 4) designed to evaluate the proposal. Are presented as objectives of the study: 1) to realize a user study to identify the information needs of users, 2) establish guidelines for information architecture for the institution focused on users, 3) to design the information architecture for the institution and 4) to evaluate the proposal designed. To obtain results are used methods in the theoretical and empirical levels. Besides, are use techniques that favored the design and evaluation. Is designed the intranet of the Faculty of Industrial Engineering and Tourism. Is evaluated the proposed design for the validation of the results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Semantic Annotation component is a software application that provides support for automated text classification, a process grounded in a cohesion-centered representation of discourse that facilitates topic extraction. The component enables the semantic meta-annotation of text resources, including automated classification, thus facilitating information retrieval within the RAGE ecosystem. It is available in the ReaderBench framework (http://readerbench.com/) which integrates advanced Natural Language Processing (NLP) techniques. The component makes use of Cohesion Network Analysis (CNA) in order to ensure an in-depth representation of discourse, useful for mining keywords and performing automated text categorization. Our component automatically classifies documents into the categories provided by the ACM Computing Classification System (http://dl.acm.org/ccs_flat.cfm), but also into the categories from a high level serious games categorization provisionally developed by RAGE. English and French languages are already covered by the provided web service, whereas the entire framework can be extended in order to support additional languages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We build a system to support search and visualization on heterogeneous information networks. We first build our system on a specialized heterogeneous information network: DBLP. The system aims to facilitate people, especially computer science researchers, toward a better understanding and user experience about academic information networks. Then we extend our system to the Web. Our results are much more intuitive and knowledgeable than the simple top-k blue links from traditional search engines, and bring more meaningful structural results with correlated entities. We also investigate the ranking algorithm, and we show that the personalized PageRank and proposed Hetero-personalized PageRank outperform the TF-IDF ranking or mixture of TF-IDF and authority ranking. Our work opens several directions for future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

International audience

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de Mestrado, Ciências da Linguagem, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2010

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ce mémoire tente de répondre à une problématique très importante dans le domaine de recrutement : l’appariement entre offre d’emploi et candidats. Dans notre cas nous disposons de milliers d’offres d’emploi et de millions de profils ramassés sur les sites dédiés et fournis par un industriel spécialisé dans le recrutement. Les offres d’emploi et les profils de candidats sur les réseaux sociaux professionnels sont généralement destinés à des lecteurs humains qui sont les recruteurs et les chercheurs d’emploi. Chercher à effectuer une sélection automatique de profils pour une offre d’emploi se heurte donc à certaines difficultés que nous avons cherché à résoudre dans le présent mémoire. Nous avons utilisé des techniques de traitement automatique de la langue naturelle pour extraire automatiquement les informations pertinentes dans une offre d’emploi afin de construite une requête qui nous permettrait d’interroger notre base de données de profils. Pour valider notre modèle d’extraction de métier, de compétences et de d’expérience, nous avons évalué ces trois différentes tâches séparément en nous basant sur une référence cent offres d’emploi canadiennes que nous avons manuellement annotée. Et pour valider notre outil d’appariement nous avons fait évaluer le résultat de l’appariement de dix offres d’emploi canadiennes par un expert en recrutement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ce mémoire tente de répondre à une problématique très importante dans le domaine de recrutement : l’appariement entre offre d’emploi et candidats. Dans notre cas nous disposons de milliers d’offres d’emploi et de millions de profils ramassés sur les sites dédiés et fournis par un industriel spécialisé dans le recrutement. Les offres d’emploi et les profils de candidats sur les réseaux sociaux professionnels sont généralement destinés à des lecteurs humains qui sont les recruteurs et les chercheurs d’emploi. Chercher à effectuer une sélection automatique de profils pour une offre d’emploi se heurte donc à certaines difficultés que nous avons cherché à résoudre dans le présent mémoire. Nous avons utilisé des techniques de traitement automatique de la langue naturelle pour extraire automatiquement les informations pertinentes dans une offre d’emploi afin de construite une requête qui nous permettrait d’interroger notre base de données de profils. Pour valider notre modèle d’extraction de métier, de compétences et de d’expérience, nous avons évalué ces trois différentes tâches séparément en nous basant sur une référence cent offres d’emploi canadiennes que nous avons manuellement annotée. Et pour valider notre outil d’appariement nous avons fait évaluer le résultat de l’appariement de dix offres d’emploi canadiennes par un expert en recrutement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Web 2.0 ha tenido un enorme éxito gracias a la posibilidad de una interacción dinámica por parte del usuario, ya no sólo a la hora de participar en elementos colaborativos, como puedan ser los foros, sino en compartir/añadir contenido a la Web. Dos ejemplos claros de este paradigma son YouTube y Flickr. El primero hospeda la mayor parte de los vídeos que podemos encontrar en Internet, y el segundo ha creado la mayor comunidad de fotógrafos existente en la red. Ambos servicios funcionan de una forma similar, el usuario es el que aporta contenidos junto a una información asociada al mismo. Al ser comunidades internacionales, la información añadida por el usuario se realiza en diversos idiomas, por lo que la búsqueda de recursos multimedia en estos sitios es dependiente del idioma de la consulta. En este artículo, presentamos Babxel, un sistema de recuperación de información multimedia y multilingüe, nacido como proyecto de fin de carrera de Ingeniería Informática, como extensión y mejora de FlickrBabel. Babxel aprovecha la capacidad de traducción multilingüe automática para generar más resultados de búsqueda relacionado con la consulta del usuario, resultados que se obtienen de las plataformas mencionadas anteriormente.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Things change. Words change, meaning changes and use changes both words and meaning. In information access systems this means concept schemes such as thesauri or clas- sification schemes change. They always have. Concept schemes that have survived have evolved over time, moving from one version, often called an edition, to the next. If we want to manage how words and meanings - and as a conse- quence use - change in an effective manner, and if we want to be able to search across versions of concept schemes, we have to track these changes. This paper explores how we might expand SKOS, a World Wide Web Consortium (W3C) draft recommendation in order to do that kind of tracking.The Simple Knowledge Organization System (SKOS) Core Guide is sponsored by the Semantic Web Best Practices and Deployment Working Group. The second draft, edited by Alistair Miles and Dan Brickley, was issued in November 2005. SKOS is a “model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, other types of controlled vocabulary and also concept schemes embedded in glossaries and terminologies” in RDF. How SKOS handles version in concept schemes is an open issue. The current draft guide suggests using OWL and DCTERMS as mechanisms for concept scheme revision.As it stands an editor of a concept scheme can make notes or declare in OWL that more than one version exists. This paper adds to the SKOS Core by introducing a tracking sys- tem for changes in concept schemes. We call this tracking system vocabulary ontogeny. Ontogeny is a biological term for the development of an organism during its lifetime. Here we use the ontogeny metaphor to describe how vocabularies change over their lifetime. Our purpose here is to create a conceptual mechanism that will track these changes and in so doing enhance information retrieval and prevent document loss through versioning, thereby enabling persistent retrieval.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper outlines the purposes, predications, functions, and contexts of information organization frameworks; including: bibliographic control, information retrieval, resource discovery, resource description, open access scholarly indexing, personal information management protocols, and social tagging in order to compare and contrast those purposes, predications, functions, and contexts. Information organization frameworks, for the purpose of this paper, consist of information organization systems (classification schemes, taxonomies, ontologies, bibliographic descriptions, etc.), methods of conceiving of and creating the systems, and the work processes involved in maintaining these systems. The paper first outlines the theoretical literature of these information organization frameworks. In conclusion, this paper establishes the first part of an evaluation rubric for a function, predication, purpose, and context analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents our work at 2016 FIRE CHIS. Given a CHIS query and a document associated with that query, the task is to classify the sentences in the document as relevant to the query or not; and further classify the relevant sentences to be supporting, neutral or opposing to the claim made in the query. In this paper, we present two different approaches to do the classification. With the first approach, we implement two models to satisfy the task. We first implement an information retrieval model to retrieve the sentences that are relevant to the query; and then we use supervised learning method to train a classification model to classify the relevant sentences into support, oppose or neutral. With the second approach, we only use machine learning techniques to learn a model and classify the sentences into four classes (relevant & support, relevant & neutral, relevant & oppose, irrelevant & neutral). Our submission for CHIS uses the first approach.