919 resultados para Entity retrieval
Resumo:
Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
Many projects, e.g. VIKEF [13] and KIM [7], present grounded approaches for the use of entities as a means of indexing and retrieval of multimedia resources from heterogeneous sources. In this paper, we discuss the state-of-the-art of entity-centric approaches for multimedia indexing and retrieval. A summary of projects employing entity-centric repositories are portrayed. This paper also looks at the current state-of-the-art authoring environment, Macromedia Authorware, and the possibility of potential extension of this environment for entity-based multimedia authoring.
Resumo:
In questo lavoro si introducono i concetti di base di Natural Language Processing, soffermandosi su Information Extraction e analizzandone gli ambiti applicativi, le attività principali e la differenza rispetto a Information Retrieval. Successivamente si analizza il processo di Named Entity Recognition, focalizzando l’attenzione sulle principali problematiche di annotazione di testi e sui metodi per la valutazione della qualità dell’estrazione di entità. Infine si fornisce una panoramica della piattaforma software open-source di language processing GATE/ANNIE, descrivendone l’architettura e i suoi componenti principali, con approfondimenti sugli strumenti che GATE offre per l'approccio rule-based a Named Entity Recognition.
Resumo:
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.
Resumo:
Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.
Resumo:
A long-standing challenge of content-based image retrieval (CBIR) systems is the definition of a suitable distance function to measure the similarity between images in an application context which complies with the human perception of similarity. In this paper, we present a new family of distance functions, called attribute concurrence influence distances (AID), which serve to retrieve images by similarity. These distances address an important aspect of the psychophysical notion of similarity in comparisons of images: the effect of concurrent variations in the values of different image attributes. The AID functions allow for comparisons of feature vectors by choosing one of two parameterized expressions: one targeting weak attribute concurrence influence and the other for strong concurrence influence. This paper presents the mathematical definition and implementation of the AID family for a two-dimensional feature space and its extension to any dimension. The composition of the AID family with L (p) distance family is considered to propose a procedure to determine the best distance for a specific application. Experimental results involving several sets of medical images demonstrate that, taking as reference the perception of the specialist in the field (radiologist), the AID functions perform better than the general distance functions commonly used in CBIR.
Resumo:
In rats, phospholipase A(2) (PLA(2)) activity was found to be increased in the hippocampus immediately after training and retrieval of a contextual fear conditioning paradigm (step-down inhibitory avoidance [IA] task). In the present study we investigated whether PLA(2) is also activated in the cerebral cortex of rats in association with contextual fear learning and retrieval. We observed that IA training induces a rapid (immediately after training) and long-lasting (3 h after training) activation of PLA(2) in both frontal and parietal cortices. However, immediately after retrieval (measured 24 h after training), PLA(2) activity was increased just in the parietal cortex. These findings suggest that PLA(2) activity is differentially required in the frontal and parietal cortices for the mechanisms of contextual learning and retrieval. Because reduced brain PLA(2) activity has been reported in Alzheimer disease, our results suggest that stimulation of PLA(2) activity may offer new treatment strategies for this disease.
Resumo:
Objective: To determine the degree of knowledge that cardiologists from Sao Paulo, Brazil, have regarding a low-prevalent entity associated with a high rate of sudden death-Brugada syndrome. Methods: Two hundred forty-four cardiologists were interviewed by an instrument divided in two parts: in the first, we recorded gender, age, and data related to academic profile. The second-answered only by the professionals that manifested having some degree of knowledge on the syndrome-had 28 questions that evaluated their knowledge. The answers were spontaneous and they did not have a chance to consult. We used uni- and multivariate analysis on the average percentage of right and wrong answers, and the influence of the academic profile. Results: The predominant gender was the male gender (61.1%), the average age was 44.32 +/- 10.83 years, 40% with more than 20 years after obtaining their degree, 44% were educated in public institutions, 69% had a residency in cardiology, 20% had overseas practice, 12% had postdegree, 41% were linked to an educational institution, 24% with publication(s) in an indexed journal, 17.2% were authors of chapters in books, 2.5% had edited books, and 10% were linked to the Brazilian Society of Cardiac Arrhythmias. The average percentage of right answers was 45.7%. Conclusion: The sample studied revealed a little knowledge on the entity. A residency in cardiology was the factor of greater significance in the percentage of right answers. Other significant factors were the link of the interviewed person to an educational institution, or the Brazilian Society of Cardiac Arrhythmias, and having a specialist degree.
Resumo:
In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The present study compared two heating methods currently used for antigen retrieval (AR) immunostaining: the microwave oven and the steam cooker. Myosin-V, a molecular motor involved in vesicle transport, was used as a neuronal marker in honeybee Apis mellifera brains fixed in formalin. Overall, the steam cooker showed the most satisfactory AR results. At 100 degrees C, tissue morphology was maintained and revealed epitope recovery, while evaporation of the AR solution was markedly reduced; this is important for stabilizing the sodium citrate molarity of the AR buffer and reducing background effects. Standardization of heat-mediated AR of formalin-fixed and paraffin-embedded tissue sections results in more reliable immunostaining of the honeybee brain.
Resumo:
Studies of delayed nonmatching-to-sample (DNMS) performance following lesions of the monkey cortex have revealed a critical circuit of brain regions involved in forming memories and retaining and retrieving stimulus representations. Using event-related functional magnetic resonance imaging (fMRI), we measured brain activity in 10 healthy human participants during performance of a trial-unique visual DNMS task using novel barcode stimuli. The event-related design enabled the identification of activity during the different phases of the task (encoding, retention, and retrieval). Several brain regions identified by monkey studies as being important for successful DNMS performance showed selective activity during the different phases, including the mediodorsal thalamic nucleus (encoding), ventrolateral prefrontal cortex (retention), and perirhinal cortex (retrieval). Regions showing sustained activity within trials included the ventromedial and dorsal prefrontal cortices and occipital cortex. The present study shows the utility of investigating performance on tasks derived from animal models to assist in the identification of brain regions involved in human recognition memory.
Resumo:
This paper discusses a document discovery tool based on Conceptual Clustering by Formal Concept Analysis. The program allows users to navigate e-mail using a visual lattice metaphor rather than a tree. It implements a virtual. le structure over e-mail where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in e-mail discovery. The system described provides more flexibility in retrieving stored e-mails than what is normally available in e-mail clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems and aid knowledge discovery in document collections.
Integration of an automatic storage and retrieval system (ASRS) in a discrete-part automation system
Resumo:
This technical report describes the work carried out in a project within the ERASMUS programme. The objective of this project was the Integration of an Automatic Warehouse in a Discrete-Part Automation System. The discrete-part automation system located at the LASCRI (Critical Systems) laboratory at ISEP was extended with automatic storage and retrieval of the manufacturing parts, through the integration of an automatic warehouse and an automatic guided vehicle (AGV).