988 resultados para Named entity recognition


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

O Reconhecimento de Entidades Mencionadas tem como objectivo identificar e classificar entidades, baseando-se em determinadas categorias ou etiquetas, contidas em textos escritos em linguagem natural. O Sistema de Reconhecimento de Entidades Mencionadas implementado na elaboração desta Dissertação pretende identificar localidades presentes em textos informais e definir para cada localidade identificada uma das etiquetas “aldeia", "vila" ou “cidade" numa primeira aproximação ao problema. Numa segunda aproximação tiveram-se em conta as etiquetas "freguesia", "concelho" e "distrito". Para a obtenção das classificações das entidades procedeu-se a uma análise estatística do número de resultados obtidos numa pesquisa de uma entidade precedida por uma etiqueta usando o motor de pesquisa Google Search. ABSTRACT: Named Entitity Recognition has the objective of identifying and classifying entities, according to certain categories or labels, contained in texts written in natural language. The Named Entitity Recognition system implemented in the developing of this dissertation intends to identify localities in informal texts, setting for each one of these localities identified one of the labels "aldeia", ''vila" or "cidade" in a first approach to the problem. ln a second approach the labels "freguesia", "concelho" and "distrito" were taken in consideration. To obtain classifications for the entities a statistical analysis of the number of results returned by a search of an entity preceded by a label using Google search engine was performed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tagging has become one of the key activities in next generation websites which allow users selecting short labels to annotate, manage, and share multimedia information such as photos, videos and bookmarks. Tagging does not require users any prior training before participating in the annotation activities as they can freely choose any terms which best represent the semantic of contents without worrying about any formal structure or ontology. However, the practice of free-form tagging can lead to several problems, such as synonymy, polysemy and ambiguity, which potentially increase the complexity of managing the tags and retrieving information. To solve these problems, this research aims to construct a lightweight indexing scheme to structure tags by identifying and disambiguating the meaning of terms and construct a knowledge base or dictionary. News has been chosen as the primary domain of application to demonstrate the benefits of using structured tags for managing the rapidly changing and dynamic nature of news information. One of the main outcomes of this work is an automatically constructed vocabulary that defines the meaning of each named entity tag, which can be extracted from a news article (including person, location and organisation), based on experts suggestions from major search engines and the knowledge from public database such as Wikipedia. To demonstrate the potential applications of the vocabulary, we have used it to provide more functionalities in an online news website, including topic-based news reading, intuitive tagging, clipping and sharing of interesting news, as well as news filtering or searching based on named entity tags. The evaluation results on the impact of disambiguating tags have shown that the vocabulary can help to significantly improve news searching performance. The preliminary results from our user study have demonstrated that users can benefit from the additional functionalities on the news websites as they are able to retrieve more relevant news, clip and share news with friends and families effectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La capacità di estrarre entità da testi, collegarle tra loro ed eliminare possibili ambiguità tra di esse è uno degli obiettivi del Web Semantico. Chiamato anche Web 3.0, esso presenta numerose innovazioni volte ad arricchire il Web con dati strutturati comprensibili sia dagli umani che dai calcolatori. Nel reperimento di questi temini e nella definizione delle entities è di fondamentale importanza la loro univocità. Il nostro orizzonte di lavoro è quello delle università italiane e le entities che vogliamo estrarre, collegare e rendere univoche sono nomi di professori italiani. L’insieme di informazioni di partenza, per sua natura, vede la presenza di ambiguità. Attenendoci il più possibile alla sua semantica, abbiamo studiato questi dati ed abbiamo risolto le collisioni presenti sui nomi dei professori. Arald, la nostra architettura software per il Web Semantico, estrae entità e le collega, ma soprattutto risolve ambiguità e omonimie tra i professori delle università italiane. Per farlo si appoggia alla semantica dei loro lavori accademici e alla rete di coautori desumibile dagli articoli da loro pubblicati, rappresentati tramite un data cluster. In questo docu delle università italiane e le entities che vogliamo estrarre, collegare e rendere univoche sono nomi di professori italiani. Partendo da un insieme di informazioni che, per sua natura, vede la presenza di ambiguità, lo abbiamo studiato attenendoci il più possibile alla sua semantica, ed abbiamo risolto le collisioni che accadevano sui nomi dei professori. Arald, la nostra architettura software per il Web Semantico, estrae entità, le collega, ma soprattutto risolve ambiguità e omonimie tra i professori delle università italiane. Per farlo si appoggia alla semantica dei loro lavori accademici e alla rete di coautori desumibile dagli articoli da loro pubblicati tramite la costruzione di un data cluster.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents the 2006 Miracle team’s approaches to the Ad-Hoc and Geographical Information Retrieval tasks. A first set of runs was obtained using a set of basic components. Then, by putting together special combinations of these runs, an extended set was obtained. With respect to previous campaigns some improvements have been introduced in our system: an entity recognition prototype is integrated in our tokenization scheme, and the performance of our indexing and retrieval engine has been improved. For GeoCLEF, we tested retrieving using geo-entity and textual references separately, and then combining them with different approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. ^ Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. ^ The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. ^ In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents work done at Medical Minner Project on the TREC-2011 Medical Records Track. The paper proposes four models for medical information retrieval based on Lucene index approach. Our retrieval engine used an Lucen Index scheme with traditional stopping and stemming, enhanced with entity recognition software on query terms. Our aim in this first competition is to set a broader project that involves the develop of a configurable Apache Lucene-based framework that allows the rapid development of medical search facilities. Results around the track median have been achieved. In this exploratory track, we think that these results are a good beginning and encourage us for future developments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Faunal vocalisations are vital indicators for environmental change and faunal vocalisation analysis can provide information for answering ecological questions. Therefore, automated species recognition in environmental recordings has become a critical research area. This thesis presents an automated species recognition approach named Timed and Probabilistic Automata. A small lexicon for describing animal calls is defined, six algorithms for acoustic component detection are developed, and a series of species recognisers are built and evaluated.The presented automated species recognition approach yields significant improvement on the analysis performance over a real world dataset, and may be transferred to commercial software in the future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biomimetic pattern recognition has been proposed for several years, but the discussion of its neuron was not very wide and deep. In this paper, we propose a new more complex neuron named M-neuron and give the application in the last part of the paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Peptidoglycan recognition protein (PGRP) specifically binds to peptidoglycan and plays a crucial role in the innate immune responses as a pattern recognition receptor (PRR). The cDNA of a short type PGRP was cloned from scallop Chlamys farreri (named CfPGRP-SI) by homology cloning with degenerate primers, and confirmed by virtual Northern blots. The full length of CfPGRP-SI cDNA was 1073 bp in length, including a 5 ' untranslated region (UTR) of 59 bp, a 3 ' UTR of 255 bp, and an open reading frame (ORF) of 759 bp encoding a polypeptide of 252 amino acids with an estimated molecular mass of 27.88 kDa and a predicted isoelectric point of 8.69. BLAST analysis revealed that CfPGRP-S1 shared high identities with other known PGRPs. A conserved PGRP domain and three zinc-binding sites were present at its C-terminus. The temporal expression of QPGRP-S1 gene in healthy, Vibrio anguillarum-challenged and Micrococcus lysodeikticus-challenged scallops was measured by RT-PCR analysis. The expression of CfPGRP-S1 was upregulated initially in the first 12 h or 24 h either by M. lysodeikticus or V. anguillarum challenge and reached the maximum level at 24 h or 36 h, then dropped progressively, and recovered to the original level as the stimulation decreased at 72 h. There was no significant difference between V. anguillarum and M. lysodeikticus challenge. The results indicated that the CfPGRP-S1 was a constitutive and inducible acute-phase protein which was involved in the immune response against bacterial infection. (c) 2007 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Both animals and mobile robots, or animats, need adaptive control systems to guide their movements through a novel environment. Such control systems need reactive mechanisms for exploration, and learned plans to efficiently reach goal objects once the environment is familiar. How reactive and planned behaviors interact together in real time, and arc released at the appropriate times, during autonomous navigation remains a major unsolved problern. This work presents an end-to-end model to address this problem, named SOVEREIGN: A Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation system. The model comprises several interacting subsystems, governed by systems of nonlinear differential equations. As the animat explores the environment, a vision module processes visual inputs using networks that arc sensitive to visual form and motion. Targets processed within the visual form system arc categorized by real-time incremental learning. Simultaneously, visual target position is computed with respect to the animat's body. Estimates of target position activate a motor system to initiate approach movements toward the target. Motion cues from animat locomotion can elicit orienting head or camera movements to bring a never target into view. Approach and orienting movements arc alternately performed during animat navigation. Cumulative estimates of each movement, based on both visual and proprioceptive cues, arc stored within a motor working memory. Sensory cues are stored in a parallel sensory working memory. These working memories trigger learning of sensory and motor sequence chunks, which together control planned movements. Effective chunk combinations arc selectively enhanced via reinforcement learning when the animat is rewarded. The planning chunks effect a gradual transition from reactive to planned behavior. The model can read-out different motor sequences under different motivational states and learns more efficient paths to rewarded goals as exploration proceeds. Several volitional signals automatically gate the interactions between model subsystems at appropriate times. A 3-D visual simulation environment reproduces the animat's sensory experiences as it moves through a simplified spatial environment. The SOVEREIGN model exhibits robust goal-oriented learning of sequential motor behaviors. Its biomimctic structure explicates a number of brain processes which are involved in spatial navigation.