877 resultados para Natural Language


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This project was a step forward in developing and evaluating a novel, mathematical model that can deduce the meaning of words based on their use in language. This model can be applied to a wide range of natural language applications, including the information seeking process most of us undertake on a daily basis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The INEX workshop is concerned with evaluating the effectiveness of XML retrieval systems. In 2004 a natural language query task was added to the INEX Ad hoc track. Standard INEX Ad hoc topic titles are specified in NEXI -- a simplified and restricted subset of XPath, with a similar feel, and yet with a distinct IR flavour and interpretation. The syntax of NEXI is rigid and it imposes some limitations on the kind of information need that it can faithfully capture. At INEX 2004 the NLP question to be answered was simple -- is it practical to use a natural language query that is the equivalent of the formal NEXI title? The results of this experiment are reported and some information on the future direction of the NLP task is presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Process models expressed in BPMN typically rely on a small subset of all available symbols. In our 2008 study, we examined the composition of these subsets, and found that the distribution of BPMN symbols in practice closely resembles the frequency distribution of words in natural language. We offered some suggestions based on our findings, how to make the use of BPMN more manageable and also outlined ideas for further development of BPMN. Since this paper was published it has provoked spirited debate in the BPM practitioner community, prompted the definition of a modeling standard in US government, and helped shape the next generation of the BPMN standard.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many successful query expansion techniques ignore information about the term dependencies that exist within natural language. However, researchers have recently demonstrated that consistent and significant improvements in retrieval effectiveness can be achieved by explicitly modelling term dependencies within the query expansion process. This has created an increased interest in dependency-based models. State-of-the-art dependency-based approaches primarily model term associations known within structural linguistics as syntagmatic associations, which are formed when terms co-occur together more often than by chance. However, structural linguistics proposes that the meaning of a word is also dependent on its paradigmatic associations, which are formed between words that can substitute for each other without effecting the acceptability of a sentence. Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search. The results demonstrate that this approach can provide significant improvements in web re- trieval effectiveness when compared to a strong benchmark retrieval system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1. The goal of this task is to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus. This paper details our participation to this ShARe/CLEF task. Our approaches are based on using the clinical natural language processing tool Metamap and Conditional Random Fields (CRF) to individuate mentions of disorders and then to map those to SNOMED CT concepts. Empirical results obtained on the 2013 ShARe/CLEF task highlight that our instance of Metamap (after ltering irrelevant semantic types), although achieving a high level of precision, is only able to identify a small amount of disorders (about 21% to 28%) from free-text health records. On the other hand, the addition of the CRF models allows for a much higher recall (57% to 79%) of disorders from free-text, without sensible detriment in precision. When evaluating the accuracy of the mapping of disorders to SNOMED CT concepts in the UMLS, we observe that the mapping obtained by our ltered instance of Metamap delivers state-of-the-art e ectiveness if only spans individuated by our system are considered (`relaxed' accuracy).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Expert searchers engage with information as information brokers, researchers, reference librarians, information architects, faculty who teach advanced search, and in a variety of other information-intensive professions. Their experiences are characterized by a profound understanding of information concepts and skills and they have an agile ability to apply this knowledge to interacting with and having an impact on the information environment. This study explored the learning experiences of searchers to understand the acquisition of search expertise. The research question was: What can be learned about becoming an expert searcher from the learning experiences of proficient novice searchers and highly experienced searchers? The key objectives were: (1) to explore the existence of threshold concepts in search expertise; (2) to improve our understanding of how search expertise is acquired and how novice searchers, intent on becoming experts, can learn to search in more expertlike ways. The participant sample drew from two population groups: (1) highly experienced searchers with a minimum of 20 years of relevant professional experience, including LIS faculty who teach advanced search, information brokers, and search engine developers (11 subjects); and (2) MLIS students who had completed coursework in information retrieval and online searching and demonstrated exceptional ability (9 subjects). Using these two groups allowed a nuanced understanding of the experience of learning to search in expertlike ways, with data from those who search at a very high level as well as those who may be actively developing expertise. The study used semi-structured interviews, search tasks with think-aloud narratives, and talk-after protocols. Searches were screen-captured with simultaneous audio-recording of the think-aloud narrative. Data were coded and analyzed using NVivo9 and manually. Grounded theory allowed categories and themes to emerge from the data. Categories represented conceptual knowledge and attributes of expert searchers. In accord with grounded theory method, once theoretical saturation was achieved, during the final stage of analysis the data were viewed through lenses of existing theoretical frameworks. For this study, threshold concept theory (Meyer & Land, 2003) was used to explore which concepts might be threshold concepts. Threshold concepts have been used to explore transformative learning portals in subjects ranging from economics to mathematics. A threshold concept has five defining characteristics: transformative (causing a shift in perception), irreversible (unlikely to be forgotten), integrative (unifying separate concepts), troublesome (initially counter-intuitive), and may be bounded. Themes that emerged provided evidence of four concepts which had the characteristics of threshold concepts. These were: information environment: the total information environment is perceived and understood; information structures: content, index structures, and retrieval algorithms are understood; information vocabularies: fluency in search behaviors related to language, including natural language, controlled vocabulary, and finesse using proximity, truncation, and other language-based tools. The fourth threshold concept was concept fusion, the integration of the other three threshold concepts and further defined by three properties: visioning (anticipating next moves), being light on one's 'search feet' (dancing property), and profound ontological shift (identity as searcher). In addition to the threshold concepts, findings were reported that were not concept-based, including praxes and traits of expert searchers. A model of search expertise is proposed with the four threshold concepts at its core that also integrates the traits and praxes elicited from the study, attributes which are likewise long recognized in LIS research as present in professional searchers. The research provides a deeper understanding of the transformative learning experiences involved in the acquisition of search expertise. It adds to our understanding of search expertise in the context of today's information environment and has implications for teaching advanced search, for research more broadly within library and information science, and for methodologies used to explore threshold concepts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: To develop a system for the automatic classification of pathology reports for Cancer Registry notifications. Method: A two pass approach is proposed to classify whether pathology reports are cancer notifiable or not. The first pass queries pathology HL7 messages for known report types that are received by the Queensland Cancer Registry (QCR), while the second pass aims to analyse the free text reports and identify those that are cancer notifiable. Cancer Registry business rules, natural language processing and symbolic reasoning using the SNOMED CT ontology were adopted in the system. Results: The system was developed on a corpus of 500 histology and cytology reports (with 47% notifiable reports) and evaluated on an independent set of 479 reports (with 52% notifiable reports). Results show that the system can reliably classify cancer notifiable reports with a sensitivity, specificity, and positive predicted value (PPV) of 0.99, 0.95, and 0.95, respectively for the development set, and 0.98, 0.96, and 0.96 for the evaluation set. High sensitivity can be achieved at a slight expense in specificity and PPV. Conclusion: The system demonstrates how medical free-text processing enables the classification of cancer notifiable pathology reports with high reliability for potential use by Cancer Registries and pathology laboratories.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this research is to report initial experimental results and evaluation of a clinician-driven automated method that can address the issue of misdiagnosis from unstructured radiology reports. Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to disperse information resources and vast amounts of manual processing of unstructured information, a point-of-care accurate diagnosis is often difficult. A rule-based method that considers the occurrence of clinician specified keywords related to radiological findings was developed to identify limb abnormalities, such as fractures. A dataset containing 99 narrative reports of radiological findings was sourced from a tertiary hospital. The rule-based method achieved an F-measure of 0.80 and an accuracy of 0.80. While our method achieves promising performance, a number of avenues for improvement were identified using advanced natural language processing (NLP) techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Searching for health advice on the web is becoming increasingly common. Because of the great importance of this activity for patients and clinicians and the effect that incorrect information may have on health outcomes, it is critical to present relevant and valuable information to a searcher. Previous evaluation campaigns on health information retrieval (IR) have provided benchmarks that have been widely used to improve health IR and record these improvements. However, in general these benchmarks have targeted the specialised information needs of physicians and other healthcare workers. In this paper, we describe the development of a new collection for evaluation of effectiveness in IR seeking to satisfy the health information needs of patients. Our methodology features a novel way to create statements of patients’ information needs using realistic short queries associated with patient discharge summaries, which provide details of patient disorders. We adopt a scenario where the patient then creates a query to seek information relating to these disorders. Thus, discharge summaries provide us with a means to create contextually driven search statements, since they may include details on the stage of the disease, family history etc. The collection will be used for the first time as part of the ShARe/-CLEF 2013 eHealth Evaluation Lab, which focuses on natural language processing and IR for clinical care.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to dispersed information resources and a vast amount of manual processing of unstructured information, accurate point-of-care diagnosis is often difficult. Aims The aim of this research is to report initial experimental evaluation of a clinician-informed automated method for the issue of initial misdiagnoses associated with delayed receipt of unstructured radiology reports. Method A method was developed that resembles clinical reasoning for identifying limb abnormalities. The method consists of a gazetteer of keywords related to radiological findings; the method classifies an X-ray report as abnormal if it contains evidence contained in the gazetteer. A set of 99 narrative reports of radiological findings was sourced from a tertiary hospital. Reports were manually assessed by two clinicians and discrepancies were validated by a third expert ED clinician; the final manual classification generated by the expert ED clinician was used as ground truth to empirically evaluate the approach. Results The automated method that attempts to individuate limb abnormalities by searching for keywords expressed by clinicians achieved an F-measure of 0.80 and an accuracy of 0.80. Conclusion While the automated clinician-driven method achieved promising performances, a number of avenues for improvement were identified using advanced natural language processing (NLP) and machine learning techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. Aims In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated. Method A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes. Results Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall F-measure of 0.9866 when evaluated on a set of 5,000 free-text death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers. Conclusion The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Text is the main method of communicating information in the digital age. Messages, blogs, news articles, reviews, and opinionated information abounds on the Internet. People commonly purchase products online and post their opinions about purchased items. This feedback is displayed publicly to assist others with their purchasing decisions, creating the need for a mechanism with which to extract and summarize useful information for enhancing the decision-making process. Our contribution is to improve the accuracy of extraction by combining different techniques from three major areas, named Data Mining, Natural Language Processing techniques and Ontologies. The proposed framework sequentially mines product’s aspects and users’ opinions, groups representative aspects by similarity, and generates an output summary. This paper focuses on the task of extracting product aspects and users’ opinions by extracting all possible aspects and opinions from reviews using natural language, ontology, and frequent “tag” sets. The proposed framework, when compared with an existing baseline model, yielded promising results.