939 resultados para Focussed retrieval


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Australian e-Health Research Centre and Queensland University of Technology recently participated in the TREC 2012 Medical Records Track. This paper reports on our methods, results and experience using an approach that exploits the concept and inter-concept relationships defined in the SNOMED CT medical ontology. Our concept-based approach is intended to overcome specific challenges in searching medical records, namely vocabulary mismatch and granularity mismatch. Queries and documents are transformed from their term-based originals into medical concepts as defined by the SNOMED CT ontology, this is done to tackle vocabulary mismatch. In addition, we make use of the SNOMED CT parent-child `is-a' relationships between concepts to weight documents that contained concept subsumed by the query concepts; this is done to tackle the problem of granularity mismatch. Finally, we experiment with other SNOMED CT relationships besides the is-a relationship to weight concepts related to query concepts. Results show our concept-based approach performed significantly above the median in all four performance metrics. Further improvements are achieved by the incorporation of weighting subsumed concepts, overall leading to improvement above the median of 28% infAP, 10% infNDCG, 12% R-prec and 7% Prec@10. The incorporation of other relations besides is-a demonstrated mixed results, more research is required to determined which SNOMED CT relationships are best employed when weighting related concepts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

IT-supported field data management benefits on-site construction management by improving accessibility to the information and promoting efficient communication between project team members. However, most of on-site safety inspections still heavily rely on subjective judgment and manual reporting processes and thus observers’ experiences often determine the quality of risk identification and control. This study aims to develop a methodology to efficiently retrieve safety-related information so that the safety inspectors can easily access to the relevant site safety information for safer decision making. The proposed methodology consists of three stages: (1) development of a comprehensive safety database which contains information of risk factors, accident types, impact of accidents and safety regulations; (2) identification of relationships among different risk factors based on statistical analysis methods; and (3) user-specified information retrieval using data mining techniques for safety management. This paper presents an overall methodology and preliminary results of the first stage research conducted with 101 accident investigation reports.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval. Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology. In our graph-based concept representation, concepts are vertices in a graph built from a document, edges represent associations between concepts. This representation naturally captures dependencies between concepts, an important requirement for interpreting medical text, and a feature lacking in bag-of-words representations. We apply existing graph-based term weighting methods to weight medical concepts. Using concepts rather than terms addresses vocabulary mismatch as well as encapsulates terms belonging to a single medical entity into a single concept. In addition, we further extend previous graph-based approaches by injecting domain knowledge that estimates the importance of a concept within the global medical domain. Retrieval experiments on the TREC Medical Records collection show our method outperforms both term and concept baselines. More generally, this work provides a means of integrating background knowledge contained in medical ontologies into data-driven information retrieval approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper gives an overview of the INEX 2011 Snippet Retrieval Track. The goal of the Snippet Retrieval Track is to provide a common forum for the evaluation of the effectiveness of snippets, and to investigate how best to generate snippets for search results, which should provide the user with sufficient information to determine whether the underlying document is relevant. We discuss the setup of the track, and the evaluation results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

On August 16, 2012 the SIGIR 2012 Workshop on Open Source Information Retrieval was held as part of the SIGIR 2012 conference in Portland, Oregon, USA. There were 2 invited talks, one from industry and one from academia. There were 6 full papers and 6 short papers presented as well as demonstrations of 4 open source tools. Finally there was a lively discussion on future directions for the open source Information Retrieval community. This contribution discusses the events of the workshop and outlines future directions for the community.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This project was a step forward in developing and evaluating a novel, mathematical model that can deduce the meaning of words based on their use in language. This model can be applied to a wide range of natural language applications, including the information seeking process most of us undertake on a daily basis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Information skills instruction for research candidates bas recently been formalised as coursework at the Queensland University of Technology. Feedback solicited from participants suggests that students benefit from such coursework in a number of ways. Their perception of the value of specific content areas to their literature review and thesis presentation is favourable. A small group of students who participated in Interviews identified five ways in which the coursework assisted the research process. As Instructors continue to work with the post·graduate community it would be useful to deepen our understanding of how such instruction is perceived and the benefits which can be derived from it.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Big Data is a rising IT trend similar to cloud computing, social networking or ubiquitous computing. Big Data can offer beneficial scenarios in the e-health arena. However, one of the scenarios can be that Big Data needs to be kept secured for a long period of time in order to gain its benefits such as finding cures for infectious diseases and protecting patient privacy. From this connection, it is beneficial to analyse Big Data to make meaningful information while the data is stored securely. Therefore, the analysis of various database encryption techniques is essential. In this study, we simulated 3 types of technical environments, namely, Plain-text, Microsoft Built-in Encryption, and custom Advanced Encryption Standard, using Bucket Index in Data-as-a-Service. The results showed that custom AES-DaaS has a faster range query response time than MS built-in encryption. Furthermore, while carrying out the scalability test, we acknowledged that there are performance thresholds depending on physical IT resources. Therefore, for the purpose of efficient Big Data management in eHealth it is noteworthy to examine their scalability limits as well even if it is under a cloud computing environment. In addition, when designing an e-health database, both patient privacy and system performance needs to be dealt as top priorities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Bluetooth technology is being increasingly used to track vehicles throughout their trips, within urban networks and across freeway stretches. One important opportunity offered by this type of data is the measurement of Origin-Destination patterns, emerging from the aggregation and clustering of individual trips. In order to obtain accurate estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. These issues mainly stem from the use of the Bluetooth technology amongst drivers, and the physical properties of the Bluetooth sensors themselves. First, not all cars are equipped with discoverable Bluetooth devices and the Bluetooth-enabled vehicles may belong to some small socio-economic groups of users. Second, the Bluetooth datasets include data from various transport modes; such as pedestrian, bicycles, cars, taxi driver, buses and trains. Third, the Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be extracted from the data. Finally, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold. We first give a comprehensive overview of the aforementioned issues. Further, we propose a methodology that can be followed, in order to cleanse, correct and aggregate Bluetooth data. We postulate that the methods introduced by this paper are the first crucial steps that need to be followed in order to compute accurate Origin-Destination matrices in urban road networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Process-Aware Information Systems (PAISs) support executions of operational processes that involve people, resources, and software applications on the basis of process models. Process models describe vast, often infinite, amounts of process instances, i.e., workflows supported by the systems. With the increasing adoption of PAISs, large process model repositories emerged in companies and public organizations. These repositories constitute significant information resources. Accurate and efficient retrieval of process models and/or process instances from such repositories is interesting for multiple reasons, e.g., searching for similar models/instances, filtering, reuse, standardization, process compliance checking, verification of formal properties, etc. This paper proposes a technique for indexing process models that relies on their alternative representations, called untanglings. We show the use of untanglings for retrieval of process models based on process instances that they specify via a solution to the total executability problem. Experiments with industrial process models testify that the proposed retrieval approach is up to three orders of magnitude faster than the state of the art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Community-based arts and media movements have been intsrumental in building population-wide creative capacity for cultural development, social participation and social transformation in many parts of the world. Digital storytelling is a form of media practice that was pioneered in the United States at the intersection of these movements. It is described here as a ‘co-creative’ media production method. This description aims to differentiate the approaches to collaborative content creation that are used in community cultural development (CCD) and community media movements from those valued in professional and consumer modes of media production. Yet, the products of co-creative practices, such as digital stories, do not circulate widely through existing media networks or through the newer social media networks that Australian CCD and community media movements anticipated by at least twenty years. The complex politics of story ownership are one of a number of factors that often render ‘publication’ a secondary consideration in the making of digital stories. The possibility of ‘downstream’ use and re-use of stories in other networks is not usually considered in initial planning and development processes. As landmark projects such as Capture Wales indicate, even where stories are made for broadcast outcomes, television can be a problematic window for exhibiting digital stories. Scepticism about the brave new world of reality television and user generated content also circulates in digital storytelling networks, especially when it comes to ethical concerns for managing the risks of harm associated with widespread distribution of digital stories to indiscriminate publics. This publication reports on a collaborative action research project that took a closer look at some of the constraints relating to the problems of re-purposing digital stories for television. It focussed on ‘best practice’ for managing the risks of harm to storytellers in the process of re-purposing digital stories for broadcast on community television.