Biblioteca Digital

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

Veja mais

There and back again: Is there a need for GLAM education?

Relevância:

80.00% 80.00%

Publicador:

Veja mais

(Digital library) education or (digital) library education? An Australian perspective

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Education for Library and Information professionals in the digital environment has been an important discussion point the world over. However, before designing and implementing a programme for digital library education, it is prudent that the skills and knowledge required to work in this environment are identified to enable informed decisions to be made. Hitherto, there has been very little research which has sought the opinion of both educators and practitioners on this topic, and none with a wide geographical coverage of Australia. This paper presents the key findings of research undertaken at Tallinn University in the first half of 2009.

Veja mais

Approximate nearest-neighbour search with inverted signature slice lists

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.

Veja mais

Parallel streaming signature EM-tree: A clustering algorithm for web scale applications

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Veja mais

Text segmentation in unconstrained hand-drawings in whiteboard photos

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.

Veja mais

Enhancement of relevant features for text mining

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.

Veja mais

An analysis of theories of search and search behavior

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Theories of search and search behavior can be used to glean insights and generate hypotheses about how people interact with retrieval systems. This paper examines three such theories, the long standing Information Foraging Theory, along with the more recently proposed Search Economic Theory and the Interactive Probability Ranking Principle. Our goal is to develop a model for ad-hoc topic retrieval using each approach, all within a common framework, in order to (1) determine what predictions each approach makes about search behavior, and (2) show the relationships, equivalences and differences between the approaches. While each approach takes a different perspective on modeling searcher interactions, we show that under certain assumptions, they lead to similar hypotheses regarding search behavior. Moreover, we show that the models are complementary to each other, but operate at different levels (i.e., sessions, patches and situations). We further show how the differences between the approaches lead to new insights into the theories and new models. This contribution will not only lead to further theoretical developments, but also enables practitioners to employ one of the three equivalent models depending on the data available.

Veja mais

The paradigm of relevance: Is it time to kill the sacred cow?

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Prior to embarking on further study into the subject of relevance it is essential to consider why the concept of relevance has remained inconclusive, despite extensive research and its centrality to the discipline of information science. The approach taken in this paper is to reconstruct the science of information retrieval from first principles including the problem statement, role, scope and objective. This framework for document selection is put forward as a straw man for comparison with the historical relevance models. The paper examines five influential relevance models over the past 50 years. Each is examined with respect to its treatment of relevance and compared with the first principles model to identify contributions and deficiencies. The major conclusion drawn is that relevance is a significantly overloaded concept which is both confusing and detrimental to the science.

Veja mais

Sustainable Australia : containing travel in master planned estates

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Low density suburban development and excessive use of automobiles are associated with serious urban and environmental problems. These problems include traffic congestion, longer commuting times, high automobile dependency, air and water pollution, and increased depletion of natural resources. Master planned development suggests itself as a possible palliative for the ills of low density and high travel. The following study examines the patterns and dynamics of movement in a selection of master planned estates in Australia. The study develops new approaches for assessing the containment of travel within planned development. Its key aim is to clarify and map the relationships between trip generation and urban form and structure. The initial conceptual framework of the paper is developed in a review of literature related to urban form and travel behaviour. These concepts are tested empirically in a pilot study of suburban travel activity in master planned estates. A geographical information systems methodology is used to determine regional journey-to-work patterns and travel containment rates. Factors that influence selfcontainment patterns are estimated with a regression model. This research is a useful preliminary examination of travel self-containment in Australian master planned estates.

Veja mais

The reliability of information on work-related injuries available from hospitalisation data in Australia

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Objective: To examine the reliability of work-related activity coding for injury-related hospitalisations in Australia. Method: A random sample of 4373 injury-related hospital separations from 1 July 2002 to 30 June 2004 were obtained from a stratified random sample of 50 hospitals across 4 states in Australia. From this sample, cases were identified as work-related if they contained an ICD-10-AM work-related activity code (U73) allocated by either: (i) the original coder; (ii) an independent auditor, blinded to the original code; or (iii) a research assistant, blinded to both the original and auditor codes, who reviewed narrative text extracted from the medical record. The concordance of activity coding and number of cases identified as work-related using each method were compared. Results: Of the 4373 cases sampled, 318 cases were identified as being work-related using any of the three methods for identification. The original coder identified 217 and the auditor identified 266 work-related cases (68.2% and 83.6% of the total cases identified, respectively). Around 10% of cases were only identified through the text description review. The original coder and auditor agreed on the assignment of work-relatedness for 68.9% of cases. Conclusions and Implications: The current best estimates of the frequency of hospital admissions for occupational injury underestimate the burden by around 32%. This is a substantial underestimate that has major implications for public policy, and highlights the need for further work on improving the quality and completeness of routine, administrative data sources for a more complete identification of work-related injuries.

Veja mais

The information systems discipline in Australian universities : a contextual framework

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This chapter presents the contextual framework for the second phase of a multi-method, multiple study of the information systems (IS) academic discipline in Australia. The chapter outlines the genesis of a two-phase Australian study, and positions the study as the precursor to a larger Pacific-Asia study. Analysis of existing literature on the state of IS and on relevant theory underpins a series of individual Australian state case studies summarised in this chapter and represented as separate chapters in the book. This chapter outlines the methodological approach employed, with emphasis on the case-study method of the multiple state studies. The process of multiple peer review of the studies is described. Importantly, this chapter summarises and analyses each of the subsequent chapters of this book, emphasising the role of a framework developed to guide much of the data gathering and analysis. This chapter also highlights the process involved in conducting the meta-analysis reported in the final chapter of this book, and summarises some of the main results of the meta-analysis.

Veja mais

336 resultados para Information retrieval - Australia

Filtro por publicador