319 resultados para Information retrieval


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As of today, user-generated information such as online reviews has become increasingly significant for customers in decision making process. Meanwhile, as the volume of online reviews proliferates, there is an insistent demand to help the users tackle the information overload problem. In order to extract useful information from overwhelming reviews, considerable work has been proposed such as review summarization and review selection. Particularly, to avoid the redundant information, researchers attempt to select a small set of reviews to represent the entire review corpus by preserving its statistical properties (e.g., opinion distribution). However, one significant drawback of the existing works is that they only measure the utility of the extracted reviews as a whole without considering the quality of each individual review. As a result, the set of chosen reviews may consist of low-quality ones even its statistical property is close to that of the original review corpus, which is not preferred by the users. In this paper, we proposed a review selection method which takes review quality into consideration during the selection process. Specifically, we examine the relationships between product features based upon a domain ontology to capture the review characteristics based on which to select reviews that have good quality and preserve the opinion distribution as well. Our experimental results based on real world review datasets demonstrate that our proposed approach is feasible and able to improve the performance of the review selection effectively.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Statistical reports of SMEs Internet usage from various countries indicate a steady growth. However, deeper investigation of SME’s e-commerce adoption and usage reveals that a number of SMEs fail to realize the full potential of e-commerce. Factors such as lack of tools and models in Information Systems and Information Technology for SMEs, and lack of technical expertise and specialized knowledge within and outside the SME have the most effect. This study aims to address the two important factors in two steps. First, introduce the conceptual tool for intuitive interaction. Second, explain the implementation process of the conceptual tool with the help of a case study. The subject chosen for the case study is a real estate SME from India. The design and development process of the website for the real estate SME was captured in this case study and the duration of the study was four months. Results indicated specific benefits for web designers and SME business owners. Results also indicated that the conceptual tool is easy to use without the need for technical expertise and specialized knowledge.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Theories of search and search behavior can be used to glean insights and generate hypotheses about how people interact with retrieval systems. This paper examines three such theories, the long standing Information Foraging Theory, along with the more recently proposed Search Economic Theory and the Interactive Probability Ranking Principle. Our goal is to develop a model for ad-hoc topic retrieval using each approach, all within a common framework, in order to (1) determine what predictions each approach makes about search behavior, and (2) show the relationships, equivalences and differences between the approaches. While each approach takes a different perspective on modeling searcher interactions, we show that under certain assumptions, they lead to similar hypotheses regarding search behavior. Moreover, we show that the models are complementary to each other, but operate at different levels (i.e., sessions, patches and situations). We further show how the differences between the approaches lead to new insights into the theories and new models. This contribution will not only lead to further theoretical developments, but also enables practitioners to employ one of the three equivalent models depending on the data available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prior to embarking on further study into the subject of relevance it is essential to consider why the concept of relevance has remained inconclusive, despite extensive research and its centrality to the discipline of information science. The approach taken in this paper is to reconstruct the science of information retrieval from first principles including the problem statement, role, scope and objective. This framework for document selection is put forward as a straw man for comparison with the historical relevance models. The paper examines five influential relevance models over the past 50 years. Each is examined with respect to its treatment of relevance and compared with the first principles model to identify contributions and deficiencies. The major conclusion drawn is that relevance is a significantly overloaded concept which is both confusing and detrimental to the science.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Context and objectives: Good clinical teaching is central to medical education but there is concern about maintaining this in contemporary, pressured health care environments. This paper aims to demonstrate that good clinical practice is at the heart of good clinical teaching. Methods: Seven roles are used as a framework for analysing good clinical teaching. The roles are medical expert, communicator, collaborator, manager, advocate, scholar and professional. Results: The analysis of clinical teaching and clinical practice demonstrates that they are closely linked. As experts, clinical teachers are involved in research, information retrieval and sharing of knowledge or teaching. Good communication with trainees, patients and colleagues defines teaching excellence. Clinicians can 'teach' collaboration by acting as role models and by encouraging learners to understand the responsibilities of other health professionals. As managers, clinicians can apply their skills to the effective management of learning resources. Similarly skills as advocates at the individual, community and population level can be passed on in educational encounters. The clinicians' responsibilities as scholars are most readily applied to teaching activities. Clinicians have clear roles in taking scholarly approaches to their practice and demonstrating them to others. Conclusion: Good clinical teaching is concerned with providing role models for good practice, making good practice visible and explaining it to trainees. This is the very basis of clinicians as professionals, the seventh role, and should be the foundation for the further development of clinicians as excellent clinical teachers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This chapter presents the contextual framework for the second phase of a multi-method, multiple study of the information systems (IS) academic discipline in Australia. The chapter outlines the genesis of a two-phase Australian study, and positions the study as the precursor to a larger Pacific-Asia study. Analysis of existing literature on the state of IS and on relevant theory underpins a series of individual Australian state case studies summarised in this chapter and represented as separate chapters in the book. This chapter outlines the methodological approach employed, with emphasis on the case-study method of the multiple state studies. The process of multiple peer review of the studies is described. Importantly, this chapter summarises and analyses each of the subsequent chapters of this book, emphasising the role of a framework developed to guide much of the data gathering and analysis. This chapter also highlights the process involved in conducting the meta-analysis reported in the final chapter of this book, and summarises some of the main results of the meta-analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, the rapid growth and adoption of the World Wide Web has further exacerbated user needs for e±cient mechanisms for information and knowledge location, selection, and retrieval. How to gather useful and meaningful information from the Web becomes challenging to users. The capture of user information needs is key to delivering users' desired information, and user pro¯les can help to capture information needs. However, e®ectively acquiring user pro¯les is di±cult. It is argued that if user background knowledge can be speci¯ed by ontolo- gies, more accurate user pro¯les can be acquired and thus information needs can be captured e®ectively. Web users implicitly possess concept models that are obtained from their experience and education, and use the concept models in information gathering. Prior to this work, much research has attempted to use ontologies to specify user background knowledge and user concept models. However, these works have a drawback in that they cannot move beyond the subsumption of super - and sub-class structure to emphasising the speci¯c se- mantic relations in a single computational model. This has also been a challenge for years in the knowledge engineering community. Thus, using ontologies to represent user concept models and to acquire user pro¯les remains an unsolved problem in personalised Web information gathering and knowledge engineering. In this thesis, an ontology learning and mining model is proposed to acquire user pro¯les for personalised Web information gathering. The proposed compu- tational model emphasises the speci¯c is-a and part-of semantic relations in one computational model. The world knowledge and users' Local Instance Reposito- ries are used to attempt to discover and specify user background knowledge. From a world knowledge base, personalised ontologies are constructed by adopting au- tomatic or semi-automatic techniques to extract user interest concepts, focusing on user information needs. A multidimensional ontology mining method, Speci- ¯city and Exhaustivity, is also introduced in this thesis for analysing the user background knowledge discovered and speci¯ed in user personalised ontologies. The ontology learning and mining model is evaluated by comparing with human- based and state-of-the-art computational models in experiments, using a large, standard data set. The experimental results are promising for evaluation. The proposed ontology learning and mining model in this thesis helps to develop a better understanding of user pro¯le acquisition, thus providing better design of personalised Web information gathering systems. The contributions are increasingly signi¯cant, given both the rapid explosion of Web information in recent years and today's accessibility to the Internet and the full text world.