914 resultados para pacs: information retrieval techniques


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As of today, user-generated information such as online reviews has become increasingly significant for customers in decision making process. Meanwhile, as the volume of online reviews proliferates, there is an insistent demand to help the users tackle the information overload problem. In order to extract useful information from overwhelming reviews, considerable work has been proposed such as review summarization and review selection. Particularly, to avoid the redundant information, researchers attempt to select a small set of reviews to represent the entire review corpus by preserving its statistical properties (e.g., opinion distribution). However, one significant drawback of the existing works is that they only measure the utility of the extracted reviews as a whole without considering the quality of each individual review. As a result, the set of chosen reviews may consist of low-quality ones even its statistical property is close to that of the original review corpus, which is not preferred by the users. In this paper, we proposed a review selection method which takes review quality into consideration during the selection process. Specifically, we examine the relationships between product features based upon a domain ontology to capture the review characteristics based on which to select reviews that have good quality and preserve the opinion distribution as well. Our experimental results based on real world review datasets demonstrate that our proposed approach is feasible and able to improve the performance of the review selection effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical reports of SMEs Internet usage from various countries indicate a steady growth. However, deeper investigation of SME’s e-commerce adoption and usage reveals that a number of SMEs fail to realize the full potential of e-commerce. Factors such as lack of tools and models in Information Systems and Information Technology for SMEs, and lack of technical expertise and specialized knowledge within and outside the SME have the most effect. This study aims to address the two important factors in two steps. First, introduce the conceptual tool for intuitive interaction. Second, explain the implementation process of the conceptual tool with the help of a case study. The subject chosen for the case study is a real estate SME from India. The design and development process of the website for the real estate SME was captured in this case study and the duration of the study was four months. Results indicated specific benefits for web designers and SME business owners. Results also indicated that the conceptual tool is easy to use without the need for technical expertise and specialized knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DEVELOPING A TEXTILE ONTOLOGY FOR THE SEMANTIC WEB AND CONNECTING IT TO MUSEUM CATALOGING DATA The goal of the Semantic Web is to share concept-based information in a versatile way on the Internet. This is achievable using formal data structures called ontologies. The goal of this re-search is to increase the usability of museum cataloging data in information retrieval. The work is interdisciplinary, involving craft science, terminology science, computer science, and museology. In the first part of the dissertation an ontology of concepts of textiles, garments, and accessories is developed for museum cataloging work. The ontology work was done with the help of thesauri, vocabularies, research reports, and standards. The basis of the ontology development was the Museoalan asiasanasto MASA, a thesaurus for museum cataloging work which has been enriched by other vocabularies. Concepts and terms concerning the research object, as well as the material names of textiles, costumes, and accessories, were focused on. The research method was terminological concept analysis complemented by an ontological view of the Semantic Web. The concept structure was based on the hierarchical generic relation. Attention was also paid to other relations between terms and concepts, and between concepts themselves. Altogether 977 concept classes were created. Issues including how to choose and name concepts for the ontology hierarchy and how deep and broad the hierarchy could be are discussed from the viewpoint of the ontology developer and museum cataloger. The second part of the dissertation analyzes why some of the cataloged terms did not match with the developed textile ontology. This problem is significant because it prevents automatic ontological content integration of the cataloged data on the Semantic Web. The research datasets, i.e. the cataloged museum data on textile collections, came from three museums: Espoo City Museum, Lahti City Museum and The National Museum of Finland. The data included 1803 textile, costume, and accessory objects. Unmatched object and textile material names were analyzed. In the case of the object names six categories (475 cases), and of the material names eight categories (423 cases), were found where automatic annotation was not possible. The most common explanation was that the cataloged field was filled with a long sentence comprised of many terms. Sometimes in the compound term, the object name and material, or the name and the way of usage, were combined. As well, numeric values in the material name cataloging field prevented annotation and so did the absence of a corresponding concept in the ontology. Ready-made drop-down lists of materials used in one cataloging system facilitated the annotation. In the case of naming objects and materials, one should use terms in basic form without attributes. The developed textile ontology has been applied in two cultural portals, MuseumFinland and Culturesampo, where one can search for and browse information based on cataloged data using integrated ontologies in an interoperable way. The textile ontology is also part of the national FinnONTO ontology infrastructure. Keywords: annotation, concept, concept analysis, cataloging, museum collection, ontology, Semantic Web, textile collection, textile material

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Theories of search and search behavior can be used to glean insights and generate hypotheses about how people interact with retrieval systems. This paper examines three such theories, the long standing Information Foraging Theory, along with the more recently proposed Search Economic Theory and the Interactive Probability Ranking Principle. Our goal is to develop a model for ad-hoc topic retrieval using each approach, all within a common framework, in order to (1) determine what predictions each approach makes about search behavior, and (2) show the relationships, equivalences and differences between the approaches. While each approach takes a different perspective on modeling searcher interactions, we show that under certain assumptions, they lead to similar hypotheses regarding search behavior. Moreover, we show that the models are complementary to each other, but operate at different levels (i.e., sessions, patches and situations). We further show how the differences between the approaches lead to new insights into the theories and new models. This contribution will not only lead to further theoretical developments, but also enables practitioners to employ one of the three equivalent models depending on the data available.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ett sätt att förbättra resultat i informationssökning är frågeutvidgning. Vid frågeutvidgning utökas användarens ursprungliga fråga med termer som berör samma ämne. Frågor som har stort likhetsvärde med ett dokument kan tänkas beskriva dokumentet väl och kan därför fungera som en källa för goda utvidgningstermer. Om tidigare frågor finns lagrade kan termer som hittas med hjälp av dessa användas som kandidater för frågeutvidgningstermer. I avhandlingen presenteras och jämförs tre metoder för användning av tidigare frågor vid frågeutvidgning. För att evaluera metodernas effektivitet, jämförs de med hjälp av sökmaskinen Lucene och en liten samling dokument som berör cancerforskning. Som jämförelseresultat används de omodifierade frågorna och en enkel pseudorelevansåterkopplingsmetod som inte använder sig av tidigare frågor. Ingen av frågeutvidgningsmetoderna klarade sig speciellt bra, vilket beror på att dokumentsamlingen och testfrågorna utgör en svår omgivning för denna typ av metoder.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information retrieval of concise and consistent text passages is called passage retrieval. Passages can be used in an information retrieval system to improve its user interface and performance. In this thesis passage retrieval is compared to other forms of information retrieval. Implementation of passage retrieval as a feature of an information retrieval system is discussed. Various existing passage retrieval methods, their implementation and their efficiency are compared. I evaluated two different implementations of passage retrieval: direct passage retrieval and combined passage retrieval. In comparison combined passage retrieval turned out to be more efficient.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Prior to embarking on further study into the subject of relevance it is essential to consider why the concept of relevance has remained inconclusive, despite extensive research and its centrality to the discipline of information science. The approach taken in this paper is to reconstruct the science of information retrieval from first principles including the problem statement, role, scope and objective. This framework for document selection is put forward as a straw man for comparison with the historical relevance models. The paper examines five influential relevance models over the past 50 years. Each is examined with respect to its treatment of relevance and compared with the first principles model to identify contributions and deficiencies. The major conclusion drawn is that relevance is a significantly overloaded concept which is both confusing and detrimental to the science.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Context and objectives: Good clinical teaching is central to medical education but there is concern about maintaining this in contemporary, pressured health care environments. This paper aims to demonstrate that good clinical practice is at the heart of good clinical teaching. Methods: Seven roles are used as a framework for analysing good clinical teaching. The roles are medical expert, communicator, collaborator, manager, advocate, scholar and professional. Results: The analysis of clinical teaching and clinical practice demonstrates that they are closely linked. As experts, clinical teachers are involved in research, information retrieval and sharing of knowledge or teaching. Good communication with trainees, patients and colleagues defines teaching excellence. Clinicians can 'teach' collaboration by acting as role models and by encouraging learners to understand the responsibilities of other health professionals. As managers, clinicians can apply their skills to the effective management of learning resources. Similarly skills as advocates at the individual, community and population level can be passed on in educational encounters. The clinicians' responsibilities as scholars are most readily applied to teaching activities. Clinicians have clear roles in taking scholarly approaches to their practice and demonstrating them to others. Conclusion: Good clinical teaching is concerned with providing role models for good practice, making good practice visible and explaining it to trainees. This is the very basis of clinicians as professionals, the seventh role, and should be the foundation for the further development of clinicians as excellent clinical teachers.

Relevância:

100.00% 100.00%

Publicador: