937 resultados para Ontologies (Information Retrieval)


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As of today, user-generated information such as online reviews has become increasingly significant for customers in decision making process. Meanwhile, as the volume of online reviews proliferates, there is an insistent demand to help the users tackle the information overload problem. In order to extract useful information from overwhelming reviews, considerable work has been proposed such as review summarization and review selection. Particularly, to avoid the redundant information, researchers attempt to select a small set of reviews to represent the entire review corpus by preserving its statistical properties (e.g., opinion distribution). However, one significant drawback of the existing works is that they only measure the utility of the extracted reviews as a whole without considering the quality of each individual review. As a result, the set of chosen reviews may consist of low-quality ones even its statistical property is close to that of the original review corpus, which is not preferred by the users. In this paper, we proposed a review selection method which takes review quality into consideration during the selection process. Specifically, we examine the relationships between product features based upon a domain ontology to capture the review characteristics based on which to select reviews that have good quality and preserve the opinion distribution as well. Our experimental results based on real world review datasets demonstrate that our proposed approach is feasible and able to improve the performance of the review selection effectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Statistical reports of SMEs Internet usage from various countries indicate a steady growth. However, deeper investigation of SME’s e-commerce adoption and usage reveals that a number of SMEs fail to realize the full potential of e-commerce. Factors such as lack of tools and models in Information Systems and Information Technology for SMEs, and lack of technical expertise and specialized knowledge within and outside the SME have the most effect. This study aims to address the two important factors in two steps. First, introduce the conceptual tool for intuitive interaction. Second, explain the implementation process of the conceptual tool with the help of a case study. The subject chosen for the case study is a real estate SME from India. The design and development process of the website for the real estate SME was captured in this case study and the duration of the study was four months. Results indicated specific benefits for web designers and SME business owners. Results also indicated that the conceptual tool is easy to use without the need for technical expertise and specialized knowledge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Theories of search and search behavior can be used to glean insights and generate hypotheses about how people interact with retrieval systems. This paper examines three such theories, the long standing Information Foraging Theory, along with the more recently proposed Search Economic Theory and the Interactive Probability Ranking Principle. Our goal is to develop a model for ad-hoc topic retrieval using each approach, all within a common framework, in order to (1) determine what predictions each approach makes about search behavior, and (2) show the relationships, equivalences and differences between the approaches. While each approach takes a different perspective on modeling searcher interactions, we show that under certain assumptions, they lead to similar hypotheses regarding search behavior. Moreover, we show that the models are complementary to each other, but operate at different levels (i.e., sessions, patches and situations). We further show how the differences between the approaches lead to new insights into the theories and new models. This contribution will not only lead to further theoretical developments, but also enables practitioners to employ one of the three equivalent models depending on the data available.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ett sätt att förbättra resultat i informationssökning är frågeutvidgning. Vid frågeutvidgning utökas användarens ursprungliga fråga med termer som berör samma ämne. Frågor som har stort likhetsvärde med ett dokument kan tänkas beskriva dokumentet väl och kan därför fungera som en källa för goda utvidgningstermer. Om tidigare frågor finns lagrade kan termer som hittas med hjälp av dessa användas som kandidater för frågeutvidgningstermer. I avhandlingen presenteras och jämförs tre metoder för användning av tidigare frågor vid frågeutvidgning. För att evaluera metodernas effektivitet, jämförs de med hjälp av sökmaskinen Lucene och en liten samling dokument som berör cancerforskning. Som jämförelseresultat används de omodifierade frågorna och en enkel pseudorelevansåterkopplingsmetod som inte använder sig av tidigare frågor. Ingen av frågeutvidgningsmetoderna klarade sig speciellt bra, vilket beror på att dokumentsamlingen och testfrågorna utgör en svår omgivning för denna typ av metoder.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Information retrieval of concise and consistent text passages is called passage retrieval. Passages can be used in an information retrieval system to improve its user interface and performance. In this thesis passage retrieval is compared to other forms of information retrieval. Implementation of passage retrieval as a feature of an information retrieval system is discussed. Various existing passage retrieval methods, their implementation and their efficiency are compared. I evaluated two different implementations of passage retrieval: direct passage retrieval and combined passage retrieval. In comparison combined passage retrieval turned out to be more efficient.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user’s files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Prior to embarking on further study into the subject of relevance it is essential to consider why the concept of relevance has remained inconclusive, despite extensive research and its centrality to the discipline of information science. The approach taken in this paper is to reconstruct the science of information retrieval from first principles including the problem statement, role, scope and objective. This framework for document selection is put forward as a straw man for comparison with the historical relevance models. The paper examines five influential relevance models over the past 50 years. Each is examined with respect to its treatment of relevance and compared with the first principles model to identify contributions and deficiencies. The major conclusion drawn is that relevance is a significantly overloaded concept which is both confusing and detrimental to the science.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Context and objectives: Good clinical teaching is central to medical education but there is concern about maintaining this in contemporary, pressured health care environments. This paper aims to demonstrate that good clinical practice is at the heart of good clinical teaching. Methods: Seven roles are used as a framework for analysing good clinical teaching. The roles are medical expert, communicator, collaborator, manager, advocate, scholar and professional. Results: The analysis of clinical teaching and clinical practice demonstrates that they are closely linked. As experts, clinical teachers are involved in research, information retrieval and sharing of knowledge or teaching. Good communication with trainees, patients and colleagues defines teaching excellence. Clinicians can 'teach' collaboration by acting as role models and by encouraging learners to understand the responsibilities of other health professionals. As managers, clinicians can apply their skills to the effective management of learning resources. Similarly skills as advocates at the individual, community and population level can be passed on in educational encounters. The clinicians' responsibilities as scholars are most readily applied to teaching activities. Clinicians have clear roles in taking scholarly approaches to their practice and demonstrating them to others. Conclusion: Good clinical teaching is concerned with providing role models for good practice, making good practice visible and explaining it to trainees. This is the very basis of clinicians as professionals, the seventh role, and should be the foundation for the further development of clinicians as excellent clinical teachers.

Relevância:

80.00% 80.00%

Publicador: