956 resultados para GUIDE-O (Information retrieval system)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel approach using combined features to retrieve images containing specific objects, scenes or buildings. The content of an image is characterized by two kinds of features: Harris-Laplace interest points described by the SIFT descriptor and edges described by the edge color histogram. Edges and corners contain the maximal amount of information necessary for image retrieval. The feature detection in this work is an integrated process: edges are detected directly based on the Harris function; Harris interest points are detected at several scales and Harris-Laplace interest points are found using the Laplace function. The combination of edges and interest points brings efficient feature detection and high recognition ratio to the image retrieval system. Experimental results show this system has good performance. © 2005 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Design rationale is an effective way of capturing knowledge, since it records the issues addressed, the options considered, and the arguments used when specific decisions are made during the design process. Design rationale is generally captured by identifying elements and their dependencies, i.e. in a structured way. Current retrieval methods focus mainly on either the classification of rationale or on keyword-based searches of records. Keyword-based retrieval is reasonably effective as the information in design rationale records is mainly described using text. However, most of the current keyword-based retrieval methods discard the implicit structures of these records, resulting either in poor precision of retrieval or in isolated pieces of information that are difficult to understand. This ongoing research aims to go beyond keyword-based retrieval by developing methods and tools to facilitate the provision of useful design knowledge in new design projects. Our first step is to understand the structured information derived from the relationship between lumps of text held in different nodes in the design rationale captured via a software tool currently used in industry, and study how this information can be utilised to improve retrieval performance. Specifically, methods for utilising various structured information are developed and implemented on a prototype keyword-based retrieval system developed in our earlier work. The implementation and evaluation of these methods shows that the structured information can be utilised in a number of ways, such as filtering the results and providing more complete information. This allows the retrieval system to present results that are easy to understand, and which closely match designers' queries. Like design rationale, other methods for representing design knowledge also in essence involve structured information and thus the methods proposed can be generalised to be adapted and applied for the retrieval of other kinds of design knowledge. Copyright © 2002-2012 The Design Society. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents work on document retrieval based on first time participation in the CLEF 2001 monolingual retrieval task using French. The experiment findings indicated that Okapi, the text retrieval system in use, can successfully be used for non-English text retrieval. A lot of internal pre-processing is required in the basic search system for conversion into Okapi access formats. Various shell scripts were written to achieve the conversion in a UNIX environment, failure of which would significantly have impeded the overall performance. Based on the experiment findings using Okapi - originally designed for English - it was clear that, although most European languages share conventional word boundaries and variant word morphemes formed by the additon of suffixes, there is significant difference between French and English retrieval depending on the adaptation of indexing and search strategies in use. No sophisticated method for higher recall and precision such as stemming techniques, phrase translation or de-compounding was employed for the experiment and our results were suggestively poor. Future participation would include more refined query translation tools.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fado was listed as UNESCO Intangible Cultural Heritage in 2011. This dissertation describes a theoretical model, as well as an automatic system, able to generate instrumental music based on the musics and vocal sounds typically associated with fado’s practice. A description of the phenomenon of fado, its musics and vocal sounds, based on ethnographic, historical sources and empirical data is presented. The data includes the creation of a digital corpus, of musical transcriptions, identified as fado, and statistical analysis via music information retrieval techniques. The second part consists in the formulation of a theory and the coding of a symbolic model, as a proof of concept, for the automatic generation of instrumental music based on the one in the corpus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sharing of information with those in need of it has always been an idealistic goal of networked environments. With the proliferation of computer networks, information is so widely distributed among systems, that it is imperative to have well-organized schemes for retrieval and also discovery. This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron.The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.Most of the distributed systems of the nature of ECRS normally will possess a "fragile architecture" which would make them amenable to collapse, with the occurrence of minor faults. This is resolved with the help of the penta-tier architecture proposed, that contained five different technologies at different tiers of the architecture.The results of experiment conducted and its analysis show that such an architecture would help to maintain different components of the software intact in an impermeable manner from any internal or external faults. The architecture thus evolved needed a mechanism to support information processing and discovery. This necessitated the introduction of the noveI concept of infotrons. Further, when a computing machine has to perform any meaningful extraction of information, it is guided by what is termed an infotron dictionary.The other empirical study was to find out which of the two prominent markup languages namely HTML and XML, is best suited for the incorporation of infotrons. A comparative study of 200 documents in HTML and XML was undertaken. The result was in favor ofXML.The concept of infotron and that of infotron dictionary, which were developed, was applied to implement an Information Discovery System (IDS). IDS is essentially, a system, that starts with the infotron(s) supplied as clue(s), and results in brewing the information required to satisfy the need of the information discoverer by utilizing the documents available at its disposal (as information space). The various components of the system and their interaction follows the penta-tier architectural model and therefore can be considered fault-tolerant. IDS is generic in nature and therefore the characteristics and the specifications were drawn up accordingly. Many subsystems interacted with multiple infotron dictionaries that were maintained in the system.In order to demonstrate the working of the IDS and to discover the information without modification of a typical Library Information System (LIS), an Information Discovery in Library Information System (lDLIS) application was developed. IDLIS is essentially a wrapper for the LIS, which maintains all the databases of the library. The purpose was to demonstrate that the functionality of a legacy system could be enhanced with the augmentation of IDS leading to information discovery service. IDLIS demonstrates IDS in action. IDLIS proves that any legacy system could be augmented with IDS effectively to provide the additional functionality of information discovery service.Possible applications of IDS and scope for further research in the field are covered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conceptual Information Systems provide a multi-dimensional conceptually structured view on data stored in relational databases. On restricting the expressiveness of the retrieval language, they allow the visualization of sets of realted queries in conceptual hierarchies, hence supporting the search of something one does not have a precise description, but only a vague idea of. Information Retrieval is considered as the process of finding specific objects (documents etc.) out of a large set of objects which fit to some description. In some data analysis and knowledge discovery applications, the dual task is of interest: The analyst needs to determine, for a subset of objects, a description for this subset. In this paper we discuss how Conceptual Information Systems can be extended to support also the second task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional content-based image retrieval (CBIR) systems use low-level features such as colors, shapes, and textures of images. Although, users make queries based on semantics, which are not easily related to such low-level characteristics. Recent works on CBIR confirm that researchers have been trying to map visual low-level characteristics and high-level semantics. The relation between low-level characteristics and image textual information has motivated this article which proposes a model for automatic classification and categorization of words associated to images. This proposal considers a self-organizing neural network architecture, which classifies textual information without previous learning. Experimental results compare the performance results of the text-based approach to an image retrieval system based on low-level features. (c) 2008 Wiley Periodicals, Inc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an agent-based system Intelligent Financial News Digest System (IFNDS) for analyzing online financial news articles and associated material. The system can abstract, synthesize, digest, and classify the contents, and assesses whether the report is favorable to any company discussed in the reports. It integrates artificial intelligence technologies including traditional information retrieval and extraction techniques for the news analysis. It makes use of keyword statistics and backpropagation training data to identify companies named in reportage whether it is, evaluatively speaking, positive, negative or neutral. The system would be of use to media such as clipping services, media management, advertising, public relations, public interest, and e-commerce professionals and government non-governmental bodies interested in monitoring the media profiles of corporations, products, and issues.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web caching is a widely deployed technique to reduce the load to web servers and to reduce the latency for web browsers. Peer-to-Peer (P2P) web caching has been a hot research topic in recent years as it can create scalable and robust designs for decentralized internet-scale applications. However, many P2P web caching systems suffer expensive overheads such as lookup and publish messages, and lack locality awareness. In this paper, we present the development of a locality aware cache diffusion system that makes use of routing table locality, aggregation, and soft state to overcome these limitations. The analysis and experiments show that our cache diffusion system reduces the amount of information processed by nodes, reduces the number of index messages sent by nodes, and improves the locality of cache pointers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multimedia information is now routinely available in the forms of text, pictures, animation and sound. Although text objects are relatively easy to deal with (in terms of information search and retrieval), other information bearing objects (such as sound, images, animation) are more difficult to index. Our research is aimed at developing better ways of representing multimedia objects by using a conceptual representation based on Schank's conceptual dependencies. Moreover, the representation allows for users' individual interpretations to be embedded in the system. This will alleviate the problems associated with traditional semantic networks by allowing for coexistence of multiple views of the same information. The viability of the approach is tested, and the preliminary results reported.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we discuss the design aspects of a dynamic distributed directory scheme (DDS) to facilitate efficient and transparent access to information files in mobile environments. The proposed directory interface enables users of mobile computers to view a distributed file system on a network of computers as a globally shared file system. In order to counter some of the limitations of wireless communications, we propose improvised invalidation schemes that avoid false sharing and ensure uninterrupted usage under disconnected and low bandwidth conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports a research to evaluate the potential and the effects of use of annotated Paraconsistent logic in automatic indexing. This logic attempts to deal with contradictions, concerned with studying and developing inconsistency-tolerant systems of logic. This logic, being flexible and containing logical states that go beyond the dichotomies yes and no, permits to advance the hypothesis that the results of indexing could be better than those obtained by traditional methods. Interactions between different disciplines, as information retrieval, automatic indexing, information visualization, and nonclassical logics were considered in this research. From the methodological point of view, an algorithm for treatment of uncertainty and imprecision, developed under the Paraconsistent logic, was used to modify the values of the weights assigned to indexing terms of the text collections. The tests were performed on an information visualization system named Projection Explorer (PEx), created at Institute of Mathematics and Computer Science (ICMC - USP Sao Carlos), with available source code. PEx uses traditional vector space model to represent documents of a collection. The results were evaluated by criteria built in the information visualization system itself, and demonstrated measurable gains in the quality of the displays, confirming the hypothesis that the use of the para-analyser under the conditions of the experiment has the ability to generate more effective clusters of similar documents. This is a point that draws attention, since the constitution of more significant clusters can be used to enhance information indexing and retrieval. It can be argued that the adoption of non-dichotomous (non-exclusive) parameters provides new possibilities to relate similar information.