929 resultados para Representation and information retrieval technologies


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Image annotation is a significant step towards semantic based image retrieval. Ontology is a popular approach for semantic representation and has been intensively studied for multimedia analysis. However, relations among concepts are seldom used to extract higher-level semantics. Moreover, the ontology inference is often crisp. This paper aims to enable sophisticated semantic querying of images, and thus contributes to 1) an ontology framework to contain both visual and contextual knowledge, and 2) a probabilistic inference approach to reason the high-level concepts based on different sources of information. The experiment on a natural scene database from LabelMe database shows encouraging results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A successful urban management system for a Ubiquitous Eco City requires an integrated approach. This integration includes bringing together economic, socio-cultural and urban development with a well orchestrated, transparent and open decision making mechanism and necessary infrastructure and technologies. Rapidly developing information and telecommunication technologies and their platforms in the late 20th Century improves urban management and enhances the quality of life and place. Telecommunication technologies provide an important base for monitoring and managing activities over wired, wireless or fibre-optic networks. Particularly technology convergence creates new ways in which the information and telecommunication technologies are used. The 21st Century is an era where information has converged, in which people are able to access a variety of services, including internet and location based services, through multi-functional devices such as mobile phones and provides opportunities in the management of Ubiquitous Eco Cities. This paper discusses the recent developments in telecommunication networks and trends in convergence technologies and their implications on the management of Ubiquitous Eco Cities and how this technological shift is likely to be beneficial in improving the quality of life and place. The paper also introduces recent approaches on urban management systems, such as intelligent urban management systems, that are suitable for Ubiquitous Eco Cities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Home Automation (HA) has emerged as a prominent ¯eld for researchers and in- vestors confronting the challenge of penetrating the average home user market with products and services emerging from technology based vision. In spite of many technology contri- butions, there is a latent demand for a®ordable and pragmatic assistive technologies for pro-active handling of complex lifestyle related problems faced by home users. This study has pioneered to develop an Initial Technology Roadmap for HA (ITRHA) that formulates a need based vision of 10-15 years, identifying market, product and technology investment opportunities, focusing on those aspects of HA contributing to e±cient management of home and personal life. The concept of Family Life Cycle is developed to understand the temporal needs of family. In order to formally describe a coherent set of family processes, their relationships, and interaction with external elements, a reference model named Fam- ily System is established that identi¯es External Entities, 7 major Family Processes, and 7 subsystems-Finance, Meals, Health, Education, Career, Housing, and Socialisation. Anal- ysis of these subsystems reveals Soft, Hard and Hybrid processes. Rectifying the lack of formal methods for eliciting future user requirements and reassessing evolving market needs, this study has developed a novel method called Requirement Elicitation of Future Users by Systems Scenario (REFUSS), integrating process modelling, and scenario technique within the framework of roadmapping. The REFUSS is used to systematically derive process au- tomation needs relating the process knowledge to future user characteristics identi¯ed from scenarios created to visualise di®erent futures with richly detailed information on lifestyle trends thus enabling learning about the future requirements. Revealing an addressable market size estimate of billions of dollars per annum this research has developed innovative ideas on software based products including Document Management Systems facilitating automated collection, easy retrieval of all documents, In- formation Management System automating information services and Ubiquitous Intelligent System empowering the highly mobile home users with ambient intelligence. Other product ideas include robotic devices of versatile Kitchen Hand and Cleaner Arm that can be time saving. Materialisation of these products require technology investment initiating further research in areas of data extraction, and information integration as well as manipulation and perception, sensor actuator system, tactile sensing, odour detection, and robotic controller. This study recommends new policies on electronic data delivery from service providers as well as new standards on XML based document structure and format.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Seventy-six librarians participated in a series of focus groups in support of research exploring the skills, knowledge and attributes required by the contemporary library and information professional in a world of every changing technology. The project was funded by the Australian Learning and Teaching Council. Text data mining analysis revealed three main thematic clusters (libraries, people, jobs) and one minor thematic cluster (community). Library 2.0 was broadly viewed by participants as being about change whilst librarian 2.0 was perceived by participants as not a new creation but just about good librarian practices. Participants expressed the general belief that personality traits, not just qualifications, were critical to be a successful librarian or information worker in the future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In computational linguistics, information retrieval and applied cognition, words and concepts are often represented as vectors in high dimensional spaces computed from a corpus of text. These high dimensional spaces are often referred to as Semantic Spaces. We describe a novel and efficient approach to computing these semantic spaces via the use of complex valued vector representations. We report on the practical implementation of the proposed method and some associated experiments. We also briefly discuss how the proposed system relates to previous theoretical work in Information Retrieval and Quantum Mechanics and how the notions of probability, logic and geometry are integrated within a single Hilbert space representation. In this sense the proposed system has more general application and gives rise to a variety of opportunities for future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.