55 resultados para Lexical Database
em Queensland University of Technology - ePrints Archive
Resumo:
The cross-sections of the Social Web and the Semantic Web has put folksonomy in the spot light for its potential in overcoming knowledge acquisition bottleneck and providing insight for "wisdom of the crowds". Folksonomy which comes as the results of collaborative tagging activities has provided insight into user's understanding about Web resources which might be useful for searching and organizing purposes. However, collaborative tagging vocabulary poses some challenges since tags are freely chosen by users and may exhibit synonymy and polysemy problem. In order to overcome these challenges and boost the potential of folksonomy as emergence semantics we propose to consolidate the diverse vocabulary into a consolidated entities and concepts. We propose to extract a tag ontology by ontology learning process to represent the semantics of a tagging community. This paper presents a novel approach to learn the ontology based on the widely used lexical database WordNet. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. We provide empirical evaluations by using the semantic information contained in the ontology in a tag recommendation experiment. The results show that by using the semantic relationships on the ontology the accuracy of the tag recommender has been improved.
Resumo:
Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. One of the most popular web personalization systems is recommender systems. In recommender systems choosing user information that can be used to profile users is very crucial for user profiling. In Web 2.0, one facility that can help users organize Web resources of their interest is user tagging systems. Exploring user tagging behavior provides a promising way for understanding users’ information needs since tags are given directly by users. However, free and relatively uncontrolled vocabulary makes the user self-defined tags lack of standardization and semantic ambiguity. Also, the relationships among tags need to be explored since there are rich relationships among tags which could provide valuable information for us to better understand users. In this paper, we propose a novel approach for learning tag ontology based on the widely used lexical database WordNet for capturing the semantics and the structural relationships of tags. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. To personalize further, clustering of users is performed to generate a more accurate ontology for a particular group of users. In order to evaluate the usefulness of the tag ontology, we use the tag ontology in a pilot tag recommendation experiment for improving the recommendation performance by exploiting the semantic information in the tag ontology. The initial result shows that the personalized information has improved the accuracy of the tag recommendation.
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
This paper presents a database ATP (Alternative Transient Program) simulated waveforms for shunt reactor switching cases with vacuum breakers in motor circuits following interruption of the starting current. The targeted objective is to provide multiple reignition simulated data for diagnostic and prognostic algorithms development, but also to help ATP users with practical study cases and component data compilation for shunt reactor switching. This method can be easily applied with different data for the different dielectric curves of circuit-breakers and networks. This paper presents design details, discusses some of the available cases and the advantages of such simulated data.
Resumo:
In the previous phase of this project, 2002-059-B Case-Based Reasoning in Construction and Infrastructure Projects, demonstration software was developed using a case-base reasoning engine to access a number of sources of information on lifetime of metallic building components. One source of information was data from the Queensland Department of Public Housing relating to maintenance operations over a number of years. Maintenance information is seen as being a particularly useful source of data about service life of building components as it relates to actual performance of materials in the working environment. If a building is constructed in 1984 and the maintenance records indicate that the guttering was replaced in 2006, then the service life of the gutters was 22 years in that environment. This phase of the project aims to look more deeply at the Department of Housing data, as an example of maintenance records, and formulate methods for using this data to inform the knowledge of service lifetimes.
Resumo:
Last year European Intellectual Property Review published an article comparing the latest version of the proposed US database legislation, the Collections of Information Antipiracy Bill with the UK's Copyright and Rights in Database Regulations 1997. Subsequently a new US Bill, the Consumer and Investor Access to Information Act has emerged, the Antipiracy Bill has been amended and much debate has occurred, but the US seems no closer to enacting database legislation. This article briefly outlines the background to the US legislative efforts, examines the two Bills and draws some comparisons with the UK Regulations. A study of the US Bills clearly demonstrates the starkly divided opinion on database protection held by the Bills' proponents and the principal lobby groups driving the legislative efforts: the Antipiracy Bill is very protective of database producers' interests, whereas the Access Bill is heavily user-oriented. If the US experience is any indication there will be a long horizon involved in achieving any consensus on international harmonisation of this difficult area.
Resumo:
Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.
Resumo:
This work details the results of a face authentication test (FAT2004) (http://www.ee.surrey.ac.uk/banca/icpr2004) held in conjunction with the 17th International Conference on Pattern Recognition. The contest was held on the publicly available BANCA database (http://www.ee.surrey.ac.uk/banca) according to a defined protocol (E. Bailly-Bailliere et al., June 2003). The competition also had a sequestered part in which institutions had to submit their algorithms for independent testing. 13 different verification algorithms from 10 institutions submitted results. Also, a standard set of face recognition software packages from the Internet (http://www.cs.colostate.edu/evalfacerec) were used to provide a baseline performance measure.
Resumo:
The growing importance and need of data processing for information extraction is vital for Web databases. Due to the sheer size and volume of databases, retrieval of relevant information as needed by users has become a cumbersome process. Information seekers are faced by information overloading - too many result sets are returned for their queries. Moreover, too few or no results are returned if a specific query is asked. This paper proposes a ranking algorithm that gives higher preference to a user’s current search and also utilizes profile information in order to obtain the relevant results for a user’s query.