121 resultados para Synonymy
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
Resumo:
Tagging has become one of the key activities in next generation websites which allow users selecting short labels to annotate, manage, and share multimedia information such as photos, videos and bookmarks. Tagging does not require users any prior training before participating in the annotation activities as they can freely choose any terms which best represent the semantic of contents without worrying about any formal structure or ontology. However, the practice of free-form tagging can lead to several problems, such as synonymy, polysemy and ambiguity, which potentially increase the complexity of managing the tags and retrieving information. To solve these problems, this research aims to construct a lightweight indexing scheme to structure tags by identifying and disambiguating the meaning of terms and construct a knowledge base or dictionary. News has been chosen as the primary domain of application to demonstrate the benefits of using structured tags for managing the rapidly changing and dynamic nature of news information. One of the main outcomes of this work is an automatically constructed vocabulary that defines the meaning of each named entity tag, which can be extracted from a news article (including person, location and organisation), based on experts suggestions from major search engines and the knowledge from public database such as Wikipedia. To demonstrate the potential applications of the vocabulary, we have used it to provide more functionalities in an online news website, including topic-based news reading, intuitive tagging, clipping and sharing of interesting news, as well as news filtering or searching based on named entity tags. The evaluation results on the impact of disambiguating tags have shown that the vocabulary can help to significantly improve news searching performance. The preliminary results from our user study have demonstrated that users can benefit from the additional functionalities on the news websites as they are able to retrieve more relevant news, clip and share news with friends and families effectively.
Resumo:
Many data mining techniques have been proposed for mining useful patterns in databases. However, how to effectively utilize discovered patterns is still an open research issue, especially in the domain of text mining. Most existing methods adopt term-based approaches. However, they all suffer from the problems of polysemy and synonymy. This paper presents an innovative technique, pattern taxonomy mining, to improve the effectiveness of using discovered patterns for finding useful information. Substantial experiments on RCV1 demonstrate that the proposed solution achieves encouraging performance.
Resumo:
Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.
Resumo:
Language Modeling (LM) has been successfully applied to Information Retrieval (IR). However, most of the existing LM approaches only rely on term occurrences in documents, queries and document collections. In traditional unigram based models, terms (or words) are usually considered to be independent. In some recent studies, dependence models have been proposed to incorporate term relationships into LM, so that links can be created between words in the same sentence, and term relationships (e.g. synonymy) can be used to expand the document model. In this study, we further extend this family of dependence models in the following two ways: (1) Term relationships are used to expand query model instead of document model, so that query expansion process can be naturally implemented; (2) We exploit more sophisticated inferential relationships extracted with Information Flow (IF). Information flow relationships are not simply pairwise term relationships as those used in previous studies, but are between a set of terms and another term. They allow for context-dependent query expansion. Our experiments conducted on TREC collections show that we can obtain large and significant improvements with our approach. This study shows that LM is an appropriate framework to implement effective query expansion.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.
Resumo:
The cross-sections of the Social Web and the Semantic Web has put folksonomy in the spot light for its potential in overcoming knowledge acquisition bottleneck and providing insight for "wisdom of the crowds". Folksonomy which comes as the results of collaborative tagging activities has provided insight into user's understanding about Web resources which might be useful for searching and organizing purposes. However, collaborative tagging vocabulary poses some challenges since tags are freely chosen by users and may exhibit synonymy and polysemy problem. In order to overcome these challenges and boost the potential of folksonomy as emergence semantics we propose to consolidate the diverse vocabulary into a consolidated entities and concepts. We propose to extract a tag ontology by ontology learning process to represent the semantics of a tagging community. This paper presents a novel approach to learn the ontology based on the widely used lexical database WordNet. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. We provide empirical evaluations by using the semantic information contained in the ontology in a tag recommendation experiment. The results show that by using the semantic relationships on the ontology the accuracy of the tag recommender has been improved.
Resumo:
Retrieving information from Twitter is always challenging due to its large volume, inconsistent writing and noise. Most existing information retrieval (IR) and text mining methods focus on term-based approach, but suffers from the problems of terms variation such as polysemy and synonymy. This problem deteriorates when such methods are applied on Twitter due to the length limit. Over the years, people have held the hypothesis that pattern-based methods should perform better than term-based methods as it provides more context, but limited studies have been conducted to support such hypothesis especially in Twitter. This paper presents an innovative framework to address the issue of performing IR in microblog. The proposed framework discover patterns in tweets as higher level feature to assign weight for low-level features (i.e. terms) based on their distributions in higher level features. We present the experiment results based on TREC11 microblog dataset and shows that our proposed approach significantly outperforms term-based methods Okapi BM25, TF-IDF and pattern based methods, using precision, recall and F measures.
Resumo:
The species from Australia in the genera Carientothrips and Nesothrips are reviewed and an illustrated key is provided. Carientothrips is distinguished based on the unusual form of the maxillary palps. Two species, badius Hood comb.n. and capricornis Mound comb.n., are transferred to Nesothrips from Carientothrips; and Nesothrips melinus Mound syn.n. is synonymised with Carientothrips miskoi Mound. In Carientothrips the following six new species are described: alienatus sp.n., calami sp.n., horni sp.n., palumai sp.n., snowi sp.n., tasmanica sp.n.; while flavitibia Moulton stat.rev. is recalled from synonymy with C. mjobergi (Karny). In Nesothrips four new species are described: barrowi sp.n., brigalowi sp.n., coorongi sp.n., rossi sp.n.; while rhizophorae (Girault) syn.n. is placed as a synonym of minor Bagnall.
A tag-based personalized item recommendation system using tensor modeling and topic model approaches
Resumo:
This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.
Resumo:
Temnoplectron Westwood is revised and five new species described, four from North Queensland: cooki, finnigani, lewisense, monteithi, one from New Guinea: wareo. Temnoplectron reyi Paulian is removed from synonymy with T. politulum Macleay, Temnoplectron laevigatum Matthews is placed in synonymy with T. boucomonti Paulian, T. heurni Paulian and Z howdeni Paulian are synonymised with Z atropolitum Gillet, and T. major Paulian is recognised in Australia for the first time. All known species are redescribed. A key is provided for the 19 species of Temnoplectron and new distribution records are noted. A cladistic analysis of the genus is presented, the results of which suggest at least two origins for flightlessness in the genus. The biogeography of Temnoplectran is discussed with reference to isolation of rainforest blocks during periods of maximum aridity.
Resumo:
Taxonomic relationships of the liverwort genus Herbertus in Asia were examined. In addition, the phylogeny of the family Herbertaceae and its close relatives was investigated and analyses conducted of higher level relationships within the entire liverwort phylum. Species of Herbertus show great plasticity in various morphological characters, resulted in a large number of described species. This study was the first comprehensive revision of Asian Herbertus, with 12 species recognized for the continent. Eleven names were reduced to synonymy under earlier described species, and one species was excluded from the genus. Herbertus buchii Juslén was described as a new species. Phylogenetic analyses based on both molecular and morphological characters resolved the families Vetaformaceae, Lepicoleaceae, and Herbertaceae (including Mastigophoraceae) as a monophyletic entity. This clade is among the most derived groups within the leafy liverworts and comprises mostly isophyllous plants, all of which have bracteolar antheridia. The relationships of Mastigophoraceae have formerly been controversial. My results confirm the view that this family is closely related to Herbertaceae, Lepicoleaceae, and Vetaformaceae. In the proposed new classification Mastigophoraceae is included in Herbertaceae. Phylogenetic relationships within the liverworts were reconstructed using both chloroplast and nuclear sequences as well as morphological characters. These analyses were the most comprehensive to date at the time of publication. Previously it was believed that liverworts had a common ancestor with an erect, radial gametophyte and a tetrahedral apical cell. The leafy liverworts were arranged based on the assumption that similar structures had repeatedly developed in many different suborders, with evolution proceeding from erect and isophyllous to creeping and anisophyllous plants. The complex thalloid liverworts were assumed to be the most derived group. By contrast, our studies resolved a clade comprising Treubia and Haplomitrium as the earliest extant liverwort lineage. According to our results the complex thalloids are also an early diverging lineage, and the simple thalloids, traditionally classified together, are a paraphyletic group. Within leafy liverworts, the hypothesis of repeated evolution from isophyllous to anisophyllous plants based on the assumption of a basal unresolved polytomy was rejected. Fundamentally, the leafy liverworts can be divided into three groups. In conflict with the earlier hypotheses, the isophyllous liverworts, including Herbertaceae, were resolved as derived lineages within the liverworts.
Resumo:
This study analyses Augustine s concept of concupiscentia, or evil desire (together with two cognate terms, libido and cupiditas) in the context of his entire oeuvre. By the aid of systematic analysis, the concept and its development is explored in four distinct ways. It is claimed that Augustine used the concept of concupiscentia for several theological purposes, and the task of the study is to represent these distinct functions, and their connections to Augustine s general theological and philosophical convictions. The study opens with a survey on terminology. A general overview of the occurrences of the negatively connoted words for desire in Latin literature precedes a corresponding examination of Augustine s own works. In this introductory chapter it is shown that, despite certain preferences in the uses of the words, a sufficient degree of synonymy reigns so as to allow an analysis of the concept without tightly discriminating between the terms. The theological functions of concupiscentia with its distinct contexts are analysed in separate chapters. The function of concupiscentia as a divine punishment is explored first (Ch 3). It is seen how Augustine links together concupiscentia and ideas about divine justice, and finally suggests that in the inordinate, psychologically experienced sexual desire, the original theological disobedience of Adam and Eve can be perceived. Augustine was criticized for this solution already in his own times, and the analysis of the function of concupiscentia as a divine punishment ends in a discussion on the critical response of punitive concupiscentia by Julian of Aeclanum. Augustine also attached to concupiscentia another central theological function by viewing evil desire as an inward originating cause for all external evil actions. In the study, this function is analysed by surveying two formally distinct images of evil desire, i.e. as the root (radix) of all evil, and as a threefold (triplex) matrix of evil actions (Ch 4). Both of these images were based on a single verse of the Bible (1 Jn 2, 16 and 1 Tim 6, 10). This function of concupiscentia was formed both parallel to, and in answer to, Manichaean insights into concupiscentia. Being familiar with the traditional philosophical discussions on the nature and therapy of emotions, Augustine situated concupiscentia also into this context. It is acknowledged that these philosophical traditions had an obvious impact into his way of explaining psychological processes in connection with concupiscentia. Not only did Augustine implicitly receive and exploit these traditions, but he also explicitly moulded and criticized them in connection with concupiscentia. Eventually, Augustine conceives the philosophical traditions of emotions as partly useful but also partly inadequate to deal with concupiscentia (Ch 5). The role of concupiscentia in connection to divine grace and Christian renewal is analysed in the final chapter of the study. Augustine s gradual development in internalizing the effects of concupiscentia also into the life of a baptized Christian are elucidated, as are the strong limitations and mitigations Augustine makes to the concept when attaching it into the life under grace (sub gratia). A crucial part in the development of this function is played by Augustine s changing interpretation of Rom 7, and the way concupiscentia appears in Augustine s readings of this text is therefore also analysed. As a result of the analysis of these four distinct functions and contexts of concupiscentia, it is concluded that Augustine s concept of concupiscentia is fairly tightly and coherently connected to his views of central theological importance. Especially the functions of concupiscentia as a punishment and the function of concupiscentia in Christian renewal were both tightly interwoven into Augustine s view of God s being and God s grace. The study shows the importance of reading Augustine s discussions on evil desire with a constant awareness of their role in their larger context, that is, of their function in each situation. The study warns against too simplistic and unifying readings of Augustine s concupiscentia, emphasizing the need to acknowledge both the necessitating, sinful aspects of concupiscentia, and the domesticated features of concupiscentia during Christian renewal.