841 resultados para Interestingness Measures
Resumo:
Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach,which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this paper we propose two approaches which measure multi-level association rules to help evaluate their interestingness. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.
Resumo:
Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach, which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this chapter we propose two approaches which measure multi-level association rules to help evaluate their interestingness by considering the database’s underlying taxonomy. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.
Resumo:
In today’s electronic world vast amounts of knowledge is stored within many datasets and databases. Often the default format of this data means that the knowledge within is not immediately accessible, but rather has to be mined and extracted. This requires automated tools and they need to be effective and efficient. Association rule mining is one approach to obtaining knowledge stored with datasets / databases which includes frequent patterns and association rules between the items / attributes of a dataset with varying levels of strength. However, this is also association rule mining’s downside; the number of rules that can be found is usually very big. In order to effectively use the association rules (and the knowledge within) the number of rules needs to be kept manageable, thus it is necessary to have a method to reduce the number of association rules. However, we do not want to lose knowledge through this process. Thus the idea of non-redundant association rule mining was born. A second issue with association rule mining is determining which ones are interesting. The standard approach has been to use support and confidence. But they have their limitations. Approaches which use information about the dataset’s structure to measure association rules are limited, but could yield useful association rules if tapped. Finally, while it is important to be able to get interesting association rules from a dataset in a manageable size, it is equally as important to be able to apply them in a practical way, where the knowledge they contain can be taken advantage of. Association rules show items / attributes that appear together frequently. Recommendation systems also look at patterns and items / attributes that occur together frequently in order to make a recommendation to a person. It should therefore be possible to bring the two together. In this thesis we look at these three issues and propose approaches to help. For discovering non-redundant rules we propose enhanced approaches to rule mining in multi-level datasets that will allow hierarchically redundant association rules to be identified and removed, without information loss. When it comes to discovering interesting association rules based on the dataset’s structure we propose three measures for use in multi-level datasets. Lastly, we propose and demonstrate an approach that allows for association rules to be practically and effectively used in a recommender system, while at the same time improving the recommender system’s performance. This especially becomes evident when looking at the user cold-start problem for a recommender system. In fact our proposal helps to solve this serious problem facing recommender systems.
Resumo:
The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Two efficient methods to find frequent arrangements of temporal intervals are described; the first one is tree-based and uses depth first search to mine the set of frequent arrangements, whereas the second one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints consisting of regular expressions and gap constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.
Resumo:
Pattern discovery in a long temporal event sequence is of great importance in many application domains. Most of the previous work focuses on identifying positive associations among time stamped event types. In this paper, we introduce the problem of defining and discovering negative associations that, as positive rules, may also serve as a source of knowledge discovery. In general, an event-oriented pattern is a pattern that associates with a selected type of event, called a target event. As a counter-part of previous research, we identify patterns that have a negative relationship with the target events. A set of criteria is defined to evaluate the interestingness of patterns associated with such negative relationships. In the process of counting the frequency of a pattern, we propose a new approach, called unique minimal occurrence, which guarantees that the Apriori property holds for all patterns in a long sequence. Based on the interestingness measures, algorithms are proposed to discover potentially interesting patterns for this negative rule problem. Finally, the experiment is made for a real application.
Resumo:
Interestingness in Association Rules has been a major topic of research in the past decade. The reason is that the strength of association rules, i.e. its ability to discover ALL patterns given some thresholds on support and confidence, is also its weakness. Indeed, a typical association rules analysis on real data often results in hundreds or thousands of patterns creating a data mining problem of the second order. In other words, it is not straightforward to determine which of those rules are interesting for the end-user. This paper provides an overview of some existing measures of interestingness and we will comment on their properties. In general, interestingness measures can be divided into objective and subjective measures. Objective measures tend to express interestingness by means of statistical or mathematical criteria, whereas subjective measures of interestingness aim at capturing more practical criteria that should be taken into account, such as unexpectedness or actionability of rules. This paper only focusses on objective measures of interestingness.
Resumo:
Cutaneous malignant melanoma (CMM) is a major health issue in Queensland, Australia, which has the world’s highest incidence. Recent molecular and epidemiologic studies suggest that CMM arises through multiple etiological pathways involving gene-environment interactions. Understanding the potential mechanisms leading to CMM requires larger studies than those previously conducted. This article describes the design and baseline characteristics of Q-MEGA, the Queensland Study of Melanoma: Environmental and Genetic Associations, which followed up 4 population-based samples of CMM patients in Queensland, including children, adolescents, men aged over 50, and a large sample of adult cases and their families, including twins. Q-MEGA aims to investigate the roles of genetic and environmental factors, and their interaction, in the etiology of melanoma. Three thousand, four hundred and seventy-one participants took part in the follow-up study and were administered a computer-assisted telephone interview in 2002-2005. Updated data on environmental and phenotypic risk factors, and 2777 blood samples were collected from interviewed participants as well as a subset of relatives. This study provides a large and well-described population-based sample of CMM cases with follow-up data. Characteristics of the cases and repeatability of sun exposure and phenotype measures between the baseline and the follow-up surveys, from 6 to 17 years later, are also described.
Resumo:
This study aimed to develop and assess the reliability and validity of a pair of self-report questionnaires to measure self-efficacy and expectancy associated with benzodiazepine use, the Benzodiazepine Refusal Self- Efficacy Questionnaire (BRSEQ) and the Benzodiazepine Expectancy Questionnaire (BEQ). Internal structure of the questionnaireswas established by principal component analysis (PCA) in a sample of 155 respondents, and verified by confirmatory factor analyses (CFA) in a second independent sample (n=139) using structural equation modeling. The PCA of the BRSEQ resulted in a 16-item, 4-factor scale, and the BEQ formed an 18-item, 2-factor scale. Both scales were internally reliable. CFA confirmed these internal structures and reduced the questionnaires to a 14-item self-efficacy scale and a 12-item expectancy scale. Lower self-efficacy and higher expectancy were moderately associated with higher scores on the SDS-B. The scales provide reliable measures for assessing benzodiazepine self-efficacy and expectancies. Future research will examine the utility of the scales in prospective prediction of benzodiazepine cessation.
Resumo:
Investigated the psychometric properties of the original and alternate sets of the Trail Making Test (TMT) and the Controlled Oral Word Association Test (COWAT; A. L. Benton and D. Hamsher, 1978) in 50 orthopedic and 15 closed head injured (1 yr after trauma) patients (aged 15–59 yrs). Although the alternate forms of both measures proved to be stable and consistent with each other in both groups, only the parallel sets of TMT reliably discriminated the clinical group from controls. Practice effects in the head injured were significant only for Trail B of TMT. Factor analysis of the control group's results identified Verbal Knowledge as a major contributor to performance on COWAT, whereas TMT was more dependent on Rapid Visual Search and Visuomotor Sequencing.
Resumo:
Economists rely heavily on self-reported measures to examine the relationship between income and health. We directly compare survey responses of a self-reported measure of health that is commonly used in nationally representative surveys with objective measures of the same health condition. We focus on hypertension. We find no evidence of an income/health greadient using self-reported hypertension but a sizeable gradient when using objectively measured hypertension. We also find that the probability of a false negative reporting is significantly income graded. Our results suggest that using commonly available self-reported chronic health measures might underestimate true income-related inequalities in health.
Resumo:
In this paper, we classify, review, and experimentally compare major methods that are exploited in the definition, adoption, and utilization of element similarity measures in the context of XML schema matching. We aim at presenting a unified view which is useful when developing a new element similarity measure, when implementing an XML schema matching component, when using an XML schema matching system, and when comparing XML schema matching systems.