971 resultados para Association mining
Open business intelligence: on the importance of data quality awareness in user-friendly data mining
Resumo:
Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.
Resumo:
The aim of this paper is to evaluate the efficacy of the application WebBootCaT to create specialised corpora automatically, investigating the translation of articles of association from Italian into English. The first section reflects on the relevant literature and proposes the utility of corpora for translators. The second section discusses the methodology employed, and the third section analyses the results obtained and comments on how language professionals could possibly exploit the application to its full. The fourth section provides a few concrete usage examples of the thus built corpora, to then conclude that WebBootCaT is a genuinely powerful tool that could be implemented by professional translators in order to save time and improve their translations in the long term.
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.
Resumo:
Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.
Resumo:
Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.
Resumo:
In order to survive in the increasingly customer-oriented marketplace, continuous quality improvement marks the fastest growing quality organization’s success. In recent years, attention has been focused on intelligent systems which have shown great promise in supporting quality control. However, only a small number of the currently used systems are reported to be operating effectively because they are designed to maintain a quality level within the specified process, rather than to focus on cooperation within the production workflow. This paper proposes an intelligent system with a newly designed algorithm and the universal process data exchange standard to overcome the challenges of demanding customers who seek high-quality and low-cost products. The intelligent quality management system is equipped with the ‘‘distributed process mining” feature to provide all levels of employees with the ability to understand the relationships between processes, especially when any aspect of the process is going to degrade or fail. An example of generalized fuzzy association rules are applied in manufacturing sector to demonstrate how the proposed iterative process mining algorithm finds the relationships between distributed process parameters and the presence of quality problems.
Resumo:
Interestingness in Association Rules has been a major topic of research in the past decade. The reason is that the strength of association rules, i.e. its ability to discover ALL patterns given some thresholds on support and confidence, is also its weakness. Indeed, a typical association rules analysis on real data often results in hundreds or thousands of patterns creating a data mining problem of the second order. In other words, it is not straightforward to determine which of those rules are interesting for the end-user. This paper provides an overview of some existing measures of interestingness and we will comment on their properties. In general, interestingness measures can be divided into objective and subjective measures. Objective measures tend to express interestingness by means of statistical or mathematical criteria, whereas subjective measures of interestingness aim at capturing more practical criteria that should be taken into account, such as unexpectedness or actionability of rules. This paper only focusses on objective measures of interestingness.
Resumo:
A novel association rule mining algorithm is composed, using the unit cube chain decomposition structures introduced in [HAN, 1966; TON, 1976]. [HAN, 1966] established the chain split theory. [TON, 1976] invented an excellent chain computation framework which brings chain split into the practical domain. We integrate these technologies around the rule mining procedures. Effectiveness is related to the intention of low complexity of rules mined. Complexity of the procedure composed is complementary to the known Apriori algorithm which is defacto standard in rule mining area.
Resumo:
his article presents some of the results of the Ph.D. thesis Class Association Rule Mining Using MultiDimensional Numbered Information Spaces by Iliya Mitov (Institute of Mathematics and Informatics, BAS), successfully defended at Hasselt University, Faculty of Science on 15 November 2011 in Belgium
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014