152 resultados para Data Mining, Rough Sets, Multi-Dimension, Association Rules, Constraint

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An accurate estimation of pressure drop due to vehicles inside an urban tunnel plays a pivotal role in tunnel ventilation issue. The main aim of the present study is to utilize computational intelligence technique for predicting pressure drop due to cars in traffic congestion in urban tunnels. A supervised feed forward back propagation neural network is utilized to estimate this pressure drop. The performance of the proposed network structure is examined on the dataset achieved from Computational Fluid Dynamic (CFD) simulation. The input data includes 2 variables, tunnel velocity and tunnel length, which are to be imported to the corresponding algorithm in order to predict presure drop. 10-fold Cross validation technique is utilized for three data mining methods, namely: multi-layer perceptron algorithm, support vector machine regression, and linear regression. A comparison is to be made to show the most accurate results. Simulation results illustrate that the Multi-layer perceptron algorithm is able to accurately estimate the pressure drop.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the advance of computing and electronic technology, quantitative data, for example, continuous data (i.e., sequences of floating point numbers), become vital and have wide applications, such as for analysis of sensor data streams and financial data streams. However, existing association rule mining generally discover association rules from discrete variables, such as boolean data (`O' and `l') and categorical data (`sunny', `cloudy', `rainy', etc.) but very few deal with quantitative data. In this paper, a novel optimized fuzzy association rule mining (OFARM) method is proposed to mine association rules from quantitative data. The advantages of the proposed algorithm are in three folds: 1) propose a novel method to add the smoothness and flexibility of membership function for fuzzy sets; 2) optimize the fuzzy sets and their partition points with multiple objective functions after categorizing the quantitative data; and 3) design a two-level iteration to filter frequent-item-sets and fuzzy association-rules. The new method is verified by three different data sets, and the results have demonstrated the effectiveness and potentials of the developed scheme.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents a novel approach to data mining that incorporates both positive and negative association rules into the analysis of outbound travelers. Using datasets collected from three large-scale domestic tourism surveys on Hong Kong residents' outbound pleasure travel, different sets of targeted rules were generated to provide promising information that will allow practitioners and policy makers to better understand the important relationship between condition attributes and target attributes. This article will be of interest to readers who want to understand methods for integrating the latest data mining techniques into tourism research. It will also be of use to marketing managers in destinations to better formulate strategies for receiving outbound travelers from Hong Kong, and possibly elsewhere.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes an optimal strategy for extracting probabilistic rules from databases. Two inductive learning-based statistic measures and their rough set-based definitions: accuracy and coverage are introduced. The simplicity of a rule emphasized in this paper has previously been ignored in the discovery of probabilistic rules. To avoid the high computational complexity of rough-set approach, some rough-set terminologies rather than the approach itself are applied to represent the probabilistic rules. The genetic algorithm is exploited to find the optimal probabilistic rules that have the highest accuracy and coverage, and shortest length. Some heuristic genetic operators are also utilized in order to make the global searching and evolution of rules more efficiently. Experimental results have revealed that it run more efficiently and generate probabilistic classification rules of the same integrity when compared with traditional classification methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is playing an important role in decision making for business activities and governmental administration. Since many organizations or their divisions do not possess the in-house expertise and infrastructure for data mining, it is beneficial to delegate data mining tasks to external service providers. However, the organizations or divisions may lose of private information during the delegating process. In this paper, we present a Bloom filter based solution to enable organizations or their divisions to delegate the tasks of mining association rules while protecting data privacy. Our approach can achieve high precision in data mining by only trading-off storage requirements, instead of by trading-off the level of privacy preserving.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cancer remains a major challenge in modern medicine. Increasing prevalence of cancer, particularly in developing countries, demands better understanding of the effectiveness and adverse consequences of different cancer treatment regimes in real patient population. Current understanding of cancer treatment toxicities is often derived from either “clean” patient cohorts or coarse population statistics. It is difficult to get up-to-date and local assessment of treatment toxicities for specific cancer centres. In this paper, we applied an Apriori-based method for discovering toxicity progression patterns in the form of temporal association rules. Our experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the pairwise association analysis. Our method is applicable for most cancer centres with even rudimentary electronic medical records and has the potential to provide real-time surveillance and quality assurance in cancer care.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rough set is a new mathematical approach to imprecision, vagueness and uncertainty. The concept of reduction of the decision table based on the rough sets is very useful for feature selection. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The methods applied include continuous data discretization based on Fuzzy c-means and, and rough set method for feature selection and reduction. The trees extractions in the aerial images were applied. The experiments show that the methods presented in this paper are practical and effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research into the prevalence of hospitalisation among childhood asthma cases is undertaken, using a data set local to the Barwon region of Victoria. Participants were the parents/guardians on behalf of children aged between 5-11 years. Various data mining techniques are used, including segmentation, association and classification to assist in predicting and exploring the instances of childhood hospitalisation due to asthma. Results from this study indicate that children in inner city and metropolitan areas may overutilise emergency department services. In addition, this study found that the prediction of hospitalisaion for asthma in children was greater for those with a written asthma management plan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many previous approaches to frequent episode discovery only accept simple sequences. Although a recent approach has been able to nd frequent episodes from complex sequences, the discovered sets are neither condensed nor accurate. This paper investigates the discovery of condensed sets of frequent episodes from complex sequences. We adopt a novel anti-monotonic frequency measure based on non-redundant occurrences, and dene a condensed set, nDaCF (the set of non-derivable approximately closed frequent episodes) within a given maximal error bound of support. We then introduce a series of effective pruning strategies, and develop a method, nDaCF-Miner, for discovering nDaCF sets. Experimental results show that, when the error bound is somewhat high, the discovered nDaCF sets are two orders of magnitude smaller than complete sets, and nDaCF-miner is more efficient than previous mining approaches. In addition, the nDaCF sets are more accurate than the sets found by previous approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is devoted to a combinatorial problem for incidence semirings, which can be viewed as sets of polynomials over graphs, where the edges are the unknowns and the coefficients are taken from a semiring. The construction of incidence rings is very well known and has many useful applications. The present article is devoted to a novel application of the more general incidence semirings. Recent research on data mining has motivated the investigation of the sets of centroids that have largest weights in semiring constructions. These sets are valuable for the design of centroid-based classification systems, or classifiers, as well as for the design of multiple classifiers combining several individual classifiers. Our article gives a complete description of all sets of centroids with the largest weight in incidence semirings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering is applied in wireless sensor networks for increasing energy efficiency. Clustering methods in wireless sensor networks are different from those in traditional data mining systems. This paper proposes a novel clustering algorithm based on Minimal Spanning Tree (MST) and Maximum Energy resource on sensors named MSTME. Also, specified constrains of clustering in wireless sensor networks and several evaluation metrics are given. MSTME performs better than already known clustering methods of Low Energy Adaptive Clustering Hierarchy (LEACH) and Base Station Controlled Dynamic Clustering Protocol (BCDCP) in wireless sensor networks when they are evaluated by these evaluation metrics. Simulation results show MSTME increases energy efficiency and network lifetime compared with LEACH and BCDCP in two-hop and multi-hop networks, respectively. © World Scientific Publishing Company.