151 resultados para Incremental mining

em Deakin Research Online - Australia


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a real application of Web-content mining using an incremental FP-Growth approach. We firstly restructure the semi-structured data retrieved from the web pages of Chinese car market to fit into the local database, and then employ an incremental algorithm to discover the association rules for the identification of car preference. To find more general regularities, a method of attribute-oriented induction is also utilized to find customer’s consumption preferences. Experimental results show some interesting consumption preference patterns that may be beneficial for the government in making policy to encourage and guide car consumption.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we propose Incremental Sequential PAttern Discovery using Equivalence classes (IncSPADE) algorithm to mine the dynamic database without the requirement of re-scanning the database again. In order to evaluate this algorithm, we conducted the experiments against three different artificial datasets. The result shows that IncSPADE outperformed the benchmarked algorithm called SPADE up to 20%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel data mining framework for the exploration and extraction of actionable knowledge from data generated by electricity meters. Although a rich source of information for energy consumption analysis, electricity meters produce a voluminous, fast-paced, transient stream of data that conventional approaches are unable to address entirely. In order to overcome these issues, it is important for a data mining framework to incorporate functionality for interim summarization and incremental analysis using intelligent techniques. The proposed Incremental Summarization and Pattern Characterization (ISPC) framework demonstrates this capability. Stream data is structured in a data warehouse based on key dimensions enabling rapid interim summarization. Independently, the IPCL algorithm incrementally characterizes patterns in stream data and correlates these across time. Eventually, characterized patterns are consolidated with interim summarization to facilitate an overall analysis and prediction of energy consumption trends. Results of experiments conducted using the actual data from electricity meters confirm applicability of the ISPC framework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

While knowledge discovery in databases (KDD) is defined as an iterative sequence of the following steps: data pre-processing, data mining, and post data mining, a significant amount of research in data mining has been done, resulting in a variety of algorithms and techniques for each step. However, a single data-mining technique has not been proven appropriate for every domain and data set. Instead, several techniques may need to be integrated into hybrid systems and used cooperatively during a particular data-mining operation. That is, hybrid solutions are crucial for the success of data mining. This paper presents a hybrid framework for identifying patterns from databases or multi-databases. The framework integrates these techniques for mining tasks from an agent point of view. Based on the experiments conducted, putting different KDD techniques together into the agent-based architecture enables them to be used cooperatively when needed. The proposed framework provides a highly flexible and robust data-mining platform and the resulting systems demonstrate emergent behaviors although it does not improve the performance of individual KDD techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article, the authors raise an important proposal for reform to Australia's mining legislation: a nationally-consistent model providing exploration licence holders with a legislative right to be granted a mining lease. This proposed national model will be designed to reflect the present Western Australian system - Western Australia being the only jurisdiction to provide exploration licence holders with the express right to be granted a mining lease on application. The authors believe that the Western Australian system should provide the basis for a national legislative model, given that it is designed to balance appropriately the interests of companies wanting a right to mine to recoup the costs involved in exploring for minerals, and the interests of the public in ensuring that exploration and mining is conducted
reasonably.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current studies to analyzing security protocols using formal methods require users to predefine authentication goals. Besides, they are unable to discover potential correlations between secure messages. This research attempts to analyze security protocols using data mining. This is done by extending the idea of association rule mining and converting the verification of protocols into computing the frequency and confidence of inconsistent secure messages. It provides a novel and efficient way to analyze security protocols and find out potential correlations between secure messages. The conducted experiments demonstrate our approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many organizations struggle with the massive amount of data they collect. Today, data does more than serve as the ingredients for churning out statistical reports. They help support efficient operations in many organizations, and to some extent, data provide the competitive intelligence organizations need to survive in today's economy. Data mining can't always deliver timely and relevant results because data are constantly changing. However, stream-data processing might be more effective, judging by the Matrix project.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background
AMP-activated protein kinase (AMPK) has emerged as a significant signaling intermediary that regulates metabolisms in response to energy demand and supply. An investigation into the degree of activation and deactivation of AMPK subunits under exercise can provide valuable data for understanding AMPK. In particular, the effect of AMPK on muscle cellular energy status makes this protein a promising pharmacological target for disease treatment. As more AMPK regulation data are accumulated, data mining techniques can play an important role in identifying frequent patterns in the data. Association rule mining, which is commonly used in market basket analysis, can be applied to AMPK regulation.

Results
This paper proposes a framework that can identify the potential correlation, either between the state of isoforms of α, β and γ subunits of AMPK, or between stimulus factors and the state of isoforms. Our approach is to apply item constraints in the closed interpretation to the itemset generation so that a threshold is specified in terms of the amount of results, rather than a fixed threshold value for all itemsets of all sizes. The derived rules from experiments are roughly analyzed. It is found that most of the extracted association rules have biological meaning and some of them were previously unknown. They indicate direction for further research.

Conclusion
Our findings indicate that AMPK has a great impact on most metabolic actions that are related to energy demand and supply. Those actions are adjusted via its subunit isoforms under specific physical training. Thus, there are strong co-relationships between AMPK subunit isoforms and exercises. Furthermore, the subunit isoforms are correlated with each other in some cases. The methods developed here could be used when predicting these essential relationships and enable an understanding of the functions and metabolic pathways regarding AMPK.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protein kinases, a family of enzymes, have been viewed as an important signaling intermediary by living organisms for regulating critical biological processes such as memory, hormone response and cell growth. The
unbalanced kinases are known to cause cancer and other diseases. With the increasing efforts to collect, store and disseminate information about the entire kinase family, it not only leads to valuable data set to understand cell regulation but also poses a big challenge to extract valuable knowledge about metabolic pathway from the data. Data mining techniques that have been widely used to find frequent patterns in large datasets can be extended and adapted to kinase data as well. This paper proposes a framework for mining frequent itemsets from the collected kinase dataset. An experiment using AMPK regulation data demonstrates that our approaches are useful and efficient in analyzing kinase regulation data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current data mining techniques may not be helpful for mining some companies/organizations such as nuclear power plants and earthquake bureaus, which have only small databases. Apparently, these companies/organizations also expect to apply data mining techniques to extract useful patterns in their databases so as to make their decisions. However, data in these databases such as the accident database of a nuclear power plant and the earthquake database in an earthquake bureau, may not be large enough to form any patterns. To meet the applications, we present a new mining model in this paper, which is based on the collecting knowledge from such as Web, journals, and newspapers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data collecting is necessary to some organizations such as nuclear power plants and earthquake bureaus, which have very small databases. Traditional data collecting is to obtain necessary data from internal and external data-sources and join all data together to create a homogeneous huge database. Because collected data may be untrusty, it can disguise really useful patterns in data. In this paper, breaking away traditional data collecting mode that deals with internal and external data equally, we argue that the first step for utilizing external data is to identify quality data in data-sources for given mining tasks. Pre- and post-analysis techniques are thus advocated for generating quality data.