881 resultados para mining algorithm


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods: Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results: This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion: The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a method to enhance fault localization for software systems based on a frequent pattern mining algorithm. Our method is based on a large set of test cases for a given set of programs in which faults can be detected. The test executions are recorded as function call trees. Based on test oracles the tests can be classified into successful and failing tests. A frequent pattern mining algorithm is used to identify frequent subtrees in successful and failing test executions. This information is used to rank functions according to their likelihood of containing a fault. The ranking suggests an order in which to examine the functions during fault analysis. We validate our approach experimentally using a subset of Siemens benchmark programs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Abstract Due to recent scientific and technological advances in information sys¬tems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particu¬lar execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social net¬works application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration iii) applying the model for the case of a social recommender. Four main contributions are presented: - The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. - The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. - SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. - D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A novel association rule mining algorithm is composed, using the unit cube chain decomposition structures introduced in [HAN, 1966; TON, 1976]. [HAN, 1966] established the chain split theory. [TON, 1976] invented an excellent chain computation framework which brings chain split into the practical domain. We integrate these technologies around the rule mining procedures. Effectiveness is related to the intention of low complexity of rules mined. Complexity of the procedure composed is complementary to the known Apriori algorithm which is defacto standard in rule mining area.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Given the limitations of different types of remote sensing images, automated land-cover classifications of the Amazon várzea may yield poor accuracy indexes. One way to improve accuracy is through the combination of images from different sensors, by either image fusion or multi-sensor classifications. Therefore, the objective of this study was to determine which classification method is more efficient in improving land cover classification accuracies for the Amazon várzea and similar wetland environments - (a) synthetically fused optical and SAR images or (b) multi-sensor classification of paired SAR and optical images. Land cover classifications based on images from a single sensor (Landsat TM or Radarsat-2) are compared with multi-sensor and image fusion classifications. Object-based image analyses (OBIA) and the J.48 data-mining algorithm were used for automated classification, and classification accuracies were assessed using the kappa index of agreement and the recently proposed allocation and quantity disagreement measures. Overall, optical-based classifications had better accuracy than SAR-based classifications. Once both datasets were combined using the multi-sensor approach, there was a 2% decrease in allocation disagreement, as the method was able to overcome part of the limitations present in both images. Accuracy decreased when image fusion methods were used, however. We therefore concluded that the multi-sensor classification method is more appropriate for classifying land cover in the Amazon várzea.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In order to survive in the increasingly customer-oriented marketplace, continuous quality improvement marks the fastest growing quality organization’s success. In recent years, attention has been focused on intelligent systems which have shown great promise in supporting quality control. However, only a small number of the currently used systems are reported to be operating effectively because they are designed to maintain a quality level within the specified process, rather than to focus on cooperation within the production workflow. This paper proposes an intelligent system with a newly designed algorithm and the universal process data exchange standard to overcome the challenges of demanding customers who seek high-quality and low-cost products. The intelligent quality management system is equipped with the ‘‘distributed process mining” feature to provide all levels of employees with the ability to understand the relationships between processes, especially when any aspect of the process is going to degrade or fail. An example of generalized fuzzy association rules are applied in manufacturing sector to demonstrate how the proposed iterative process mining algorithm finds the relationships between distributed process parameters and the presence of quality problems.