4 resultados para Data Mining, Rough Sets, Multi-Dimension, Association Rules, Constraint

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The opening of the Brazilian market of electricity and competitiveness between companies in the energy sector make the search for useful information and tools that will assist in decision making activities, increase by the concessionaires. An important source of knowledge for these utilities is the time series of energy demand. The identification of behavior patterns and description of events become important for the planning execution, seeking improvements in service quality and financial benefits. This dissertation presents a methodology based on mining and representation tools of time series, in order to extract knowledge that relate series of electricity demand in various substations connected of a electric utility. The method exploits the relationship of duration, coincidence and partial order of events in multi-dimensionals time series. To represent the knowledge is used the language proposed by Mörchen (2005) called Time Series Knowledge Representation (TSKR). We conducted a case study using time series of energy demand of 8 substations interconnected by a ring system, which feeds the metropolitan area of Goiânia-GO, provided by CELG (Companhia Energética de Goiás), responsible for the service of power distribution in the state of Goiás (Brazil). Using the proposed methodology were extracted three levels of knowledge that describe the behavior of the system studied, representing clearly the system dynamics, becoming a tool to assist planning activities

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Educational Data Mining is an application domain in artificial intelligence area that has been extensively explored nowadays. Technological advances and in particular, the increasing use of virtual learning environments have allowed the generation of considerable amounts of data to be investigated. Among the activities to be treated in this context exists the prediction of school performance of the students, which can be accomplished through the use of machine learning techniques. Such techniques may be used for student’s classification in predefined labels. One of the strategies to apply these techniques consists in their combination to design multi-classifier systems, which efficiency can be proven by results achieved in other studies conducted in several areas, such as medicine, commerce and biometrics. The data used in the experiments were obtained from the interactions between students in one of the most used virtual learning environments called Moodle. In this context, this paper presents the results of several experiments that include the use of specific multi-classifier systems systems, called ensembles, aiming to reach better results in school performance prediction that is, searching for highest accuracy percentage in the student’s classification. Therefore, this paper presents a significant exploration of educational data and it shows analyzes of relevant results about these experiments.