52 resultados para algoritmos de agrupamento

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using classic clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. This work presents the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main goal of this work is to investigate the suitability of applying cluster ensemble techniques (ensembles or committees) to gene expression data. More specifically, we will develop experiments with three diferent cluster ensembles methods, which have been used in many works in literature: coassociation matrix, relabeling and voting, and ensembles based on graph partitioning. The inputs for these methods will be the partitions generated by three clustering algorithms, representing diferent paradigms: kmeans, ExpectationMaximization (EM), and hierarchical method with average linkage. These algorithms have been widely applied to gene expression data. In general, the results obtained with our experiments indicate that the cluster ensemble methods present a better performance when compared to the individual techniques. This happens mainly for the heterogeneous ensembles, that is, ensembles built with base partitions generated with diferent clustering algorithms

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Symbolic Data Analysis (SDA) main aims to provide tools for reducing large databases to extract knowledge and provide techniques to describe the unit of such data in complex units, as such, interval or histogram. The objective of this work is to extend classical clustering methods for symbolic interval data based on interval-based distance. The main advantage of using an interval-based distance for interval-based data lies on the fact that it preserves the underlying imprecision on intervals which is usually lost when real-valued distances are applied. This work includes an approach allow existing indices to be adapted to interval context. The proposed methods with interval-based distances are compared with distances punctual existing literature through experiments with simulated data and real data interval

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This master dissertation presents the study and implementation of inteligent algorithms to monitor the measurement of sensors involved in natural gas custody transfer processes. To create these algoritmhs Artificial Neural Networks are investigated because they have some particular properties, such as: learning, adaptation, prediction. A neural predictor is developed to reproduce the sensor output dynamic behavior, in such a way that its output is compared to the real sensor output. A recurrent neural network is used for this purpose, because of its ability to deal with dynamic information. The real sensor output and the estimated predictor output work as the basis for the creation of possible sensor fault detection and diagnosis strategies. Two competitive neural network architectures are investigated and their capabilities are used to classify different kinds of faults. The prediction algorithm and the fault detection classification strategies, as well as the obtained results, are presented

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper artificial neural network (ANN) based on supervised and unsupervised algorithms were investigated for use in the study of rheological parameters of solid pharmaceutical excipients, in order to develop computational tools for manufacturing solid dosage forms. Among four supervised neural networks investigated, the best learning performance was achieved by a feedfoward multilayer perceptron whose architectures was composed by eight neurons in the input layer, sixteen neurons in the hidden layer and one neuron in the output layer. Learning and predictive performance relative to repose angle was poor while to Carr index and Hausner ratio (CI and HR, respectively) showed very good fitting capacity and learning, therefore HR and CI were considered suitable descriptors for the next stage of development of supervised ANNs. Clustering capacity was evaluated for five unsupervised strategies. Network based on purely unsupervised competitive strategies, classic "Winner-Take-All", "Frequency-Sensitive Competitive Learning" and "Rival-Penalize Competitive Learning" (WTA, FSCL and RPCL, respectively) were able to perform clustering from database, however this classification was very poor, showing severe classification errors by grouping data with conflicting properties into the same cluster or even the same neuron. On the other hand it could not be established what was the criteria adopted by the neural network for those clustering. Self-Organizing Maps (SOM) and Neural Gas (NG) networks showed better clustering capacity. Both have recognized the two major groupings of data corresponding to lactose (LAC) and cellulose (CEL). However, SOM showed some errors in classify data from minority excipients, magnesium stearate (EMG) , talc (TLC) and attapulgite (ATP). NG network in turn performed a very consistent classification of data and solve the misclassification of SOM, being the most appropriate network for classifying data of the study. The use of NG network in pharmaceutical technology was still unpublished. NG therefore has great potential for use in the development of software for use in automated classification systems of pharmaceutical powders and as a new tool for mining and clustering data in drug development

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective in the facility location problem with limited distances is to minimize the sum of distance functions from the facility to the customers, but with a limit on each distance, after which the corresponding function becomes constant. The problem has applications in situations where the service provided by the facility is insensitive after a given threshold distance (eg. fire station location). In this work, we propose a global optimization algorithm for the case in which there are lower and upper limits on the numbers of customers that can be served

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The bidimensional periodic structures called frequency selective surfaces have been well investigated because of their filtering properties. Similar to the filters that work at the traditional radiofrequency band, such structures can behave as band-stop or pass-band filters, depending on the elements of the array (patch or aperture, respectively) and can be used for a variety of applications, such as: radomes, dichroic reflectors, waveguide filters, artificial magnetic conductors, microwave absorbers etc. To provide high-performance filtering properties at microwave bands, electromagnetic engineers have investigated various types of periodic structures: reconfigurable frequency selective screens, multilayered selective filters, as well as periodic arrays printed on anisotropic dielectric substrates and composed by fractal elements. In general, there is no closed form solution directly from a given desired frequency response to a corresponding device; thus, the analysis of its scattering characteristics requires the application of rigorous full-wave techniques. Besides that, due to the computational complexity of using a full-wave simulator to evaluate the frequency selective surface scattering variables, many electromagnetic engineers still use trial-and-error process until to achieve a given design criterion. As this procedure is very laborious and human dependent, optimization techniques are required to design practical periodic structures with desired filter specifications. Some authors have been employed neural networks and natural optimization algorithms, such as the genetic algorithms and the particle swarm optimization for the frequency selective surface design and optimization. This work has as objective the accomplishment of a rigorous study about the electromagnetic behavior of the periodic structures, enabling the design of efficient devices applied to microwave band. For this, artificial neural networks are used together with natural optimization techniques, allowing the accurate and efficient investigation of various types of frequency selective surfaces, in a simple and fast manner, becoming a powerful tool for the design and optimization of such structures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problems of combinatory optimization have involved a large number of researchers in search of approximative solutions for them, since it is generally accepted that they are unsolvable in polynomial time. Initially, these solutions were focused on heuristics. Currently, metaheuristics are used more for this task, especially those based on evolutionary algorithms. The two main contributions of this work are: the creation of what is called an -Operon- heuristic, for the construction of the information chains necessary for the implementation of transgenetic (evolutionary) algorithms, mainly using statistical methodology - the Cluster Analysis and the Principal Component Analysis; and the utilization of statistical analyses that are adequate for the evaluation of the performance of the algorithms that are developed to solve these problems. The aim of the Operon is to construct good quality dynamic information chains to promote an -intelligent- search in the space of solutions. The Traveling Salesman Problem (TSP) is intended for applications based on a transgenetic algorithmic known as ProtoG. A strategy is also proposed for the renovation of part of the chromosome population indicated by adopting a minimum limit in the coefficient of variation of the adequation function of the individuals, with calculations based on the population. Statistical methodology is used for the evaluation of the performance of four algorithms, as follows: the proposed ProtoG, two memetic algorithms and a Simulated Annealing algorithm. Three performance analyses of these algorithms are proposed. The first is accomplished through the Logistic Regression, based on the probability of finding an optimal solution for a TSP instance by the algorithm being tested. The second is accomplished through Survival Analysis, based on a probability of the time observed for its execution until an optimal solution is achieved. The third is accomplished by means of a non-parametric Analysis of Variance, considering the Percent Error of the Solution (PES) obtained by the percentage in which the solution found exceeds the best solution available in the literature. Six experiments have been conducted applied to sixty-one instances of Euclidean TSP with sizes of up to 1,655 cities. The first two experiments deal with the adjustments of four parameters used in the ProtoG algorithm in an attempt to improve its performance. The last four have been undertaken to evaluate the performance of the ProtoG in comparison to the three algorithms adopted. For these sixty-one instances, it has been concluded on the grounds of statistical tests that there is evidence that the ProtoG performs better than these three algorithms in fifty instances. In addition, for the thirty-six instances considered in the last three trials in which the performance of the algorithms was evaluated through PES, it was observed that the PES average obtained with the ProtoG was less than 1% in almost half of these instances, having reached the greatest average for one instance of 1,173 cities, with an PES average equal to 3.52%. Therefore, the ProtoG can be considered a competitive algorithm for solving the TSP, since it is not rare in the literature find PESs averages greater than 10% to be reported for instances of this size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a set of intelligent algorithms with the purpose of correcting calibration errors in sensors and reducting the periodicity of their calibrations. Such algorithms were designed using Artificial Neural Networks due to its great capacity of learning, adaptation and function approximation. Two approaches willbe shown, the firstone uses Multilayer Perceptron Networks to approximate the many shapes of the calibration curve of a sensor which discalibrates in different time points. This approach requires the knowledge of the sensor s functioning time, but this information is not always available. To overcome this need, another approach using Recurrent Neural Networks was proposed. The Recurrent Neural Networks have a great capacity of learning the dynamics of a system to which it was trained, so they can learn the dynamics of a sensor s discalibration. Knowingthe sensor s functioning time or its discalibration dynamics, it is possible to determine how much a sensor is discalibrated and correct its measured value, providing then, a more exact measurement. The algorithms proposed in this work can be implemented in a Foundation Fieldbus industrial network environment, which has a good capacity of device programming through its function blocks, making it possible to have them applied to the measurement process

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The predictive control technique has gotten, on the last years, greater number of adepts in reason of the easiness of adjustment of its parameters, of the exceeding of its concepts for multi-input/multi-output (MIMO) systems, of nonlinear models of processes could be linearised around a operating point, so can clearly be used in the controller, and mainly, as being the only methodology that can take into consideration, during the project of the controller, the limitations of the control signals and output of the process. The time varying weighting generalized predictive control (TGPC), studied in this work, is one more an alternative to the several existing predictive controls, characterizing itself as an modification of the generalized predictive control (GPC), where it is used a reference model, calculated in accordance with parameters of project previously established by the designer, and the application of a new function criterion, that when minimized offers the best parameters to the controller. It is used technique of the genetic algorithms to minimize of the function criterion proposed and searches to demonstrate the robustness of the TGPC through the application of performance, stability and robustness criterions. To compare achieves results of the TGPC controller, the GCP and proportional, integral and derivative (PID) controllers are used, where whole the techniques applied to stable, unstable and of non-minimum phase plants. The simulated examples become fulfilled with the use of MATLAB tool. It is verified that, the alterations implemented in TGPC, allow the evidence of the efficiency of this algorithm