14 resultados para Data clustering

em Reposit


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-technical losses identification has been paramount in the last decade. Since we have datasets with hundreds of legal and illegal profiles, one may have a method to group data into subprofiles in order to minimize the search for consumers that cause great frauds. In this context, a electric power company may be interested in to go deeper a specific profile of illegal consumer. In this paper, we introduce the Optimum-Path Forest (OPF) clustering technique to this task, and we evaluate the behavior of a dataset provided by a brazilian electric power company with different values of an OPF parameter. © 2011 IEEE.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of strategies for structural health monitoring (SHM) has become increasingly important because of the necessity of preventing undesirable damage. This paper describes an approach to this problem using vibration data. It involves a three-stage process: reduction of the time-series data using principle component analysis (PCA), the development of a data-based model using an auto-regressive moving average (ARMA) model using data from an undamaged structure, and the classification of whether or not the structure is damaged using a fuzzy clustering approach. The approach is applied to data from a benchmark structure from Los Alamos National Laboratory, USA. Two fuzzy clustering algorithms are compared: fuzzy c-means (FCM) and Gustafson-Kessel (GK) algorithms. It is shown that while both fuzzy clustering algorithms are effective, the GK algorithm marginally outperforms the FCM algorithm. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The frequency of adenine mononucleotides (A), dinucleotides (AA) and clusters, and the positions of clusters, were studied in 502 molecules of the 5S rRNA.All frequencies were reduced in the evolutive lines of vertebrates, plants and fungi, in parallel with increasing organismic complexity. No change was observed in invertebrates. All frequencies were increased in mitochondria, plastids and mycoplasmas. The presumed relatives to the ancestors of the organelles, Rhodobacteria alfa and Cyanobacteria, showed intermediate values, relative to the eubacterial averages. Firmibacterid showed very high number of cluster sites.Clusters were more frequent in single-stranded regions in all organisms. The routes of organelles and mycoplasmas accummulated clusters at faster rates in double-stranded regions. Rates of change were higher for AA and clusters than for A in plants, vertebrates and organeltes, higher for cluster sites and A in mycoplasmas, and higher for AA and A in fungi. These data indicated that selection pressures acted more strongly on adenine clustering than on adenine frequency.It is proposed that AA and clusters, as sites of lower informational content. have the property of tolerating positional variation in the sites of other molecules (or other regions of the same molecule) that interact with the adenines. This reasoning was consistent with the degrees of genic polymorphism. low in plants and vertebrates and high in invertebrates. In the eubacteria endosymbiontic or parasitic to eukaryotes, the more tolerant RNA would be better adapted to interactions with the homologous nucleus-derived ribosomal proteins: the intermediate values observed in their precursors were interpreted as preadaptive.Among other groups, only the Deinococcus-Thermus eubacteria showed excessive AA and cluster contents, possibly related to their peculiar tolerance to mutagens, and the Ciliates showed excessive AA contents, indicative of retention of primitive characters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thirteen species of Coffea were studied for five enzymes systems, including alpha and beta esterase, alkaline phosphatase, acid phosphatase, malate dehydrogenase and acid dehydrogenase. Three coefficients of similarity: Simple Matching, Jaccard and Ochiai and three different clustering methods: Single Linkage, Complete Linkage and Unweighted Pair Group, using Arithmetic Averages (UPGMA) were used to analyse the data.The phylogenetic relationships among the twelve diploid species and between them and the tetraploid species C. arabica showed that similarity among species of the same subsection is not always greater than among species of different subsections. In addition, although there are several similarity groups in common, established by isoenzymatic polymorphism, morphological characteristics, chemical data, crossability and geographic distribution, there is no common trend among the phylogenetic relationships as indicated by all these different evaluating procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The significant volume of work accidents in the cities causes an expressive loss to society. The development of Spatial Data Mining technologies presents a new perspective for the extraction of knowledge from the correlation between conventional and spatial attributes. One of the most important techniques of the Spatial Data Mining is the Spatial Clustering, which clusters similar spatial objects to find a distribution of patterns, taking into account the geographical position of the objects. Applying this technique to the health area, will provide information that can contribute towards the planning of more adequate strategies for the prevention of work accidents. The original contribution of this work is to present an application of tools developed for Spatial Clustering which supply a set of graphic resources that have helped to discover knowledge and support for management in the work accidents area. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Structural Health Monitoring (SHM) denotes a system with the ability to detect and interpret adverse changes in a structure. One of the critical challenges for practical implementation of SHM system is the ability to detect damage under changing environmental conditions. This paper aims to characterize the temperature, load and damage effects in the sensor measurements obtained with piezoelectric transducer (PZT) patches. Data sets are collected on thin aluminum specimens under different environmental conditions and artificially induced damage states. The fuzzy clustering algorithm is used to organize the sensor measurements into a set of clusters, which can attribute the variation in sensor data due to temperature, load or any induced damage.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays, organizations face the problem of keeping their information protected, available and trustworthy. In this context, machine learning techniques have also been extensively applied to this task. Since manual labeling is very expensive, several works attempt to handle intrusion detection with traditional clustering algorithms. In this paper, we introduce a new pattern recognition technique called Optimum-Path Forest (OPF) clustering to this task. Experiments on three public datasets have showed that OPF classifier may be a suitable tool to detect intrusions on computer networks, since it outperformed some state-of-the-art unsupervised techniques. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many topics related to association mining have received attention in the research community, especially the ones focused on the discovery of interesting knowledge. A promising approach, related to this topic, is the application of clustering in the pre-processing step to aid the user to find the relevant associative patterns of the domain. In this paper, we propose nine metrics to support the evaluation of this kind of approach. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Some experiments were done in order to present how the metrics can be used and their usefulness. © 2013 Springer-Verlag GmbH.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Issues related to association mining have received attention, especially the ones aiming to discover and facilitate the search for interesting patterns. A promising approach, in this context, is the application of clustering in the pre-processing step. In this paper, eleven metrics are proposed to provide an assessment procedure in order to support the evaluation of this kind of approach. To propose the metrics, a subjective evaluation was done. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Besides, the metrics do the users think about aspects related to the problems and provide a flexible way to solve them. Some experiments were done in order to present how the metrics can be used and their usefulness.