856 resultados para Data Driven Clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833-841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. (c) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neutron multiplicities for several targets and spallation products of proton-induced reactions in thin targets of interest to an accelerator-driven system obtained with the CRISP code have been reported. This code is a Monte Carlo calculation that simulates the intranuclear cascade and evaporationl fission competition processes. Results are compared with experimental data, and agreement between each other can be considered quite satisfactory in a very broad energy range of incitant particles and different targets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Specific choices about how to represent complex networks can have a substantial impact on the execution time required for the respective construction and analysis of those structures. In this work we report a comparison of the effects of representing complex networks statically by adjacency matrices or dynamically by adjacency lists. Three theoretical models of complex networks are considered: two types of Erdos-Renyi as well as the Barabasi-Albert model. We investigated the effect of the different representations with respect to the construction and measurement of several topological properties (i.e. degree, clustering coefficient, shortest path length, and betweenness centrality). We found that different forms of representation generally have a substantial effect on the execution time, with the sparse representation frequently resulting in remarkably superior performance. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the groundwater favorability mapping on a fractured terrain in the eastern portion of Sao Paulo State, Brazil. Remote sensing, airborne geophysical data, photogeologic interpretation, geologic and geomorphologic maps and geographic information system (GIS) techniques have been used. The results of cross-tabulation between these maps and well yield data allowed groundwater prospective parameters in a fractured-bedrock aquifer. These prospective parameters are the base for the favorability analysis whose principle is based on the knowledge-driven method. The mutticriteria analysis (weighted linear combination) was carried out to give a groundwater favorabitity map, because the prospective parameters have different weights of importance and different classes of each parameter. The groundwater favorability map was tested by cross-tabulation with new well yield data and spring occurrence. The wells with the highest values of productivity, as well as all the springs occurrence are situated in the excellent and good favorabitity mapped areas. It shows good coherence between the prospective parameters and the well yield and the importance of GIS techniques for definition of target areas for detail study and wells location. (c) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a result of urbanization, stormwater runoff flow rates and volumes are significantly increased due to increasing impervious land cover and the decreased availability of depression storage. Storage tanks are the basic devices to efficiently control the flow rate in drainage systems during wet weather. Presented in the paper conception of vacuum-driven detention tanks allows to increase the storage capacity by usage of space above the free surface water elevation at the inlet channel. Partial vacuum storage makes possible to gain cost savings by reduction of both the horizontal area of the detention tank and necessary depth of foundations. Simulation model of vacuum-driven storage tank has been developed to estimate potential profits of its application in urban drainage system. Although SWMM5 has no direct options for vacuum tanks an existing functions (i.e. control rules) have been used to reflect its operation phases. Rainfall data used in simulations were recorded at raingage in Czestochowa during years 2010÷2012 with time interval of 10minutes. Simulation results gives overview to practical operation and maintenance cost (energy demand) of vacuum driven storage tanks depending of the ratio: vacuum-driven volume to total storage capacity. The following conclusion can be drawn from this investigations: vacuum-driven storage tanks are characterized by uncomplicated construction and control systems, thus can be applied in newly developed as well as in the existing urban drainage systems. the application of vacuum in underground detention facilities makes possible to increase of the storage capacity of existing reservoirs by usage the space above the maximum depth. Possible increase of storage capacity can achieve even a few dozen percent at relatively low investment costs. vacuum driven storage tanks can be included in existing simulation software (i.e. SWMM) using options intended for pumping stations (including control and action rules ).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents the data-rich findings of an experiment with enlisting patron-driven/demand-driven acquisitions (DDA) of ebooks in two ways. The first experiment entailed comparison of DDA eBook usage against newly ordered hardcopy materials’ circulation, both overall and ebook vs. print usage within the same subject areas. Secondly, this study experimented with DDA ebooks as a backup plan for unfunded requests left over at the end of the fiscal year.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Libraries are caught in the middle—between static or shrinking budgets on one hand and ever-expanding user needs on the other. How did we get here, and where do we go from here? This paper will offer two perspectives: Part I will present survey results about changing Library purchasing habits in light of changing formats, access, business models and user demands. Data from a previous survey on this topic will be compared and updated. Pricing trends and possible futures will be discussed. Part II will briefly trace the history of libraries’ roles in scholarly communication and connecting learners with knowledge. From there, we show an example of phasing in a patron-driven / demand-driven and short-term loan e-book program, complete with incorporating these tools in library instruction, research, and portable device loadability for field work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents the data-rich findings of an experiment with enlisting patron-driven/demand-driven acquisitions (DDA) of ebooks in two ways. The first experiment entailed comparison of DDA eBook usage against newly ordered hardcopy materials’ circulation, both overall and ebook vs. print usage within the same subject areas. Secondly, this study experimented with DDA ebooks as a backup plan for unfunded requests left over at the end of the fiscal year.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that are based in measures of distances or dissimilarities (divergence) between the auxiliary clusters. This new method uses only two a priori information, the number of auxiliary clusters Na and a threshold distance dt that will be used to decide about the linkage or not of the auxiliary clusters. The number os classes could be automatically found by the method, that do it based in the chosen threshold distance dt, or it is given as additional information to help in the choice of the correct threshold. Some analysis are made and the results are compared with traditional clustering methods. In this work different dissimilarities metrics are analyzed and a new one is proposed based on the concept of negentropy. Besides grouping points of a set in classes, it is proposed a method to statistical modeling the classes aiming to obtain a expression to the probability of a point to belong to one of the classes. Experiments with several values of Na e dt are made in tests sets and the results are analyzed aiming to study the robustness of the method and to consider heuristics to the choice of the correct threshold. During this work it is explored the aspects of information theory applied to the calculation of the divergences. It will be explored specifically the different measures of information and divergence using the Rényi entropy. The results using the different metrics are compared and commented. The work also has appendix where are exposed real applications using the proposed method

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes a collaborative system for marking dangerous points in the transport routes and generation of alerts to drivers. It consisted of a proximity warning system for a danger point that is fed by the driver via a mobile device equipped with GPS. The system will consolidate data provided by several different drivers and generate a set of points common to be used in the warning system. Although the application is designed to protect drivers, the data generated by it can serve as inputs for the responsible to improve signage and recovery of public roads