808 resultados para Agglomerative Hierarchical Clustering
Resumo:
In this paper we present a technique for visualising hierarchical and symmetric, multimodal fitness functions that have been investigated in the evolutionary computation literature. The focus of this technique is on landscapes in moderate-dimensional, binary spaces (i.e., fitness functions defined over {0, 1}(n), for n less than or equal to 16). The visualisation approach involves an unfolding of the hyperspace into a two-dimensional graph, whose layout represents the topology of the space using a recursive relationship, and whose shading defines the shape of the cost surface defined on the space. Using this technique we present case-study explorations of three fitness functions: royal road, hierarchical-if-and-only-if (H-IFF), and hierarchically decomposable functions (HDF). The visualisation approach provides an insight into the properties of these functions, particularly with respect to the size and shape of the basins of attraction around each of the local optima.
Resumo:
OBJECTIVE: To estimate the incidence rate of type 1 diabetes in the urban area of Santiago, Chile, from March 21, 1997 to March 20, 1998, and to assess the spatio-temporal clustering of cases during that period. METHODS: All sixty-one incident cases were located temporally (day of diagnosis) and spatially (place of residence) in the area of study. Knox's method was used to assess spatio-temporal clustering of incident cases. RESULTS: The overall incidence rate of type 1 diabetes was 4.11 cases per 100,000 children aged less than 15 years per year (95% confidence interval: 3.06--5.14). The incidence rate seems to have increased since the last estimate of the incidence calculated for the years 1986--1992 in the metropolitan region of Santiago. Different combinations of space-time intervals have been evaluated to assess spatio-temporal clustering. The smallest p-value was found for the combination of critical distances of 750 meters and 60 days (uncorrected p-value = 0.048). CONCLUSIONS: Although these are preliminary results regarding space-time clustering in Santiago, exploratory analysis of the data method would suggest a possible aggregation of incident cases in space-time coordinates.
Resumo:
A definition of medium voltage (MV) load diagrams was made, based on the data base knowledge discovery process. Clustering techniques were used as support for the agents of the electric power retail markets to obtain specific knowledge of their customers’ consumption habits. Each customer class resulting from the clustering operation is represented by its load diagram. The Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) were applied to an electricity consumption data from a utility client’s database in order to form the customer’s classes and to find a set of representative consumption patterns. The WEACS approach is a clustering ensemble combination approach that uses subsampling and that weights differently the partitions in the co-association matrix. As a complementary step to the WEACS approach, all the final data partitions produced by the different variations of the method are combined and the Ward Link algorithm is used to obtain the final data partition. Experiment results showed that WEACS approach led to better accuracy than many other clustering approaches. In this paper the WEACS approach separates better the customer’s population than Two-step clustering algorithm.
Resumo:
With the electricity market liberalization, the distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity consumers. A fair insight on the consumers’ behavior will permit the definition of specific contract aspects based on the different consumption patterns. In order to form the different consumers’ classes, and find a set of representative consumption patterns we use electricity consumption data from a utility client’s database and two approaches: Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. While EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process, the WEACS approach uses subsampling and weights differently the partitions. As a complementary step to the WEACS approach, we combine the partitions obtained in the WEACS approach with the ALL clustering ensemble construction method and we use the Ward Link algorithm to obtain the final data partition. The characterization of the obtained consumers’ clusters was performed using the C5.0 classification algorithm. Experiment results showed that the WEACS approach leads to better results than many other clustering approaches.
Resumo:
The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.
Resumo:
The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.
Resumo:
Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Although power-line communication (PLC) is not a new technology, its use to support data communication with timing requirements is still the focus of ongoing research. A new infrastructure intended for communication using power lines from a central location to dispersed nodes using inexpensive devices was presented recently. This new infrastructure uses a two-level hierarchical power-line system, together with an IP-based network. Due to the master-slave behaviour of the PLC medium access, together with the inherent dynamic topology of power-line networks, a mechanism to provide end-to-end communication through the two levels of the power-line system must be provided. In this paper we introduce the architecture of the PLC protocol layer that is being implemented for this end.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
Composition is a practice of key importance in software engineering. When real-time applications are composed it is necessary that their timing properties (such as meeting the deadlines) are guaranteed. The composition is performed by establishing an interface between the application and the physical platform. Such an interface does typically contain information about the amount of computing capacity needed by the application. In multiprocessor platforms, the interface should also present information about the degree of parallelism. Recently there have been quite a few interface proposals. However, they are either too complex to be handled or too pessimistic.In this paper we propose the Generalized Multiprocessor Periodic Resource model (GMPR) that is strictly superior to the MPR model without requiring a too detailed description. We describe a method to generate the interface from the application specification. All these methods have been implemented in Matlab routines that are publicly available.
Resumo:
Consider a single processor and a software system. The software system comprises components and interfaces where each component has an associated interface and each component comprises a set of constrained-deadline sporadic tasks. A scheduling algorithm (called global scheduler) determines at each instant which component is active. The active component uses another scheduling algorithm (called local scheduler) to determine which task is selected for execution on the processor. The interface of a component makes certain information about a component visible to other components; the interfaces of all components are used for schedulability analysis. We address the problem of generating an interface for a component based on the tasks inside the component. We desire to (i) incur only a small loss in schedulability analysis due to the interface and (ii) ensure that the amount of space (counted in bits) of the interface is small; this is because such an interface hides as much details of the component as possible. We present an algorithm for generating such an interface.
Resumo:
OBJECTIVE : To analyze the evolution in the prevalence and determinants of malnutrition in children in the semiarid region of Brazil. METHODS : Data were collected from two cross-sectional population-based household surveys that used the same methodology. Clustering sampling was used to collect data from 8,000 families in Ceará, Northeastern Brazil, for the years 1987 and 2007. Acute undernutrition was calculated as weight/age < -2 standard deviation (SD); stunting as height/age < -2 SD; wasting as weight/height < -2 SD. Data on biological and sociodemographic determinants were analyzed using hierarchical multivariate analyses based on a theoretical model. RESULTS : A sample of 4,513 and 1,533 children under three years of age, in 1987 and 2007, respectively, were included in the analyses. The prevalence of acute malnutrition was reduced by 60.0%, from 12.6% in 1987 to 4.7% in 2007, while prevalence of stunting was reduced by 50.0%, from 27.0% in 1987 to 13.0% in 2007. Prevalence of wasting changed little in the period. In 1987, socioeconomic and biological characteristics (family income, mother’s education, toilet and tap water availability, children’s medical consultation and hospitalization, age, sex and birth weight) were significantly associated with undernutrition, stunting and wasting. In 2007, the determinants of malnutrition were restricted to biological characteristics (age, sex and birth weight). Only one socioeconomic characteristic, toilet availability, remained associated with stunting. CONCLUSIONS : Socioeconomic development, along with health interventions, may have contributed to improvements in children’s nutritional status. Birth weight, especially extremely low weight (< 1,500 g), appears as the most important risk factor for early childhood malnutrition.