801 resultados para Labeling hierarchical clustering
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Although power-line communication (PLC) is not a new technology, its use to support data communication with timing requirements is still the focus of ongoing research. A new infrastructure intended for communication using power lines from a central location to dispersed nodes using inexpensive devices was presented recently. This new infrastructure uses a two-level hierarchical power-line system, together with an IP-based network. Due to the master-slave behaviour of the PLC medium access, together with the inherent dynamic topology of power-line networks, a mechanism to provide end-to-end communication through the two levels of the power-line system must be provided. In this paper we introduce the architecture of the PLC protocol layer that is being implemented for this end.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
Composition is a practice of key importance in software engineering. When real-time applications are composed it is necessary that their timing properties (such as meeting the deadlines) are guaranteed. The composition is performed by establishing an interface between the application and the physical platform. Such an interface does typically contain information about the amount of computing capacity needed by the application. In multiprocessor platforms, the interface should also present information about the degree of parallelism. Recently there have been quite a few interface proposals. However, they are either too complex to be handled or too pessimistic.In this paper we propose the Generalized Multiprocessor Periodic Resource model (GMPR) that is strictly superior to the MPR model without requiring a too detailed description. We describe a method to generate the interface from the application specification. All these methods have been implemented in Matlab routines that are publicly available.
Resumo:
Consider a single processor and a software system. The software system comprises components and interfaces where each component has an associated interface and each component comprises a set of constrained-deadline sporadic tasks. A scheduling algorithm (called global scheduler) determines at each instant which component is active. The active component uses another scheduling algorithm (called local scheduler) to determine which task is selected for execution on the processor. The interface of a component makes certain information about a component visible to other components; the interfaces of all components are used for schedulability analysis. We address the problem of generating an interface for a component based on the tasks inside the component. We desire to (i) incur only a small loss in schedulability analysis due to the interface and (ii) ensure that the amount of space (counted in bits) of the interface is small; this is because such an interface hides as much details of the component as possible. We present an algorithm for generating such an interface.
Resumo:
OBJECTIVE : To analyze the evolution in the prevalence and determinants of malnutrition in children in the semiarid region of Brazil. METHODS : Data were collected from two cross-sectional population-based household surveys that used the same methodology. Clustering sampling was used to collect data from 8,000 families in Ceará, Northeastern Brazil, for the years 1987 and 2007. Acute undernutrition was calculated as weight/age < -2 standard deviation (SD); stunting as height/age < -2 SD; wasting as weight/height < -2 SD. Data on biological and sociodemographic determinants were analyzed using hierarchical multivariate analyses based on a theoretical model. RESULTS : A sample of 4,513 and 1,533 children under three years of age, in 1987 and 2007, respectively, were included in the analyses. The prevalence of acute malnutrition was reduced by 60.0%, from 12.6% in 1987 to 4.7% in 2007, while prevalence of stunting was reduced by 50.0%, from 27.0% in 1987 to 13.0% in 2007. Prevalence of wasting changed little in the period. In 1987, socioeconomic and biological characteristics (family income, mother’s education, toilet and tap water availability, children’s medical consultation and hospitalization, age, sex and birth weight) were significantly associated with undernutrition, stunting and wasting. In 2007, the determinants of malnutrition were restricted to biological characteristics (age, sex and birth weight). Only one socioeconomic characteristic, toilet availability, remained associated with stunting. CONCLUSIONS : Socioeconomic development, along with health interventions, may have contributed to improvements in children’s nutritional status. Birth weight, especially extremely low weight (< 1,500 g), appears as the most important risk factor for early childhood malnutrition.
Resumo:
Wireless Sensor Networks (WSNs) are highly distributed systems in which resource allocation (bandwidth, memory) must be performed efficiently to provide a minimum acceptable Quality of Service (QoS) to the regions where critical events occur. In fact, if resources are statically assigned independently from the location and instant of the events, these resources will definitely be misused. In other words, it is more efficient to dynamically grant more resources to sensor nodes affected by critical events, thus providing better network resource management and reducing endto- end delays of event notification and tracking. In this paper, we discuss the use of a WSN management architecture based on the active network management paradigm to provide the real-time tracking and reporting of dynamic events while ensuring efficient resource utilization. The active network management paradigm allows packets to transport not only data, but also program scripts that will be executed in the nodes to dynamically modify the operation of the network. This presumes the use of a runtime execution environment (middleware) in each node to interpret the script. We consider hierarchical (e.g. cluster-tree, two-tiered architecture) WSN topologies since they have been used to improve the timing performance of WSNs as they support deterministic medium access control protocols.
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
In this paper we present a possible design for a passive RFID tag antenna on paper substrate to be integrated into bottle labels. Considering the application scenario, we verified and determined the permittivity and dissipation factor of the materials in order to simulate all the possible sources that would influence the antenna performance. The measured results reported a maximum reading range of 1.45 m even though the efficiency obtained with the antenna integrated into the bottle was only of 3%. © 2014 IEEE.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
Hierarchical SAPO-11 was synthesized using a commercial Merck carbon as template. Oxidant acid treatments were performed on the carbon matrix in order to investigate its influence on the properties of SAPO-11. Structural, textural and acidic properties of the different materials were evaluated by XRD, SEM, N-2 adsorption, pyridine adsorption followed by IR spectroscopy and thermal analyses. The catalytic behavior of the materials (with 0.5 wt.% Pt, introduced by mechanic mixture with Pt/Al2O3), were studied in the hydroisomerization of n-decane. The hierarchical samples showed higher yields in monobranched isomers than typical microporous SAPO-11, as a direct consequence of the modification on both porosity and acidity, the later one being the most predominant. (C) 2014 Elsevier B.V. All rights reserved.