1000 resultados para CLUSTER VALIDITY


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a new approach to clustering. Our idea is to map cluster formation to coalition formation in cooperative games, and to use the Shapley value of the patterns to identify clusters and cluster representatives. We show that the underlying game is convex and this leads to an efficient biobjective clustering algorithm that we call BiGC. The algorithm yields high-quality clustering with respect to average point-to-center distance (potential) as well as average intracluster point-to-point distance (scatter). We demonstrate the superiority of BiGC over state-of-the-art clustering algorithms (including the center based and the multiobjective techniques) through a detailed experimentation using standard cluster validity criteria on several benchmark data sets. We also show that BiGC satisfies key clustering properties such as order independence, scale invariance, and richness.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A novel image segmentation method based on a constraint satisfaction neural network (CSNN) is presented. The new method uses CSNN-based relaxation but with a modified scanning scheme of the image. The pixels are visited with more distant intervals and wider neighborhoods in the first level of the algorithm. The intervals between pixels and their neighborhoods are reduced in the following stages of the algorithm. This method contributes to the formation of more regular segments rapidly and consistently. A cluster validity index to determine the number of segments is also added to complete the proposed method into a fully automatic unsupervised segmentation scheme. The results are compared quantitatively by means of a novel segmentation evaluation criterion. The results are promising.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the characterization of high voltage (HV) electric power consumers based on a data clustering approach. The typical load profiles (TLP) are obtained selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The choice of the best partition is supported using several cluster validity indices. The proposed data-mining (DM) based methodology, that includes all steps presented in the process of knowledge discovery in databases (KDD), presents an automatic data treatment application in order to preprocess the initial database in an automatic way, allowing time saving and better accuracy during this phase. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ consumption behavior. To validate our approach, a case study with a real database of 185 HV consumers was used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose
The Strengths and Difficulties Questionnaire (SDQ) is a behavioural screening tool for children. The SDQ is increasingly used as the primary outcome measure in population health interventions involving children, but it is not preference based; therefore, its role in allocative economic evaluation is limited. The Child Health Utility 9D (CHU9D) is a generic preference-based health-related quality of-life measure. This study investigates the applicability of the SDQ outcome measure for use in economic evaluations and examines its relationship with the CHU9D by testing previously published mapping algorithms. The aim of the paper is to explore the feasibility of using the SDQ within economic evaluations of school-based population health interventions.
Methods
Data were available from children participating in a cluster randomised controlled trial of the school-based roots of empathy programme in Northern Ireland. Utility was calculated using the original and alternative CHU9D tariffs along with two SDQ mapping algorithms. t tests were performed for pairwise differences in utility values from the preference-based tariffs and mapping algorithms.
Results
Mean (standard deviation) SDQ total difficulties and prosocial scores were 12 (3.2) and 8.3 (2.1). Utility values obtained from the original tariff, alternative tariff, and mapping algorithms using five and three SDQ subscales were 0.84 (0.11), 0.80 (0.13), 0.84 (0.05), and 0.83 (0.04), respectively. Each method for calculating utility produced statistically significantly different values except the original tariff and five SDQ subscale algorithm.
Conclusion
Initial evidence suggests the SDQ and CHU9D are related in some of their measurement properties. The mapping algorithm using five SDQ subscales was found to be optimal in predicting mean child health utility. Future research valuing changes in the SDQ scores would contribute to this research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modelling the fundamental performance limits of wireless sensor networks (WSNs) is of paramount importance to understand the behaviour of WSN under worst case conditions and to make the appropriate design choices. In that direction, this paper contributes with a methodology for modelling cluster tree WSNs with a mobile sink. We propose closed form recurrent expressions for computing the worst case end to end delays, buffering and bandwidth requirements across any source-destination path in the cluster tree assuming error free channel. We show how to apply our theoretical results to the specific case of IEEE 802.15.4/ZigBee WSNs. Finally, we demonstrate the validity and analyze the accuracy of our methodology through a comprehensive experimental study, therefore validating the theoretical results through experimentation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. New method: We propose a complete pipeline for the cluster analysis of ERP data. To increase the signalto-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA)to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). Results: After validating the pipeline on simulated data, we tested it on data from two experiments – a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the present study was to determine whether under-reporting rates vary between dietary pattern Clusters. Subjects were sixty-five Brazilian women. During 3 weeks, anthropometric data were collected. total energy expenditure (TEE) was determined by the doubly labelled water method and diet Was Measured. Energy intake (El) and the daily frequency of consumption per 1000 kJ of twenty-two food groups were obtained from a FFQ. These frequencies were entered into a Cluster analysis procedure in order to obtain dietary patterns. Under-reporters were defined Lis those who did not lose more than 1 kg of body weight during the study and presented EI:TEE less than 0.82. Three dietary pattern clusters were identified and named according to their most recurrent food groups: sweet foods (SW). starchy foods (ST) and health), (H). Subjects from the healthy cluster had the lowest mean EI:TEE (SW = 0.86, ST = 0.71 and H = 0.58: P = 0.003) and EI - TEE (SW = -0.49 MJ, ST = - 3.20 MJ and H = -5.09 MJ; P = 0.008). The proportion of Under-reporters was 45.2 (95 % CI 35.5, 55.0) % in the SW Cluster: 58.3 (95 % CI 48.6, 68.0) % in the ST Cluster and 70.0 (95 % CI 61.0, 79) % in the H cluster (P=0.34). Thus, in Brazilian women, Under-reporting of El is not uniformly distributed among, dietary pattern clusters and tends to be more severe among subjects from the healthy cluster. This cluster is more consistent with both dietary guidelines and with what lay individuals usually consider `healthy eating`.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study examined validity evidence for the Australian version of the Neighborhood Environment Walkability Scale (NEWS-AU). A stratified two-stage cluster sampling design was used to recruit 2,650 adults from Adelaide (Australia). The sample was drawn from residential addresses within eight high-walkable and eight low-walkable suburbs matched for socio-economic status (SES). Neighborhood walkability was measured using Geographic Information Systems data on dwelling density, intersection density, net retail area, and land-use mix. Participants completed the NEWS-AU and reported weekly minutes of walking for transport and recreation (International Physical Activity Questionnaire [IPAQ]). Multilevel confirmatory factor analysis (MCFA) was used to define the individual- and Census Collection District (CCD)-level measurement model of the NEWS-AU. Seven individual-level and five CCD-level factors were identified. These measurement models were somewhat similar to those of the original Neighborhood Environment Walkability Scale (NEWS). Patterns of associations between the NEWS-AU factors/scales and the walking measures provided some validity evidence for the instrument.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The phenomenology of unipolar and bipolar disorders differ in a number of ways, such as the presence of mixed states and atypical features. Conventional depression rating instruments are designed to capture the characteristics of unipolar depression and have limitations in capturing the breadth of bipolar disorder.

Method: The Bipolar Depression Rating Scale (BDRS) was administered together with the Montgomery Asberg Rating Scale (MADRS) and Young Mania Rating Scale (YMRS) in a double-blind randomised placebo-controlled clinical trial of N-acetyl cysteine for bipolar disorder (N = 75).

Results: A factor analysis showed a two-factor solution: depression and mixed symptom clusters. The BDRS has strong internal consistency (Cronbach's alpha = 0.917), the depression cluster showed robust correlation with the MADRS (r = 0.865) and the mixed subscale correlated with the YMRS (r = 0.750).

Conclusion: The BDRS has good internal validity and inter-rater reliability and is sensitive to change in the context of a clinical trial.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: While the importance and magnitude of the burden of low back pain upon the individual is well recognized, a systematic understanding of the impact of the condition on individuals is currently hampered by the lack of an organized understanding of what aspects of a person’s life are affected and the lack of comprehensive measures for these effects. The aim of the present study was to develop a conceptual and measurement model of the overall burden of low back pain from the individual’s perspective using a validity-driven approach.
Methods: To define the breadth of low back pain burden we conducted three concept-mapping workshops to generate an item pool. Two face-to-face workshops (Australia) were conducted with people with low back pain and clinicians and policy-makers, respectively. A third workshop (USA) was held with international multidisciplinary experts. Multidimensional scaling, cluster analysis, participant input and thematic analyses organized participants’ ideas into clusters of ideas that then informed the conceptual model.
Results: One hundred and ninety-nine statements were generated. Considerable overlap was observed between groups, and four major clusters were observed - Psychosocial, Physical, Treatment and Employment - each with between two and six subclusters. Content analysis revealed that elements of the Psychosocial cluster were sufficiently distinct to be split into Psychological and Social, and a further cluster of elements termed Positive Effects also emerged. Finally, a hypothesized structure was proposed with six domains and 16 subdomains. New domains not previously considered in the back pain field emerged for psychometric verification: loss of independence, worry about the future, and negative or discriminatory actions by others.
Conclusions: Using a grounded approach, an explicit a priori and testable model of the overall burden of low back pain has been proposed that captures the full breadth of the burden experienced by patients and observed by experts.