65 resultados para Subjective Clustering
Resumo:
In this paper, a comparative study is carried using three nature-inspired algorithms namely Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Cuckoo Search (CS) on clustering problem. Cuckoo search is used with levy flight. The heavy-tail property of levy flight is exploited here. These algorithms are used on three standard benchmark datasets and one real-time multi-spectral satellite dataset. The results are tabulated and analysed using various techniques. Finally we conclude that under the given set of parameters, cuckoo search works efficiently for majority of the dataset and levy flight plays an important role.
Resumo:
This paper illustrates the application of a new technique, based on Support Vector Clustering (SVC) for the direct identification of coherent synchronous generators in a large interconnected Multi-Machine Power Systems. The clustering is based on coherency measures, obtained from the time domain responses of the generators following system disturbances. The proposed clustering algorithm could be integrated into a wide-area measurement system that enables fast identification of coherent clusters of generators for the construction of dynamic equivalent models. An application of the proposed method is demonstrated on a practical 15 generators 72-bus system, an equivalent of Indian Southern grid in an attempt to show the effectiveness of this clustering approach. The effects of short circuit fault locations on coherency are also investigated.
Resumo:
We address the problem of detecting cells in biological images. The problem is important in many automated image analysis applications. We identify the problem as one of clustering and formulate it within the framework of robust estimation using loss functions. We show how suitable loss functions may be chosen based on a priori knowledge of the noise distribution. Specifically, in the context of biological images, since the measurement noise is not Gaussian, quadratic loss functions yield suboptimal results. We show that by incorporating the Huber loss function, cells can be detected robustly and accurately. To initialize the algorithm, we also propose a seed selection approach. Simulation results show that Huber loss exhibits better performance compared with some standard loss functions. We also provide experimental results on confocal images of yeast cells. The proposed technique exhibits good detection performance even when the signal-to-noise ratio is low.
Resumo:
When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.
Resumo:
The role of crystallite size and clustering in influencing the stability of the structures of a large tetragonality ferroelectric system 0.6BiFeO(3)-0.4PbTiO(3) was investigated. The system exhibits cubic phase for a crystallite size similar to 25 nm, three times larger than the critical size reported for one of its end member PbTiO3. With increased degree of clustering for the same average crystallite size, partial stabilization of the ferroelectric tetragonal phase takes place. The results suggest that clustering helps in reducing the depolarization energy without the need for increasing the crystallite size of free particles.
Resumo:
Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.
Resumo:
Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.
Resumo:
Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.
Resumo:
Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning and data mining. Clustering is grouping of a data set or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait according to some defined distance measure. In this paper we present the genetically improved version of particle swarm optimization algorithm which is a population based heuristic search technique derived from the analysis of the particle swarm intelligence and the concepts of genetic algorithms (GA). The algorithm combines the concepts of PSO such as velocity and position update rules together with the concepts of GA such as selection, crossover and mutation. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The performance of our method is better than k-means and PSO algorithm.
Resumo:
This study investigates the application of support vector clustering (SVC) for the direct identification of coherent synchronous generators in large interconnected multi-machine power systems. The clustering is based on coherency measure, which indicates the degree of coherency between any pair of generators. The proposed SVC algorithm processes the coherency measure matrix that is formulated using the generator rotor measurements to cluster the coherent generators. The proposed approach is demonstrated on IEEE 10 generator 39-bus system and an equivalent 35 generators, 246-bus system of practical Indian southern grid. The effect of number of data samples and fault locations are also examined for determining the accuracy of the proposed approach. An extended comparison with other clustering techniques is also included, to show the effectiveness of the proposed approach in grouping the data into coherent groups of generators. This effectiveness of the coherent clusters obtained with the proposed approach is compared in terms of a set of clustering validity indicators and in terms of statistical assessment that is based on the coherency degree of a generator pair.
Resumo:
A controlled laboratory experiment was carried out on forty Indian male college students for evaluating the effect of indoor thermal environment on occupants' response and thermal comfort. During experiment, indoor temperature varied from 21 degrees C to 33 degrees C, and the variables like relative humidity, airflow, air temperature and radiant temperature were recorded along with skin (T-sk) and oral temperature (T-core) from the subjects. From T-sk and T-c, body temperature (T-b) was evaluated. Subjective Thermal Sensation Vote (TSV) was recorded using ASHRAE 7-point scale. In PMV model, Fanger's T-sk equation was used to accommodate adaptive response. Stepwise regression analysis result showed T-b was better predictor of TSV than T-sk and T-core. Regional skin temperature response, lower sweat threshold temperature with no dipping sweat and higher cutaneous sweating threshold temperature were observed as thermal adaptive responses. Using PMV model, thermal comfort zone was evaluated as (22.46-25.41) degrees C with neutral temperature of 23.91 degrees C, whereas using TSV response, wider comfort zone was estimated as (23.25-2632) degrees C with neutral temperature at 24.83 degrees C. It was observed that PMV-model overestimated the actual thermal response. Interestingly, these subjects were found to be less sensitive to hot but more sensitive to cold. A new TSV-PPD relation (PPDnew) was obtained with an asymmetric distribution of hot-cold thermal sensation response in Indians. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
A controlled laboratory experiment was carried out on forty Indian male college students for evaluating the effect of indoor thermal environment on occupants' response and thermal comfort. During experiment, indoor temperature varied from 21 degrees C to 33 degrees C, and the variables like relative humidity, airflow, air temperature and radiant temperature were recorded along with subject's physiological parameters (skin (T-sk) and oral temperature (T-c)) and subjective thermal sensation responses (TSV). From T-sk and T-c, body temperature (T-b) was evaluated. Subjective Thermal Sensation Vote (TSV) was recorded using ASHRAE 7-point scale. In PMV model, Fanger's T-sk equation was used to accommodate adaptive response. Step-wise regression analysis result showed T-b was better predictor of TSV than T-sk and T-c. Regional skin temperature response, suppressed sweating without dipping, lower sweating threshold temperature and higher cutaneous threshold for sweating were observed as thermal adaptive responses. These adaptive responses cannot be considered in PMV model. To incorporate subjective adaptive response, mean skin temperature (T-sk) is considered in dry heat loss calculation. Along with these, PMV-model and other two methodologies are adopted to calculate PMV values and results are compared. However, recent literature is limited to measure the sweat rate in Indians and consideration of constant Ersw in PMV model needs to be corrected. Using measured T-sk in PMV model (Method(1)), thermal comfort zone corresponding to 0.5 <= PMV <= 0.5 was evaluated as (22.46-25.41) degrees C with neutral temperature of 23.91 degrees C, similarly while using TSV response, wider comfort zone was estimated as (23.25-26.32) degrees C with neutral temperature at 24.83 degrees C, which was further increased to with TSV-PPDnew, relation. It was observed that PMV-model overestimated the actual thermal response. Interestingly, these subjects were found to be less sensitive to hot but more sensitive to cold. A new TSV-PPD relation (PPDnew) was obtained from the population distribution of TSV response with an asymmetric distribution of hot-cold thermal sensation response from Indians. The calculations of human thermal stress according to steady state energy balance models used on PMV model seem to be inadequate to evaluate human thermal sensation of Indians. Relevance to industry: The purpose of this paper is to estimate thermal comfort zone and optimum temperature for Indians. It also highlights that PMV model seems to be inadequate to evaluate subjective thermal perception in Indians. These results can be used in feedback control of HVAC systems in residential and industrial buildings. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Regionalization approaches are widely used in water resources engineering to identify hydrologically homogeneous groups of watersheds that are referred to as regions. Pooled information from sites (depicting watersheds) in a region forms the basis to estimate quantiles associated with hydrological extreme events at ungauged/sparsely gauged sites in the region. Conventional regionalization approaches can be effective when watersheds (data points) corresponding to different regions can be separated using straight lines or linear planes in the space of watershed related attributes. In this paper, a kernel-based Fuzzy c-means (KFCM) clustering approach is presented for use in situations where such linear separation of regions cannot be accomplished. The approach uses kernel-based functions to map the data points from the attribute space to a higher-dimensional space where they can be separated into regions by linear planes. A procedure to determine optimal number of regions with the KFCM approach is suggested. Further, formulations to estimate flood quantiles at ungauged sites with the approach are developed. Effectiveness of the approach is demonstrated through Monte-Carlo simulation experiments and a case study on watersheds in United States. Comparison of results with those based on conventional Fuzzy c-means clustering, Region-of-influence approach and a prior study indicate that KFCM approach outperforms the other approaches in forming regions that are closer to being statistically homogeneous and in estimating flood quantiles at ungauged sites. Key Points
Resumo:
The complexity in visualizing volumetric data often limits the scope of direct exploration of scalar fields. Isocontour extraction is a popular method for exploring scalar fields because of its simplicity in presenting features in the data. In this paper, we present a novel representation of contours with the aim of studying the similarity relationship between the contours. The representation maps contours to points in a high-dimensional transformation-invariant descriptor space. We leverage the power of this representation to design a clustering based algorithm for detecting symmetric regions in a scalar field. Symmetry detection is a challenging problem because it demands both segmentation of the data and identification of transformation invariant segments. While the former task can be addressed using topological analysis of scalar fields, the latter requires geometry based solutions. Our approach combines the two by utilizing the contour tree for segmenting the data and the descriptor space for determining transformation invariance. We discuss two applications, query driven exploration and asymmetry visualization, that demonstrate the effectiveness of the approach.
Resumo:
Objective: The aim of this study is to validate the applicability of the PolyVinyliDene Fluoride (PVDF) nasal sensor to assess the nasal airflow, in healthy subjects and patients with nasal obstruction and to correlate the results with the score of Visual Analogue Scale (VAS). Methods: PVDF nasal sensor and VAS measurements were carried out in 50 subjects (25-healthy subjects and 25 patients). The VAS score of nasal obstruction and peak-to-peak amplitude (Vp-p) of nasal cycle measured by PVDF nasal sensors were analyzed for right nostril (RN) and left nostril (LN) in both the groups. Spearman's rho correlation was calculated. The relationship between PVDF nasal sensor measurements and severity of nasal obstruction (VAS score) were assessed by ANOVA. Results: In healthy group, the measurement of nasal airflow by PVDF nasal sensor for RN and LN were found to be 51.14 +/- 5.87% and 48.85 +/- 5.87%, respectively. In patient group, PVDF nasal sensor indicated lesser nasal airflow in the blocked nostrils (RN: 23.33 +/- 10.54% and LN: 32.24 +/- 11.54%). Moderate correlation was observed in healthy group (r = 0.710, p < 0.001 for RN and r = 0.651, p < 0.001 for LN), and moderate to strong correlation in patient group (r = 0.751, p < 0.01 for RN and r = 0.885, p < 0.0001 for LN). Conclusion: PVDF nasal sensor method is a newly developed technique for measuring the nasal airflow. Moderate to strong correlation was observed between PVDF nasal sensor data and VAS scores for nasal obstruction. In our present study, PVDF nasal sensor technique successfully differentiated between healthy subjects and patients with nasal obstruction. Additionally, it can also assess severity of nasal obstruction in comparison with VAS. Thus, we propose that the PVDF nasal sensor technique could be used as a new diagnostic method to evaluate nasal obstruction in routine clinical practice. (C) 2015 Elsevier Inc. All rights reserved.