96 resultados para height partition clustering
Resumo:
In this paper a generalisation of the Voronoi partition is used for locational optimisation of facilities having different service capabilities and limited range or reach. The facilities can be stationary, such as base stations in a cellular network, hospitals, schools, etc., or mobile units, such as multiple unmanned aerial vehicles, automated guided vehicles, etc., carrying sensors, or mobile units carrying relief personnel and materials. An objective function for optimal deployment of the facilities is formulated, and its critical points are determined. The locally optimal deployment is shown to be a generalised centroidal Voronoi configuration in which the facilities are located at the centroids of the corresponding generalised Voronoi cells. The problem is formulated for more general mobile facilities, and formal results on the stability, convergence and spatial distribution of the proposed control laws responsible for the motion of the agents carrying facilities, under some constraints on the agents' speed and limit on the sensor range, are provided. The theoretical results are supported with illustrative simulation results.
Resumo:
Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.
Resumo:
Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.
Resumo:
Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning and data mining. Clustering is grouping of a data set or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait according to some defined distance measure. In this paper we present the genetically improved version of particle swarm optimization algorithm which is a population based heuristic search technique derived from the analysis of the particle swarm intelligence and the concepts of genetic algorithms (GA). The algorithm combines the concepts of PSO such as velocity and position update rules together with the concepts of GA such as selection, crossover and mutation. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The performance of our method is better than k-means and PSO algorithm.
Resumo:
Data clustering groups data so that data which are similar to each other are in the same group and data which are dissimilar to each other are in different groups. Since generally clustering is a subjective activity, it is possible to get different clusterings of the same data depending on the need. This paper attempts to find the best clustering of the data by first carrying out feature selection and using only the selected features, for clustering. A PSO (Particle Swarm Optimization)has been used for clustering but feature selection has also been carried out simultaneously. The performance of the above proposed algorithm is evaluated on some benchmark data sets. The experimental results shows the proposed methodology outperforms the previous approaches such as basic PSO and Kmeans for the clustering problem.
Resumo:
We consider a scenario where the communication nodes in a sensor network have limited energy, and the objective is to maximize the aggregate bits transported from sources to respective destinations before network partition due to node deaths. This performance metric is novel, and captures the useful information that a network can provide over its lifetime. The optimization problem that results from our approach is nonlinear; however, we show that it can be converted to a Multicommodity Flow (MCF) problem that yields the optimal value of the metric. Subsequently, we compare the performance of a practical routing strategy, based on Node Disjoint Paths (NDPs), with the ideal corresponding to the MCF formulation. Our results indicate that the performance of NDP-based routing is within 7.5% of the optimal.
Resumo:
This study investigates the application of support vector clustering (SVC) for the direct identification of coherent synchronous generators in large interconnected multi-machine power systems. The clustering is based on coherency measure, which indicates the degree of coherency between any pair of generators. The proposed SVC algorithm processes the coherency measure matrix that is formulated using the generator rotor measurements to cluster the coherent generators. The proposed approach is demonstrated on IEEE 10 generator 39-bus system and an equivalent 35 generators, 246-bus system of practical Indian southern grid. The effect of number of data samples and fault locations are also examined for determining the accuracy of the proposed approach. An extended comparison with other clustering techniques is also included, to show the effectiveness of the proposed approach in grouping the data into coherent groups of generators. This effectiveness of the coherent clusters obtained with the proposed approach is compared in terms of a set of clustering validity indicators and in terms of statistical assessment that is based on the coherency degree of a generator pair.
Resumo:
The effects of the initial height on the temporal persistence probability of steady-state height fluctuations in up-down symmetric linear models of surface growth are investigated. We study the (1 + 1)-dimensional Family model and the (1 + 1)-and (2 + 1)-dimensional larger curvature (LC) model. Both the Family and LC models have up-down symmetry, so the positive and negative persistence probabilities in the steady state, averaged over all values of the initial height h(0), are equal to each other. However, these two probabilities are not equal if one considers a fixed nonzero value of h(0). Plots of the positive persistence probability for negative initial height versus time exhibit power-law behavior if the magnitude of the initial height is larger than the interface width at saturation. By symmetry, the negative persistence probability for positive initial height also exhibits the same behavior. The persistence exponent that describes this power-law decay decreases as the magnitude of the initial height is increased. The dependence of the persistence probability on the initial height, the system size, and the discrete sampling time is found to exhibit scaling behavior.
Resumo:
We consider four-dimensional CFTs which admit a large-N expansion, and whose spectrum contains states whose conformal dimensions do not scale with N. We explicitly reorganise the partition function obtained by exponentiating the one-particle partition function of these states into a heat kernel form for the dual string spectrum on AdS(5). On very general grounds, the heat kernel answer can be expressed in terms of a convolution of the one-particle partition function of the light states in the four-dimensional CFT. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
In this paper, the effect of local defects, viz., cracks and cutouts on the buckling behaviour of functionally graded material plates subjected to mechanical and thermal load is numerically studied. The internal discontinuities, viz., cracks and cutouts are represented independent of the mesh within the framework of the extended finite element method and an enriched shear flexible 4-noded quadrilateral element is used for the spatial discretization. The properties are assumed to vary only in the thickness direction and the effective properties are estimated using the Mori-Tanaka homogenization scheme. The plate kinematics is based on the first order shear deformation theory. The influence of various parameters, viz., the crack length and its location, the cutout radius and its position, the plate aspect ratio and the plate thickness on the critical buckling load is studied. The effect of various boundary conditions is also studied. The numerical results obtained reveal that the critical buckling load decreases with increase in the crack length, the cutout radius and the material gradient index. This is attributed to the degradation in the stiffness either due to the presence of local defects or due to the change in the material composition. (C) 2013 Elsevier Masson SAS. All rights reserved.
Resumo:
Regionalization approaches are widely used in water resources engineering to identify hydrologically homogeneous groups of watersheds that are referred to as regions. Pooled information from sites (depicting watersheds) in a region forms the basis to estimate quantiles associated with hydrological extreme events at ungauged/sparsely gauged sites in the region. Conventional regionalization approaches can be effective when watersheds (data points) corresponding to different regions can be separated using straight lines or linear planes in the space of watershed related attributes. In this paper, a kernel-based Fuzzy c-means (KFCM) clustering approach is presented for use in situations where such linear separation of regions cannot be accomplished. The approach uses kernel-based functions to map the data points from the attribute space to a higher-dimensional space where they can be separated into regions by linear planes. A procedure to determine optimal number of regions with the KFCM approach is suggested. Further, formulations to estimate flood quantiles at ungauged sites with the approach are developed. Effectiveness of the approach is demonstrated through Monte-Carlo simulation experiments and a case study on watersheds in United States. Comparison of results with those based on conventional Fuzzy c-means clustering, Region-of-influence approach and a prior study indicate that KFCM approach outperforms the other approaches in forming regions that are closer to being statistically homogeneous and in estimating flood quantiles at ungauged sites. Key Points
Resumo:
The effect of structure height on the lightning striking distance is estimated using a lightning strike model that takes into account the effect of connecting leaders. According to the results, the lightning striking distance may differ significantly from the values assumed in the IEC standard for structure heights beyond 30m. However, for structure heights smaller than about 30m, the results show that the values assumed by IEC do not differ significantly from the predictions based on a lightning attachment model taking into account the effect of connecting leaders. However, since IEC assumes a smaller striking distance than the ones predicted by the adopted model one can conclude that the safety is not compromised in adhering to the IEC standard. Results obtained from the model are also compared with Collection Volume Method (CVM) and other commonly used lightning attachment models available in the literature. The results show that in the case of CVM the calculated attractive distances are much larger than the ones obtained using the physically based lightning attachment models. This indicates the possibility of compromising the lightning protection procedures when using CVM. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present a novel algorithm for piecewise linear regression which can learn continuous as well as discontinuous piecewise linear functions. The main idea is to repeatedly partition the data and learn a linear model in each partition. The proposed algorithm is similar in spirit to k-means clustering algorithm. We show that our algorithm can also be viewed as a special case of an EM algorithm for maximum likelihood estimation under a reasonable probability model. We empirically demonstrate the effectiveness of our approach by comparing its performance with that of the state of art algorithms on various datasets. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
The complexity in visualizing volumetric data often limits the scope of direct exploration of scalar fields. Isocontour extraction is a popular method for exploring scalar fields because of its simplicity in presenting features in the data. In this paper, we present a novel representation of contours with the aim of studying the similarity relationship between the contours. The representation maps contours to points in a high-dimensional transformation-invariant descriptor space. We leverage the power of this representation to design a clustering based algorithm for detecting symmetric regions in a scalar field. Symmetry detection is a challenging problem because it demands both segmentation of the data and identification of transformation invariant segments. While the former task can be addressed using topological analysis of scalar fields, the latter requires geometry based solutions. Our approach combines the two by utilizing the contour tree for segmenting the data and the descriptor space for determining transformation invariance. We discuss two applications, query driven exploration and asymmetry visualization, that demonstrate the effectiveness of the approach.
Resumo:
Several statistical downscaling models have been developed in the past couple of decades to assess the hydrologic impacts of climate change by projecting the station-scale hydrological variables from large-scale atmospheric variables simulated by general circulation models (GCMs). This paper presents and compares different statistical downscaling models that use multiple linear regression (MLR), positive coefficient regression (PCR), stepwise regression (SR), and support vector machine (SVM) techniques for estimating monthly rainfall amounts in the state of Florida. Mean sea level pressure, air temperature, geopotential height, specific humidity, U wind, and V wind are used as the explanatory variables/predictors in the downscaling models. Data for these variables are obtained from the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis dataset and the Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled Global Climate Model, version 3 (CGCM3) GCM simulations. The principal component analysis (PCA) and fuzzy c-means clustering method (FCM) are used as part of downscaling model to reduce the dimensionality of the dataset and identify the clusters in the data, respectively. Evaluation of the performances of the models using different error and statistical measures indicates that the SVM-based model performed better than all the other models in reproducing most monthly rainfall statistics at 18 sites. Output from the third-generation CGCM3 GCM for the A1B scenario was used for future projections. For the projection period 2001-10, MLR was used to relate variables at the GCM and NCEP grid scales. Use of MLR in linking the predictor variables at the GCM and NCEP grid scales yielded better reproduction of monthly rainfall statistics at most of the stations (12 out of 18) compared to those by spatial interpolation technique used in earlier studies.