22 resultados para spatial clustering algorithms

em CentAUR: Central Archive University of Reading - UK


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper is concerned with tensor clustering with the assistance of dimensionality reduction approaches. A class of formulation for tensor clustering is introduced based on tensor Tucker decomposition models. In this formulation, an extra tensor mode is formed by a collection of tensors of the same dimensions and then used to assist a Tucker decomposition in order to achieve data dimensionality reduction. We design two types of clustering models for the tensors: PCA Tensor Clustering model and Non-negative Tensor Clustering model, by utilizing different regularizations. The tensor clustering can thus be solved by the optimization method based on the alternative coordinate scheme. Interestingly, our experiments show that the proposed models yield comparable or even better performance compared to most recent clustering algorithms based on matrix factorization.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Parametric software effort estimation models consisting on a single mathematical relationship suffer from poor adjustment and predictive characteristics in cases in which the historical database considered contains data coming from projects of a heterogeneous nature. The segmentation of the input domain according to clusters obtained from the database of historical projects serves as a tool for more realistic models that use several local estimation relationships. Nonetheless, it may be hypothesized that using clustering algorithms without previous consideration of the influence of well-known project attributes misses the opportunity to obtain more realistic segments. In this paper, we describe the results of an empirical study using the ISBSG-8 database and the EM clustering algorithm that studies the influence of the consideration of two process-related attributes as drivers of the clustering process: the use of engineering methodologies and the use of CASE tools. The results provide evidence that such consideration conditions significantly the final model obtained, even though the resulting predictive quality is of a similar magnitude.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper discusses how numerical gradient estimation methods may be used in order to reduce the computational demands on a class of multidimensional clustering algorithms. The study is motivated by the recognition that several current point-density based cluster identification algorithms could benefit from a reduction of computational demand if approximate a-priori estimates of the cluster centres present in a given data set could be supplied as starting conditions for these algorithms. In this particular presentation, the algorithm shown to benefit from the technique is the Mean-Tracking (M-T) cluster algorithm, but the results obtained from the gradient estimation approach may also be applied to other clustering algorithms and their related disciplines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper the use of neural networks for the control of dynamical systems is considered. Both identification and feedback control aspects are discussed as well as the types of system for which neural networks can provide a useful technique. Multi-layer Perceptron and Radial Basis function neural network types are looked at, with an emphasis on the latter. It is shown how basis function centre selection is a critical part of the implementation process and that multivariate clustering algorithms can be an extremely useful tool for finding centres.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a geoadditive negative binomial model (Geo-NB-GAM) for regional count data that allows us to address simultaneously some important methodological issues, such as spatial clustering, nonlinearities, and overdispersion. This model is applied to the study of location determinants of inward greenfield investments that occurred during 2003–2007 in 249 European regions. After presenting the data set and showing the presence of overdispersion and spatial clustering, we review the theoretical framework that motivates the choice of the location determinants included in the empirical model, and we highlight some reasons why the relationship between some of the covariates and the dependent variable might be nonlinear. The subsequent section first describes the solutions proposed by previous literature to tackle spatial clustering, nonlinearities, and overdispersion, and then presents the Geo-NB-GAM. The empirical analysis shows the good performance of Geo-NB-GAM. Notably, the inclusion of a geoadditive component (a smooth spatial trend surface) permits us to control for spatial unobserved heterogeneity that induces spatial clustering. Allowing for nonlinearities reveals, in keeping with theoretical predictions, that the positive effect of agglomeration economies fades as the density of economic activities reaches some threshold value. However, no matter how dense the economic activity becomes, our results suggest that congestion costs never overcome positive agglomeration externalities.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present some additions to a fuzzy variable radius niche technique called Dynamic Niche Clustering (DNC) (Gan and Warwick, 1999; 2000; 2001) that enable the identification and creation of niches of arbitrary shape through a mechanism called Niche Linkage. We show that by using this mechanism it is possible to attain better feature extraction from the underlying population.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a continuation of a variable radius niche technique called Dynamic Niche Clustering developed by (Gan & Warwick, 1999) is presented. The technique employs a separate dynamic population of overlapping niches that coexists alongside the normal population. An empirical analysis of the updated methodology on a large group of standard optimisation test-bed functions is also given. The technique is shown to perform almost as well as standard fitness sharing with regards to stability and the accuracy of peak identification, but it outperforms standard fitness sharing with regards to time complexity. It is also shown that the technique is capable of forming niches of varying size depending on the characteristics of the underlying peak that the niche is populating.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ensemble clustering (EC) can arise in data assimilation with ensemble square root filters (EnSRFs) using non-linear models: an M-member ensemble splits into a single outlier and a cluster of M−1 members. The stochastic Ensemble Kalman Filter does not present this problem. Modifications to the EnSRFs by a periodic resampling of the ensemble through random rotations have been proposed to address it. We introduce a metric to quantify the presence of EC and present evidence to dispel the notion that EC leads to filter failure. Starting from a univariate model, we show that EC is not a permanent but transient phenomenon; it occurs intermittently in non-linear models. We perform a series of data assimilation experiments using a standard EnSRF and a modified EnSRF by a resampling though random rotations. The modified EnSRF thus alleviates issues associated with EC at the cost of traceability of individual ensemble trajectories and cannot use some of algorithms that enhance performance of standard EnSRF. In the non-linear regimes of low-dimensional models, the analysis root mean square error of the standard EnSRF slowly grows with ensemble size if the size is larger than the dimension of the model state. However, we do not observe this problem in a more complex model that uses an ensemble size much smaller than the dimension of the model state, along with inflation and localisation. Overall, we find that transient EC does not handicap the performance of the standard EnSRF.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the fast development of the Internet, wireless communications and semiconductor devices, home networking has received significant attention. Consumer products can collect and transmit various types of data in the home environment. Typical consumer sensors are often equipped with tiny, irreplaceable batteries and it therefore of the utmost importance to design energy efficient algorithms to prolong the home network lifetime and reduce devices going to landfill. Sink mobility is an important technique to improve home network performance including energy consumption, lifetime and end-to-end delay. Also, it can largely mitigate the hot spots near the sink node. The selection of optimal moving trajectory for sink node(s) is an NP-hard problem jointly optimizing routing algorithms with the mobile sink moving strategy is a significant and challenging research issue. The influence of multiple static sink nodes on energy consumption under different scale networks is first studied and an Energy-efficient Multi-sink Clustering Algorithm (EMCA) is proposed and tested. Then, the influence of mobile sink velocity, position and number on network performance is studied and a Mobile-sink based Energy-efficient Clustering Algorithm (MECA) is proposed. Simulation results validate the performance of the proposed two algorithms which can be deployed in a consumer home network environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element