174 resultados para Algorithm clustering
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
We propose a novel model for the spatio-temporal clustering of trajectories based on motion, which applies to challenging street-view video sequences of pedestrians captured by a mobile camera. A key contribution of our work is the introduction of novel probabilistic region trajectories, motivated by the non-repeatability of segmentation of frames in a video sequence. Hierarchical image segments are obtained by using a state-of-the-art hierarchical segmentation algorithm, and connected from adjacent frames in a directed acyclic graph. The region trajectories and measures of confidence are extracted from this graph using a dynamic programming-based optimisation. Our second main contribution is a Bayesian framework with a twofold goal: to learn the optimal, in a maximum likelihood sense, Random Forests classifier of motion patterns based on video features, and construct a unique graph from region trajectories of different frames, lengths and hierarchical levels. Finally, we demonstrate the use of Isomap for effective spatio-temporal clustering of the region trajectories of pedestrians. We support our claims with experimental results on new and existing challenging video sequences. © 2011 IEEE.
Resumo:
We present a novel filtering algorithm for tracking multiple clusters of coordinated objects. Based on a Markov chain Monte Carlo (MCMC) mechanism, the new algorithm propagates a discrete approximation of the underlying filtering density. A dynamic Gaussian mixture model is utilized for representing the time-varying clustering structure. This involves point process formulations of typical behavioral moves such as birth and death of clusters as well as merging and splitting. For handling complex, possibly large scale scenarios, the sampling efficiency of the basic MCMC scheme is enhanced via the use of a Metropolis within Gibbs particle refinement step. As the proposed methodology essentially involves random set representations, a new type of estimator, termed the probability hypothesis density surface (PHDS), is derived for computing point estimates. It is further proved that this estimator is optimal in the sense of the mean relative entropy. Finally, the algorithm's performance is assessed and demonstrated in both synthetic and realistic tracking scenarios. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.
Resumo:
Displacement estimation is a key step in the evaluation of tissue elasticity by quasistatic strain imaging. An efficient approach may incorporate a tracking strategy whereby each estimate is initially obtained from its neighbours' displacements and then refined through a localized search. This increases the accuracy and reduces the computational expense compared with exhaustive search. However, simple tracking strategies fail when the target displacement map exhibits complex structure. For example, there may be discontinuities and regions of indeterminate displacement caused by decorrelation between the pre- and post-deformation radio frequency (RF) echo signals. This paper introduces a novel displacement tracking algorithm, with a search strategy guided by a data quality indicator. Comparisons with existing methods show that the proposed algorithm is more robust when the displacement distribution is challenging.
Resumo:
This paper introduces a new technique called species conservation for evolving parallel subpopulations. The technique is based on the concept of dividing the population into several species according to their similarity. Each of these species is built around a dominating individual called the species seed. Species seeds found in the current generation are saved (conserved) by moving them into the next generation. Our technique has proved to be very effective in finding multiple solutions of multimodal optimization problems. We demonstrate this by applying it to a set of test problems, including some problems known to be deceptive to genetic algorithms.
Resumo:
Node placement plays a significant role in the effective and successful deployment of Wireless Sensor Networks (WSNs), i.e., meeting design goals such as cost effectiveness, coverage, connectivity, lifetime and data latency. In this paper, we propose a new strategy to assist in the placement of Relay Nodes (RNs) for a WSN monitoring underground tunnel infrastructure. By applying for the first time an accurate empirical mean path loss propagation model along with a well fitted fading distribution model specifically defined for the tunnel environment, we address the RN placement problem with guaranteed levels of radio link performance. The simulation results show that the choice of appropriate path loss model and fading distribution model for a typical environment is vital in the determination of the number and the positions of RNs. Furthermore, we adapt a two-tier clustering multi-hop framework in which the first tier of the RN placement is modelled as the minimum set cover problem, and the second tier placement is solved using the search-and-find algorithm. The implementation of the proposed scheme is evaluated by simulation, and it lays the foundations for further work in WSN planning for underground tunnel applications. © 2010 IEEE.