54 resultados para height partition clustering
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
Clustering behavior is studied in a model of integrate-and-fire oscillators with excitatory pulse coupling. When considering a population of identical oscillators, the main result is a proof of global convergence to a phase-locked clustered behavior. The robustness of this clustering behavior is then investigated in a population of nonidentical oscillators by studying the transition from total clustering to the absence of clustering as the group coherence decreases. A robust intermediate situation of partial clustering, characterized by few oscillators traveling among nearly phase-locked clusters, is of particular interest. The analysis complements earlier studies of synchronization in a closely related model. © 2008 American Institute of Physics.
Resumo:
The magnitude and frequency of vertical fluctuations of the top of an axisymmetric miscible Boussinesq fountain forms the focus of this work. We present measurements of these quantities for saline-aqueous fountains in uniform quiescent surroundings. Our results span source Froude numbers 0.3 ≤ Fr 0 ≤ 40 and, thereby, encompass very weak, weak, intermediate and forced classes of fountain. We identify distinct scalings, based on known quantities at the fountain source, for the frequency of fountain height fluctuations which collapse our data within bands of Fr0. Notably, our scalings reveal that the (dimensionless) frequency takes a constant value within each band. These results highlight characteristic time scales for the fluctuations which we decompose into a single, physically apparent, length scale and velocity scale within each band. Moreover, within one particular band, spanning source Froude numbers towards the lower end of the full range considered, we identify unexpectedly long-period fluctuations indicating a near balance of inertia and (opposing) buoyancy at the source. Our analysis identifies four distinct classes of fluctuation behaviour (four bands of Fr 0) and this classification matches well with existing classifications of fountains based on rise heights. As such, we show that an analysis of the behaviour of the fountain top alone, rather than the entire fountain, provides an alternative approach to classifying fountains. The similarity of classifications based on the two different methods confirms that the boundaries between classes mark tangible changes in the physics of fountains. For high Fr0 we show that the dominant fluctuations occur at the scale of the largest eddies which can be contained within the fountain near its top. Extending this, we develop a Strouhal number, Strtop, based on experimental measures of the fountain top, defined such that Strtop = 1 would suggest the dominant fluctuations are caused by a continual cycle of eddies forming and collapsing at this largest physical scale. For high- Fr 0 fountains we find Strtop ≈ 0. 9. © 2013 Cambridge University Press.
Resumo:
Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.