869 resultados para height partition clustering
Resumo:
For many applications, it is necessary to produce speech transcriptions in a causal fashion. To produce high quality transcripts, speaker adaptation is often used. This requires online speaker clustering and incremental adaptation techniques to be developed. This paper presents an integrated approach to online speaker clustering and adaptation which allows efficient clustering of speakers using the same accumulated statistics that are normally used for adaptation. Using a consistent criterion for both clustering and adaptation should yield gains for both stages. The proposed approach is evaluated on a meetings transcription task using audio from multiple distant microphones. Consistent gains over standard clustering and adaptation were obtained. Copyright © 2011 ISCA.
Resumo:
MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
Clustering behavior is studied in a model of integrate-and-fire oscillators with excitatory pulse coupling. When considering a population of identical oscillators, the main result is a proof of global convergence to a phase-locked clustered behavior. The robustness of this clustering behavior is then investigated in a population of nonidentical oscillators by studying the transition from total clustering to the absence of clustering as the group coherence decreases. A robust intermediate situation of partial clustering, characterized by few oscillators traveling among nearly phase-locked clusters, is of particular interest. The analysis complements earlier studies of synchronization in a closely related model. © 2008 American Institute of Physics.
Resumo:
The magnitude and frequency of vertical fluctuations of the top of an axisymmetric miscible Boussinesq fountain forms the focus of this work. We present measurements of these quantities for saline-aqueous fountains in uniform quiescent surroundings. Our results span source Froude numbers 0.3 ≤ Fr 0 ≤ 40 and, thereby, encompass very weak, weak, intermediate and forced classes of fountain. We identify distinct scalings, based on known quantities at the fountain source, for the frequency of fountain height fluctuations which collapse our data within bands of Fr0. Notably, our scalings reveal that the (dimensionless) frequency takes a constant value within each band. These results highlight characteristic time scales for the fluctuations which we decompose into a single, physically apparent, length scale and velocity scale within each band. Moreover, within one particular band, spanning source Froude numbers towards the lower end of the full range considered, we identify unexpectedly long-period fluctuations indicating a near balance of inertia and (opposing) buoyancy at the source. Our analysis identifies four distinct classes of fluctuation behaviour (four bands of Fr 0) and this classification matches well with existing classifications of fountains based on rise heights. As such, we show that an analysis of the behaviour of the fountain top alone, rather than the entire fountain, provides an alternative approach to classifying fountains. The similarity of classifications based on the two different methods confirms that the boundaries between classes mark tangible changes in the physics of fountains. For high Fr0 we show that the dominant fluctuations occur at the scale of the largest eddies which can be contained within the fountain near its top. Extending this, we develop a Strouhal number, Strtop, based on experimental measures of the fountain top, defined such that Strtop = 1 would suggest the dominant fluctuations are caused by a continual cycle of eddies forming and collapsing at this largest physical scale. For high- Fr 0 fountains we find Strtop ≈ 0. 9. © 2013 Cambridge University Press.
Resumo:
Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.
Resumo:
The octanol-air partition coefficient (K-OA) is a key descriptor of chemicals partitioning between the atmosphere and environmental organic phases. Quantitative structure-property relationships (QSPR) are necessary to model and predict KOA from molecular structures. Based on 12 quantum chemical descriptors computed by the PM3 Hamiltonian, using partial least squares (PLS) analysis, a QSPR model for logarithms of K-OA to base 10 (log K-OA) for polychlorinated naphthalenes (PCNs), chlorobenzenes and p,p'-DDT was obtained. The cross-validated Q(cum)(2) value of the model is 0.973, indicating a good predictive ability of the model. The main factors governing log K-OA of the PCNs, chlorobenzenes, and p,p'-DDT are, in order of decreasing importance, molecular size and molecular ability of donating/accepting electrons to participate in intermolecular interactions. The intermolecular dispersive interactions play a leading role in governing log K-OA. The more chlorines in PCN and chlorobenzene molecules, the greater the log K-OA values. Increasing E-LUMO (the energy of the lowest unoccupied molecular orbital) of the molecules leads to decreasing log K-OA values, implying possible intermolecular interactions between the molecules under study and octanol molecules. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
A concise quantitative model that incorporates information on both environmental temperature M and molecular structures, for logarithm of octanol-air partition coefficient (K-OA) to base 10 (logK(OA)) of PCDDs, was developed. Partial least squares (PLS) analysis together with 14 quantum chemical descriptors were used to develop the quantitative relationships between structures, environmental temperatures and properties (QRSETP) model. It has been validated that the obtained QRSETP model can be used to predict logK(OA) of other PCDDs. Molecular size, environmental temperature (T), q(+) (the most positive net atomic charge on hydrogen or chlorine atoms in PCDD molecules) and E-LUMO (the energy of the lowest unoccupied molecular orbital) are main factors governing logK(OA) of PCDD/Fs under study. The intermolecular dispersive interactions and thus the size of the molecules play a leading role in governing logK(OA). The more chlorines in PCDD molecules, the greater the logK(OA) values. Increasing E-LUMO values of the molecules leads to decreasing logK(OA) values, implying possible intermolecular interactions between the molecules under study and octanol molecules. Greater q(+) values results in greater intermolecular electrostatic repulsive interactions between PCDD and octanol molecules and smaller logK(OA) values. (C) 2002 Elsevier Science B.V. All rights reserved.