960 resultados para Semi-supervised clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

For many applications, it is necessary to produce speech transcriptions in a causal fashion. To produce high quality transcripts, speaker adaptation is often used. This requires online speaker clustering and incremental adaptation techniques to be developed. This paper presents an integrated approach to online speaker clustering and adaptation which allows efficient clustering of speakers using the same accumulated statistics that are normally used for adaptation. Using a consistent criterion for both clustering and adaptation should yield gains for both stages. The proposed approach is evaluated on a meetings transcription task using audio from multiple distant microphones. Consistent gains over standard clustering and adaptation were obtained. Copyright © 2011 ISCA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Semi-implicit, second order temporal and spatial finite volume computations of the flow in a differentially heated rotating annulus are presented. For the regime considered, three cyclones and anticyclones separated by a relatively fast moving jet of fluid or "jet stream" are predicted. Two second order methods are compared with, first order spatial predictions, and experimental measurements. Velocity vector plots are used to illustrate the predicted flow structure. Computations made using second order central differences are shown to agree best with experimental measurements, and to be stable for integrations over long time periods (> 1000s). No periodic smoothing is required to prevent divergence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary covariate space using Gaussian processes. We use the process to construct a multitask clustering model which partitions datapoints in a similar way across multiple data sources, and a time series model of network data which allows cluster assignments to vary over time. We describe sampling algorithms for inference and apply our method to defining cancer subtypes based on different types of cellular characteristics, finding regulatory modules from gene expression data from multiple human populations, and discovering time varying community structure in a social network.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An investigation into the potential for reducing road damage by optimising the design of heavy vehicle suspensions is described. In the first part of the paper two simple mathematical models are used to study the optimisation of conventional passive suspensions. Simple modifications are made to the steel spring suspension of a tandem axle trailer and it is found experimentally that RMS dynamic tyre forces can be reduced by 15% and theoretical road damage by 5.2%. A mathematical model of an air-sprung articulated vehicle is validated, and its suspension is optimised according to the simple models. This vehicle generates about 9% less damage than the leaf-sprung vehicle in the unmodified state and it is predicted that, for the operating conditions examined, the road damage caused by this vehicle can be reduced by a further 5.4%. Finally, it is shown experimentally that computer-controlled semi-active dampers have the potential to reduce road damage by a further 5-6%, compared to an air suspension with optimum passive damping. © Copyright 1994 Society of Automotive Engineers, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ground movements induced by the construction of supported excavation systems are generally predicted by empirical/semi-empirical methods in the design stage. However, these methods cannot account for the site-specific conditions and for information that becomes available as an excavation proceeds. A Bayesian updating methodology is proposed to update the predictions of ground movements in the later stages of excavation based on recorded deformation measurements. As an application, the proposed framework is used to predict the three-dimensional deformation shapes at four incremental excavation stages of an actual supported excavation project. © 2011 Taylor & Francis Group, London.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design and construction of deep excavations in urban environment is often governed by serviceability limit state related to the risk of damage to adjacent buildings. In current practice, the assessment of excavation-induced building damage has focused on a deterministic approach. This paper presents a component/system reliability analysis framework to assess the probability that specified threshold design criteria for multiple serviceability limit states are exceeded. A recently developed Bayesian probabilistic framework is used to update the predictions of ground movements in the later stages of excavation based on the recorded deformation measurements. An example is presented to show how the serviceability performance for excavation problems can be assessed based on the component/system reliability analysis. © 2011 ASCE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ground movements induced by the construction of supported excavation systems are generally predicted in the design stage by empirical/semi-empirical methods. However, these methods cannot account for the site-specific conditions and for information that become available as an excavation proceeds. A Bayesian updating methodology is proposed to update the predictions of ground movements in the later stages of excavation based on recorded deformation measurements. As an application, the proposed framework is used to predict the three-dimensional deformation shapes at four incremental excavation stages of an actual supported excavation project. Copyright © ASCE 2011.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The generalization of the geometric mean of positive scalars to positive definite matrices has attracted considerable attention since the seminal work of Ando. The paper generalizes this framework of matrix means by proposing the definition of a rank-preserving mean for two or an arbitrary number of positive semi-definite matrices of fixed rank. The proposed mean is shown to be geometric in that it satisfies all the expected properties of a rank-preserving geometric mean. The work is motivated by operations on low-rank approximations of positive definite matrices in high-dimensional spaces.© 2012 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering behavior is studied in a model of integrate-and-fire oscillators with excitatory pulse coupling. When considering a population of identical oscillators, the main result is a proof of global convergence to a phase-locked clustered behavior. The robustness of this clustering behavior is then investigated in a population of nonidentical oscillators by studying the transition from total clustering to the absence of clustering as the group coherence decreases. A robust intermediate situation of partial clustering, characterized by few oscillators traveling among nearly phase-locked clusters, is of particular interest. The analysis complements earlier studies of synchronization in a closely related model. © 2008 American Institute of Physics.