14 resultados para multiple data

em Cambridge University Engineering Department Publications Database


Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary covariate space using Gaussian processes. We use the process to construct a multitask clustering model which partitions datapoints in a similar way across multiple data sources, and a time series model of network data which allows cluster assignments to vary over time. We describe sampling algorithms for inference and apply our method to defining cancer subtypes based on different types of cellular characteristics, finding regulatory modules from gene expression data from multiple human populations, and discovering time varying community structure in a social network.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Simultaneous recording from multiple single neurones presents many technical difficulties. However, obtaining such data has many advantages, which make it highly worthwhile to overcome the technical problems. This report describes methods which we have developed to permit recordings in awake behaving monkeys using the 'Eckhorn' 16 electrode microdrive. Structural magnetic resonance images are collected to guide electrode placement. Head fixation is achieved using a specially designed headpiece, modified for the multiple electrode approach, and access to the cortex is provided via a novel recording chamber. Growth of scar tissue over the exposed dura mater is reduced using an anti-mitotic compound. Control of the microdrive is achieved by a computerised system which permits several experimenters to move different electrodes simultaneously, considerably reducing the load on an individual operator. Neurones are identified as pyramidal tract neurones by antidromic stimulation through chronically implanted electrodes; stimulus control is integrated into the computerised system. Finally, analysis of multiple single unit recordings requires accurate methods to correct for non-stationarity in unit firing. A novel technique for such correction is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ultrasound elastography tracks tissue displacements under small levels of compression to obtain images of strain, a mechanical property useful in the detection and characterization of pathology. Due to the nature of ultrasound beamforming, only tissue displacements in the direction of beam propagation, referred to as 'axial', are measured to high quality, although an ability to measure other components of tissue displacement is desired to more fully characterize the mechanical behavior of tissue. Previous studies have used multiple one-dimensional (1D) angled axial displacements tracked from steered ultrasound beams to reconstruct improved quality trans-axial displacements within the scan plane ('lateral'). We show that two-dimensional (2D) displacement tracking is not possible with unmodified electronically-steered ultrasound data, and present a method of reshaping frames of steered ultrasound data to retain axial-lateral orthogonality, which permits 2D displacement tracking. Simulated and experimental ultrasound data are used to compare changes in image quality of lateral displacements reconstructed using 1D and 2D tracked steered axial and steered lateral data. Reconstructed lateral displacement image quality generally improves with the use of 2D displacement tracking at each steering angle, relative to axial tracking alone, particularly at high levels of compression. Due to the influence of tracking noise, unsteered lateral displacements exhibit greater accuracy than axial-based reconstructions at high levels of applied strain. © 2011 SPIE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Novel statistical models are proposed and developed in this paper for automated multiple-pitch estimation problems. Point estimates of the parameters of partial frequencies of a musical note are modeled as realizations from a non-homogeneous Poisson process defined on the frequency axis. When several notes are combined, the processes for the individual notes combine to give a new Poisson process whose likelihood is easy to compute. This model avoids the data-association step of linking the harmonics of each note with the corresponding partials and is ideal for efficient Bayesian inference of unknown multiple fundamental frequencies in a signal. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present measurements of grid turbulence using 2D particle image velocimetry taken immediately downstream from the grid at a Reynolds number of Re M = 16500 where M is the rod spacing. A long field of view of 14M x 4M in the down- and cross-stream directions was achieved by stitching multiple cameras together. Two uniform biplanar grids were selected to have the same M and pressure drop but different rod diameter D and crosssection. A large data set (10 4 vector fields) was obtained to ensure good convergence of second-order statistics. Estimations of the dissipation rate ε of turbulent kinetic energy (TKE) were found to be sensitive to the number of meansquared velocity gradient terms included and not whether the turbulence was assumed to adhere to isotropy or axisymmetry. The resolution dependency of different turbulence statistics was assessed with a procedure that does not rely on the dissipation scale η. The streamwise evolution of the TKE components and ε was found to collapse across grids when the rod diameter was included in the normalisation. We argue that this should be the case between all regular grids when the other relevant dimensionless quantities are matched and the flow has become homogeneous across the stream. Two-point space correlation functions at x/M = 1 show evidence of complex wake interactions which exhibit a strong Reynolds number dependence. However, these changes in initial conditions disappear indicating rapid cross-stream homogenisation. On the other hand, isotropy was, as expected, not found to be established by x/M = 12 for any case studied. © Springer-Verlag 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Emissions, fuel burn, and noise are the main drivers for innovative aircraft design. Embedded propulsion systems, such as for example used in hybrid-wing body aircraft, can offer fuel burn and noise reduction benefits but the impact of inlet flow distortion on the generation and propagation of turbomachinery noise has yet to be assessed. A novel approach is used to quantify the effects of non-uniform flow on the creation and propagation of multiple pure tone (MPT) noise. The ultimate goal is to conduct a parametric study of S-duct inlets to quantify the effects of inlet design parameters on the acoustic signature. The key challenge is that the effects of distortion transfer, noise source generation and propagation through the non-uniform flow field are inherently coupled such that a simultaneous computation of the aerodynamics and acoustics is required to capture the mechanisms at play. The technical approach is based on a body force description of the fan blade row that is able to capture the distortion transfer and the blade-to-blade flow variations that cause the MPT noise while reducing computational cost. A single, 3-D full-wheel CFD simulation, in which the Euler equations are solved to second-order spatial and temporal accuracy, simultaneously computes the MPT noise generation and its propagation in distorted inlet flow. A new method of producing the blade-to-blade variations in the body force field for MPT noise generation has been developed and validated. The numerical dissipation inherent to the solver is quantified and used to correct for non-physical attenuation in the far-field noise spectra. Source generation, acoustic propagation and acoustic energy transfer between modes is examined in detail. The new method is validated on NASA's Source Diagnostic Test fan and inlet, showing good agreement with experimental data for aerodynamic performance, acoustic source generation, and far-field noise spectra. The next steps involve the assessment of MPT noise in serpentine inlet ducts and the development of a reduced order formulation suitable for incorporation into NASA's ANOPP framework. © 2010 by Jeff Defoe, Alex Narkaj & Zoltan Spakovszky.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We demonstrate how a prior assumption of smoothness can be used to enhance the reconstruction of free energy profiles from multiple umbrella sampling simulations using the Bayesian Gaussian process regression approach. The method we derive allows the concurrent use of histograms and free energy gradients and can easily be extended to include further data. In Part I we review the necessary theory and test the method for one collective variable. We demonstrate improved performance with respect to the weighted histogram analysis method and obtain meaningful error bars without any significant additional computation. In Part II we consider the case of multiple collective variables and compare to a reconstruction using least squares fitting of radial basis functions. We find substantial improvements in the regimes of spatially sparse data or short sampling trajectories. A software implementation is made available on www.libatoms.org.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Adaptation to speaker and environment changes is an essential part of current automatic speech recognition (ASR) systems. In recent years the use of multi-layer percpetrons (MLPs) has become increasingly common in ASR systems. A standard approach to handling speaker differences when using MLPs is to apply a global speaker-specific constrained MLLR (CMLLR) transform to the features prior to training or using the MLP. This paper considers the situation when there are both speaker and channel, communication link, differences in the data. A more powerful transform, front-end CMLLR (FE-CMLLR), is applied to the inputs to the MLP to represent the channel differences. Though global, these FE-CMLLR transforms vary from time-instance to time-instance. Experiments on a channel distorted dialect Arabic conversational speech recognition task indicates the usefulness of adapting MLP features using both CMLLR and FE-CMLLR transforms. © 2013 IEEE.