52 resultados para kernel estimators
Resumo:
Unbiased location- and scale-invariant `elemental' estimators for the GPD tail parameter are constructed. Each involves three log-spacings. The estimators are unbiased for finite sample sizes, even as small as N=3. It is shown that the elementals form a complete basis for unbiased location- and scale-invariant estimators constructed from linear combinations of log-spacings. Preliminary numerical evidence is presented which suggests that elemental combinations can be constructed which are consistent estimators of the tail parameter for samples drawn from the pure GPD family.
Resumo:
In a companion paper (McRobie(2013) arxiv:1304.3918), a simple set of `elemental' estimators was presented for the Generalized Pareto tail parameter. Each elemental estimator: involves only three log-spacings; is absolutely unbiased for all values of the tail parameter; is location- and scale-invariant; and is valid for all sample sizes $N$, even as small as $N= 3$. It was suggested that linear combinations of such elementals could then be used to construct efficient unbiased estimators. In this paper, the analogous mathematical approach is taken to the Generalised Extreme Value (GEV) distribution. The resulting elemental estimators, although not absolutely unbiased, are found to have very small bias, and may thus provide a useful basis for the construction of efficient estimators.
Resumo:
Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.
Resumo:
We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.
Resumo:
MOTIVATION: Synthetic lethal interactions represent pairs of genes whose individual mutations are not lethal, while the double mutation of both genes does incur lethality. Several studies have shown a correlation between functional similarity of genes and their distances in networks based on synthetic lethal interactions. However, there is a lack of algorithms for predicting gene function from synthetic lethality interaction networks. RESULTS: In this article, we present a novel technique called kernelROD for gene function prediction from synthetic lethal interaction networks based on kernel machines. We apply our novel algorithm to Gene Ontology functional annotation prediction in yeast. Our experiments show that our method leads to improved gene function prediction compared with state-of-the-art competitors and that combining genetic and congruence networks leads to a further improvement in prediction accuracy.
Resumo:
In recent years there has been a growing interest amongst the speech research community into the use of spectral estimators which circumvent the traditional quasi-stationary assumption and provide greater time-frequency (t-f) resolution than conventional spectral estimators, such as the short time Fourier power spectrum (STFPS). One distribution in particular, the Wigner distribution (WD), has attracted considerable interest. However, experimental studies have indicated that, despite its improved t-f resolution, employing the WD as the front end of speech recognition system actually reduces recognition performance; only by explicitly re-introducing t-f smoothing into the WD are recognition rates improved. In this paper we provide an explanation for these findings. By treating the spectral estimation problem as one of optimization of a bias variance trade off, we show why additional t-f smoothing improves recognition rates, despite reducing the t-f resolution of the spectral estimator. A practical adaptive smoothing algorithm is presented, whicy attempts to match the degree of smoothing introduced into the WD with the time varying quasi-stationary regions within the speech waveform. The recognition performance of the resulting adaptively smoothed estimator is found to be comparable to that of conventional filterbank estimators, yet the average temporal sampling rate of the resulting spectral vectors is reduced by around a factor of 10. © 1992.
Resumo:
The long term goal of our work is to enable rapid prototyping design optimization to take place on geometries of arbitrary size in a spirit of a real time computer game. In recent papers we have reported the integration of a Level Set based geometry kernel with an octree-based cut-Cartesian mesh generator, RANS flow solver and post-processing all within a single piece of software - and all implemented in parallel with commodity PC clusters as the target. This work has shown that it is possible to eliminate all serial bottlenecks from the CED Process. This paper reports further progress towards our goal; in particular we report on the generation of viscous layer meshes to bridge the body to the flow across the cut-cells. The Level Set formulation, which underpins the geometry representation, is used as a natural mechanism to allow rapid construction of conformal layer meshes. The guiding principle is to construct the mesh which most closely approximates the body but remains solvable. This apparently novel approach is described and examples given.
Resumo:
Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.
Resumo:
The application of automated design optimization to real-world, complex geometry problems is a significant challenge - especially if the topology is not known a priori like in turbine internal cooling. The long term goal of our work is to focus on an end-to-end integration of the whole CFD Process, from solid model through meshing, solving and post-processing to enable this type of design optimization to become viable & practical. In recent papers we have reported the integration of a Level Set based geometry kernel with an octree-based cut- Cartesian mesh generator, RANS flow solver, post-processing & geometry editing all within a single piece of software - and all implemented in parallel with commodity PC clusters as the target. The cut-cells which characterize the approach are eliminated by exporting a body-conformal mesh guided by the underpinning Level Set. This paper extends this work still further with a simple scoping study showing how the basic functionality can be scripted & automated and then used as the basis for automated optimization of a generic gas turbine cooling geometry. Copyright © 2008 by W.N.Dawes.
Resumo:
Cambridge Flow Solutions Ltd, Compass House, Vision Park, Cambridge, CB4 9AD, UK Real-world simulation challenges are getting bigger: virtual aero-engines with multistage blade rows coupled with their secondary air systems & with fully featured geometry; environmental flows at meta-scales over resolved cities; synthetic battlefields. It is clear that the future of simulation is scalable, end-to-end parallelism. To address these challenges we have reported in a sequence of papers a series of inherently parallel building blocks based on the integration of a Level Set based geometry kernel with an octree-based cut-Cartesian mesh generator, RANS flow solver, post-processing and geometry management & editing. The cut-cells which characterize the approach are eliminated by exporting a body-conformal mesh driven by the underpinning Level Set and managed by mesh quality optimization algorithms; this permits third party flow solvers to be deployed. This paper continues this sequence by reporting & demonstrating two main novelties: variable depth volume mesh refinement enabling variable surface mesh refinement and a radical rework of the mesh generation into a bottom-up system based on Space Filling Curves. Also reported are the associated extensions to body-conformal mesh export. Everything is implemented in a scalable, parallel manner. As a practical demonstration, meshes of guaranteed quality are generated for a fully resolved, generic aircraft carrier geometry, a cooled disc brake assembly and a B747 in landing configuration. Copyright © 2009 by W.N.Dawes.
Resumo:
The background to this review paper is research we have performed over recent years aimed at developing a simulation system capable of handling large scale, real world applications implemented in an end-to-end parallel, scalable manner. The particular focus of this paper is the use of a Level Set solid modeling geometry kernel within this parallel framework to enable automated design optimization without topological restrictions and on geometries of arbitrary complexity. Also described is another interesting application of Level Sets: their use in guiding the export of a body-conformal mesh from our basic cut-Cartesian background octree - mesh - this permits third party flow solvers to be deployed. As a practical demonstrations meshes of guaranteed quality are generated and flow-solved for a B747 in full landing configuration and an automated optimization is performed on a cooled turbine tip geometry. Copyright © 2009 by W.N.Dawes.
Resumo:
A parametric study of spark ignition in a uniform monodisperse turbulent spray is performed with complex chemistry three-dimensional Direct Numerical Simulations in order to improve the understanding of the structure of the ignition kernel. The heat produced by the kernel increases with the amount of fuel evaporated inside the spark volume. Moreover, the heat sink by evaporation is initially higher than the heat release and can have a negative effect on ignition. With the sprays investigated, heat release occurs over a large range of mixture fractions, being high within the nominal flammability limits and finite but low below the lean flammability limit. The burning of very lean regions is attributed to the diffusion of heat and species from regions of high heat release, and from the spark, to lean regions. Two modes of spray ignition are reported. With a relatively dilute spray, nominally flammable material exists only near the droplets. Reaction zones are created locally near the droplets and have a non-premixed character. They spread from droplet to droplet through a very lean interdroplet spacing. With a dense spray, the hot spark region is rich due to substantial evaporation but the cold region remains lean. In between, a large surface of flammable material is generated by evaporation. Ignition occurs there and a large reaction zone propagates from the rich burned region to the cold lean region. This flame is wrinkled due to the stratified mixture fraction field and evaporative cooling. In the dilute spray, the reaction front curvature pdf contains high values associated with single droplet combustion, while in the dense spray, the curvature is lower and closer to the curvature associated with gaseous fuel ignition kernels. © 2011 The Combustion Institute.
Resumo:
Noise and vibration from underground railways is a major source of disturbance to inhabitants near subways. To help designers meet noise and vibration limits, numerical models are used to understand vibration propagation from these underground railways. However, the models commonly assume the ground is homogeneous and neglect to include local variability in the soil properties. Such simplifying assumptions add a level of uncertainty to the predictions which is not well understood. The goal of the current paper is to quantify the effect of soil inhomogeneity on surface vibration. The thin-layer method (TLM) is suggested as an efficient and accurate means of simulating vibration from underground railways in arbitrarily layered half-spaces. Stochastic variability of the soils elastic modulus is introduced using a KL expansion; the modulus is assumed to have a log-normal distribution and a modified exponential covariance kernel. The effect of horizontal soil variability is investigated by comparing the stochastic results for soils varied only in the vertical direction to soils with 2D variability. Results suggest that local soil inhomogeneity can significantly affect surface velocity predictions; 90 percent confidence intervals showing 8 dB averages and peak values up to 12 dB are computed. This is a significant source of uncertainty and should be considered when using predictions from models assuming homogeneous soil properties. Furthermore, the effect of horizontal variability of the elastic modulus on the confidence interval appears to be negligible. This suggests that only vertical variation needs to be taken into account when modelling ground vibration from underground railways. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
When a thin rectangular plate is restrained on the two long edges and free on the remaining edges, the equivalent stiffness of the restraining joints can be identified by the order of the natural frequencies obtained using the free response of the plate at a single location. This work presents a method to identify the equivalent stiffness of the restraining joints, being represented as simply supporting the plate but elastically restraining it in rotation. An integral transform is used to map the autospectrum of the free response from the frequency domain to the stiffness domain in order to identify the equivalent torsional stiffness of the restrained edges of the plate and also the order of natural frequencies. The kernel of the integral transform is built interpolating data from a finite element model of the plate. The method introduced in this paper can also be applied to plates or shells with different shapes and boundary conditions. © 2011 Elsevier Ltd. All rights reserved.