23 resultados para Data Modeling
em Cambridge University Engineering Department Publications Database
Resumo:
We present the Gaussian process density sampler (GPDS), an exchangeable generative model for use in nonparametric Bayesian density estimation. Samples drawn from the GPDS are consistent with exact, independent samples from a distribution defined by a density that is a transformation of a function drawn from a Gaussian process prior. Our formulation allows us to infer an unknown density from data using Markov chain Monte Carlo, which gives samples from the posterior distribution over density functions and from the predictive distribution on data space. We describe two such MCMC methods. Both methods also allow inference of the hyperparameters of the Gaussian process.
Resumo:
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data.
Resumo:
A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data $\mathbf{Y}$ is modeled as a linear superposition, $\mathbf{G}$, of a potentially infinite number of hidden factors, $\mathbf{X}$. The Indian Buffet Process (IBP) is used as a prior on $\mathbf{G}$ to incorporate sparsity and to allow the number of latent features to be inferred. The model's utility for modeling gene expression data is investigated using randomly generated data sets based on a known sparse connectivity matrix for E. Coli, and on three biological data sets of increasing complexity.
Resumo:
Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets. © 2010 Springer-Verlag.
Resumo:
The effects of initial soil fabric on behaviors of granular soils are investigated by using Distinct Element Method (DEM) numerical simulation. Soil specimens are represented by an assembly of non-uniform sized spheres with different initial contact normal distributions. Isotropically consolidated triaxial compression loading and extension unloading in both undrained and drained conditions are simulated for vertically- and horizontally-sheared specimens. The numerical simulation results are compared qualitatively with the published experimental data and the effects of initial soil fabric on resulting soil behaviors are discussed, including the effects of specimen reconstitution methods, effects of large preshearing, and anisotropic characteristics in undrained and drained conditions. The effects of initial soil fabric and mode of shearing on the quasi-steady state line are also investigated. The numerical simulation results can systematically explain that the observed experimental behaviors of granular soils are due principally to their conditions of the initial soil fabric. This outcome provides insights into the observed phenomena in microscopic view. © 2011 Elsevier Ltd.
Resumo:
Simulations of an n-heptane spray autoigniting under conditions relevant to a diesel engine are performed using two-dimensional, first-order conditional moment closure (CMC) with full treatment of spray terms in the mixture fraction variance and CMC equations. The conditional evaporation term in the CMC equations is closed assuming interphase exchange to occur at the droplet saturation mixture fraction values only. Modeling of the unclosed terms in themixture fraction variance equation is done accordingly. Comparison with experimental data for a range of ambient oxygen concentrations shows that the ignition delay is overpredicted. The trend of increasing ignition delay with decreasing oxygen concentration, however, is correctly captured. Good agreement is found between the computed and measured flame lift-off height for all conditions investigated. Analysis of source terms in the CMC temperature equation reveals that a convective-reactive balance sets in at the flame base, with spatial diffusion terms being important, but not as important as in lifted jet flames in cold air. Inclusion of droplet terms in the governing equations is found to affect the mixture fraction variance field in the region where evaporation is the strongest, and to slightly increase the ignition delay time due to the cooling associated with the evaporation. Both flame propagation and stabilization mechanisms, however, remain unaffected. © 2011 Taylor & Francis.
Resumo:
Reducing energy consumption is a major challenge for "energy-intensive" industries such as papermaking. A commercially viable energy saving solution is to employ data-based optimization techniques to obtain a set of "optimized" operational settings that satisfy certain performance indices. The difficulties of this are: 1) the problems of this type are inherently multicriteria in the sense that improving one performance index might result in compromising the other important measures; 2) practical systems often exhibit unknown complex dynamics and several interconnections which make the modeling task difficult; and 3) as the models are acquired from the existing historical data, they are valid only locally and extrapolations incorporate risk of increasing process variability. To overcome these difficulties, this paper presents a new decision support system for robust multiobjective optimization of interconnected processes. The plant is first divided into serially connected units to model the process, product quality, energy consumption, and corresponding uncertainty measures. Then multiobjective gradient descent algorithm is used to solve the problem in line with user's preference information. Finally, the optimization results are visualized for analysis and decision making. In practice, if further iterations of the optimization algorithm are considered, validity of the local models must be checked prior to proceeding to further iterations. The method is implemented by a MATLAB-based interactive tool DataExplorer supporting a range of data analysis, modeling, and multiobjective optimization techniques. The proposed approach was tested in two U.K.-based commercial paper mills where the aim was reducing steam consumption and increasing productivity while maintaining the product quality by optimization of vacuum pressures in forming and press sections. The experimental results demonstrate the effectiveness of the method.
Resumo:
Reducing energy consumption is a major challenge for energy-intensive industries such as papermaking. A commercially viable energy saving solution is to employ data-based optimization techniques to obtain a set of optimized operational settings that satisfy certain performance indices. The difficulties of this are: 1) the problems of this type are inherently multicriteria in the sense that improving one performance index might result in compromising the other important measures; 2) practical systems often exhibit unknown complex dynamics and several interconnections which make the modeling task difficult; and 3) as the models are acquired from the existing historical data, they are valid only locally and extrapolations incorporate risk of increasing process variability. To overcome these difficulties, this paper presents a new decision support system for robust multiobjective optimization of interconnected processes. The plant is first divided into serially connected units to model the process, product quality, energy consumption, and corresponding uncertainty measures. Then multiobjective gradient descent algorithm is used to solve the problem in line with user's preference information. Finally, the optimization results are visualized for analysis and decision making. In practice, if further iterations of the optimization algorithm are considered, validity of the local models must be checked prior to proceeding to further iterations. The method is implemented by a MATLAB-based interactive tool DataExplorer supporting a range of data analysis, modeling, and multiobjective optimization techniques. The proposed approach was tested in two U.K.-based commercial paper mills where the aim was reducing steam consumption and increasing productivity while maintaining the product quality by optimization of vacuum pressures in forming and press sections. The experimental results demonstrate the effectiveness of the method. © 2006 IEEE.
Resumo:
Most of the manual labor needed to create the geometric building information model (BIM) of an existing facility is spent converting raw point cloud data (PCD) to a BIM description. Automating this process would drastically reduce the modeling cost. Surface extraction from PCD is a fundamental step in this process. Compact modeling of redundant points in PCD as a set of planes leads to smaller file size and fast interactive visualization on cheap hardware. Traditional approaches for smooth surface reconstruction do not explicitly model the sparse scene structure or significantly exploit the redundancy. This paper proposes a method based on sparsity-inducing optimization to address the planar surface extraction problem. Through sparse optimization, points in PCD are segmented according to their embedded linear subspaces. Within each segmented part, plane models can be estimated. Experimental results on a typical noisy PCD demonstrate the effectiveness of the algorithm.
Resumo:
This paper is aimed at enabling the confident use of existing model test facilities for ultra deepwater application without having to compromise on the widely accepted range of scales currently used by the floating production industry. Passive line truncation has traditionally been the preferred method of creating an equivalent numerical model at reduced depth; however, these techniques tend to suffer in capturing accurately line dynamic response and so reproducing peak tensions. In an attempt to improve credibility of model test data the proposed truncation procedure sets up the truncated model, based on line dynamic response rather than quasi-static system stiffness. The upper sections of each line are modeled in detail, capturing the wave action zone and all coupling effects with the vessel. These terminate to an approximate analytical model that aims to simulate the remainder of the line. Stages 1 & 2 are used to derive a water depth truncation ratio. Here vibration decay of transverse elastic waves is assessed and it is found that below a certain length criterion, the transverse vibrational characteristics for each line are inertia driven, hence with respect to these motions the truncated model can assume a linear damper whose coefficient depends on the local line properties and vibration frequency. Stage 3 endeavors to match the individual line stiffness between the full depth and truncated models. In deepwater it is likely that taut polyester moorings will be used which are predominantly straight and have high axial stiffness that provides the principal restoring force to static and low frequency vessel motions. Consequently, it means that the natural frequencies of axial vibrations are above the typical wave frequency range allowing for a quasi-static solution. In cases of exceptionally large wave frequency vessel motions, localized curvature at the chain seabed segment and tangential skin drag on the polyester rope can increase dynamic peak tensions considerably. The focus of this paper is to develop an efficient scheme based on analytic formulation, for replicating these forces at the truncation. The paper will close with an example case study of a single mooring under extreme conditions that replicates exactly the static and dynamic characteristics of the full depth line. Copyright © 2012 by the International Society of Offshore and Polar Engineers (ISOPE).
Resumo:
Computational fluid dynamics (CFD) simulations are becoming increasingly widespread with the advent of more powerful computers and more sophisticated software. The aim of these developments is to facilitate more accurate reactor design and optimization methods compared to traditional lumped-parameter models. However, in order for CFD to be a trusted method, it must be validated using experimental data acquired at sufficiently high spatial resolution. This article validates an in-house CFD code by comparison with flow-field data obtained using magnetic resonance imaging (MRI) for a packed bed with a particle-to-column diameter ratio of 2. Flows characterized by inlet Reynolds numbers, based on particle diameter, of 27, 55, 111, and 216 are considered. The code used employs preconditioning to directly solve for pressure in low-velocity flow regimes. Excellent agreement was found between the MRI and CFD data with relative error between the experimentally determined and numerically predicted flow-fields being in the range of 3-9%. © 2012 American Institute of Chemical Engineers (AIChE).
Resumo:
In this study, TiN/La 2O 3/HfSiON/SiO 2/Si gate stacks with thick high-k (HK) and thick pedestal oxide were used. Samples were annealed at different temperatures and times in order to characterize in detail the interaction mechanisms between La and the gate stack layers. Time-of-flight secondary ion mass spectrometry (ToF-SIMS) measurements performed on these samples show a time diffusion saturation of La in the high-k insulator, indicating an La front immobilization due to LaSiO formation at the high-k/interfacial layer. Based on the SIMS data, a technology computer aided design (TCAD) diffusion model including La time diffusion saturation effect was developed. © 2012 American Institute of Physics.
Resumo:
Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences ($\sim1$s); phonemes ($\sim10$−$1$ s); glottal pulses ($\sim 10$−$2$s); and formants ($\sim 10$−$3$s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis [1]. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscience-inspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored. The reason for this is two-fold. Firstly, it is a difficult technical problem to construct an algorithm that utilises both sorts of information. Secondly, it is computationally demanding to simultaneously process data both at high resolution (to extract short temporal information) and for long duration (to extract long temporal information). The contribution of this work is to develop a new statistical model for natural sounds that captures structure across a wide range of time-scales, and to provide efficient learning and inference algorithms. We demonstrate the success of this approach on a missing data task.