19 resultados para Missing data
em Cambridge University Engineering Department Publications Database
Resumo:
Data in an organisation often contains business secrets that organisations do not want to release. However, there are occasions when it is necessary for an organisation to release its data such as when outsourcing work or using the cloud for Data Quality (DQ) related tasks like data cleansing. Currently, there is no mechanism that allows organisations to release their data for DQ tasks while ensuring that it is suitably protected from releasing business related secrets. The aim of this paper is therefore to present our current progress on determining which methods are able to modify secret data and retain DQ problems. So far we have identified the ways in which data swapping and the SHA-2 hash function alterations methods can be used to preserve missing data, incorrectly formatted values, and domain violations DQ problems while minimising the risk of disclosing secrets. © (2012) by the AIS/ICIS Administrative Office All rights reserved.
Resumo:
We define a copula process which describes the dependencies between arbitrarily many random variables independently of their marginal distributions. As an example, we develop a stochastic volatility model, Gaussian Copula Process Volatility (GCPV), to predict the latent standard deviations of a sequence of random variables. To make predictions we use Bayesian inference, with the Laplace approximation, and with Markov chain Monte Carlo as an alternative. We find both methods comparable. We also find our model can outperform GARCH on simulated and financial data. And unlike GARCH, GCPV can easily handle missing data, incorporate covariates other than time, and model a rich class of covariance structures.
Resumo:
We introduce a stochastic process with Wishart marginals: the generalised Wishart process (GWP). It is a collection of positive semi-definite random matrices indexed by any arbitrary dependent variable. We use it to model dynamic (e.g. time varying) covariance matrices. Unlike existing models, it can capture a diverse class of covariance structures, it can easily handle missing data, the dependent variable can readily include covariates other than time, and it scales well with dimension; there is no need for free parameters, and optional parameters are easy to interpret. We describe how to construct the GWP, introduce general procedures for inference and predictions, and show that it outperforms its main competitor, multivariate GARCH, even on financial data that especially suits GARCH. We also show how to predict the mean of a multivariate process while accounting for dynamic correlations.
Resumo:
Demodulation is an ill-posed problem whenever both carrier and envelope signals are broadband and unknown. Here, we approach this problem using the methods of probabilistic inference. The new approach, called Probabilistic Amplitude Demodulation (PAD), is computationally challenging but improves on existing methods in a number of ways. By contrast to previous approaches to demodulation, it satisfies five key desiderata: PAD has soft constraints because it is probabilistic; PAD is able to automatically adjust to the signal because it learns parameters; PAD is user-steerable because the solution can be shaped by user-specific prior information; PAD is robust to broad-band noise because this is modeled explicitly; and PAD's solution is self-consistent, empirically satisfying a Carrier Identity property. Furthermore, the probabilistic view naturally encompasses noise and uncertainty, allowing PAD to cope with missing data and return error bars on carrier and envelope estimates. Finally, we show that when PAD is applied to a bandpass-filtered signal, the stop-band energy of the inferred carrier is minimal, making PAD well-suited to sub-band demodulation. © 2006 IEEE.
Resumo:
A number of recent scientific and engineering problems require signals to be decomposed into a product of a slowly varying positive envelope and a quickly varying carrier whose instantaneous frequency also varies slowly over time. Although signal processing provides algorithms for so-called amplitude-and frequency-demodulation (AFD), there are well known problems with all of the existing methods. Motivated by the fact that AFD is ill-posed, we approach the problem using probabilistic inference. The new approach, called probabilistic amplitude and frequency demodulation (PAFD), models instantaneous frequency using an auto-regressive generalization of the von Mises distribution, and the envelopes using Gaussian auto-regressive dynamics with a positivity constraint. A novel form of expectation propagation is used for inference. We demonstrate that although PAFD is computationally demanding, it outperforms previous approaches on synthetic and real signals in clean, noisy and missing data settings.