991 resultados para Gaussian scale mixture
Resumo:
Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.
Resumo:
Gaussian multi-scale representation is a mathematical framework that allows to analyse images at different scales in a consistent manner, and to handle derivatives in a way deeply connected to scale. This paper uses Gaussian multi-scale representation to investigate several aspects of the derivation of atmospheric motion vectors (AMVs) from water vapour imagery. The contribution of different spatial frequencies to the tracking is studied, for a range of tracer sizes, and a number of tracer selection methods are presented and compared, using WV 6.2 images from the geostationary satellite MSG-2.
Resumo:
We generalize the popular ensemble Kalman filter to an ensemble transform filter, in which the prior distribution can take the form of a Gaussian mixture or a Gaussian kernel density estimator. The design of the filter is based on a continuous formulation of the Bayesian filter analysis step. We call the new filter algorithm the ensemble Gaussian-mixture filter (EGMF). The EGMF is implemented for three simple test problems (Brownian dynamics in one dimension, Langevin dynamics in two dimensions and the three-dimensional Lorenz-63 model). It is demonstrated that the EGMF is capable of tracking systems with non-Gaussian uni- and multimodal ensemble distributions. Copyright © 2011 Royal Meteorological Society
On-line Gaussian mixture density estimator for adaptive minimum bit-error-rate beamforming receivers
Resumo:
We develop an on-line Gaussian mixture density estimator (OGMDE) in the complex-valued domain to facilitate adaptive minimum bit-error-rate (MBER) beamforming receiver for multiple antenna based space-division multiple access systems. Specifically, the novel OGMDE is proposed to adaptively model the probability density function of the beamformer’s output by tracking the incoming data sample by sample. With the aid of the proposed OGMDE, our adaptive beamformer is capable of updating the beamformer’s weights sample by sample to directly minimize the achievable bit error rate (BER). We show that this OGMDE based MBER beamformer outperforms the existing on-line MBER beamformer, known as the least BER beamformer, in terms of both the convergence speed and the achievable BER.
Resumo:
Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multimodal and multidimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean vectors and covariance matrices) updated using all the previously generated samples applying simple recursive rules. Numerical results for the one and two-dimensional cases are provided.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
We perform variational studies of the interaction-localization problem to describe the interaction-induced renormalizations of the effective (screened) random potential seen by quasiparticles. Here we present results of careful finite-size scaling studies for the conductance of disordered Hubbard chains at half-filling and zero temperature. While our results indicate that quasiparticle wave functions remain exponentially localized even in the presence of moderate to strong repulsive interactions, we show that interactions produce a strong decrease of the characteristic conductance scale g^{*} signaling the crossover to strong localization. This effect, which cannot be captured by a simple renormalization of the disorder strength, instead reflects a peculiar non-Gaussian form of the spatial correlations of the screened disordered potential, a hitherto neglected mechanism to dramatically reduce the impact of Anderson localization (interference) effects.
Resumo:
This paper proposes a novel computer vision approach that processes video sequences of people walking and then recognises those people by their gait. Human motion carries different information that can be analysed in various ways. The skeleton carries motion information about human joints, and the silhouette carries information about boundary motion of the human body. Moreover, binary and gray-level images contain different information about human movements. This work proposes to recover these different kinds of information to interpret the global motion of the human body based on four different segmented image models, using a fusion model to improve classification. Our proposed method considers the set of the segmented frames of each individual as a distinct class and each frame as an object of this class. The methodology applies background extraction using the Gaussian Mixture Model (GMM), a scale reduction based on the Wavelet Transform (WT) and feature extraction by Principal Component Analysis (PCA). We propose four new schemas for motion information capture: the Silhouette-Gray-Wavelet model (SGW) captures motion based on grey level variations; the Silhouette-Binary-Wavelet model (SBW) captures motion based on binary information; the Silhouette-Edge-Binary model (SEW) captures motion based on edge information and the Silhouette Skeleton Wavelet model (SSW) captures motion based on skeleton movement. The classification rates obtained separately from these four different models are then merged using a new proposed fusion technique. The results suggest excellent performance in terms of recognising people by their gait.
Resumo:
A laboratory scale sequencing batch reactor (SBR) operating for enhanced biological phosphorus removal (EBPR) and fed with a mixture of volatile fatty acids (VFAs) showed stable and efficient EBPR capacity over a four-year-period. Phosphorus (P), poly-beta-hydroxyalkanoate (PHA) and glycogen cycling consistent with classical anaerobic/aerobic EBPR were demonstrated with the order of anaerobic VFA uptake being propionate, acetate then butyrate. The SBR was operated without pH control and 63.67+/-13.86 mg P l(-1) was released anaerobically. The P% of the sludge fluctuated between 6% and 10% over the operating period (average of 8.04+/-1.31%). Four main morphological types of floc-forming bacteria were observed in the sludge during one year of in-tensive microscopic observation. Two of them were mainly responsible for anaerobic/aerobic P and PHA transformations. Fluorescence in situ hybridization (FISH) and post-FISH chemical staining for intracellular polyphosphate and PHA were used to determine that 'Candidatus Accumulibacter phosphatis' was the most abundant polyphosphate accumulating organism (PAO), forming large clusters of coccobacilli (1.0-1.5 mum) and comprising 53% of the sludge bacteria. Also by these methods, large coccobacillus-shaped gammaproteobacteria (2.5-3.5 mum) from a recently described novel cluster were glycogen-accumulating organisms (GAOs) comprising 13% of the bacteria. Tetrad-forming organisms (TFOs) consistent with the 'G bacterium' morphotype were alphaproteobacteria , but not Amaricoccus spp., and comprised 25% of all bacteria. According to chemical staining, TFOs were occasionally able to store PHA anaerobically and utilize it aerobically.
Resumo:
Anogenital lichen sclerosus is a chronic, inflammatory, mucocutaneous disorder of significant morbidity. Common symptoms include pruritus, pain, dysuria, and dyspareunia, frequently of difficult control. Photodynamic therapy (PDT) may be an effective therapeutic option in selected cases refractory to first--‐line treatment options. However, procedure--‐related pain is a limiting factor in patient adherence to treatment. Conscious sedation and analgesia with a ready--‐to--‐use gas mixture of nitrous oxide and oxygen is useful in short--‐term procedures. It provides a rapid, effective, and short--‐lived effect, without the need for anesthesiology support. A 75--‐year--‐old woman presented with a highly symptomatic, histologically confirmed vulvar lichen sclerosus, with at least 15 years of evolution. Pain, pruritus, and dysuria were intense and disabling. Treatment with ultrapotent topical corticosteroids proved to be ineffective despite patient compliance. She was then referred for PDT. A total of 3 sessions were performed, held at a mean interval of 9 weeks, and under the analgesic and sedative effect of nitrous oxide/oxygen gas. Response to treatment was evaluated through a daily, self--‐reported pain rating scale. Dysuria remitted completely after the first PDT session. An 80% reduction in pruritus and pain was observed after the third session, and has been sustained for the past six months without further need for topical corticotherapy. Treatment sessions were well tolerated and pain-- free, with no side effects to report. PDT appears to be effective in the symptomatic treatment of vulvar lichen sclerosus. To the authors’ knowledge this is the first case reporting the use of inhaled nitrous oxide/oxygen gas mixture during PDT performed in the genital area. Its analgesic and sedative effects may increase patients’ adherence to this painful procedure. Furthermore, given its safety, it can be easily managed in outpatient clinics by trained dermatologists.
Resumo:
Pollution in coastal ecosystems is a serious threat to the biota and human populations there residing. Anthropogenic activities in these ecosystems are the main cause of contamination by endocrine disruption compounds (EDCs), which can interfere with hormonal regulation and cause adverse effects to growth, stress response and reproduction. Although the chemical nature of many EDCs is unknown, it is believed that most are organic contaminants. Under an environmental risk assessment for a contaminated estuary (the Sado, SW Portugal), the present work intended to detect endocrine disruption in a flatsfish, Solea senegalensis Kaup, 1858, and its potential relationship to organic toxicants. Animals were collected from two areas in the estuary with distinct influences (industrial and rural) and from an external reference area. To evaluate endocrine disruption, hepatic vitellogenin (VTG) concentrations in males and gonad histology were analysed. As biomarkers of exposure to organic contaminants, cytochrome P450 (CYP1A) induction and the ethoxyresorufin-O-deethylase (EROD) activity were determined. The results were contrasted to sediment contamination levels, which are overall considered low, although the area presents a complex mixture of toxicants. Either males or females were found sexually immature and showed no significant evidence of degenerative pathologies. However, hepatic VTG concentrations in males from the industrial area in estuary were superior than those from the Reference, even reaching levels comparable to those in females, which may indicate an oestrogenic effect resulting from the complex contaminant mixture. These individuals also presented higher levels of CYP1A induction and EROD activity, which is consistent with contamination by organic substances. The combination of the results suggest that the exposure of flatfish to an environment contaminated by mixed toxicants, even at low levels, may cause endocrine disruption, therefore affecting populations, which implies the need for further research in identification of potential EDCs, their sources and risks at ecosystem scale.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.