59 resultados para gaussian mixture model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

ABSTRACT Non-Gaussian/non-linear data assimilation is becoming an increasingly important area of research in the Geosciences as the resolution and non-linearity of models are increased and more and more non-linear observation operators are being used. In this study, we look at the effect of relaxing the assumption of a Gaussian prior on the impact of observations within the data assimilation system. Three different measures of observation impact are studied: the sensitivity of the posterior mean to the observations, mutual information and relative entropy. The sensitivity of the posterior mean is derived analytically when the prior is modelled by a simplified Gaussian mixture and the observation errors are Gaussian. It is found that the sensitivity is a strong function of the value of the observation and proportional to the posterior variance. Similarly, relative entropy is found to be a strong function of the value of the observation. However, the errors in estimating these two measures using a Gaussian approximation to the prior can differ significantly. This hampers conclusions about the effect of the non-Gaussian prior on observation impact. Mutual information does not depend on the value of the observation and is seen to be close to its Gaussian approximation. These findings are illustrated with the particle filter applied to the Lorenz ’63 system. This article is concluded with a discussion of the appropriateness of these measures of observation impact for different situations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The analysis step of the (ensemble) Kalman filter is optimal when (1) the distribution of the background is Gaussian, (2) state variables and observations are related via a linear operator, and (3) the observational error is of additive nature and has Gaussian distribution. When these conditions are largely violated, a pre-processing step known as Gaussian anamorphosis (GA) can be applied. The objective of this procedure is to obtain state variables and observations that better fulfil the Gaussianity conditions in some sense. In this work we analyse GA from a joint perspective, paying attention to the effects of transformations in the joint state variable/observation space. First, we study transformations for state variables and observations that are independent from each other. Then, we introduce a targeted joint transformation with the objective to obtain joint Gaussianity in the transformed space. We focus primarily in the univariate case, and briefly comment on the multivariate one. A key point of this paper is that, when (1)-(3) are violated, using the analysis step of the EnKF will not recover the exact posterior density in spite of any transformations one may perform. These transformations, however, provide approximations of different quality to the Bayesian solution of the problem. Using an example in which the Bayesian posterior can be analytically computed, we assess the quality of the analysis distributions generated after applying the EnKF analysis step in conjunction with different GA options. The value of the targeted joint transformation is particularly clear for the case when the prior is Gaussian, the marginal density for the observations is close to Gaussian, and the likelihood is a Gaussian mixture.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new coupled cloud physics–radiation parameterization of the bulk optical properties of ice clouds is presented. The parameterization is consistent with assumptions in the cloud physics scheme regarding particle size distributions (PSDs) and mass–dimensional relationships. The parameterization is based on a weighted ice crystal habit mixture model, and its bulk optical properties are parameterized as simple functions of wavelength and ice water content (IWC). This approach directly couples IWC to the bulk optical properties, negating the need for diagnosed variables, such as the ice crystal effective dimension. The parameterization is implemented into the Met Office Unified Model Global Atmosphere 5.0 (GA5) configuration. The GA5 configuration is used to simulate the annual 20-yr shortwave (SW) and longwave (LW) fluxes at the top of the atmosphere (TOA), as well as the temperature structure of the atmosphere, under various microphysical assumptions. The coupled parameterization is directly compared against the current operational radiation parameterization, while maintaining the same cloud physics assumptions. In this experiment, the impacts of the two parameterizations on the SW and LW radiative effects at TOA are also investigated and compared against observations. The 20-yr simulations are compared against the latest observations of the atmospheric temperature and radiative fluxes at TOA. The comparisons demonstrate that the choice of PSD and the assumed ice crystal shape distribution are as important as each other. Moreover, the consistent radiation parameterization removes a long-standing tropical troposphere cold temperature bias but slightly warms the southern midlatitudes by about 0.5 K.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new sparse kernel density estimator is introduced based on the minimum integrated square error criterion combining local component analysis for the finite mixture model. We start with a Parzen window estimator which has the Gaussian kernels with a common covariance matrix, the local component analysis is initially applied to find the covariance matrix using expectation maximization algorithm. Since the constraint on the mixing coefficients of a finite mixture model is on the multinomial manifold, we then use the well-known Riemannian trust-region algorithm to find the set of sparse mixing coefficients. The first and second order Riemannian geometry of the multinomial manifold are utilized in the Riemannian trust-region algorithm. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with competitive accuracy to existing kernel density estimators.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A wind-tunnel study was conducted to investigate ventilation of scalars from urban-like geometries at neighbourhood scale by exploring two different geometries a uniform height roughness and a non-uniform height roughness, both with an equal plan and frontal density of λ p = λ f = 25%. In both configurations a sub-unit of the idealized urban surface was coated with a thin layer of naphthalene to represent area sources. The naphthalene sublimation method was used to measure directly total area-averaged transport of scalars out of the complex geometries. At the same time, naphthalene vapour concentrations controlled by the turbulent fluxes were detected using a fast Flame Ionisation Detection (FID) technique. This paper describes the novel use of a naphthalene coated surface as an area source in dispersion studies. Particular emphasis was also given to testing whether the concentration measurements were independent of Reynolds number. For low wind speeds, transfer from the naphthalene surface is determined by a combination of forced and natural convection. Compared with a propane point source release, a 25% higher free stream velocity was needed for the naphthalene area source to yield Reynolds-number-independent concentration fields. Ventilation transfer coefficients w T /U derived from the naphthalene sublimation method showed that, whilst there was enhanced vertical momentum exchange due to obstacle height variability, advection was reduced and dispersion from the source area was not enhanced. Thus, the height variability of a canopy is an important parameter when generalising urban dispersion. Fine resolution concentration measurements in the canopy showed the effect of height variability on dispersion at street scale. Rapid vertical transport in the wake of individual high-rise obstacles was found to generate elevated point-like sources. A Gaussian plume model was used to analyse differences in the downstream plumes. Intensified lateral and vertical plume spread and plume dilution with height was found for the non-uniform height roughness

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The article considers screening human populations with two screening tests. If any of the two tests is positive, then full evaluation of the disease status is undertaken; however, if both diagnostic tests are negative, then disease status remains unknown. This procedure leads to a data constellation in which, for each disease status, the 2 × 2 table associated with the two diagnostic tests used in screening has exactly one empty, unknown cell. To estimate the unobserved cell counts, previous approaches assume independence of the two diagnostic tests and use specific models, including the special mixture model of Walter or unconstrained capture–recapture estimates. Often, as is also demonstrated in this article by means of a simple test, the independence of the two screening tests is not supported by the data. Two new estimators are suggested that allow associations of the screening test, although the form of association must be assumed to be homogeneous over disease status. These estimators are modifications of the simple capture–recapture estimator and easy to construct. The estimators are investigated for several screening studies with fully evaluated disease status in which the superior behavior of the new estimators compared to the previous conventional ones can be shown. Finally, the performance of the new estimators is compared with maximum likelihood estimators, which are more difficult to obtain in these models. The results indicate the loss of efficiency as minor.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper considers meta-analysis of diagnostic studies that use a continuous score for classification of study participants into healthy or diseased groups. Classification is often done on the basis of a threshold or cut-off value, which might vary between studies. Consequently, conventional meta-analysis methodology focusing solely on separate analysis of sensitivity and specificity might be confounded by a potentially unknown variation of the cut-off value. To cope with this phenomena it is suggested to use, instead, an overall estimate of the misclassification error previously suggested and used as Youden’s index and; furthermore, it is argued that this index is less prone to between-study variation of cut-off values. A simple Mantel–Haenszel estimator as a summary measure of the overall misclassification error is suggested, which adjusts for a potential study effect. The measure of the misclassification error based on Youden’s index is advantageous in that it easily allows an extension to a likelihood approach, which is then able to cope with unobserved heterogeneity via a nonparametric mixture model. All methods are illustrated at hand of an example on a diagnostic meta-analysis on duplex doppler ultrasound, with angiography as the standard for stroke prevention.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Much of the atmospheric variability in the North Atlantic sector is associated with variations in the eddy-driven component of the zonal flow. Here we present a simple method to specifically diagnose this component of the flow using the low-level wind field (925–700 hpa ). We focus on the North Atlantic winter season in the ERA-40 reanalysis. Diagnostics of the latitude and speed of the eddy-driven jet stream are compared with conventional diagnostics of the North Atlantic Oscillation (NAO) and the East Atlantic (EA) pattern. This shows that the NAO and the EA both describe combined changes in the latitude and speed of the jet stream. It is therefore necessary, but not always sufficient, to consider both the NAO and the EA in identifying changes in the jet stream. The jet stream analysis suggests that there are three preferred latitudinal positions of the North Atlantic eddy-driven jet stream in winter. This result is in very good agreement with the application of a statistical mixture model to the two-dimensional state space defined by the NAO and the EA. These results are consistent with several other studies which identify four European/Atlantic regimes, comprising three jet stream patterns plus European blocking events.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper considers meta-analysis of diagnostic studies that use a continuous Score for classification of study participants into healthy, or diseased groups. Classification is often done on the basis of a threshold or cut-off value, which might vary between Studies. Consequently, conventional meta-analysis methodology focusing solely on separate analysis of sensitivity and specificity might he confounded by a potentially unknown variation of the cut-off Value. To cope with this phenomena it is suggested to use, instead an overall estimate of the misclassification error previously suggested and used as Youden's index and; furthermore, it is argued that this index is less prone to between-study variation of cut-off values. A simple Mantel-Haenszel estimator as a summary measure of the overall misclassification error is suggested, which adjusts for a potential study effect. The measure of the misclassification error based on Youden's index is advantageous in that it easily allows an extension to a likelihood approach, which is then able to cope with unobserved heterogeneity via a nonparametric mixture model. All methods are illustrated at hand of an example on a diagnostic meta-analysis on duplex doppler ultrasound, with angiography as the standard for stroke prevention.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The article considers screening human populations with two screening tests. If any of the two tests is positive, then full evaluation of the disease status is undertaken; however, if both diagnostic tests are negative, then disease status remains unknown. This procedure leads to a data constellation in which, for each disease status, the 2 x 2 table associated with the two diagnostic tests used in screening has exactly one empty, unknown cell. To estimate the unobserved cell counts, previous approaches assume independence of the two diagnostic tests and use specific models, including the special mixture model of Walter or unconstrained capture-recapture estimates. Often, as is also demonstrated in this article by means of a simple test, the independence of the two screening tests is not supported by the data. Two new estimators are suggested that allow associations of the screening test, although the form of association must be assumed to be homogeneous over disease status. These estimators are modifications of the simple capture-recapture estimator and easy to construct. The estimators are investigated for several screening studies with fully evaluated disease status in which the superior behavior of the new estimators compared to the previous conventional ones can be shown. Finally, the performance of the new estimators is compared with maximum likelihood estimators, which are more difficult to obtain in these models. The results indicate the loss of efficiency as minor.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Accelerated failure time models with a shared random component are described, and are used to evaluate the effect of explanatory factors and different transplant centres on survival times following kidney transplantation. Different combinations of the distribution of the random effects and baseline hazard function are considered and the fit of such models to the transplant data is critically assessed. A mixture model that combines short- and long-term components of a hazard function is then developed, which provides a more flexible model for the hazard function. The model can incorporate different explanatory variables and random effects in each component. The model is straightforward to fit using standard statistical software, and is shown to be a good fit to the transplant data. Copyright (C) 2004 John Wiley Sons, Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The study of motor unit action potential (MUAP) activity from electrornyographic signals is an important stage on neurological investigations that aim to understand the state of the neuromuscular system. In this context, the identification and clustering of MUAPs that exhibit common characteristics, and the assessment of which data features are most relevant for the definition of such cluster structure are central issues. In this paper, we propose the application of an unsupervised Feature Relevance Determination (FRD) method to the analysis of experimental MUAPs obtained from healthy human subjects. In contrast to approaches that require the knowledge of a priori information from the data, this FRD method is embedded on a constrained mixture model, known as Generative Topographic Mapping, which simultaneously performs clustering and visualization of MUAPs. The experimental results of the analysis of a data set consisting of MUAPs measured from the surface of the First Dorsal Interosseous, a hand muscle, indicate that the MUAP features corresponding to the hyperpolarization period in the physisiological process of generation of muscle fibre action potentials are consistently estimated as the most relevant and, therefore, as those that should be paid preferential attention for the interpretation of the MUAP groupings.