916 resultados para High-dimensional indexing
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
The application of particle filters in geophysical systems is reviewed. Some background on Bayesian filtering is provided, and the existing methods are discussed. The emphasis is on the methodology, and not so much on the applications themselves. It is shown that direct application of the basic particle filter (i.e., importance sampling using the prior as the importance density) does not work in high-dimensional systems, but several variants are shown to have potential. Approximations to the full problem that try to keep some aspects of the particle filter beyond the Gaussian approximation are also presented and discussed.
Resumo:
The identification and visualization of clusters formed by motor unit action potentials (MUAPs) is an essential step in investigations seeking to explain the control of the neuromuscular system. This work introduces the generative topographic mapping (GTM), a novel machine learning tool, for clustering of MUAPs, and also it extends the GTM technique to provide a way of visualizing MUAPs. The performance of GTM was compared to that of three other clustering methods: the self-organizing map (SOM), a Gaussian mixture model (GMM), and the neural-gas network (NGN). The results, based on the study of experimental MUAPs, showed that the rate of success of both GTM and SOM outperformed that of GMM and NGN, and also that GTM may in practice be used as a principled alternative to the SOM in the study of MUAPs. A visualization tool, which we called GTM grid, was devised for visualization of MUAPs lying in a high-dimensional space. The visualization provided by the GTM grid was compared to that obtained from principal component analysis (PCA). (c) 2005 Elsevier Ireland Ltd. All rights reserved.
Resumo:
This paper introduces a new neurofuzzy model construction and parameter estimation algorithm from observed finite data sets, based on a Takagi and Sugeno (T-S) inference mechanism and a new extended Gram-Schmidt orthogonal decomposition algorithm, for the modeling of a priori unknown dynamical systems in the form of a set of fuzzy rules. The first contribution of the paper is the introduction of a one to one mapping between a fuzzy rule-base and a model matrix feature subspace using the T-S inference mechanism. This link enables the numerical properties associated with a rule-based matrix subspace, the relationships amongst these matrix subspaces, and the correlation between the output vector and a rule-base matrix subspace, to be investigated and extracted as rule-based knowledge to enhance model transparency. The matrix subspace spanned by a fuzzy rule is initially derived as the input regression matrix multiplied by a weighting matrix that consists of the corresponding fuzzy membership functions over the training data set. Model transparency is explored by the derivation of an equivalence between an A-optimality experimental design criterion of the weighting matrix and the average model output sensitivity to the fuzzy rule, so that rule-bases can be effectively measured by their identifiability via the A-optimality experimental design criterion. The A-optimality experimental design criterion of the weighting matrices of fuzzy rules is used to construct an initial model rule-base. An extended Gram-Schmidt algorithm is then developed to estimate the parameter vector for each rule. This new algorithm decomposes the model rule-bases via an orthogonal subspace decomposition approach, so as to enhance model transparency with the capability of interpreting the derived rule-base energy level. This new approach is computationally simpler than the conventional Gram-Schmidt algorithm for resolving high dimensional regression problems, whereby it is computationally desirable to decompose complex models into a few submodels rather than a single model with large number of input variables and the associated curse of dimensionality problem. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.
Resumo:
A generalized or tunable-kernel model is proposed for probability density function estimation based on an orthogonal forward regression procedure. Each stage of the density estimation process determines a tunable kernel, namely, its center vector and diagonal covariance matrix, by minimizing a leave-one-out test criterion. The kernel mixing weights of the constructed sparse density estimate are finally updated using the multiplicative nonnegative quadratic programming algorithm to ensure the nonnegative and unity constraints, and this weight-updating process additionally has the desired ability to further reduce the model size. The proposed tunable-kernel model has advantages, in terms of model generalization capability and model sparsity, over the standard fixed-kernel model that restricts kernel centers to the training data points and employs a single common kernel variance for every kernel. On the other hand, it does not optimize all the model parameters together and thus avoids the problems of high-dimensional ill-conditioned nonlinear optimization associated with the conventional finite mixture model. Several examples are included to demonstrate the ability of the proposed novel tunable-kernel model to effectively construct a very compact density estimate accurately.
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.
Resumo:
This report describes the analysis and development of novel tools for the global optimisation of relevant mission design problems. A taxonomy was created for mission design problems, and an empirical analysis of their optimisational complexity performed - it was demonstrated that the use of global optimisation was necessary on most classes and informed the selection of appropriate global algorithms. The selected algorithms were then applied to the di®erent problem classes: Di®erential Evolution was found to be the most e±cient. Considering the speci¯c problem of multiple gravity assist trajectory design, a search space pruning algorithm was developed that displays both polynomial time and space complexity. Empirically, this was shown to typically achieve search space reductions of greater than six orders of magnitude, thus reducing signi¯cantly the complexity of the subsequent optimisation. The algorithm was fully implemented in a software package that allows simple visualisation of high-dimensional search spaces, and e®ective optimisation over the reduced search bounds.
Resumo:
This paper proposes a nonlinear regression structure comprising a wavelet network and a linear term. The introduction of the linear term is aimed at providing a more parsimonious interpolation in high-dimensional spaces when the modelling samples are sparse. A constructive procedure for building such structures, termed linear-wavelet networks, is described. For illustration, the proposed procedure is employed in the framework of dynamic system identification. In an example involving a simulated fermentation process, it is shown that a linear-wavelet network yields a smaller approximation error when compared with a wavelet network with the same number of regressors. The proposed technique is also applied to the identification of a pressure plant from experimental data. In this case, the results show that the introduction of wavelets considerably improves the prediction ability of a linear model. Standard errors on the estimated model coefficients are also calculated to assess the numerical conditioning of the identification process.
Resumo:
A set of random variables is exchangeable if its joint distribution function is invariant under permutation of the arguments. The concept of exchangeability is discussed, with a view towards potential application in evaluating ensemble forecasts. It is argued that the paradigm of ensembles being an independent draw from an underlying distribution function is probably too narrow; allowing ensemble members to be merely exchangeable might be a more versatile model. The question is discussed whether established methods of ensemble evaluation need alteration under this model, with reliability being given particular attention. It turns out that the standard methodology of rank histograms can still be applied. As a first application of the exchangeability concept, it is shown that the method of minimum spanning trees to evaluate the reliability of high dimensional ensembles is mathematically sound.
Resumo:
Mean field models (MFMs) of cortical tissue incorporate salient, average features of neural masses in order to model activity at the population level, thereby linking microscopic physiology to macroscopic observations, e.g., with the electroencephalogram (EEG). One of the common aspects of MFM descriptions is the presence of a high-dimensional parameter space capturing neurobiological attributes deemed relevant to the brain dynamics of interest. We study the physiological parameter space of a MFM of electrocortical activity and discover robust correlations between physiological attributes of the model cortex and its dynamical features. These correlations are revealed by the study of bifurcation plots, which show that the model responses to changes in inhibition belong to two archetypal categories or “families”. After investigating and characterizing them in depth, we discuss their essential differences in terms of four important aspects: power responses with respect to the modeled action of anesthetics, reaction to exogenous stimuli such as thalamic input, and distributions of model parameters and oscillatory repertoires when inhibition is enhanced. Furthermore, while the complexity of sustained periodic orbits differs significantly between families, we are able to show how metamorphoses between the families can be brought about by exogenous stimuli. We here unveil links between measurable physiological attributes of the brain and dynamical patterns that are not accessible by linear methods. They instead emerge when the nonlinear structure of parameter space is partitioned according to bifurcation responses. We call this general method “metabifurcation analysis”. The partitioning cannot be achieved by the investigation of only a small number of parameter sets and is instead the result of an automated bifurcation analysis of a representative sample of 73,454 physiologically admissible parameter sets. Our approach generalizes straightforwardly and is well suited to probing the dynamics of other models with large and complex parameter spaces.
Resumo:
The Asian summer monsoon is a high dimensional and highly nonlinear phenomenon involving considerable moisture transport towards land from the ocean, and is critical for the whole region. We have used daily ECMWF reanalysis (ERA-40) sea-level pressure (SLP) anomalies to the seasonal cycle, over the region 50-145°E, 20°S-35°N to study the nonlinearity of the Asian monsoon using Isomap. We have focused on the two-dimensional embedding of the SLP anomalies for ease of interpretation. Unlike the unimodality obtained from tests performed in empirical orthogonal function space, the probability density function, within the two-dimensional Isomap space, turns out to be bimodal. But a clustering procedure applied to the SLP data reveals support for three clusters, which are identified using a three-component bivariate Gaussian mixture model. The modes are found to appear similar to active and break phases of the monsoon over South Asia in addition to a third phase, which shows active conditions over the Western North Pacific. Using the low-level wind field anomalies the active phase over South Asia is found to be characterised by a strengthening and an eastward extension of the Somali jet whereas during the break phase the Somali jet is weakened near southern India, while the monsoon trough in northern India also weakens. Interpretation is aided using the APHRODITE gridded land precipitation product for monsoon Asia. The effect of large-scale seasonal mean monsoon and lower boundary forcing, in the form of ENSO, is also investigated and discussed. The outcome here is that ENSO is shown to perturb the intraseasonal regimes, in agreement with conceptual ideas.
Resumo:
A potential problem with Ensemble Kalman Filter is the implicit Gaussian assumption at analysis times. Here we explore the performance of a recently proposed fully nonlinear particle filter on a high-dimensional but simplified ocean model, in which the Gaussian assumption is not made. The model simulates the evolution of the vorticity field in time, described by the barotropic vorticity equation, in a highly nonlinear flow regime. While common knowledge is that particle filters are inefficient and need large numbers of model runs to avoid degeneracy, the newly developed particle filter needs only of the order of 10-100 particles on large scale problems. The crucial new ingredient is that the proposal density cannot only be used to ensure all particles end up in high-probability regions of state space as defined by the observations, but also to ensure that most of the particles have similar weights. Using identical twin experiments we found that the ensemble mean follows the truth reliably, and the difference from the truth is captured by the ensemble spread. A rank histogram is used to show that the truth run is indistinguishable from any of the particles, showing statistical consistency of the method.
Resumo:
Particle filters are fully non-linear data assimilation techniques that aim to represent the probability distribution of the model state given the observations (the posterior) by a number of particles. In high-dimensional geophysical applications the number of particles required by the sequential importance resampling (SIR) particle filter in order to capture the high probability region of the posterior, is too large to make them usable. However particle filters can be formulated using proposal densities, which gives greater freedom in how particles are sampled and allows for a much smaller number of particles. Here a particle filter is presented which uses the proposal density to ensure that all particles end up in the high probability region of the posterior probability density function. This gives rise to the possibility of non-linear data assimilation in large dimensional systems. The particle filter formulation is compared to the optimal proposal density particle filter and the implicit particle filter, both of which also utilise a proposal density. We show that when observations are available every time step, both schemes will be degenerate when the number of independent observations is large, unlike the new scheme. The sensitivity of the new scheme to its parameter values is explored theoretically and demonstrated using the Lorenz (1963) model.
Resumo:
In this paper we provide a connection between the geometrical properties of the attractor of a chaotic dynamical system and the distribution of extreme values. We show that the extremes of so-called physical observables are distributed according to the classical generalised Pareto distribution and derive explicit expressions for the scaling and the shape parameter. In particular, we derive that the shape parameter does not depend on the cho- sen observables, but only on the partial dimensions of the invariant measure on the stable, unstable, and neutral manifolds. The shape parameter is negative and is close to zero when high-dimensional systems are considered. This result agrees with what was derived recently using the generalized extreme value approach. Combining the results obtained using such physical observables and the properties of the extremes of distance observables, it is possible to derive estimates of the partial dimensions of the attractor along the stable and the unstable directions of the flow. Moreover, by writing the shape parameter in terms of moments of the extremes of the considered observable and by using linear response theory, we relate the sensitivity to perturbations of the shape parameter to the sensitivity of the moments, of the partial dimensions, and of the Kaplan–Yorke dimension of the attractor. Preliminary numer- ical investigations provide encouraging results on the applicability of the theory presented here. The results presented here do not apply for all combinations of Axiom A systems and observables, but the breakdown seems to be related to very special geometrical configurations.