850 resultados para High-dimensional data visualization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The complexity in visualizing volumetric data often limits the scope of direct exploration of scalar fields. Isocontour extraction is a popular method for exploring scalar fields because of its simplicity in presenting features in the data. In this paper, we present a novel representation of contours with the aim of studying the similarity relationship between the contours. The representation maps contours to points in a high-dimensional transformation-invariant descriptor space. We leverage the power of this representation to design a clustering based algorithm for detecting symmetric regions in a scalar field. Symmetry detection is a challenging problem because it demands both segmentation of the data and identification of transformation invariant segments. While the former task can be addressed using topological analysis of scalar fields, the latter requires geometry based solutions. Our approach combines the two by utilizing the contour tree for segmenting the data and the descriptor space for determining transformation invariance. We discuss two applications, query driven exploration and asymmetry visualization, that demonstrate the effectiveness of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The mapping and geospatial analysis of benthic environments are multidisciplinary tasks that have become more accessible in recent years because of advances in technology and cost reductions in survey systems. The complex relationships that exist among physical, biological, and chemical seafloor components require advanced, integrated analysis techniques to enable scientists and others to visualize patterns and, in so doing, allow inferences to be made about benthic processes. Effective mapping, analysis, and visualization of marine habitats are particularly important because the subtidal seafloor environment is not readily viewed directly by eye. Research in benthic environments relies heavily, therefore, on remote sensing techniques to collect effective data. Because many benthic scientists are not mapping professionals, they may not adequately consider the links between data collection, data analysis, and data visualization. Projects often start with clear goals, but may be hampered by the technical details and skills required for maintaining data quality through the entire process from collection through analysis and presentation. The lack of technical understanding of the entire data handling process can represent a significant impediment to success. While many benthic mapping efforts have detailed their methodology as it relates to the overall scientific goals of a project, only a few published papers and reports focus on the analysis and visualization components (Paton et al. 1997, Weihe et al. 1999, Basu and Saxena 1999, Bruce et al. 1997). In particular, the benthic mapping literature often briefly describes data collection and analysis methods, but fails to provide sufficiently detailed explanation of particular analysis techniques or display methodologies so that others can employ them. In general, such techniques are in large part guided by the data acquisition methods, which can include both aerial and water-based remote sensing methods to map the seafloor without physical disturbance, as well as physical sampling methodologies (e.g., grab or core sampling). The terms benthic mapping and benthic habitat mapping are often used synonymously to describe seafloor mapping conducted for the purpose of benthic habitat identification. There is a subtle yet important difference, however, between general benthic mapping and benthic habitat mapping. The distinction is important because it dictates the sequential analysis and visualization techniques that are employed following data collection. In this paper general seafloor mapping for identification of regional geologic features and morphology is defined as benthic mapping. Benthic habitat mapping incorporates the regional scale geologic information but also includes higher resolution surveys and analysis of biological communities to identify the biological habitats. In addition, this paper adopts the definition of habitats established by Kostylev et al. (2001) as a “spatially defined area where the physical, chemical, and biological environment is distinctly different from the surrounding environment.” (PDF contains 31 pages)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Four decades of instrumented climate records at D1 on Niwot Ridge suggest that high elevation data are an important - and even unique - part of the full climate picture. High elevation data provide information on changing lapse rates as well as model verification for global warming, which is predicted to occur earliest in high latitudes and at high elevations. The D1 records show climatic trends that arguably support global warming, assuming that greater planetary wave amplitude is verification of warming. Lapse rates reflect conditions of air mass stability, atmospheric moisture, and could [sic] cover, which contribute to feedback processes involving temperature, precipitation, and snowpack. The D1 record show a period, 1981-1985, when the lapse rate increased significantly, and this change was not detected by other data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Training data for supervised learning neural networks can be clustered such that the input/output pairs in each cluster are redundant. Redundant training data can adversely affect training time. In this paper we apply two clustering algorithms, ART2 -A and the Generalized Equality Classifier, to identify training data clusters and thus reduce the training data and training time. The approach is demonstrated for a high dimensional nonlinear continuous time mapping. The demonstration shows six-fold decrease in training time at little or no loss of accuracy in the handling of evaluation data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

© 2014, Springer-Verlag Berlin Heidelberg.This study assesses the skill of advanced regional climate models (RCMs) in simulating southeastern United States (SE US) summer precipitation and explores the physical mechanisms responsible for the simulation skill at a process level. Analysis of the RCM output for the North American Regional Climate Change Assessment Program indicates that the RCM simulations of summer precipitation show the largest biases and a remarkable spread over the SE US compared to other regions in the contiguous US. The causes of such a spread are investigated by performing simulations using the Weather Research and Forecasting (WRF) model, a next-generation RCM developed by the US National Center for Atmospheric Research. The results show that the simulated biases in SE US summer precipitation are due mainly to the misrepresentation of the modeled North Atlantic subtropical high (NASH) western ridge. In the WRF simulations, the NASH western ridge shifts 7° northwestward when compared to that in the reanalysis ensemble, leading to a dry bias in the simulated summer precipitation according to the relationship between the NASH western ridge and summer precipitation over the southeast. Experiments utilizing the four dimensional data assimilation technique further suggest that the improved representation of the circulation patterns (i.e., wind fields) associated with the NASH western ridge substantially reduces the bias in the simulated SE US summer precipitation. Our analysis of circulation dynamics indicates that the NASH western ridge in the WRF simulations is significantly influenced by the simulated planetary boundary layer (PBL) processes over the Gulf of Mexico. Specifically, a decrease (increase) in the simulated PBL height tends to stabilize (destabilize) the lower troposphere over the Gulf of Mexico, and thus inhibits (favors) the onset and/or development of convection. Such changes in tropical convection induce a tropical–extratropical teleconnection pattern, which modulates the circulation along the NASH western ridge in the WRF simulations and contributes to the modeled precipitation biases over the SE US. In conclusion, our study demonstrates that the NASH western ridge is an important factor responsible for the RCM skill in simulating SE US summer precipitation. Furthermore, the improvements in the PBL parameterizations for the Gulf of Mexico might help advance RCM skill in representing the NASH western ridge circulation and summer precipitation over the SE US.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

his paper considers a problem of identification for a high dimensional nonlinear non-parametric system when only a limited data set is available. The algorithms are proposed for this purpose which exploit the relationship between the input variables and the output and further the inter-dependence of input variables so that the importance of the input variables can be established. A key to these algorithms is the non-parametric two stage input selection algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the statistical distributions of worldwide earthquakes from year 1963 up to year 2012. A Cartesian grid, dividing Earth into geographic regions, is considered. Entropy and the Jensen–Shannon divergence are used to analyze and compare real-world data. Hierarchical clustering and multi-dimensional scaling techniques are adopted for data visualization. Entropy-based indices have the advantage of leading to a single parameter expressing the relationships between the seismic data. Classical and generalized (fractional) entropy and Jensen–Shannon divergence are tested. The generalized measures lead to a clear identification of patterns embedded in the data and contribute to better understand earthquake distributions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis - the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts have been derived in the context of linear statistical data assimilation in numerical weather prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the four-dimensional variational system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the boreal spring 2003 operational system, 15% of the global influence is due to the assimilated observations in any one analysis, and the complementary 85% is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 25% of the observational information is currently provided by surface-based observing systems, and 75% by satellite systems. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background-error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). Incorrect specifications of background and observation-error covariance matrices can be identified, interpreted and better understood by the use of influence-matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system. Copyright © 2004 Royal Meteorological Society

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification and visualization of clusters formed by motor unit action potentials (MUAPs) is an essential step in investigations seeking to explain the control of the neuromuscular system. This work introduces the generative topographic mapping (GTM), a novel machine learning tool, for clustering of MUAPs, and also it extends the GTM technique to provide a way of visualizing MUAPs. The performance of GTM was compared to that of three other clustering methods: the self-organizing map (SOM), a Gaussian mixture model (GMM), and the neural-gas network (NGN). The results, based on the study of experimental MUAPs, showed that the rate of success of both GTM and SOM outperformed that of GMM and NGN, and also that GTM may in practice be used as a principled alternative to the SOM in the study of MUAPs. A visualization tool, which we called GTM grid, was devised for visualization of MUAPs lying in a high-dimensional space. The visualization provided by the GTM grid was compared to that obtained from principal component analysis (PCA). (c) 2005 Elsevier Ireland Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A potential problem with Ensemble Kalman Filter is the implicit Gaussian assumption at analysis times. Here we explore the performance of a recently proposed fully nonlinear particle filter on a high-dimensional but simplified ocean model, in which the Gaussian assumption is not made. The model simulates the evolution of the vorticity field in time, described by the barotropic vorticity equation, in a highly nonlinear flow regime. While common knowledge is that particle filters are inefficient and need large numbers of model runs to avoid degeneracy, the newly developed particle filter needs only of the order of 10-100 particles on large scale problems. The crucial new ingredient is that the proposal density cannot only be used to ensure all particles end up in high-probability regions of state space as defined by the observations, but also to ensure that most of the particles have similar weights. Using identical twin experiments we found that the ensemble mean follows the truth reliably, and the difference from the truth is captured by the ensemble spread. A rank histogram is used to show that the truth run is indistinguishable from any of the particles, showing statistical consistency of the method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning low dimensional manifold from highly nonlinear data of high dimensionality has become increasingly important for discovering intrinsic representation that can be utilized for data visualization and preprocessing. The autoencoder is a powerful dimensionality reduction technique based on minimizing reconstruction error, and it has regained popularity because it has been efficiently used for greedy pretraining of deep neural networks. Compared to Neural Network (NN), the superiority of Gaussian Process (GP) has been shown in model inference, optimization and performance. GP has been successfully applied in nonlinear Dimensionality Reduction (DR) algorithms, such as Gaussian Process Latent Variable Model (GPLVM). In this paper we propose the Gaussian Processes Autoencoder Model (GPAM) for dimensionality reduction by extending the classic NN based autoencoder to GP based autoencoder. More interestingly, the novel model can also be viewed as back constrained GPLVM (BC-GPLVM) where the back constraint smooth function is represented by a GP. Experiments verify the performance of the newly proposed model.