6 resultados para High-dimensional data visualization

em Bulgarian Digital Mathematics Library at IMI-BAS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Shield UI’s advanced framework for creating rich charts and graphs is the first of a line of data visualization components, giving web developers the power for embedding rich graphics in their web projects with minimum effort. Built with HTML, CSS3 and packaged as a jQuery plugin, the library has full support for legacy and modern desktop web browsers, as well as the latest mobile devices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Implementation of GEOSS/GMES initiative requires creation and integration of service providers, most of which provide geospatial data output from Grid system to interactive user. In this paper approaches of DOS- centers (service providers) integration used in Ukrainian segment of GEOSS/GMES will be considered and template solutions for geospatial data visualization subsystems will be suggested. Developed patterns are implemented in DOS center of Space Research Institute of National Academy of Science of Ukraine and National Space Agency of Ukraine (NASU-NSAU).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. One of the most accuracy approach based on dynamic modeling of cluster similarity is called Chameleon. In this paper we present a modified hierarchical clustering algorithm that used the main idea of Chameleon and the effectiveness of suggested approach will be demonstrated by the experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 62J99.