12 resultados para High-Dimensional Space Geometrical Informatics (HDSGI)
em Bulgarian Digital Mathematics Library at IMI-BAS
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.
Resumo:
Dedicated to Professor A.M. Mathai on the occasion of his 75-th birthday. Mathematics Subject Classi¯cation 2010: 26A33, 44A10, 33C60, 35J10.
Resumo:
2000 Mathematics Subject Classification: 14C05, 14L30, 14E15, 14J35.
Resumo:
Владимир Тодоров, Петър Стоев - Тази бележка съдържа елементарна конструкция на множество с указаните в заглавието свойства. Да отбележим в допълнение, че така полученото множество остава напълно несвързано дори и след като се допълни с краен брой елементи.
Resumo:
2010 Mathematics Subject Classification: 62J99.
Resumo:
2000 Mathematics Subject Classification: 30A05, 33E05, 30G30, 30G35, 33E20.
Resumo:
In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. One of the most accuracy approach based on dynamic modeling of cluster similarity is called Chameleon. In this paper we present a modified hierarchical clustering algorithm that used the main idea of Chameleon and the effectiveness of suggested approach will be demonstrated by the experimental results.
On Multi-Dimensional Random Walk Models Approximating Symmetric Space-Fractional Diffusion Processes
Resumo:
Mathematics Subject Classification: 26A33, 47B06, 47G30, 60G50, 60G52, 60G60.
Resumo:
2010 Mathematics Subject Classification: 53A07, 53A35, 53A10.
Resumo:
An embedding X ⊂ G of a topological space X into a topological group G is called functorial if every homeomorphism of X extends to a continuous group homomorphism of G. It is shown that the interval [0, 1] admits no functorial embedding into a finite-dimensional or metrizable topological group.
Resumo:
2000 Mathematics Subject Classification: 26A33 (primary), 35S15 (secondary)