7 resultados para Points distribution in high dimensional space
em University of Queensland eSpace - Australia
Resumo:
Indexing high dimensional datasets has attracted extensive attention from many researchers in the last decade. Since R-tree type of index structures are known as suffering curse of dimensionality problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse of dimensionality. However, for high dimensional data, the number of pyramids is often insufficient to discriminate data points when the number of dimensions is high. Its effectiveness degrades dramatically with the increase of dimensionality. In this paper, we focus on one particular issue of curse of dimensionality; that is, the surface of a hypercube in a high dimensional space approaches 100% of the total hypercube volume when the number of dimensions approaches infinite. We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method. The results of our experiments demonstrate clear priority of our novel method.
Resumo:
In this paper, we propose a novel high-dimensional index method, the BM+-tree, to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a rotary binary hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the M+-tree. Compared with the key dimension concept in the M+-tree, the binary hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Experimental results using two types of real data sets illustrate a significantly improved filtering efficiency.
Resumo:
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. (C) 2005 Wiley-Liss, Inc.
Resumo:
The notorious "dimensionality curse" is a well-known phenomenon for any multi-dimensional indexes attempting to scale up to high dimensions. One well-known approach to overcome degradation in performance with respect to increasing dimensions is to reduce the dimensionality of the original dataset before constructing the index. However, identifying the correlation among the dimensions and effectively reducing them are challenging tasks. In this paper, we present an adaptive Multi-level Mahalanobis-based Dimensionality Reduction (MMDR) technique for high-dimensional indexing. Our MMDR technique has four notable features compared to existing methods. First, it discovers elliptical clusters for more effective dimensionality reduction by using only the low-dimensional subspaces. Second, data points in the different axis systems are indexed using a single B+-tree. Third, our technique is highly scalable in terms of data size and dimension. Finally, it is also dynamic and adaptive to insertions. An extensive performance study was conducted using both real and synthetic datasets, and the results show that our technique not only achieves higher precision, but also enables queries to be processed efficiently. Copyright Springer-Verlag 2005
Resumo:
1. Many species of delphinids co-occur in space and time. However, little is known of their ecological interactions and the underlying mechanisms that mediate their coexistence. 2. Snubfin Orcaella heinsohni, and Indo-Pacific humpback dolphins Sousa chinensis, live in sympatry throughout most of their range in Australian waters. I conducted boat-based surveys in Cleveland Bay, north-east Queensland, to collect data on the space and habitat use of both species. Using Geographic Information Systems, kernel methods and Euclidean distances I investigated interspecific differences in their space use patterns, behaviour and habitat preferences. 3. Core areas of use (50% kernel range) for both species were located close to river mouths and modified habitat such as dredged channels and breakwaters close to the Port of Townsville. Foraging and travelling activities were the dominant behavioural activities of snubfin and humpback dolphins within and outside their core areas. 4. Their representative ranges (95% kernel range) overlapped considerably, with shared areas showing strong concordance in the space use by both species. Nevertheless, snubfin dolphins preferred slightly shallower (1-2 m) waters than humpback dolphins (2-5 m). Additionally, shallow areas with seagrass ranked high in the habitat preferences of snubfin dolphins, whereas humpback dolphins favoured dredged channels. 5. Slight differences in habitat preferences appear to be one of the principal factors maintaining the coexistence of snubfin and humpback dolphins. I suggest diet partitioning and interspecific aggression as the major forces determining habitat selection in these sympatric species.