798 resultados para Data-Intensive Science


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Content-based image retrieval is still a challenging issue due to the inherent complexity of images and choice of the most discriminant descriptors. Recent developments in the field have introduced multidimensional projections to burst accuracy in the retrieval process, but many issues such as introduction of pattern recognition tasks and deeper user intervention to assist the process of choosing the most discriminant features still remain unaddressed. In this paper, we present a novel framework to CBIR that combines pattern recognition tasks, class-specific metrics, and multidimensional projection to devise an effective and interactive image retrieval system. User interaction plays an essential role in the computation of the final multidimensional projection from which image retrieval will be attained. Results have shown that the proposed approach outperforms existing methods, turning out to be a very attractive alternative for managing image data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a new catalogue of galaxy triplets derived from the Sloan Digital Sky Survey (SDSS) Data Release 7. The identification of systems was performed considering galaxies brighter than Mr=-20.5 and imposing constraints over the projected distances, radial velocity differences of neighbouring galaxies and isolation. To improve the identification of triplets, we employed a data pixelization scheme, which allows us to handle large amounts of data as in the SDSS photometric survey. Using spectroscopic and photometric data in the redshift range 0.01 =z= 0.40, we obtain 5901 triplet candidates. We have used a mock catalogue to analyse the completeness and contamination of our methods. The results show a high level of completeness ( 80 per cent) and low contamination ( 5 per cent). By using photometric and spectroscopic data, we have also addressed the effects of fibre collisions in the spectroscopic sample. We have defined an isolation criterion considering the distance of the triplet brightest galaxy to the closest neighbour cluster, to describe a global environment, as well as the galaxies within a fixed aperture, around the triplet brightest galaxy, to measure the local environment. The final catalogue comprises 1092 isolated triplets of galaxies in the redshift range 0.01 =z= 0.40. Our results show that photometric redshifts provide very useful information, allowing us to complete the sample of nearby systems whose detection is affected by fibre collisions, as well as extending the detection of triplets to large distances, where spectroscopic redshifts are not available.