11 resultados para data gathering algorithm

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Conventional procedures employed in the modeling of viscoelastic properties of polymer rely on the determination of the polymer`s discrete relaxation spectrum from experimentally obtained data. In the past decades, several analytical regression techniques have been proposed to determine an explicit equation which describes the measured spectra. With a diverse approach, the procedure herein introduced constitutes a simulation-based computational optimization technique based on non-deterministic search method arisen from the field of evolutionary computation. Instead of comparing numerical results, this purpose of this paper is to highlight some Subtle differences between both strategies and focus on what properties of the exploited technique emerge as new possibilities for the field, In oder to illustrate this, essayed cases show how the employed technique can outperform conventional approaches in terms of fitting quality. Moreover, in some instances, it produces equivalent results With much fewer fitting parameters, which is convenient for computational simulation applications. I-lie problem formulation and the rationale of the highlighted method are herein discussed and constitute the main intended contribution. (C) 2009 Wiley Periodicals, Inc. J Appl Polym Sci 113: 122-135, 2009

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a catalogue of galaxy photometric redshifts and k-corrections for the Sloan Digital Sky Survey Data Release 7 (SDSS-DR7), available on the World Wide Web. The photometric redshifts were estimated with an artificial neural network using five ugriz bands, concentration indices and Petrosian radii in the g and r bands. We have explored our redshift estimates with different training sets, thus concluding that the best choice for improving redshift accuracy comprises the main galaxy sample (MGS), the luminous red galaxies and the galaxies of active galactic nuclei covering the redshift range 0 < z < 0.3. For the MGS, the photometric redshift estimates agree with the spectroscopic values within rms = 0.0227. The distribution of photometric redshifts derived in the range 0 < z(phot) < 0.6 agrees well with the model predictions. k-corrections were derived by calibration of the k-correct_v4.2 code results for the MGS with the reference-frame (z = 0.1) (g - r) colours. We adopt a linear dependence of k-corrections on redshift and (g - r) colours that provide suitable distributions of luminosity and colours for galaxies up to redshift z(phot) = 0.6 comparable to the results in the literature. Thus, our k-correction estimate procedure is a powerful, low computational time algorithm capable of reproducing suitable results that can be used for testing galaxy properties at intermediate redshifts using the large SDSS data base.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surveys for exoplanetary transits are usually limited not by photon noise but rather by the amount of red noise in their data. In particular, although the CoRoT space-based survey data are being carefully scrutinized, significant new sources of systematic noises are still being discovered. Recently, a magnitude-dependant systematic effect was discovered in the CoRoT data by Mazeh et al. and a phenomenological correction was proposed. Here we tie the observed effect to a particular type of effect, and in the process generalize the popular Sysrem algorithm to include external parameters in a simultaneous solution with the unknown effects. We show that a post-processing scheme based on this algorithm performs well and indeed allows for the detection of new transit-like signals that were not previously detected.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Knowing the best 1D model of the crustal and upper mantle structure is useful not only for routine hypocenter determination, but also for linearized joint inversions of hypocenters and 3D crustal structure, where a good choice of the initial model can be very important. Here, we tested the combination of a simple GA inversion with the widely used HYPO71 program to find the best three-layer model (upper crust, lower crust, and upper mantle) by minimizing the overall P- and S-arrival residuals, using local and regional earthquakes in two areas of the Brazilian shield. Results from the Tocantins Province (Central Brazil) and the southern border of the Sao Francisco craton (SE Brazil) indicated an average crustal thickness of 38 and 43 km, respectively, consistent with previous estimates from receiver functions and seismic refraction lines. The GA + HYPO71 inversion produced correct Vp/Vs ratios (1.73 and 1.71, respectively), as expected from Wadati diagrams. Tests with synthetic data showed that the method is robust for the crustal thickness, Pn velocity, and Vp/Vs ratio when using events with distance up to about 400 km, despite the small number of events available (7 and 22, respectively). The velocities of the upper and lower crusts, however, are less well constrained. Interestingly, in the Tocantins Province, the GA + HYPO71 inversion showed a secondary solution (local minimum) for the average crustal thickness, besides the global minimum solution, which was caused by the existence of two distinct domains in the Central Brazil with very different crustal thicknesses. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset, V. Thus, a data set of size N can be sequentially displayed in O(N) time, reaching O(N (2)) only if the complete set is simultaneously displayed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the formulation of a combinatorial optimization problem with the following characteristics: (i) the search space is the power set of a finite set structured as a Boolean lattice; (ii) the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed method to the sequential floating forward selection (SFFS), which is a popular heuristic that gives good results in very short computational time. In all experiments, the proposed method got better or equal results in similar or even smaller computational time. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Given two strings A and B of lengths n(a) and n(b), n(a) <= n(b), respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B` of B, the length of the longest string that is a subsequence of both A and B. The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O(n(a)n(b)) time and O(n(a) + n(b)) space. After this preparation, it is possible to build that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and a matrix of size O(n(b)(2)) the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem. (C) 2007 Elsevier B.V. All rights reserved.