21 resultados para data gathering algorithm
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Context. B[e] supergiants are luminous, massive post-main sequence stars exhibiting non-spherical winds, forbidden lines, and hot dust in a disc-like structure. The physical properties of their rich and complex circumstellar environment (CSE) are not well understood, partly because these CSE cannot be easily resolved at the large distances found for B[e] supergiants (typically greater than or similar to 1 kpc). Aims. From mid-IR spectro-interferometric observations obtained with VLTI/MIDI we seek to resolve and study the CSE of the Galactic B[e] supergiant CPD-57 degrees 2874. Methods. For a physical interpretation of the observables (visibilities and spectrum) we use our ray-tracing radiative transfer code (FRACS), which is optimised for thermal spectro-interferometric observations. Results. Thanks to the short computing time required by FRACS (<10 s per monochromatic model), best-fit parameters and uncertainties for several physical quantities of CPD-57 degrees 2874 were obtained, such as inner dust radius, relative flux contribution of the central source and of the dusty CSE, dust temperature profile, and disc inclination. Conclusions. The analysis of VLTI/MIDI data with FRACS allowed one of the first direct determinations of physical parameters of the dusty CSE of a B[e] supergiant based on interferometric data and using a full model-fitting approach. In a larger context, the study of B[e] supergiants is important for a deeper understanding of the complex structure and evolution of hot, massive stars.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Conventional procedures employed in the modeling of viscoelastic properties of polymer rely on the determination of the polymer`s discrete relaxation spectrum from experimentally obtained data. In the past decades, several analytical regression techniques have been proposed to determine an explicit equation which describes the measured spectra. With a diverse approach, the procedure herein introduced constitutes a simulation-based computational optimization technique based on non-deterministic search method arisen from the field of evolutionary computation. Instead of comparing numerical results, this purpose of this paper is to highlight some Subtle differences between both strategies and focus on what properties of the exploited technique emerge as new possibilities for the field, In oder to illustrate this, essayed cases show how the employed technique can outperform conventional approaches in terms of fitting quality. Moreover, in some instances, it produces equivalent results With much fewer fitting parameters, which is convenient for computational simulation applications. I-lie problem formulation and the rationale of the highlighted method are herein discussed and constitute the main intended contribution. (C) 2009 Wiley Periodicals, Inc. J Appl Polym Sci 113: 122-135, 2009
Resumo:
The network of HIV counseling and testing centers in São Paulo, Brazil is a major source of data used to build epidemiological profiles of the client population. We examined HIV-1 incidence from November 2000 to April 2001, comparing epidemiological and socio-behavioral data of recently-infected individuals with those with long-standing infection. A less sensitive ELISA was employed to identify recent infection. The overall incidence of HIV-1 infection was 0.53/100/year (95% CI: 0.31-0.85/100/year): 0.77/100/year for males (95% CI: 0.42-1.27/100/year) and 0.22/100/ year (95% CI: 0.05-0.59/100/year) for females. Overall HIV-1 prevalence was 3.2% (95% CI: 2.8-3.7%), being 4.0% among males (95% CI: 3.3-4.7%) and 2.1% among females (95% CI: 1.6-2.8%). Recent infections accounted for 15% of the total (95% CI: 10.2-20.8%). Recent infection correlated with being younger and male (p = 0.019). Therefore, recent infection was more common among younger males and older females.
Resumo:
This paper presents a new statistical algorithm to estimate rainfall over the Amazon Basin region using the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI). The algorithm relies on empirical relationships derived for different raining-type systems between coincident measurements of surface rainfall rate and 85-GHz polarization-corrected brightness temperature as observed by the precipitation radar (PR) and TMI on board the TRMM satellite. The scheme includes rain/no-rain area delineation (screening) and system-type classification routines for rain retrieval. The algorithm is validated against independent measurements of the TRMM-PR and S-band dual-polarization Doppler radar (S-Pol) surface rainfall data for two different periods. Moreover, the performance of this rainfall estimation technique is evaluated against well-known methods, namely, the TRMM-2A12 [ the Goddard profiling algorithm (GPROF)], the Goddard scattering algorithm (GSCAT), and the National Environmental Satellite, Data, and Information Service (NESDIS) algorithms. The proposed algorithm shows a normalized bias of approximately 23% for both PR and S-Pol ground truth datasets and a mean error of 0.244 mm h(-1) ( PR) and -0.157 mm h(-1)(S-Pol). For rain volume estimates using PR as reference, a correlation coefficient of 0.939 and a normalized bias of 0.039 were found. With respect to rainfall distributions and rain area comparisons, the results showed that the formulation proposed is efficient and compatible with the physics and dynamics of the observed systems over the area of interest. The performance of the other algorithms showed that GSCAT presented low normalized bias for rain areas and rain volume [0.346 ( PR) and 0.361 (S-Pol)], and GPROF showed rainfall distribution similar to that of the PR and S-Pol but with a bimodal distribution. Last, the five algorithms were evaluated during the TRMM-Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) 1999 field campaign to verify the precipitation characteristics observed during the easterly and westerly Amazon wind flow regimes. The proposed algorithm presented a cumulative rainfall distribution similar to the observations during the easterly regime, but it underestimated for the westerly period for rainfall rates above 5 mm h(-1). NESDIS(1) overestimated for both wind regimes but presented the best westerly representation. NESDIS(2), GSCAT, and GPROF underestimated in both regimes, but GPROF was closer to the observations during the easterly flow.
Resumo:
The power loss reduction in distribution systems (DSs) is a nonlinear and multiobjective problem. Service restoration in DSs is even computationally hard since it additionally requires a solution in real-time. Both DS problems are computationally complex. For large-scale networks, the usual problem formulation has thousands of constraint equations. The node-depth encoding (NDE) enables a modeling of DSs problems that eliminates several constraint equations from the usual formulation, making the problem solution simpler. On the other hand, a multiobjective evolutionary algorithm (EA) based on subpopulation tables adequately models several objectives and constraints, enabling a better exploration of the search space. The combination of the multiobjective EA with NDE (MEAN) results in the proposed approach for solving DSs problems for large-scale networks. Simulation results have shown the MEAN is able to find adequate restoration plans for a real DS with 3860 buses and 632 switches in a running time of 0.68 s. Moreover, the MEAN has shown a sublinear running time in function of the system size. Tests with networks ranging from 632 to 5166 switches indicate that the MEAN can find network configurations corresponding to a power loss reduction of 27.64% for very large networks requiring relatively low running time.
A hybrid Particle Swarm Optimization - Simplex algorithm (PSOS) for structural damage identification
Resumo:
This study proposes a new PSOS-model based damage identification procedure using frequency domain data. The formulation of the objective function for the minimization problem is based on the Frequency Response Functions (FRFs) of the system. A novel strategy for the control of the Particle Swarm Optimization (PSO) parameters based on the Nelder-Mead algorithm (Simplex method) is presented; consequently, the convergence of the PSOS becomes independent of the heuristic constants and its stability and confidence are enhanced. The formulated hybrid method performs better in different benchmark functions than the Simulated Annealing (SA) and the basic PSO (PSO(b)). Two damage identification problems, taking into consideration the effects of noisy and incomplete data, were studied: first, a 10-bar truss and second, a cracked free-free beam, both modeled with finite elements. In these cases, the damage location and extent were successfully determined. Finally, a non-linear oscillator (Duffing oscillator) was identified by PSOS providing good results. (C) 2009 Elsevier Ltd. All rights reserved
Resumo:
This paper presents a new methodology to estimate unbalanced harmonic distortions in a power system, based on measurements of a limited number of given sites. The algorithm utilizes evolutionary strategies (ES), a development branch of evolutionary algorithms. The problem solving algorithm herein proposed makes use of data from various power quality meters, which can either be synchronized by high technology GPS devices or by using information from a fundamental frequency load flow, what makes the overall power quality monitoring system much less costly. The ES based harmonic estimation model is applied to a 14 bus network to compare its performance to a conventional Monte Carlo approach. It is also applied to a 50 bus subtransmission network in order to compare the three-phase and single-phase approaches as well as the robustness of the proposed method. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this work, the applicability of a new algorithm for the estimation of mechanical properties from instrumented indentation data was studied for thin films. The applicability was analyzed with the aid of both three-dimensional finite element simulations and experimental indentation tests. The numerical approach allowed studying the effect of the substrate on the estimation of mechanical properties of the film, which was conducted based on the ratio h(max)/l between maximum indentation depth and film thickness. For the experimental analysis, indentation tests were conducted on AISI H13 tool steel specimens, plasma nitrated and coated with TiN thin films. Results have indicated that, for the conditions analyzed in this work, the elastic deformation of the substrate limited the extraction of mechanical properties of the film/substrate system. This limitation occurred even at low h(max)/l ratios and especially for the estimation of the values of yield strength and strain hardening exponent. At indentation depths lower than 4% of the film thickness, the proposed algorithm estimated the mechanical properties of the film with accuracy. Particularly for hardness, precise values were estimated at h(max)/l lower than 0.1, i.e. 10% of film thickness. (C) 2010 Published by Elsevier B.V.
Resumo:
Estimation of Taylor`s power law for species abundance data may be performed by linear regression of the log empirical variances on the log means, but this method suffers from a problem of bias for sparse data. We show that the bias may be reduced by using a bias-corrected Pearson estimating function. Furthermore, we investigate a more general regression model allowing for site-specific covariates. This method may be efficiently implemented using a Newton scoring algorithm, with standard errors calculated from the inverse Godambe information matrix. The method is applied to a set of biomass data for benthic macrofauna from two Danish estuaries. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Interval-censored survival data, in which the event of interest is not observed exactly but is only known to occur within some time interval, occur very frequently. In some situations, event times might be censored into different, possibly overlapping intervals of variable widths; however, in other situations, information is available for all units at the same observed visit time. In the latter cases, interval-censored data are termed grouped survival data. Here we present alternative approaches for analyzing interval-censored data. We illustrate these techniques using a survival data set involving mango tree lifetimes. This study is an example of grouped survival data.
Resumo:
The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
We present a catalogue of galaxy photometric redshifts and k-corrections for the Sloan Digital Sky Survey Data Release 7 (SDSS-DR7), available on the World Wide Web. The photometric redshifts were estimated with an artificial neural network using five ugriz bands, concentration indices and Petrosian radii in the g and r bands. We have explored our redshift estimates with different training sets, thus concluding that the best choice for improving redshift accuracy comprises the main galaxy sample (MGS), the luminous red galaxies and the galaxies of active galactic nuclei covering the redshift range 0 < z < 0.3. For the MGS, the photometric redshift estimates agree with the spectroscopic values within rms = 0.0227. The distribution of photometric redshifts derived in the range 0 < z(phot) < 0.6 agrees well with the model predictions. k-corrections were derived by calibration of the k-correct_v4.2 code results for the MGS with the reference-frame (z = 0.1) (g - r) colours. We adopt a linear dependence of k-corrections on redshift and (g - r) colours that provide suitable distributions of luminosity and colours for galaxies up to redshift z(phot) = 0.6 comparable to the results in the literature. Thus, our k-correction estimate procedure is a powerful, low computational time algorithm capable of reproducing suitable results that can be used for testing galaxy properties at intermediate redshifts using the large SDSS data base.
The SARS algorithm: detrending CoRoT light curves with Sysrem using simultaneous external parameters
Resumo:
Surveys for exoplanetary transits are usually limited not by photon noise but rather by the amount of red noise in their data. In particular, although the CoRoT space-based survey data are being carefully scrutinized, significant new sources of systematic noises are still being discovered. Recently, a magnitude-dependant systematic effect was discovered in the CoRoT data by Mazeh et al. and a phenomenological correction was proposed. Here we tie the observed effect to a particular type of effect, and in the process generalize the popular Sysrem algorithm to include external parameters in a simultaneous solution with the unknown effects. We show that a post-processing scheme based on this algorithm performs well and indeed allows for the detection of new transit-like signals that were not previously detected.