33 resultados para Data sets storage
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset, V. Thus, a data set of size N can be sequentially displayed in O(N) time, reaching O(N (2)) only if the complete set is simultaneously displayed.
Resumo:
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Resumo:
Establishing a few sites in which measurements of soil water storage (SWS) are time stable significantly reduces the efforts involved in determining average values of SWS. This study aimed to apply a new criterion the mean absolute bias error (MABE)-to identify temporally stable sites for mean SWS evaluation. The performance of MABE was compared with that of the commonly used criterion, the standard deviation of relative difference (SDRD). From October 2004 to October 2008, SWS of four soil layers (0-1.0, 1.0-2.0,2.0-3.0, and 3.0-4.0 m) was measured, using a neutron probe, at 28 sites on a hillslope of the Loess Plateau, China. A total of 37 SWS data sets taken over time were divided into two subsets, the first consisting of 22 dates collected during the calibration period from October 2004 to September 2006, and the second with 15 dates collected during the validation period from October 2006 to October 2008. The results showed that if a critical value of 5% for MABE was defined, more than half the sites were temporally stable for both periods, and the number of temporally stable sires generally increased with soil depth. Compared with SDRD, MABE was more suitable for the identification of time-stable sites for mean SS prediction. Since the absolute prediction error of drier sites is more sensitive to changes in relative difference in terms of mean SWS prediction, the sites of wet sectors should be preferable for mean SWS prediction for the same changes in relative difference.
Resumo:
For the first time, we introduce and study some mathematical properties of the Kumaraswamy Weibull distribution that is a quite flexible model in analyzing positive data. It contains as special sub-models the exponentiated Weibull, exponentiated Rayleigh, exponentiated exponential, Weibull and also the new Kumaraswamy exponential distribution. We provide explicit expressions for the moments and moment generating function. We examine the asymptotic distributions of the extreme values. Explicit expressions are derived for the mean deviations, Bonferroni and Lorenz curves, reliability and Renyi entropy. The moments of the order statistics are calculated. We also discuss the estimation of the parameters by maximum likelihood. We obtain the expected information matrix. We provide applications involving two real data sets on failure times. Finally, some multivariate generalizations of the Kumaraswamy Weibull distribution are discussed. (C) 2010 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
Resumo:
Precipitation and temperature climate indices are calculated using the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis and validated against observational data from some stations over Brazil and other data sources. The spatial patterns of the climate indices trends are analyzed for the period 1961-1990 over South America. In addition, the correlation and linear regression coefficients for some specific stations were also obtained in order to compare with the reanalysis data. In general, the results suggest that NCEP/NCAR reanalysis can provide useful information about minimum temperature and consecutive dry days indices at individual grid cells in Brazil. However, some regional differences in the climate indices trends are observed when different data sets are compared. For instance, the NCEP/NCAR reanalysis shows a reversal signal for all rainfall annual indices and the cold night index over Argentina. Despite these differences, maps of the trends for most of the annual climate indices obtained from the NCEP/NCAR reanalysis and BRANT analysis are generally in good agreement with other available data sources and previous findings in the literature for large areas of southern South America. The pattern of trends for the precipitation annual indices over the 30 years analyzed indicates a change to wetter conditions over southern and southeastern parts of Brazil, Paraguay, Uruguay, central and northern Argentina, and parts of Chile and a decrease over southwestern South America. All over South America, the climate indices related to the minimum temperature (warm or cold nights) have clearly shown a warming tendency; however, no consistent changes in maximum temperature extremes (warm and cold days) have been observed. Therefore, one must be careful before suggesting an), trends for warm or cold days.
Resumo:
Astronomy has evolved almost exclusively by the use of spectroscopic and imaging techniques, operated separately. With the development of modern technologies, it is possible to obtain data cubes in which one combines both techniques simultaneously, producing images with spectral resolution. To extract information from them can be quite complex, and hence the development of new methods of data analysis is desirable. We present a method of analysis of data cube (data from single field observations, containing two spatial and one spectral dimension) that uses Principal Component Analysis (PCA) to express the data in the form of reduced dimensionality, facilitating efficient information extraction from very large data sets. PCA transforms the system of correlated coordinates into a system of uncorrelated coordinates ordered by principal components of decreasing variance. The new coordinates are referred to as eigenvectors, and the projections of the data on to these coordinates produce images we will call tomograms. The association of the tomograms (images) to eigenvectors (spectra) is important for the interpretation of both. The eigenvectors are mutually orthogonal, and this information is fundamental for their handling and interpretation. When the data cube shows objects that present uncorrelated physical phenomena, the eigenvector`s orthogonality may be instrumental in separating and identifying them. By handling eigenvectors and tomograms, one can enhance features, extract noise, compress data, extract spectra, etc. We applied the method, for illustration purpose only, to the central region of the low ionization nuclear emission region (LINER) galaxy NGC 4736, and demonstrate that it has a type 1 active nucleus, not known before. Furthermore, we show that it is displaced from the centre of its stellar bulge.
Resumo:
Phylogenetic analyses of representative species from the five genera of Winteraceae (Drimys, Pseudowintera, Takhtajania, Tasmannia, and Zygogynum s.l.) were performed using ITS nuclear sequences and a combined data-set of ITS + psbA-trnH + rpS16 sequences (sampling of 30 and 15 species, respectively). Indel informativity using simple gap coding or gaps as a fifth character was examined in both data-sets. Parsimony and Bayesian analyses support the monophyly of Drimys, Tasmannia, and Zygogynum s.l., but do not support the monophyly of Belliolum, Zygogynum s.s., and Bubbia. Within Drimys, the combined data-set recovers two subclades. Divergence time estimates suggest that the splitting between Drimys and its sister clade (Pseudowintera + Zygogynum s.l.) occurred around the end of the Cretaceous; in contrast, the divergence between the two subclades within Drimys is more recent (15.5-18.5 MY) and coincides in time with the Andean uplift. Estimates suggest that the earliest divergences within Winteraceae could have predated the first events of Gondwana fragmentation. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
INTRODUÇÃO: A prevalência da hipertensão arterial vem crescendo no país, constituindo-se em um problema de saúde pública por sua magnitude e dificuldades no controle. OBJETIVO: Avaliar a qualidade dos dados sobre hipertensão como causa de morte e verificar o ganho de informação na mortalidade por hipertensão arterial de mulheres de 10 a 49 anos, por meio da metodologia de análise por causas múltiplas de morte. MATERIAL E MÉTODOS: Foi constituída uma base de dados com 7.332 óbitos ocorridos no primeiro semestre de 2002 pertencentes ao "Estudo da Morbi-Mortalidade de Mulheres de 10 a 49 anos". A metodologia RAMOS (Reproductive Age Mortality Survey) foi aplicada em todas as capitais de Estados brasileiros e Distrito Federal. Com as informações adicionais, foi preenchida uma nova declaração de óbito - DO-NOVA. Foram analisados dois conjuntos de dados (DO-ORIGINAL - antes da investigação - e DO-NOVA - após resgate das informações. Foram realizadas comparações segundo causas básicas e múltiplas por fontes dos dados (DO-O, DO-N). RESULTADOS E CONCLUSÃO: A DO-ORIGINAL apresentou algumas falhas quantitativas e qualitativas. Concluiu-se que a análise por causas múltiplas enriquece a informação, com base nas DO. São necessárias ações contínuas para um melhor preenchimento da DO, pelos médicos, e deve haver mais estudos que adotem a metodologia de causas múltiplas.
Resumo:
Large-conductance Ca(2+)-activated K(+) channels (BK) play a fundamental role in modulating membrane potential in many cell types. The gating of BK channels and its modulation by Ca(2+) and voltage has been the subject of intensive research over almost three decades, yielding several of the most complicated kinetic mechanisms ever proposed. A large number of open and closed states disposed, respectively, in two planes, named tiers, characterize these mechanisms. Transitions between states in the same plane are cooperative and modulated by Ca(2+). Transitions across planes are highly concerted and voltage-dependent. Here we reexamine the validity of the two-tiered hypothesis by restricting attention to the modulation by Ca(2+). Large single channel data sets at five Ca(2+) concentrations were simultaneously analyzed from a Bayesian perspective by using hidden Markov models and Markov-chain Monte Carlo stochastic integration techniques. Our results support a dramatic reduction in model complexity, favoring a simple mechanism derived from the Monod-Wyman-Changeux allosteric model for homotetramers, able to explain the Ca(2+) modulation of the gating process. This model differs from the standard Monod-Wyman-Changeux scheme in that one distinguishes when two Ca(2+) ions are bound to adjacent or diagonal subunits of the tetramer.
Resumo:
BACKGROUND: The findings of prior studies of air pollution effects on adverse birth outcomes are difficult to synthesize because of differences in study design. OBJECTIVES: The International Collaboration on Air Pollution and Pregnancy Outcomes was formed to understand how differences in research methods contribute to variations in findings. We initiated a feasibility study to a) assess the ability of geographically diverse research groups to analyze their data sets using a common protocol and b) perform location-specific analyses of air pollution effects on birth weight using a standardized statistical approach. METHODS: Fourteen research groups from nine countries participated. We developed a protocol to estimate odds ratios (ORs) for the association between particulate matter <= 10 mu m in aerodynamic diameter (PM(10)) and low birth weight (LBW) among term births, adjusted first for socioeconomic status (SES) and second for additional location-specific variables. RESULTS: Among locations with data for the PM(10) analysis, ORs estimating the relative risk of term LBW associated with a 10-mu g/m(3) increase in average PM(10) concentration during pregnancy, adjusted for SES, ranged from 0.63 [95% confidence interval (CI), 0.30-1.35] for the Netherlands to 1.15 (95% CI, 0.61-2.18) for Vancouver, with six research groups reporting statistically significant adverse associations. We found evidence of statistically significant heterogeneity in estimated effects among locations. CONCLUSIONS: Variability in PM(10)-LBW relationships among study locations remained despite use of a common statistical approach. A more detailed meta-analysis and use of more complex protocols for future analysis may uncover reasons for heterogeneity across locations. However, our findings confirm the potential for a diverse group of researchers to analyze their data in a standardized way to improve understanding of air pollution effects on birth outcomes.
Resumo:
Background: The rapid progress currently being made in genomic science has created interest in potential clinical applications; however, formal translational research has been limited thus far. Studies of population genetics have demonstrated substantial variation in allele frequencies and haplotype structure at loci of medical relevance and the genetic background of patient cohorts may often be complex. Methods and Findings: To describe the heterogeneity in an unselected clinical sample we used the Affymetrix 6.0 gene array chip to genotype self-identified European Americans (N = 326), African Americans (N = 324) and Hispanics (N = 327) from the medical practice of Mount Sinai Medical Center in Manhattan, NY. Additional data from US minority groups and Brazil were used for external comparison. Substantial variation in ancestral origin was observed for both African Americans and Hispanics; data from the latter group overlapped with both Mexican Americans and Brazilians in the external data sets. A pooled analysis of the African Americans and Hispanics from NY demonstrated a broad continuum of ancestral origin making classification by race/ethnicity uninformative. Selected loci harboring variants associated with medical traits and drug response confirmed substantial within-and between-group heterogeneity. Conclusion: As a consequence of these complementary levels of heterogeneity group labels offered no guidance at the individual level. These findings demonstrate the complexity involved in clinical translation of the results from genome-wide association studies and suggest that in the genomic era conventional racial/ethnic labels are of little value.
Resumo:
Background: Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The ""Folmer region"" detects a single taxon using a 3% divergence threshold. Methods: To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results: Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (< 100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (similar to 798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapa state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions: Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara.
Resumo:
Aims. We report the discovery of CoRoT-8b, a dense small Saturn-class exoplanet that orbits a K1 dwarf in 6.2 days, and we derive its orbital parameters, mass, and radius. Methods. We analyzed two complementary data sets: the photometric transit curve of CoRoT-8b as measured by CoRoT and the radial velocity curve of CoRoT-8 as measured by the HARPS spectrometer**. Results. We find that CoRoT-8b is on a circular orbit with a semi-major axis of 0.063 +/- 0.001 AU. It has a radius of 0.57 +/- 0.02 R(J), a mass of 0.22 +/- 0.03 M(J), and therefore a mean density of 1.6 +/- 0.1 g cm(-3). Conclusions. With 67% of the size of Saturn and 72% of its mass, CoRoT-8b has a density comparable to that of Neptune (1.76 g cm(-3)). We estimate its content in heavy elements to be 47-63 M(circle plus), and the mass of its hydrogen-helium envelope to be 7-23 M(circle plus). At 0.063 AU, the thermal loss of hydrogen of CoRoT-8b should be no more than similar to 0.1% over an assumed integrated lifetime of 3 Ga.