909 resultados para exploratory data analysis
Resumo:
In this paper a set of Brazilian commercial gasoline representative samples from São Paulo State, selected by HCA, plus six samples obtained directly from refineries were analysed by a high-sensitive gas chromatographic (GC) method ASTM D6733. The levels of saturated hydrocarbons and anhydrous ethanol obtained by GC were correlated with the quality obtained from Brazilian Government Petroleum, Natural Gas and Biofuels Agency (ANP) specifications through exploratory analysis (HCA and PCA). This correlation showed that the GC method, together with HCA and PCA, could be employed as a screening technique to determine compliance with the prescribed legal standards of Brazilian gasoline.
Resumo:
Understanding spatial distributions and how environmental conditions influence catch-per-unit-effort (CPUE) is important for increased fishing efficiency and sustainable fisheries management. This study investigated the relationship between CPUE, spatial factors, temperature, and depth using generalized additive models. Combinations of factors, and not one single factor, were frequently included in the best model. Parameters which best described CPUE varied by geographic region. The amount of variance, or deviance, explained by the best models ranged from a low of 29% (halibut, Charlotte region) to a high of 94% (sablefish, Charlotte region). Depth, latitude, and longitude influenced most species in several regions. On the broad geographic scale, depth was associated with CPUE for every species, except dogfish. Latitude and longitude influenced most species, except halibut (Areas 4 A/D), sablefish, and cod. Temperature was important for describing distributions of halibut in Alaska, arrowtooth flounder in British Columbia, dogfish, Alaska skate, and Aleutian skate. The species-habitat relationships revealed in this study can be used to create improved fishing and management strategies.
Resumo:
This paper surveys the context of feature extraction by neural network approaches, and compares and contrasts their behaviour as prospective data visualisation tools in a real world problem. We also introduce and discuss a hybrid approach which allows us to control the degree of discriminatory and topographic information in the extracted feature space.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feed-forward neural network is utilised to effect a topographic, structure-preserving, dimension-reducing transformation of the data, with an additional facility to incorporate different degrees of associated subjective information. The properties of this transformation are illustrated on synthetic and real datasets, including the 1992 UK Research Assessment Exercise for funding in higher education. The method is compared and contrasted to established techniques for feature extraction, and related to topographic mappings, the Sammon projection and the statistical field of multidimensional scaling.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica
Resumo:
Little research so far has been devoted to understanding the diffusion of grassroots innovation for sustainability across space. This paper explores and compares the spatial diffusion of two networks of grassroots innovations, the Transition Towns Network (TTN) and Gruppi di Acquisto Solidale (Solidarity Purchasing Groups – GAS), in Great Britain and Italy. Spatio-temporal diffusion data were mined from available datasets, and patterns of diffusion were uncovered through an exploratory data analysis. The analysis shows that GAS and TTN diffusion in Italy and Great Britain is spatially structured, and that the spatial structure has changed over time. TTN has diffused differently in Great Britain and Italy, while GAS and TTN have diffused similarly in central Italy. The uneven diffusion of these grassroots networks on the one hand challenges current narratives on the momentum of grassroots innovations, but on the other highlights important issues in the geography of grassroots innovations for sustainability, such as cross-movement transfers and collaborations, institutional thickness, and interplay of different proximities in grassroots innovation diffusion.
Resumo:
A combination of deductive reasoning, clustering, and inductive learning is given as an example of a hybrid system for exploratory data analysis. Visualization is replaced by a dialogue with the data.
Resumo:
Regional planners, policy makers and policing agencies all recognize the importance of better understanding the dynamics of crime. Theoretical and application-oriented approaches which provide insights into why and where crimes take place are much sought after. Geographic information systems and spatial analysis techniques, in particular, are proving to be essential or studying criminal activity. However, the capabilities of these quantitative methods continue to evolve. This paper explores the use of geographic information systems and spatial analysis approaches for examining crime occurrence in Brisbane, Australia. The analysis highlights novel capabilities for the analysis of crime in urban regions.
Resumo:
Purpose/Objective(s): RTwith TMZ is the standard for GBM. dd TMZ causes prolongedMGMTdepletion in mononuclear cells and possibly in tumor. The RTOG 0525 trial (ASCO 2011) did not show an advantage from dd TMZ for survival or progression free survival. We conducted exploratory, hypothesis-generating subset analyses to detect possible benefit from dd TMZ.Materials/Methods: Patients were randomized to std (150-200 mg/m2 x 5 d) or dd TMZ (75-100 mg/m2 x 21 d) q 4 weeks for 6- 12 cycles. Eligibility included age.18, KPS$ 60, and. 1 cm2 tissue for prospective MGMTanalysis for stratification. Furtheranalyses were performed for all randomized patients (''intent-to-treat'', ITT), and for all patients starting protocol therapy (SPT). Subset analyses were performed by RPA class (III, IV, V), KPS (90-100, = 50,\50), resection (partial, total), gender (female, male), and neurologic dysfunction (nf = none, minor, moderate).Results: No significant difference was seen for median OS (16.6 vs. 14.9 months), or PFS (5.5 vs. 6.7 months, p = 0.06). MGMT methylation was linked to improved OS (21.2 vs. 14 months, p\0.0001), and PFS (8.7 vs. 5.7 months, p\0.0001). For the ITT (n = 833), there was no OS benefit from dd TMZ in any subset. Two subsets showed a PFS benefit for dd TMZ: RPA class III (6.2 vs. 12.6 months, HR 0.69, p = 0.03) and nf = minor (HR 0.77, p = 0.01). For RPA III, dd dramatically delayed progression, but post-progression dd patients died more quickly than std. A similar pattern for nf = minor was observed. For the SPT group (n = 714) there was neither PFS nor OS benefit for dd TMZ, overall. For RPA class III and nf = minor, there was a PFS benefit for dd TMZ (HR 0.73, p = 0.08; HR 0.77, p = 0.02). For nf = moderate subset, both ITT and SPT, the std arm showed superior OS (14.4 vs. 10.9 months) compared to dd, without improved PFS (HR 1.46, p = 0.03; and HR 1.74, p = 0.01. In terms of methylation status within this subset, there were more methylated patients in the std arm of the ITT subset (n = 159; 32 vs. 24%). For the SPT subset (n = 124), methylation status was similar between arms.Conclusions: This study did not demonstrate improved OS for dd TMZ for any subgroup, but for 2 highly functional subgroups, PFS was significantly increased. These data generate the testable hypothesis that intensive treatment may selectively improve disease control in those most likely able to tolerate dd therapy. Interpretation of this should be considered carefully due to small sample size, the process of multiple observations, and other confounders.Acknowledgment: This project was supported by RTOG grant U10 CA21661, and CCOP grant U10 CA37422 from the National Cancer Institute (NCI).
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed
Resumo:
The objective of this research was to use the technique of Exploratory Factor Analysis (EFA) for the adequacy of a tool for the assessment of fish consumption and the characteristics involved in this process. Data were collected during a campaign to encourage fish consumption in Brazil with the voluntarily participation of members of a university community. An assessment instrument consisting of multiple-choice questions and a five-point Likert scale was designed and used to measure the importance of certain attributes that influence the choice and consumption of fish. This study sample was composed of of 224 individuals, the majority were women (65.6%). With regard to the frequency of fish consumption, 37.67% of the volunteers interviewed said they consume the product two or three times a month, and 29.6% once a week. The Exploratory Factor Analysis (EFA) was used to group the variables; the extraction was made using the principal components and the rotation using the Quartimax method. The results show clusters in two main constructs, quality and consumption with Cronbach Alpha coefficients of 0.75 and 0.69, respectively, indicating good internal consistency.
Resumo:
Compositional data naturally arises from the scientific analysis of the chemical composition of archaeological material such as ceramic and glass artefacts. Data of this type can be explored using a variety of techniques, from standard multivariate methods such as principal components analysis and cluster analysis, to methods based upon the use of log-ratios. The general aim is to identify groups of chemically similar artefacts that could potentially be used to answer questions of provenance. This paper will demonstrate work in progress on the development of a documented library of methods, implemented using the statistical package R, for the analysis of compositional data. R is an open source package that makes available very powerful statistical facilities at no cost. We aim to show how, with the aid of statistical software such as R, traditional exploratory multivariate analysis can easily be used alongside, or in combination with, specialist techniques of compositional data analysis. The library has been developed from a core of basic R functionality, together with purpose-written routines arising from our own research (for example that reported at CoDaWork'03). In addition, we have included other appropriate publicly available techniques and libraries that have been implemented in R by other authors. Available functions range from standard multivariate techniques through to various approaches to log-ratio analysis and zero replacement. We also discuss and demonstrate a small selection of relatively new techniques that have hitherto been little-used in archaeometric applications involving compositional data. The application of the library to the analysis of data arising in archaeometry will be demonstrated; results from different analyses will be compared; and the utility of the various methods discussed