923 resultados para Exploratory Multivariate Data Analysis techniques
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the Rodrigues Triple Junction in the Indian Ocean were studied applying classical statistical methods (fuzzy c-means clustering, linear mixing model, principal component analysis) for the extraction of endmembers and evaluating the spatial and temporal variation of geochemical signals. Three main factors of sedimentation were expected by the marine geologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. The display of fuzzy membership values and/or factor scores versus depth provided consistent results for two factors only; the ultra-basic component could not be identified. The reason for this may be that only traditional statistical methods were applied, i.e. the untransformed components were used and the cosine-theta coefficient as similarity measure. During the last decade considerable progress in compositional data analysis was made and many case studies were published using new tools for exploratory analysis of these data. Therefore it makes sense to check if the application of suitable data transformations, reduction of the D-part simplex to two or three factors and visual interpretation of the factor scores would lead to a revision of earlier results and to answers to open questions . In this paper we follow the lines of a paper of R. Tolosana- Delgado et al. (2005) starting with a problem-oriented interpretation of the biplot scattergram, extracting compositional factors, ilr-transformation of the components and visualization of the factor scores in a spatial context: The compositional factors will be plotted versus depth (time) of the core samples in order to facilitate the identification of the expected sources of the sedimentary process. Kew words: compositional data analysis, biplot, deep sea sediments
Recent developments in genetic data analysis: what can they tell us about human demographic history?
Resumo:
Over the last decade, a number of new methods of population genetic analysis based on likelihood have been introduced. This review describes and explains the general statistical techniques that have recently been used, and discusses the underlying population genetic models. Experimental papers that use these methods to infer human demographic and phylogeographic history are reviewed. It appears that the use of likelihood has hitherto had little impact in the field of human population genetics, which is still primarily driven by more traditional approaches. However, with the current uncertainty about the effects of natural selection, population structure and ascertainment of single-nucleotide polymorphism markers, it is suggested that likelihood-based methods may have a greater impact in the future.
Resumo:
Baking and 2-g mixograph analyses were performed for 55 cultivars (19 spring and 36 winter wheat) from various quality classes from the 2002 harvest in Poland. An instrumented 2-g direct-drive mixograph was used to study the mixing characteristics of the wheat cultivars. A number of parameters were extracted automatically from each mixograph trace and correlated with baking volume and flour quality parameters (protein content and high molecular weight glutenin subunit [HMW-GS] composition by SDS-PAGE) using multiple linear regression statistical analysis. Principal component analysis of the mixograph data discriminated between four flour quality classes, and predictions of baking volume were obtained using several selected mixograph parameters, chosen using a best subsets regression routine, giving R-2 values of 0.862-0.866. In particular, three new spring wheat strains (CHD 502a-c) recently registered in Poland were highly discriminated and predicted to give high baking volume on the basis of two mixograph parameters: peak bandwidth and 10-min bandwidth.
Resumo:
The Earth-directed coronal mass ejection (CME) of 8 April 2010 provided an opportunity for space weather predictions from both established and developmental techniques to be made from near–real time data received from the SOHO and STEREO spacecraft; the STEREO spacecraft provide a unique view of Earth-directed events from outside the Sun-Earth line. Although the near–real time data transmitted by the STEREO Space Weather Beacon are significantly poorer in quality than the subsequently downlinked science data, the use of these data has the advantage that near–real time analysis is possible, allowing actual forecasts to be made. The fact that such forecasts cannot be biased by any prior knowledge of the actual arrival time at Earth provides an opportunity for an unbiased comparison between several established and developmental forecasting techniques. We conclude that for forecasts based on the STEREO coronagraph data, it is important to take account of the subsequent acceleration/deceleration of each CME through interaction with the solar wind, while predictions based on measurements of CMEs made by the STEREO Heliospheric Imagers would benefit from higher temporal and spatial resolution. Space weather forecasting tools must work with near–real time data; such data, when provided by science missions, is usually highly compressed and/or reduced in temporal/spatial resolution and may also have significant gaps in coverage, making such forecasts more challenging.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
This work presents a novel approach in order to increase the recognition power of Multiscale Fractal Dimension (MFD) techniques, when applied to image classification. The proposal uses Functional Data Analysis (FDA) with the aim of enhancing the MFD technique precision achieving a more representative descriptors vector, capable of recognizing and characterizing more precisely objects in an image. FDA is applied to signatures extracted by using the Bouligand-Minkowsky MFD technique in the generation of a descriptors vector from them. For the evaluation of the obtained improvement, an experiment using two datasets of objects was carried out. A dataset was used of characters shapes (26 characters of the Latin alphabet) carrying different levels of controlled noise and a dataset of fish images contours. A comparison with the use of the well-known methods of Fourier and wavelets descriptors was performed with the aim of verifying the performance of FDA method. The descriptor vectors were submitted to Linear Discriminant Analysis (LDA) classification method and we compared the correctness rate in the classification process among the descriptors methods. The results demonstrate that FDA overcomes the literature methods (Fourier and wavelets) in the processing of information extracted from the MFD signature. In this way, the proposed method can be considered as an interesting choice for pattern recognition and image classification using fractal analysis.
Resumo:
This paper presents the groundwater favorability mapping on a fractured terrain in the eastern portion of Sao Paulo State, Brazil. Remote sensing, airborne geophysical data, photogeologic interpretation, geologic and geomorphologic maps and geographic information system (GIS) techniques have been used. The results of cross-tabulation between these maps and well yield data allowed groundwater prospective parameters in a fractured-bedrock aquifer. These prospective parameters are the base for the favorability analysis whose principle is based on the knowledge-driven method. The mutticriteria analysis (weighted linear combination) was carried out to give a groundwater favorabitity map, because the prospective parameters have different weights of importance and different classes of each parameter. The groundwater favorability map was tested by cross-tabulation with new well yield data and spring occurrence. The wells with the highest values of productivity, as well as all the springs occurrence are situated in the excellent and good favorabitity mapped areas. It shows good coherence between the prospective parameters and the well yield and the importance of GIS techniques for definition of target areas for detail study and wells location. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
Housing is an important component of wealth for a typical household in many countries. The objective of this paper is to investigate the effect of real-estate price variation on welfare, trying to close a gap between the welfare literature in Brazil and that in the U.S., the U.K., and other developed countries. Our first motivation relates to the fact that real estate is probably more important here than elsewhere as a proportion of wealth, which potentially makes the impact of a price change bigger here. Our second motivation relates to the fact that real-estate prices boomed in Brazil in the last five years. Prime real estate in Rio de Janeiro and São Paulo have tripled in value in that period, and a smaller but generalized increase has been observed throughout the country. Third, we have also seen a recent consumption boom in Brazil in the last five years. Indeed, the recent rise of some of the poor to middle-income status is well documented not only for Brazil but for other emerging countries as well. Regarding consumption and real-estate prices in Brazil, one cannot imply causality from correlation, but one can do causal inference with an appropriate structural model and proper inference, or with a proper inference in a reduced-form setup. Our last motivation is related to the complete absence of studies of this kind in Brazil, which makes ours a pioneering study. We assemble a panel-data set for the determinants of non-durable consumption growth by Brazilian states, merging the techniques and ideas in Campbell and Cocco (2007) and in Case, Quigley and Shiller (2005). With appropriate controls, and panel-data methods, we investigate whether house-price variation has a positive effect on non-durable consumption. The results show a non-negligible significant impact of the change in the price of real estate on welfare consumption), although smaller then what Campbell and Cocco have found. Our findings support the view that the channel through which house prices affect consumption is a financial one.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Petroleum contamination impact on macrobenthic communities in the northeast portion of Todos os Santos Bay was assessed combining in multivariate analyses, chemical parameters such as aliphatic and polycyclic aromatic hydrocarbon indices and concentration ratios with benthic ecological parameters. Sediment samples were taken in August 2000 with a 0.05 m(2) van Veen grab at 28 sampling locations. The predominance of n-alkanes with more than 24 carbons, together with CPI values close to one, and the fact that most of the stations showed UCM/resolved aliphatic hydrocarbons ratios (UCM:R) higher than two, indicated a high degree of anthropogenic contribution, the presence of terrestrial plant detritus, petroleum products and evidence of chronic oil pollution. The indices used to determine the origin of PAH indicated the occurrence of a petrogenic contribution. A pyrolytic contribution constituted mainly by fossil fuel combustion derived PAH was also observed. The results of the stepwise multiple regression analysis performed with chemical data and benthic ecological descriptors demonstrated that not only total PAH concentrations but also specific concentration ratios or indices such as >= C24:< C24, An/178 and Fl/Fl + Py, are determining the structure of benthic communities within the study area. According to the BIO-ENV results petroleum related variables seemed to have a main influence on macrofauna community structure. The PCA ordination performed with the chemical data resulted in the formation of three groups of stations. The decrease in macrofauna density, number of species and diversity from groups III to I seemed to be related to the occurrence of high aliphatic hydrocarbon and PAH concentrations associated with fine sediments. Our results showed that macrobenthic communities in the northeast portion of Todos os Santos Bay are subjected to the impact of chronic oil pollution as was reflected by the reduction in the number of species and diversity. These results emphasise the importance to combine in multivariate approaches not only total hydrocarbon concentrations but also indices, isomer pair ratios and specific compound concentrations with biological data to improve the assessment of anthropogenic impact on marine ecosystems. (c) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Concentrations of 39 organic compounds were determined in three fractions (head, heart and tail) obtained from the pot still distillation of fermented sugarcane juice. The results were evaluated using analysis of variance (ANOVA), Tukey's test, principal component analysis (PCA), hierarchical cluster analysis (HCA) and linear discriminant analysis (LDA). According to PCA and HCA, the experimental data lead to the formation of three clusters. The head fractions give rise to a more defined group. The heart and tail fractions showed some overlap consistent with its acid composition. The predictive ability of calibration and validation of the model generated by LDA for the three fractions classification were 90.5 and 100%, respectively. This model recognized as the heart twelve of the thirteen commercial cachacas (92.3%) with good sensory characteristics, thus showing potential for guiding the process of cuts.
Resumo:
The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.
Resumo:
Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Portable system of energy dispersive X-ray fluorescence was used to determine the elemental composition of 68 pottery fragments from Sambaqui do Bacanga, an archeological site in Sao Luis, Maranhao, Brazil. This site was occupied from 6600 BP until 900 BP. By determining the element chemical composition of those fragments, it was possible to verify the existence of engobe in 43 pottery fragments. Obtained from two-dimensional graphs and hierarchical cluster analysis performed in fragments of stratigraphies from surface and 113-cm level, and 10 to 20, 132 and 144-cm level, it was possible to group these fragments in five distinct groups, according to their stratigraphies. The results of data grouping (two-dimensional graphics) are in agreement with hierarchical cluster analysis by Ward method. Copyright (C) 2011 John Wiley & Sons, Ltd.