970 resultados para Multivariate data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Synoptic climatology relates the atmospheric circulation with the surface environment. The aim of this study is to examine the variability of the surface meteorological patterns, which are developing under different synoptic scale categories over a suburban area with complex topography. Multivariate Data Analysis techniques were performed to a data set with surface meteorological elements. Three principal components related to the thermodynamic status of the surface environment and the two components of the wind speed were found. The variability of the surface flows was related with atmospheric circulation categories by applying Correspondence Analysis. Similar surface thermodynamic fields develop under cyclonic categories, which are contrasted with the anti-cyclonic category. A strong, steady wind flow characterized by high shear values develops under the cyclonic Closed Low and the anticyclonic H–L categories, in contrast to the variable weak flow under the anticyclonic Open Anticyclone category.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper is reported the use of the chromatographic profiles of volatiles to determine disease markers in plants - in this case, leaves of Eucalyptus globulus contaminated by the necrotroph fungus Teratosphaeria nubilosa. The volatile fraction was isolated by headspace solid phase microextraction (HS-SPME) and analyzed by comprehensive two-dimensional gas chromatography-fast quadrupole mass spectrometry (GC. ×. GC-qMS). For the correlation between the metabolic profile described by the chromatograms and the presence of the infection, unfolded-partial least squares discriminant analysis (U-PLS-DA) with orthogonal signal correction (OSC) were employed. The proposed method was checked to be independent of factors such as the age of the harvested plants. The manipulation of the mathematical model obtained also resulted in graphic representations similar to real chromatograms, which allowed the tentative identification of more than 40 compounds potentially useful as disease biomarkers for this plant/pathogen pair. The proposed methodology can be considered as highly reliable, since the diagnosis is based on the whole chromatographic profile rather than in the detection of a single analyte. © 2013 Elsevier B.V..

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Petroleum contamination impact on macrobenthic communities in the northeast portion of Todos os Santos Bay was assessed combining in multivariate analyses, chemical parameters such as aliphatic and polycyclic aromatic hydrocarbon indices and concentration ratios with benthic ecological parameters. Sediment samples were taken in August 2000 with a 0.05 m(2) van Veen grab at 28 sampling locations. The predominance of n-alkanes with more than 24 carbons, together with CPI values close to one, and the fact that most of the stations showed UCM/resolved aliphatic hydrocarbons ratios (UCM:R) higher than two, indicated a high degree of anthropogenic contribution, the presence of terrestrial plant detritus, petroleum products and evidence of chronic oil pollution. The indices used to determine the origin of PAH indicated the occurrence of a petrogenic contribution. A pyrolytic contribution constituted mainly by fossil fuel combustion derived PAH was also observed. The results of the stepwise multiple regression analysis performed with chemical data and benthic ecological descriptors demonstrated that not only total PAH concentrations but also specific concentration ratios or indices such as >= C24:< C24, An/178 and Fl/Fl + Py, are determining the structure of benthic communities within the study area. According to the BIO-ENV results petroleum related variables seemed to have a main influence on macrofauna community structure. The PCA ordination performed with the chemical data resulted in the formation of three groups of stations. The decrease in macrofauna density, number of species and diversity from groups III to I seemed to be related to the occurrence of high aliphatic hydrocarbon and PAH concentrations associated with fine sediments. Our results showed that macrobenthic communities in the northeast portion of Todos os Santos Bay are subjected to the impact of chronic oil pollution as was reflected by the reduction in the number of species and diversity. These results emphasise the importance to combine in multivariate approaches not only total hydrocarbon concentrations but also indices, isomer pair ratios and specific compound concentrations with biological data to improve the assessment of anthropogenic impact on marine ecosystems. (c) 2008 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data visualization techniques are powerful in the handling and analysis of multivariate systems. One such technique known as parallel coordinates was used to support the diagnosis of an event, detected by a neural network-based monitoring system, in a boiler at a Brazilian Kraft pulp mill. Its attractiveness is the possibility of the visualization of several variables simultaneously. The diagnostic procedure was carried out step-by-step going through exploratory, explanatory, confirmatory, and communicative goals. This tool allowed the visualization of the boiler dynamics in an easier way, compared to commonly used univariate trend plots. In addition it facilitated analysis of other aspects, namely relationships among process variables, distinct modes of operation and discrepant data. The whole analysis revealed firstly that the period involving the detected event was associated with a transition between two distinct normal modes of operation, and secondly the presence of unusual changes in process variables at this time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many multifactorial biologic effects, particularly in the context of complex human diseases, are still poorly understood. At the same time, the systematic acquisition of multivariate data has become increasingly easy. The use of such data to analyze and model complex phenotypes, however, remains a challenge. Here, a new analytic approach is described, termed coreferentiality, together with an appropriate statistical test. Coreferentiality is the indirect relation of two variables of functional interest in respect to whether they parallel each other in their respective relatedness to multivariate reference data, which can be informative for a complex effect or phenotype. It is shown that the power of coreferentiality testing is comparable to multiple regression analysis, sufficient even when reference data are informative only to a relatively small extent of 2.5%, and clearly exceeding the power of simple bivariate correlation testing. Thus, coreferentiality testing uses the increased power of multivariate analysis, however, in order to address a more straightforward interpretable bivariate relatedness. Systematic application of this approach could substantially improve the analysis and modeling of complex phenotypes, particularly in the context of human study where addressing functional hypotheses by direct experimentation is often difficult.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different ""frailties"" or latent variables are considered to capture the correlation among the survival times for the same individual. We assume Weibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Min/max autocorrelation factor analysis (MAFA) and dynamic factor analysis (DFA) are complementary techniques for analysing short (> 15-25 y), non-stationary, multivariate data sets. We illustrate the two techniques using catch rate (cpue) time-series (1982-2001) for 17 species caught during trawl surveys off Mauritania, with the NAO index, an upwelling index, sea surface temperature, and an index of fishing effort as explanatory variables. Both techniques gave coherent results, the most important common trend being a decrease in cpue during the latter half of the time-series, and the next important being an increase during the first half. A DFA model with SST and UPW as explanatory variables and two common trends gave good fits to most of the cpue time-series. (c) 2004 International Council for the Exploration of the Sea. Published by Elsevier Ltd. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Eight depositional sequences (DS) delimited by regional disconformities had been recognized in the Miocene of Lisbon and Setúbal Peninsula areas. In the case of the western coast of the Setúbal Peninsula, outcrops consisting of Lower Burdigalian to Lower Tortonian sediments were studied. The stratigraphic zonography and the environmental considerations are mainly supported on data concerning to foraminifera, ostracoda, vertebrates and palynomorphs. The first mineralogical and geochemical data determined for Foz da Fonte, Penedo Sul and Penedo Norte sedimentary sequences are presented. These analytical data mainly correspond to the sediments' fine fractions. Mineralogical data are based on X-ray diffraction (XRD), carried out on both the less than 38 nm and 2 nm fractions. Qualitative and semi-quantitative determinations of clay and non-clay minerals were obtained for both fractions. The clay minerals assemblages complete the lithostratigraphic and paleoenvironmental data obtained by stratigraphic and palaeontological studies. Some palaeomagnetic and isotopic data are discussed and correlated with the mineralogical data. Multivariate data analysis (Principal Components Analysis) of the mineralogical data was carried out using both R-mode and Q-mode factor analysis.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.