930 resultados para Compositional data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex industrial plants exhibit multiple interactions among smaller parts and with human operators. Failure in one part can propagate across subsystem boundaries causing a serious disaster. This paper analyzes the industrial accident data series in the perspective of dynamical systems. First, we process real world data and show that the statistics of the number of fatalities reveal features that are well described by power law (PL) distributions. For early years, the data reveal double PL behavior, while, for more recent time periods, a single PL fits better into the experimental data. Second, we analyze the entropy of the data series statistics over time. Third, we use the Kullback–Leibler divergence to compare the empirical data and multidimensional scaling (MDS) techniques for data analysis and visualization. Entropy-based analysis is adopted to assess complexity, having the advantage of yielding a single parameter to express relationships between the data. The classical and the generalized (fractional) entropy and Kullback–Leibler divergence are used. The generalized measures allow a clear identification of patterns embedded in the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Functional Data Analysis (FDA) deals with samples where a whole function is observedfor each individual. A particular case of FDA is when the observed functions are densityfunctions, that are also an example of infinite dimensional compositional data. In thiswork we compare several methods for dimensionality reduction for this particular typeof data: functional principal components analysis (PCA) with or without a previousdata transformation and multidimensional scaling (MDS) for diferent inter-densitiesdistances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (householdsincome distributions)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, compositional data has been identified with closed data, and the simplex has been considered as the natural sample space of this kind of data. In our opinion, the emphasis on the constrained nature ofcompositional data has contributed to mask its real nature. More crucial than the constraining property of compositional data is the scale-invariant property of this kind of data. Indeed, when we are considering only few parts of a full composition we are not working with constrained data but our data are still compositional. We believe that it is necessary to give a more precisedefinition of composition. This is the aim of this oral contribution

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The biplot has proved to be a powerful descriptive and analytical tool in many areasof applications of statistics. For compositional data the necessary theoreticaladaptation has been provided, with illustrative applications, by Aitchison (1990) andAitchison and Greenacre (2002). These papers were restricted to the interpretation ofsimple compositional data sets. In many situations the problem has to be described insome form of conditional modelling. For example, in a clinical trial where interest isin how patients’ steroid metabolite compositions may change as a result of differenttreatment regimes, interest is in relating the compositions after treatment to thecompositions before treatment and the nature of the treatments applied. To study thisthrough a biplot technique requires the development of some form of conditionalcompositional biplot. This is the purpose of this paper. We choose as a motivatingapplication an analysis of the 1992 US President ial Election, where interest may be inhow the three-part composition, the percentage division among the three candidates -Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethniccomposition and on the urban-rural composition of the state. The methodology ofconditional compositional biplots is first developed and a detailed interpretation of the1992 US Presidential Election provided. We use a second application involving theconditional variability of tektite mineral compositions with respect to major oxidecompositions to demonstrate some hazards of simplistic interpretation of biplots.Finally we conjecture on further possible applications of conditional compositionalbiplots

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Starting with logratio biplots for compositional data, which are based on the principle of subcompositional coherence, and then adding weights, as in correspondence analysis, we rediscover Lewi's spectral map and many connections to analyses of two-way tables of non-negative data. Thanks to the weighting, the method also achieves the property of distributional equivalence

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare two methods for visualising contingency tables and developa method called the ratio map which combines the good properties of both.The first is a biplot based on the logratio approach to compositional dataanalysis. This approach is founded on the principle of subcompositionalcoherence, which assures that results are invariant to considering subsetsof the composition. The second approach, correspondence analysis, isbased on the chi-square approach to contingency table analysis. Acornerstone of correspondence analysis is the principle of distributionalequivalence, which assures invariance in the results when rows or columnswith identical conditional proportions are merged. Both methods may bedescribed as singular value decompositions of appropriately transformedmatrices. Correspondence analysis includes a weighting of the rows andcolumns proportional to the margins of the table. If this idea of row andcolumn weights is introduced into the logratio biplot, we obtain a methodwhich obeys both principles of subcompositional coherence and distributionalequivalence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Whether for investigative or intelligence aims, crime analysts often face up the necessity to analyse the spatiotemporal distribution of crimes or traces left by suspects. This article presents a visualisation methodology supporting recurrent practical analytical tasks such as the detection of crime series or the analysis of traces left by digital devices like mobile phone or GPS devices. The proposed approach has led to the development of a dedicated tool that has proven its effectiveness in real inquiries and intelligence practices. It supports a more fluent visual analysis of the collected data and may provide critical clues to support police operations as exemplified by the presented case studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Office of Special Investigations at Iowa Department of Transportation (DOT) collects FWD data on regular basis to evaluate pavement structural conditions. The primary objective of this study was to develop a fully-automated software system for rapid processing of the FWD data along with a user manual. The software system automatically reads the FWD raw data collected by the JILS-20 type FWD machine that Iowa DOT owns, processes and analyzes the collected data with the rapid prediction algorithms developed during the phase I study. This system smoothly integrates the FWD data analysis algorithms and the computer program being used to collect the pavement deflection data. This system can be used to assess pavement condition, estimate remaining pavement life, and eventually help assess pavement rehabilitation strategies by the Iowa DOT pavement management team. This report describes the developed software in detail and can also be used as a user-manual for conducting simulation studies and detailed analyses. *********************** Large File ***********************

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In general, laboratory activities are costly in terms of time, space, and money. As such, the ability to provide realistically simulated laboratory data that enables students to practice data analysis techniques as a complementary activity would be expected to reduce these costs while opening up very interesting possibilities. In the present work, a novel methodology is presented for design of analytical chemistry instrumental analysis exercises that can be automatically personalized for each student and the results evaluated immediately. The proposed system provides each student with a different set of experimental data generated randomly while satisfying a set of constraints, rather than using data obtained from actual laboratory work. This allows the instructor to provide students with a set of practical problems to complement their regular laboratory work along with the corresponding feedback provided by the system's automatic evaluation process. To this end, the Goodle Grading Management System (GMS), an innovative web-based educational tool for automating the collection and assessment of practical exercises for engineering and scientific courses, was developed. The proposed methodology takes full advantage of the Goodle GMS fusion code architecture. The design of a particular exercise is provided ad hoc by the instructor and requires basic Matlab knowledge. The system has been employed with satisfactory results in several university courses. To demonstrate the automatic evaluation process, three exercises are presented in detail. The first exercise involves a linear regression analysis of data and the calculation of the quality parameters of an instrumental analysis method. The second and third exercises address two different comparison tests, a comparison test of the mean and a t-paired test.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reliability analysis is a well established branch of statistics that deals with the statistical study of different aspects of lifetimes of a system of components. As we pointed out earlier that major part of the theory and applications in connection with reliability analysis were discussed based on the measures in terms of distribution function. In the beginning chapters of the thesis, we have described some attractive features of quantile functions and the relevance of its use in reliability analysis. Motivated by the works of Parzen (1979), Freimer et al. (1988) and Gilchrist (2000), who indicated the scope of quantile functions in reliability analysis and as a follow up of the systematic study in this connection by Nair and Sankaran (2009), in the present work we tried to extend their ideas to develop necessary theoretical framework for lifetime data analysis. In Chapter 1, we have given the relevance and scope of the study and a brief outline of the work we have carried out. Chapter 2 of this thesis is devoted to the presentation of various concepts and their brief reviews, which were useful for the discussions in the subsequent chapters .In the introduction of Chapter 4, we have pointed out the role of ageing concepts in reliability analysis and in identifying life distributions .In Chapter 6, we have studied the first two L-moments of residual life and their relevance in various applications of reliability analysis. We have shown that the first L-moment of residual function is equivalent to the vitality function, which have been widely discussed in the literature .In Chapter 7, we have defined percentile residual life in reversed time (RPRL) and derived its relationship with reversed hazard rate (RHR). We have discussed the characterization problem of RPRL and demonstrated with an example that the RPRL for given does not determine the distribution uniquely