5 resultados para Rounded atelectasis
em Universitat de Girona, Spain
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’t includes below detection limits and/or zero values, and since most of the geological data responds to lognormal distributions, these “zero data” represent a mathematical challenge for the interpretation. We need to start by recognizing that there are zero values in geology. For example the amount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-exists with nepheline. Another common essential zero is a North azimuth, however we can always change that zero for the value of 360°. These are known as “Essential zeros”, but what can we do with “Rounded zeros” that are the result of below the detection limit of the equipment? Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimes we need to differentiate between a sodic and a potassic alteration. Pre-classification into groups requires a good knowledge of the distribution of the data and the geochemical characteristics of the groups which is not always available. Considering the zero values equal to the limit of detection of the used equipment will generate spurious distributions, especially in ternary diagrams. Same situation will occur if we replace the zero values by a small amount using non-parametric or parametric techniques (imputation). The method that we are proposing takes into consideration the well known relationships between some elements. For example, in copper porphyry deposits, there is always a good direct correlation between the copper values and the molybdenum ones, but while copper will always be above the limit of detection, many of the molybdenum values will be “rounded zeros”. So, we will take the lower quartile of the real molybdenum values and establish a regression equation with copper, and then we will estimate the “rounded” zero values of molybdenum by their corresponding copper values. The method could be applied to any type of data, provided we establish first their correlation dependency. One of the main advantages of this method is that we do not obtain a fixed value for the “rounded zeros”, but one that depends on the value of the other variable. Key words: compositional data analysis, treatment of zeros, essential zeros, rounded zeros, correlation dependency
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositional data. Nevertheless, the use of this methodology is only possible for those data sets without null values. Consequently, in those data sets where the zeros are present, a previous treatment becomes necessary. Last advances in the treatment of compositional zeros have been centered especially in the zeros of structural nature and in the rounded zeros. These tools do not contemplate the particular case of count compositional data sets with null values. In this work we deal with \count zeros" and we introduce a treatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichlet probability distribution as a prior and we estimate the posterior probabilities. Then we apply a multiplicative modi¯cation for the non-zero values. We present a case study where this new methodology is applied. Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros
Resumo:
In order to develop applications for z;isual interpretation of medical images, the early detection and evaluation of microcalcifications in digital mammograms is verg important since their presence is often associated with a high incidence of breast cancers. Accurate classification into benign and malignant groups would help improve diagnostic sensitivity as well as reduce the number of unnecessa y biopsies. The challenge here is the selection of the useful features to distinguish benign from malignant micro calcifications. Our purpose in this work is to analyse a microcalcification evaluation method based on a set of shapebased features extracted from the digitised mammography. The segmentation of the microcalcifications is performed using a fixed-tolerance region growing method to extract boundaries of calcifications with manually selected seed pixels. Taking into account that shapes and sizes of clustered microcalcifications have been associated with a high risk of carcinoma based on digerent subjective measures, such as whether or not the calcifications are irregular, linear, vermiform, branched, rounded or ring like, our efforts were addressed to obtain a feature set related to the shape. The identification of the pammeters concerning the malignant character of the microcalcifications was performed on a set of 146 mammograms with their real diagnosis known in advance from biopsies. This allowed identifying the following shape-based parameters as the relevant ones: Number of clusters, Number of holes, Area, Feret elongation, Roughness, and Elongation. Further experiments on a set of 70 new mammogmms showed that the performance of the classification scheme is close to the mean performance of three expert radiologists, which allows to consider the proposed method for assisting the diagnosis and encourages to continue the investigation in the sense of adding new features not only related to the shape