10 resultados para Classification, Markov chain Monte Carlo, k-nearest neighbours
em Universitat de Girona, Spain
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent essential zeros or because it is below detection limit rounded zeros. Because the second kind of zeros is usually understood as a trace too small to measure, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martn-Fernndez, Barcel-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts and thus the metric properties should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martn-Fernndez, Barcel-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martn-Fernndez, Barcel-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is natural in the sense that it recovers the true composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
In networks with small buffers, such as optical packet switching based networks, the convolution approach is presented as one of the most accurate method used for the connection admission control. Admission control and resource management have been addressed in other works oriented to bursty traffic and ATM. This paper focuses on heterogeneous traffic in OPS based networks. Using heterogeneous traffic and bufferless networks the enhanced convolution approach is a good solution. However, both methods (CA and ECA) present a high computational cost for high number of connections. Two new mechanisms (UMCA and ISCA) based on Monte Carlo method are proposed to overcome this drawback. Simulation results show that our proposals achieve lower computational cost compared to enhanced convolution approach with an small stochastic error in the probability estimation
Resumo:
Realistic rendering animation is known to be an expensive processing task when physically-based global illumination methods are used in order to improve illumination details. This paper presents an acceleration technique to compute animations in radiosity environments. The technique is based on an interpolated approach that exploits temporal coherence in radiosity. A fast global Monte Carlo pre-processing step is introduced to the whole computation of the animated sequence to select important frames. These are fully computed and used as a base for the interpolation of all the sequence. The approach is completely view-independent. Once the illumination is computed, it can be visualized by any animated camera. Results present significant high speed-ups showing that the technique could be an interesting alternative to deterministic methods for computing non-interactive radiosity animations for moderately complex scenarios
Resumo:
Diffusion tensor magnetic resonance imaging, which measures directional information of water diffusion in the brain, has emerged as a powerful tool for human brain studies. In this paper, we introduce a new Monte Carlo-based fiber tracking approach to estimate brain connectivity. One of the main characteristics of this approach is that all parameters of the algorithm are automatically determined at each point using the entropy of the eigenvalues of the diffusion tensor. Experimental results show the good performance of the proposed approach
Resumo:
The author studies the error and complexity of the discrete random walk Monte Carlo technique for radiosity, using both the shooting and gathering methods. The author shows that the shooting method exhibits a lower complexity than the gathering one, and under some constraints, it has a linear complexity. This is an improvement over a previous result that pointed to an O(n log n) complexity. The author gives and compares three unbiased estimators for each method, and obtains closed forms and bounds for their variances. The author also bounds the expected value of the mean square error (MSE). Some of the results obtained are also shown
Resumo:
The R-package compositionsis a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a regular sub- composition (where all parts are actually observed and the datum behaves typically) and a problematic subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros
Resumo:
The Hardy-Weinberg law, formulated about 100 years ago, states that under certain assumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur in the proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p. There are many statistical tests being used to check whether empirical marker data obeys the Hardy-Weinberg principle. Among these are the classical xi-square test (with or without continuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combination with Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE) are numerical in nature, requiring the computation of a test statistic and a p-value. There is however, ample space for the use of graphics in HWE tests, in particular for the ternary plot. Nowadays, many genetical studies are using genetical markers known as Single Nucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the counts one typically computes genotype frequencies and allele frequencies. These frequencies satisfy the unit-sum constraint, and their analysis therefore falls within the realm of compositional data analysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotype frequencies can be adequately represented in a ternary plot. Compositions that are in exact HWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected in a statistical test are typically close" to the parabola, whereas compositions that differ significantly from HWE are far". By rewriting the statistics used to test for HWE in terms of heterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted in the ternary plot. This way, compositions can be tested for HWE purely on the basis of their position in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphical representations where large numbers of SNPs can be tested for HWE in a single graph. Several examples of graphical tests for HWE (implemented in R software), will be shown, using SNP data from different human populations
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
A statistical method for classification of sags their origin downstream or upstream from the recording point is proposed in this work. The goal is to obtain a statistical model using the sag waveforms useful to characterise one type of sags and to discriminate them from the other type. This model is built on the basis of multi-way principal component analysis an later used to project the available registers in a new space with lower dimension. Thus, a case base of diagnosed sags is built in the projection space. Finally classification is done by comparing new sags against the existing in the case base. Similarity is defined in the projection space using a combination of distances to recover the nearest neighbours to the new sag. Finally the method assigns the origin of the new sag according to the origin of their neighbours
Resumo:
A problem in the archaeometric classification of Catalan Renaissance pottery is the fact, that the clay supply of the pottery workshops was centrally organized by guilds, and therefore usually all potters of a single production centre produced chemically similar ceramics. However, analysing the glazes of the ware usually a large number of inclusions in the glaze is found, which reveal technological differences between single workshops. These inclusions have been used by the potters in order to opacify the transparent glaze and to achieve a white background for further decoration. In order to distinguish different technological preparation procedures of the single workshops, at a Scanning Electron Microscope the chemical composition of those inclusions as well as their size in the two-dimensional cut is recorded. Based on the latter, a frequency distribution of the apparent diameters is estimated for each sample and type of inclusion. Following an approach by S.D. Wicksell (1925), it is principally possible to transform the distributions of the apparent 2D-diameters back to those of the true three-dimensional bodies. The applicability of this approach and its practical problems are examined using different ways of kernel density estimation and Monte-Carlo tests of the methodology. Finally, it is tested in how far the obtained frequency distributions can be used to classify the pottery