966 resultados para gaussian mixture model
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Mestrado em Radiações Aplicadas às Tecnologias da Saúde. Área de especialização: Ressonância Magnética
Resumo:
Given a set of mixed spectral (multispectral or hyperspectral) vectors, linear spectral mixture analysis, or linear unmixing, aims at estimating the number of reference substances, also called endmembers, their spectral signatures, and their abundance fractions. This paper presents a new method for unsupervised endmember extraction from hyperspectral data, termed vertex component analysis (VCA). The algorithm exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. In a series of experiments using simulated and real data, the VCA algorithm competes with state-of-the-art methods, with a computational complexity between one and two orders of magnitude lower than the best available method.
Resumo:
Proceedings of International Conference Conference Volume 7830 Image and Signal Processing for Remote Sensing XVI Lorenzo Bruzzone Toulouse, France | September 20, 2010
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
In this study the inhalation doses and respective risk are calculated for the population living within a 20 km radius of a coal-fired power plant. The dispersion and deposition of natural radionuclides were simulated by a Gaussian dispersion model estimating the ground level activity concentration. The annual effective dose and total risk were 0.03205 mSv/y and 1.25 x 10-8, respectively. The effective dose is lower than the limit established by the ICRP and the risk is lower than the limit proposed by the U.S. EPA, which means that the considered exposure does not pose any risk for the public health.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica
Resumo:
Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.
Resumo:
Hyperspectral unmixing methods aim at the decomposition of a hyperspectral image into a collection endmember signatures, i.e., the radiance or reflectance of the materials present in the scene, and the correspondent abundance fractions at each pixel in the image. This paper introduces a new unmixing method termed dependent component analysis (DECA). This method is blind and fully automatic and it overcomes the limitations of unmixing methods based on Independent Component Analysis (ICA) and on geometrical based approaches. DECA is based on the linear mixture model, i.e., each pixel is a linear mixture of the endmembers signatures weighted by the correspondent abundance fractions. These abundances are modeled as mixtures of Dirichlet densities, thus enforcing the non-negativity and constant sum constraints, imposed by the acquisition process. The endmembers signatures are inferred by a generalized expectation-maximization (GEM) type algorithm. The paper illustrates the effectiveness of DECA on synthetic and real hyperspectral images.
Resumo:
Linear unmixing decomposes an hyperspectral image into a collection of re ectance spectra, called endmember signatures, and a set corresponding abundance fractions from the respective spatial coverage. This paper introduces vertex component analysis, an unsupervised algorithm to unmix linear mixtures of hyperpsectral data. VCA exploits the fact that endmembers occupy vertices of a simplex, and assumes the presence of pure pixels in data. VCA performance is illustrated using simulated and real data. VCA competes with state-of-the-art methods with much lower computational complexity.
Resumo:
Tese apresentada como requisito parcial para obtenção do grau de Doutor em Estatística e Gestão de Informação pelo Instituto Superior de Estatística e Gestão de Informação da Universidade Nova de Lisboa
Resumo:
Coupled carbon/climate models are predicting changes in Amazon carbon and water cycles for the near future, with conversion of forest into savanna-like vegetation. However, empirical data to support these models are still scarce for Amazon. Facing this scenario, we investigated whether conservation status and changes in rainfall regime have influenced the forest-savanna mosaic over 20 years, from 1986 to 2006, in a transitional area in Northern Amazonia. By applying a spectral linear mixture model to a Landsat-5-TM time series, we identified protected savanna enclaves within a strictly protected nature reserve (Maracá Ecological Station - MES) and non-protected forest islands at its outskirts and compared their areas among 1986/1994/2006. The protected savanna enclaves decreased 26% in the 20-years period at an average rate of 0.131 ha year-1, with a greater reduction rate observed during times of higher precipitation, whereas the non-protected forest islands remained stable throughout the period of study, balancing the encroachment of forests into the savanna during humid periods and savannization during reduced rainfall periods. Thus, keeping favorable climate conditions, the MES conservation status would continue to favor the forest encroachment upon savanna, while the non-protected outskirt areas would remain resilient to disturbance regimes. However, if the increases in the frequency of dry periods predicted by climate models for this region are confirmed, future changes in extension and directions of forest limits will be affected, disrupting ecological services as carbon storage and the maintenance of local biodiversity.
Resumo:
Interpretability and power of genome-wide association studies can be increased by imputing unobserved genotypes, using a reference panel of individuals genotyped at higher marker density. For many markers, genotypes cannot be imputed with complete certainty, and the uncertainty needs to be taken into account when testing for association with a given phenotype. In this paper, we compare currently available methods for testing association between uncertain genotypes and quantitative traits. We show that some previously described methods offer poor control of the false-positive rate (FPR), and that satisfactory performance of these methods is obtained only by using ad hoc filtering rules or by using a harsh transformation of the trait under study. We propose new methods that are based on exact maximum likelihood estimation and use a mixture model to accommodate nonnormal trait distributions when necessary. The new methods adequately control the FPR and also have equal or better power compared to all previously described methods. We provide a fast software implementation of all the methods studied here; our new method requires computation time of less than one computer-day for a typical genome-wide scan, with 2.5 M single nucleotide polymorphisms and 5000 individuals.
Resumo:
In this work we present a method for the image analysisof Magnetic Resonance Imaging (MRI) of fetuses. Our goalis to segment the brain surface from multiple volumes(axial, coronal and sagittal acquisitions) of a fetus. Tothis end we propose a two-step approach: first, a FiniteGaussian Mixture Model (FGMM) will segment the image into3 classes: brain, non-brain and mixture voxels. Second, aMarkov Random Field scheme will be applied tore-distribute mixture voxels into either brain ornon-brain tissue. Our main contributions are an adaptedenergy computation and an extended neighborhood frommultiple volumes in the MRF step. Preliminary results onfour fetuses of different gestational ages will be shown.