939 resultados para Transform statistics
Resumo:
With the widespread proliferation of computers, many human activities entail the use of automatic image analysis. The basic features used for image analysis include color, texture, and shape. In this paper, we propose a new shape description method, called Hough Transform Statistics (HTS), which uses statistics from the Hough space to characterize the shape of objects or regions in digital images. A modified version of this method, called Hough Transform Statistics neighborhood (HTSn), is also presented. Experiments carried out on three popular public image databases showed that the HTS and HTSn descriptors are robust, since they presented precision-recall results much better than several other well-known shape description methods. When compared to Beam Angle Statistics (BAS) method, a shape description method that inspired their development, both the HTS and the HTSn methods presented inferior results regarding the precision-recall criterion, but superior results in the processing time and multiscale separability criteria. The linear complexity of the HTS and the HTSn algorithms, in contrast to BAS, make them more appropriate for shape analysis in high-resolution image retrieval tasks when very large databases are used, which are very common nowadays. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
In this paper we propose a novel method for shape analysis called HTS (Hough Transform Statistics), which uses statistics from Hough Transform space in order to characterize the shape of objects in digital images. Experimental results showed that the HTS descriptor is robust and presents better accuracy than some traditional shape description methods. Furthermore, HTS algorithm has linear complexity, which is an important requirement for content based image retrieval from large databases. © 2013 IEEE.
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Diversas atividades nos dias atuais podem ser beneficiadas pela da análise automatizada de imagens como o reconhecimento biométrico de pessoas, a busca de imagens por conteúdo e o diagnóstico médico. Dentre as principais características que podem ser analisadas em uma imagem a fim de obter informações sobre seu conteúdo encontra-se a forma de objetos e regiões da mesma. Neste trabalho propõe-se um novo descritor de formas denominado HTS (Hough Transform Statistics) o qual se baseia no espaço de Hough para representar e reconhecer objetos em imagens por suas formas. Os resultados obtidos sobre algumas bases de imagens públicas mostram que o HTS, além de apresentar altas taxas de acerto, é muito rápido. Discute-se também uma adaptação na etapa de extração de características do descritor a qual fez com que os resultados melhorassem bastante sem deixar o método muito mais lento.
Resumo:
For certain observing types, such as those that are remotely sensed, the observation errors are correlated and these correlations are state- and time-dependent. In this work, we develop a method for diagnosing and incorporating spatially correlated and time-dependent observation error in an ensemble data assimilation system. The method combines an ensemble transform Kalman filter with a method that uses statistical averages of background and analysis innovations to provide an estimate of the observation error covariance matrix. To evaluate the performance of the method, we perform identical twin experiments using the Lorenz ’96 and Kuramoto-Sivashinsky models. Using our approach, a good approximation to the true observation error covariance can be recovered in cases where the initial estimate of the error covariance is incorrect. Spatial observation error covariances where the length scale of the true covariance changes slowly in time can also be captured. We find that using the estimated correlated observation error in the assimilation improves the analysis.
Resumo:
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods based on the occurrence of specific patterns of nucleotides at coding regions have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically predefined window length required for a local analysis of a DNA region. We introduce a method based on a modified Gabor-wavelet transform (MGWT) for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods by using eukaryote data sets. The results show that MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors but also makes a tool available for detailed exploration of the nucleotide occurrence.
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
A 24-member ensemble of 1-h high-resolution forecasts over the Southern United Kingdom is used to study short-range forecast error statistics. The initial conditions are found from perturbations from an ensemble transform Kalman filter. Forecasts from this system are assumed to lie within the bounds of forecast error of an operational forecast system. Although noisy, this system is capable of producing physically reasonable statistics which are analysed and compared to statistics implied from a variational assimilation system. The variances for temperature errors for instance show structures that reflect convective activity. Some variables, notably potential temperature and specific humidity perturbations, have autocorrelation functions that deviate from 3-D isotropy at the convective-scale (horizontal scales less than 10 km). Other variables, notably the velocity potential for horizontal divergence perturbations, maintain 3-D isotropy at all scales. Geostrophic and hydrostatic balances are studied by examining correlations between terms in the divergence and vertical momentum equations respectively. Both balances are found to decay as the horizontal scale decreases. It is estimated that geostrophic balance becomes less important at scales smaller than 75 km, and hydrostatic balance becomes less important at scales smaller than 35 km, although more work is required to validate these findings. The implications of these results for high-resolution data assimilation are discussed.
Resumo:
Vortex-induced motion (VIM) is a highly nonlinear dynamic phenomenon. Usual spectral analysis methods, using the Fourier transform, rely on the hypotheses of linear and stationary dynamics. A method to treat nonstationary signals that emerge from nonlinear systems is denoted Hilbert-Huang transform (HHT) method. The development of an analysis methodology to study the VIM of a monocolumn production, storage, and offloading system using HHT is presented. The purposes of the present methodology are to improve the statistics analysis of VIM. The results showed to be comparable to results obtained from a traditional analysis (mean of the 10% highest peaks) particularly for the motions in the transverse direction, although the difference between the results from the traditional analysis for the motions in the in-line direction showed a difference of around 25%. The results from the HHT analysis are more reliable than the traditional ones, owing to the larger number of points to calculate the statistics characteristics. These results may be used to design risers and mooring lines, as well as to obtain VIM parameters to calibrate numerical predictions. [DOI: 10.1115/1.4003493]
Resumo:
The Fourier transform-infrared (FT-IR) signature of dry samples of DNA and DNA-polypeptide complexes, as studied by IR microspectroscopy using a diamond attenuated total reflection (ATR) objective, has revealed important discriminatory characteristics relative to the PO2(-) vibrational stretchings. However, DNA IR marks that provide information on the sample's richness in hydrogen bonds have not been resolved in the spectral profiles obtained with this objective. Here we investigated the performance of an all reflecting objective (ARO) for analysis of the FT-IR signal of hydrogen bonds in DNA samples differing in base richness types (salmon testis vs calf thymus). The results obtained using the ARO indicate prominent band peaks at the spectral region representative of the vibration of nitrogenous base hydrogen bonds and of NH and NH2 groups. The band areas at this spectral region differ in agreement with the DNA base richness type when using the ARO. A peak assigned to adenine was more evident in the AT-rich salmon DNA using either the ARO or the ATR objective. It is concluded that, for the discrimination of DNA IR hydrogen bond vibrations associated with varying base type proportions, the use of an ARO is recommended.
Resumo:
Plasma edge turbulence in Tokamak Chauffage Alfven Bresilien (TCABR) [R. M. O. Galvao et al., Plasma Phys. Contr. Fusion 43, 1181 (2001)] is investigated for multifractal properties of the fluctuating floating electrostatic potential measured by Langmuir probes. The multifractality in this signal is characterized by the full multifractal spectra determined by applying the wavelet transform modulus maxima. In this work, the dependence of the multifractal spectrum with the radial position is presented. The multifractality degree inside the plasma increases with the radial position reaching a maximum near the plasma edge and becoming almost constant in the scrape-off layer. Comparisons between these results with those obtained for random test time series with the same Hurst exponents and data length statistically confirm the reported multifractal behavior. Moreover, the persistence of these signals, characterized by their Hurst exponent, present radial profile similar to the deterministic component estimated from analysis based on dynamical recurrences. (C) 2008 American Institute of Physics.
Resumo:
Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
Resumo:
The existence of juxtaposed regions of distinct cultures in spite of the fact that people's beliefs have a tendency to become more similar to each other's as the individuals interact repeatedly is a puzzling phenomenon in the social sciences. Here we study an extreme version of the frequency-dependent bias model of social influence in which an individual adopts the opinion shared by the majority of the members of its extended neighborhood, which includes the individual itself. This is a variant of the majority-vote model in which the individual retains its opinion in case there is a tie among the neighbors' opinions. We assume that the individuals are fixed in the sites of a square lattice of linear size L and that they interact with their nearest neighbors only. Within a mean-field framework, we derive the equations of motion for the density of individuals adopting a particular opinion in the single-site and pair approximations. Although the single-site approximation predicts a single opinion domain that takes over the entire lattice, the pair approximation yields a qualitatively correct picture with the coexistence of different opinion domains and a strong dependence on the initial conditions. Extensive Monte Carlo simulations indicate the existence of a rich distribution of opinion domains or clusters, the number of which grows with L(2) whereas the size of the largest cluster grows with ln L(2). The analysis of the sizes of the opinion domains shows that they obey a power-law distribution for not too large sizes but that they are exponentially distributed in the limit of very large clusters. In addition, similarly to other well-known social influence model-Axelrod's model-we found that these opinion domains are unstable to the effect of a thermal-like noise.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.