67 resultados para Classification, Markov chain Monte Carlo, k-nearest neighbours
Resumo:
The Hardy-Weinberg law, formulated about 100 years ago, states that under certainassumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur inthe proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p.There are many statistical tests being used to check whether empirical marker data obeys theHardy-Weinberg principle. Among these are the classical xi-square test (with or withoutcontinuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combinationwith Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE)are numerical in nature, requiring the computation of a test statistic and a p-value.There is however, ample space for the use of graphics in HWE tests, in particular for the ternaryplot. Nowadays, many genetical studies are using genetical markers known as SingleNucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the countsone typically computes genotype frequencies and allele frequencies. These frequencies satisfythe unit-sum constraint, and their analysis therefore falls within the realm of compositional dataanalysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotypefrequencies can be adequately represented in a ternary plot. Compositions that are in exactHWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected ina statistical test are typically close" to the parabola, whereas compositions that differsignificantly from HWE are far". By rewriting the statistics used to test for HWE in terms ofheterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted inthe ternary plot. This way, compositions can be tested for HWE purely on the basis of theirposition in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphicalrepresentations where large numbers of SNPs can be tested for HWE in a single graph. Severalexamples of graphical tests for HWE (implemented in R software), will be shown, using SNPdata from different human populations
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
The R-package compositionsis a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a regular sub-composition (where all parts are actually observed and the datum behaves typically)and a problematic subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros
Resumo:
A statistical method for classification of sags their origin downstream or upstream from the recording point is proposed in this work. The goal is to obtain a statistical model using the sag waveforms useful to characterise one type of sags and to discriminate them from the other type. This model is built on the basis of multi-way principal component analysis an later used to project the available registers in a new space with lower dimension. Thus, a case base of diagnosed sags is built in the projection space. Finally classification is done by comparing new sags against the existing in the case base. Similarity is defined in the projection space using a combination of distances to recover the nearest neighbours to the new sag. Finally the method assigns the origin of the new sag according to the origin of their neighbours
Resumo:
In this paper we propose a Pyramidal Classification Algorithm,which together with an appropriate aggregation index producesan indexed pseudo-hierarchy (in the strict sense) withoutinversions nor crossings. The computer implementation of thealgorithm makes it possible to carry out some simulation testsby Monte Carlo methods in order to study the efficiency andsensitivity of the pyramidal methods of the Maximum, Minimumand UPGMA. The results shown in this paper may help to choosebetween the three classification methods proposed, in order toobtain the classification that best fits the original structureof the population, provided we have an a priori informationconcerning this structure.
Resumo:
The magnetic structure of the edge-sharing cuprate compound Li2CuO2 has been investigated with highly correlated ab initio electronic structure calculations. The first- and second-neighbor in-chain magnetic interactions are calculated to be 142 and -22 K, respectively. The ratio between the two parameters is smaller than suggested previously in the literature. The interchain interactions are antiferromagnetic in nature and of the order of a few K only. Monte Carlo simulations using the ab initio parameters to define the spin model Hamiltonian result in a Nel temperature in good agreement with experiment. Spin population analysis situates the magnetic moment on the copper and oxygen ions between the completely localized picture derived from experiment and the more delocalized picture based on local-density calculations.
Resumo:
The magnetic structure of the edge-sharing cuprate compound Li2CuO2 has been investigated with highly correlated ab initio electronic structure calculations. The first- and second-neighbor in-chain magnetic interactions are calculated to be 142 and -22 K, respectively. The ratio between the two parameters is smaller than suggested previously in the literature. The interchain interactions are antiferromagnetic in nature and of the order of a few K only. Monte Carlo simulations using the ab initio parameters to define the spin model Hamiltonian result in a Nel temperature in good agreement with experiment. Spin population analysis situates the magnetic moment on the copper and oxygen ions between the completely localized picture derived from experiment and the more delocalized picture based on local-density calculations.
Resumo:
This paper presents a novel image classification scheme for benthic coral reef images that can be applied to both single image and composite mosaic datasets. The proposed method can be configured to the characteristics (e.g., the size of the dataset, number of classes, resolution of the samples, color information availability, class types, etc.) of individual datasets. The proposed method uses completed local binary pattern (CLBP), grey level co-occurrence matrix (GLCM), Gabor filter response, and opponent angle and hue channel color histograms as feature descriptors. For classification, either k-nearest neighbor (KNN), neural network (NN), support vector machine (SVM) or probability density weighted mean distance (PDWMD) is used. The combination of features and classifiers that attains the best results is presented together with the guidelines for selection. The accuracy and efficiency of our proposed method are compared with other state-of-the-art techniques using three benthic and three texture datasets. The proposed method achieves the highest overall classification accuracy of any of the tested methods and has moderate execution time. Finally, the proposed classification scheme is applied to a large-scale image mosaic of the Red Sea to create a completely classified thematic map of the reef benthos
Resumo:
A problem in the archaeometric classification of Catalan Renaissance pottery is the fact, thatthe clay supply of the pottery workshops was centrally organized by guilds, and thereforeusually all potters of a single production centre produced chemically similar ceramics.However, analysing the glazes of the ware usually a large number of inclusions in the glaze isfound, which reveal technological differences between single workshops. These inclusionshave been used by the potters in order to opacify the transparent glaze and to achieve a whitebackground for further decoration.In order to distinguish different technological preparation procedures of the single workshops,at a Scanning Electron Microscope the chemical composition of those inclusions as well astheir size in the two-dimensional cut is recorded. Based on the latter, a frequency distributionof the apparent diameters is estimated for each sample and type of inclusion.Following an approach by S.D. Wicksell (1925), it is principally possible to transform thedistributions of the apparent 2D-diameters back to those of the true three-dimensional bodies.The applicability of this approach and its practical problems are examined using differentways of kernel density estimation and Monte-Carlo tests of the methodology. Finally, it istested in how far the obtained frequency distributions can be used to classify the pottery
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, andmargin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
Domain growth in a system with nonconserved order parameter is studied. We simulate the usual Ising model for binary alloys with concentration 0.5 on a two-dimensional square lattice by Monte Carlo techniques. Measurements of the energy, jump-acceptance ratio, and order parameters are performed. Dynamics based on the diffusion of a single vacancy in the system gives a growth law faster than the usual Allen-Cahn law. Allowing vacancy jumps to next-nearest-neighbor sites is essential to prevent vacancy trapping in the ordered regions. By measuring local order parameters we show that the vacancy prefers to be in the disordered regions (domain boundaries). This naturally concentrates the atomic jumps in the domain boundaries, accelerating the growth compared with the usual exchange mechanism that causes jumps to be homogeneously distributed on the lattice.
Resumo:
The binding energies of two-dimensional clusters (puddles) of4He are calculated in the framework of the diffusion Monte Carlo method. The results are well fitted by a mass formula in powers of x=N-1/2, where N is the number of particles. The analysis of the mass formula allows for the extraction of the line tension, which turns out to be 0.121 K/. Sizes and density profiles of the puddles are also reported.
Resumo:
The energy and structure of dilute hard- and soft-sphere Bose gases are systematically studied in the framework of several many-body approaches, such as the variational correlated theory, the Bogoliubov model, and the uniform limit approximation, valid in the weak-interaction regime. When possible, the results are compared with the exact diffusion Monte Carlo ones. Jastrow-type correlation provides a good description of the systems, both hard- and soft-spheres, if the hypernetted chain energy functional is freely minimized and the resulting Euler equation is solved. The study of the soft-sphere potentials confirms the appearance of a dependence of the energy on the shape of the potential at gas paremeter values of x~0.001. For quantities other than the energy, such as the radial distribution functions and the momentum distributions, the dependence appears at any value of x. The occurrence of a maximum in the radial distribution function, in the momentum distribution, and in the excitation spectrum is a natural effect of the correlations when x increases. The asymptotic behaviors of the functions characterizing the structure of the systems are also investigated. The uniform limit approach is very easy to implement and provides a good description of the soft-sphere gas. Its reliability improves when the interaction weakens.
Resumo:
We study the (K-, p) reaction on nuclei with a 1 GeV/c momentum kaon beam, paying special attention to the region of emitted protons having kinetic energy above 600 MeV, which was used to claim a deeply attractive kaon nucleus optical potential. Our model describes the nuclear reaction in the framework of a local density approach and the calculations are performed following two different procedures: one is based on a many-body method using the Lindhard function and the other is based on a Monte Carlo simulation. The simulation method offers flexibility to account for processes other than kaon quasielastic scattering, such as K- absorption by one and two nucleons, producing hyperons, and allows consideration of final-state interactions of the K-, the p, and all other primary and secondary particles on their way out of the nucleus, as well as the weak decay of the produced hyperons into pi N. We find a limited sensitivity of the cross section to the strength of the kaon optical potential. We also show a serious drawback in the experimental setup-the requirement for having, together with the energetic proton, at least one charged particle detected in the decay counter surrounding the target-as we find that the shape of the original cross section is appreciably distorted, to the point of invalidating the claims made in the experimental paper on the strength of the kaon nucleus optical.
Resumo:
In this work we compare the results of the Gross-Pitaevskii and modified Gross-Pitaevskii equations with ab initio variational Monte Carlo calculations for Bose-Einstein condensates of atoms in axially symmetric traps. We examine both the ground state and excited states having a vortex line along the z axis at high values of the gas parameter and demonstrate an excellent agreement between the modified Gross-Pitaevskii and ab initio Monte Carlo methods, both for the ground and vortex states.