62 resultados para Exponential Sum
Resumo:
The Hardy-Weinberg law, formulated about 100 years ago, states that under certainassumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur inthe proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p.There are many statistical tests being used to check whether empirical marker data obeys theHardy-Weinberg principle. Among these are the classical xi-square test (with or withoutcontinuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combinationwith Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE)are numerical in nature, requiring the computation of a test statistic and a p-value.There is however, ample space for the use of graphics in HWE tests, in particular for the ternaryplot. Nowadays, many genetical studies are using genetical markers known as SingleNucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the countsone typically computes genotype frequencies and allele frequencies. These frequencies satisfythe unit-sum constraint, and their analysis therefore falls within the realm of compositional dataanalysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotypefrequencies can be adequately represented in a ternary plot. Compositions that are in exactHWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected ina statistical test are typically “close" to the parabola, whereas compositions that differsignificantly from HWE are “far". By rewriting the statistics used to test for HWE in terms ofheterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted inthe ternary plot. This way, compositions can be tested for HWE purely on the basis of theirposition in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphicalrepresentations where large numbers of SNPs can be tested for HWE in a single graph. Severalexamples of graphical tests for HWE (implemented in R software), will be shown, using SNPdata from different human populations
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult toachieve because the relative values of the forecast components often fail to behave ina way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It hasbeen shown that cause-specic mortality forecasts are pessimistic when compared withall-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approachof using log mortality rates and forecasts the density of deaths in the life table. Sincethese values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbingstate), they are intrinsically relative rather than absolute values across decrements aswell as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that theunit sum constraint is honoured. The structure of the best-known, single-decrementmortality-rate forecasting model, devised by Lee and Carter (1992), is expressed incompositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortalityby cause of death for Japan
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
In most psychological tests and questionnaires, a test score is obtained bytaking the sum of the item scores. In virtually all cases where the test orquestionnaire contains multidimensional forced-choice items, this traditionalscoring method is also applied. We argue that the summation of scores obtained with multidimensional forced-choice items produces uninterpretabletest scores. Therefore, we propose three alternative scoring methods: a weakand a strict rank preserving scoring method, which both allow an ordinalinterpretation of test scores; and a ratio preserving scoring method, whichallows a proportional interpretation of test scores. Each proposed scoringmethod yields an index for each respondent indicating the degree to whichthe response pattern is inconsistent. Analysis of real data showed that withrespect to rank preservation, the weak and strict rank preserving methodresulted in lower inconsistency indices than the traditional scoring method;with respect to ratio preservation, the ratio preserving scoring method resulted in lower inconsistency indices than the traditional scoring method
Resumo:
The preceding two editions of CoDaWork included talks on the possible considerationof densities as infinite compositions: Egozcue and D´ıaz-Barrero (2003) extended theEuclidean structure of the simplex to a Hilbert space structure of the set of densitieswithin a bounded interval, and van den Boogaart (2005) generalized this to the setof densities bounded by an arbitrary reference density. From the many variations ofthe Hilbert structures available, we work with three cases. For bounded variables, abasis derived from Legendre polynomials is used. For variables with a lower bound, westandardize them with respect to an exponential distribution and express their densitiesas coordinates in a basis derived from Laguerre polynomials. Finally, for unboundedvariables, a normal distribution is used as reference, and coordinates are obtained withrespect to a Hermite-polynomials-based basis.To get the coordinates, several approaches can be considered. A numerical accuracyproblem occurs if one estimates the coordinates directly by using discretized scalarproducts. Thus we propose to use a weighted linear regression approach, where all k-order polynomials are used as predictand variables and weights are proportional to thereference density. Finally, for the case of 2-order Hermite polinomials (normal reference)and 1-order Laguerre polinomials (exponential), one can also derive the coordinatesfrom their relationships to the classical mean and variance.Apart of these theoretical issues, this contribution focuses on the application of thistheory to two main problems in sedimentary geology: the comparison of several grainsize distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock orsediment, like their composition
Resumo:
We shall call an n × p data matrix fully-compositional if the rows sum to a constant, and sub-compositional if the variables are a subset of a fully-compositional data set1. Such data occur widely in archaeometry, where it is common to determine the chemical composition of ceramic, glass, metal or other artefacts using techniques such as neutron activation analysis (NAA), inductively coupled plasma spectroscopy (ICPS), X-ray fluorescence analysis (XRF) etc. Interest often centres on whether there are distinct chemical groups within the data and whether, for example, these can be associated with different origins or manufacturing technologies
Resumo:
It is well known that image processing requires a huge amount of computation, mainly at low level processing where the algorithms are dealing with a great number of data-pixel. One of the solutions to estimate motions involves detection of the correspondences between two images. For normalised correlation criteria, previous experiments shown that the result is not altered in presence of nonuniform illumination. Usually, hardware for motion estimation has been limited to simple correlation criteria. The main goal of this paper is to propose a VLSI architecture for motion estimation using a matching criteria more complex than Sum of Absolute Differences (SAD) criteria. Today hardware devices provide many facilities for the integration of more and more complex designs as well as the possibility to easily communicate with general purpose processors
Resumo:
A compositional time series is obtained when a compositional data vector is observed atdifferent points in time. Inherently, then, a compositional time series is a multivariatetime series with important constraints on the variables observed at any instance in time.Although this type of data frequently occurs in situations of real practical interest, atrawl through the statistical literature reveals that research in the field is very much in itsinfancy and that many theoretical and empirical issues still remain to be addressed. Anyappropriate statistical methodology for the analysis of compositional time series musttake into account the constraints which are not allowed for by the usual statisticaltechniques available for analysing multivariate time series. One general approach toanalyzing compositional time series consists in the application of an initial transform tobreak the positive and unit sum constraints, followed by the analysis of the transformedtime series using multivariate ARIMA models. In this paper we discuss the use of theadditive log-ratio, centred log-ratio and isometric log-ratio transforms. We also presentresults from an empirical study designed to explore how the selection of the initialtransform affects subsequent multivariate ARIMA modelling as well as the quality ofthe forecasts
Resumo:
Automotive painting cabins are cleaned with several solvents, being great part of them mixtures of volatile organic compounds (VOCs), where the three xylene isomers are the most important constituents. To evaluate the work-related exposition of the cleaners that use these mixtures of solvents, xylenes have been determined in the working ambient air as well as its metabolite, o-m-p-methyl hippuric acid, has been analysed in urine to establish the dermal and respiratory exposition. This evaluation has been done in order to assess the occupational exposure to VOCs and to know the working conditions of the cleaners, but also to evaluate the effectiveness of personal protective equipment (PPE), the engineering control and the work practices.The xylenes have been chosen as indicators of exposition because they are the main components in the cleaning solvents used, with a level of concentration between 50% and 85%.The Xylenes have an occupational exposure limit (8 h TWA) of 50 ppm (221 mg/m3) and a short-term exposure limit (STEL) of 100 ppm (442 mg/m3). On the other hand, the biological exposure index (BEI) for xylenes is the sum of the total methyl hippuric acids in urine at the end of the work-shift, being the value 1500 mg/g creatinine.
Resumo:
La recent revolució en les tècniques de generació de dades genòmiques ha portat a una situació de creixement exponencial de la quantitat de dades generades i fa més necessari que mai el treball en la optimització de la gestió i maneig d'aquesta informació. En aquest treball s'han atacat tres vessants del problema: la disseminació de la informació, la integració de dades de diverses fonts i finalment la seva visualització. Basant-nos en el Sistema d'Anotacions Distribuides, DAS, hem creat un aplicatiu per a la creació automatitzada de noves fonts de dades en format estandaritzat i accessible programàticament a partir de fitxers de dades simples. Aquest progrtamari, easyDAS, està en funcionament a l'Institut Europeu de Bioinformàtica. Aquest sistema facilita i encoratja la compartició i disseminació de dades genòmiques en formats usables. jsDAS és una llibreria client de DAS que permet incorporar dades DAS en qualsevol aplicatiu web de manera senzilla i ràpida. Aprofitant els avantatges que ofereix DAS és capaç d'integrar dades de múltiples fonts de manera coherent i robusta. GenExp és el prototip de navegador genòmic basat en web altament interactiu i que facilita l'exploració dels genomes en temps real. És capaç d'integrar dades de quansevol font DAS i crear-ne una representació en client usant els últims avenços en tecnologies web.
Resumo:
We study the zero set of random analytic functions generated by a sum of the cardinal sine functions which form an orthogonal basis for the Paley-Wiener space. As a model case, we consider real-valued Gaussian coefficients. It is shown that the asymptotic probability that there is no zero in a bounded interval decays exponentially as a function of the length.
Resumo:
In 2000 the European Statistical Office published the guidelines for developing theHarmonized European Time Use Surveys system. Under such a unified framework,the first Time Use Survey of national scope was conducted in Spain during 2002–03. The aim of these surveys is to understand human behavior and the lifestyle ofpeople. Time allocation data are of compositional nature in origin, that is, they aresubject to non-negativity and constant-sum constraints. Thus, standard multivariatetechniques cannot be directly applied to analyze them. The goal of this work is toidentify homogeneous Spanish Autonomous Communities with regard to the typicalactivity pattern of their respective populations. To this end, fuzzy clustering approachis followed. Rather than the hard partitioning of classical clustering, where objects areallocated to only a single group, fuzzy method identify overlapping groups of objectsby allowing them to belong to more than one group. Concretely, the probabilistic fuzzyc-means algorithm is conveniently adapted to deal with the Spanish Time Use Surveymicrodata. As a result, a map distinguishing Autonomous Communities with similaractivity pattern is drawn.Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is afundamental issue in palaeoclimatic and paleooceanographic investigations. TheModern Analogue Technique, a widely adopted method based on direct comparison offossil assemblages with modern coretop samples, was revised with the aim ofconforming it to compositional data analysis. The new CODAMAT method wasdeveloped by adopting the Aitchison metric as distance measure. Modern coretopdatasets are characterised by a large amount of zeros. The zero replacement was carriedout by adopting a Bayesian approach to the zero replacement, based on a posteriorestimation of the parameter of the multinomial distribution. The number of modernanalogues from which reconstructing the SST was determined by means of a multipleapproach by considering the Proxies correlation matrix, Standardized Residual Sum ofSquares and Mean Squared Distance. This new CODAMAT method was applied to theplanktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea.Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix,Standardized Residual Sum of Squares
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices
Resumo:
A novel test of spatial independence of the distribution of crystals or phases in rocksbased on compositional statistics is introduced. It improves and generalizes the commonjoins-count statistics known from map analysis in geographic information systems.Assigning phases independently to objects in RD is modelled by a single-trial multinomialrandom function Z(x), where the probabilities of phases add to one and areexplicitly modelled as compositions in the K-part simplex SK. Thus, apparent inconsistenciesof the tests based on the conventional joins{count statistics and their possiblycontradictory interpretations are avoided. In practical applications we assume that theprobabilities of phases do not depend on the location but are identical everywhere inthe domain of de nition. Thus, the model involves the sum of r independent identicalmultinomial distributed 1-trial random variables which is an r-trial multinomialdistributed random variable. The probabilities of the distribution of the r counts canbe considered as a composition in the Q-part simplex SQ. They span the so calledHardy-Weinberg manifold H that is proved to be a K-1-affine subspace of SQ. This isa generalisation of the well-known Hardy-Weinberg law of genetics. If the assignmentof phases accounts for some kind of spatial dependence, then the r-trial probabilitiesdo not remain on H. This suggests the use of the Aitchison distance between observedprobabilities to H to test dependence. Moreover, when there is a spatial uctuation ofthe multinomial probabilities, the observed r-trial probabilities move on H. This shiftcan be used as to check for these uctuations. A practical procedure and an algorithmto perform the test have been developed. Some cases applied to simulated and realdata are presented.Key words: Spatial distribution of crystals in rocks, spatial distribution of phases,joins-count statistics, multinomial distribution, Hardy-Weinberg law, Hardy-Weinbergmanifold, Aitchison geometry