926 resultados para compositional geometry
Resumo:
Continuity of set-valued maps is hereby revisited: after recalling some basic concepts of variational analysis and a short description of the State-of-the-Art, we obtain as by-product two Sard type results concerning local minima of scalar and vector valued functions. Our main result though, is inscribed in the framework of tame geometry, stating that a closed-valued semialgebraic set-valued map is almost everywhere continuous (in both topological and measure-theoretic sense). The result –depending on stratification techniques– holds true in a more general setting of o-minimal (or tame) set-valued maps. Some applications are briefly discussed at the end.
Resumo:
KNOTS are usually categorized in terms of topological properties that are invariant under changes in a knot's spatial configuration(1-4). Here we approach knot identification from a different angle, by considering the properties of particular geometrical forms which we define as 'ideal'. For a knot with a given topology and assembled from a tube of uniform diameter, the ideal form is the geometrical configuration having the highest ratio of volume to surface area. Practically, this is equivalent to determining the shortest piece of tube that can be closed to form the knot. Because the notion of an ideal form is independent of absolute spatial scale, the length-to-diameter ratio of a tube providing an ideal representation is constant, irrespective of the tube's actual dimensions. We report the results of computer simulations which show that these ideal representations of knots have surprisingly simple geometrical properties. In particular, there is a simple linear relationship between the length-to-diameter ratio and the crossing number-the number of intersections in a two-dimensional projection of the knot averaged over all directions. We have also found that the average shape of knotted polymeric chains in thermal equilibrium is closely related to the ideal representation of the corresponding knot type. Our observations provide a link between ideal geometrical objects and the behaviour of seemingly disordered systems, and allow the prediction of properties of knotted polymers such as their electrophoretic mobility(5).
Resumo:
We have analyzed the compositional properties of coding (protein encoding) and non-coding sequences of Plasmodium falciparum, a unicellular parasite characterized by an extremely AT-rich genome. GC% levels, base and dinucleotide frequencies were studied. We found that among the various factors that contribute to the properties of the sequences analyzed, the most relevant are the compositional constraints which operate on the whole genome
Resumo:
Toro Toro (T) and Yungas (Y) have been described as genetically well differentiated populations of the Lutzomyia longipalpis (Lutz & Neiva, 1912) complex in Bolivia. Here we use geometric morphometrics to compare samples from these populations and new populations (Bolivia and Nicaragua), representing distant geographical origins, qualitative morphological variation ("one-spot" or "two-spots" phenotypes), ecologically distinct traits (peridomestic and silvatic populations), and possibly different epidemiological roles (transmitting or nor transmitting Leishmania chagasi). The Nicaragua (N) (Somotillo) sample was "one-spot" phenotype and a possible peridomestic vector. The Bolivian sample of the Y was also "one-spot" phenotype and a demonstrated peridomestic vector of visceral leishmaniasis (VL). The three remaining samples were silvatic, "two-spots" phenotypes. Two of them (Uyuni and T) were collected in the highlands of Bolivian where VL never has been reported. The last one (Robore, R) came from the lowlands of Bolivia, where human cases of VL are sporadically reported. The decomposition of metric variation into size and shape by geometric morphometric techniques suggests the existence of two groups (N/Y/R, and U/T). Several arguments indicate that such subdivision of Lu. longipalpis could correspond to different evolutionary units.
Resumo:
The concept of ideal geometric configurations was recently applied to the classification and characterization of various knots. Different knots in their ideal form (i.e., the one requiring the shortest length of a constant-diameter tube to form a given knot) were shown to have an overall compactness proportional to the time-averaged compactness of thermally agitated knotted polymers forming corresponding knots. This was useful for predicting the relative speed of electrophoretic migration of different DNA knots. Here we characterize the ideal geometric configurations of catenanes (called links by mathematicians), i.e., closed curves in space that are topologically linked to each other. We demonstrate that the ideal configurations of different catenanes show interrelations very similar to those observed in the ideal configurations of knots. By analyzing literature data on electrophoretic separations of the torus-type of DNA catenanes with increasing complexity, we observed that their electrophoretic migration is roughly proportional to the overall compactness of ideal representations of the corresponding catenanes. This correlation does not apply, however, to electrophoretic migration of certain replication intermediates, believed up to now to represent the simplest torus-type catenanes. We propose, therefore, that freshly replicated circular DNA molecules, in addition to forming regular catenanes, may also form hemicatenanes.
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
We take stock of the present position of compositional data analysis, of what has beenachieved in the last 20 years, and then make suggestions as to what may be sensibleavenues of future research. We take an uncompromisingly applied mathematical view,that the challenge of solving practical problems should motivate our theoreticalresearch; and that any new theory should be thoroughly investigated to see if it mayprovide answers to previously abandoned practical considerations. Indeed a main themeof this lecture will be to demonstrate this applied mathematical approach by a number ofchallenging examples
Resumo:
Traditionally, compositional data has been identified with closed data, and the simplex has been considered as the natural sample space of this kind of data. In our opinion, the emphasis on the constrained nature ofcompositional data has contributed to mask its real nature. More crucial than the constraining property of compositional data is the scale-invariant property of this kind of data. Indeed, when we are considering only few parts of a full composition we are not working with constrained data but our data are still compositional. We believe that it is necessary to give a more precisedefinition of composition. This is the aim of this oral contribution
Resumo:
The biplot has proved to be a powerful descriptive and analytical tool in many areasof applications of statistics. For compositional data the necessary theoreticaladaptation has been provided, with illustrative applications, by Aitchison (1990) andAitchison and Greenacre (2002). These papers were restricted to the interpretation ofsimple compositional data sets. In many situations the problem has to be described insome form of conditional modelling. For example, in a clinical trial where interest isin how patients’ steroid metabolite compositions may change as a result of differenttreatment regimes, interest is in relating the compositions after treatment to thecompositions before treatment and the nature of the treatments applied. To study thisthrough a biplot technique requires the development of some form of conditionalcompositional biplot. This is the purpose of this paper. We choose as a motivatingapplication an analysis of the 1992 US President ial Election, where interest may be inhow the three-part composition, the percentage division among the three candidates -Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethniccomposition and on the urban-rural composition of the state. The methodology ofconditional compositional biplots is first developed and a detailed interpretation of the1992 US Presidential Election provided. We use a second application involving theconditional variability of tektite mineral compositions with respect to major oxidecompositions to demonstrate some hazards of simplistic interpretation of biplots.Finally we conjecture on further possible applications of conditional compositionalbiplots
Resumo:
We propose to analyze shapes as “compositions” of distances in Aitchison geometry asan alternate and complementary tool to classical shape analysis, especially when sizeis non-informative.Shapes are typically described by the location of user-chosen landmarks. Howeverthe shape – considered as invariant under scaling, translation, mirroring and rotation– does not uniquely define the location of landmarks. A simple approach is to usedistances of landmarks instead of the locations of landmarks them self. Distances arepositive numbers defined up to joint scaling, a mathematical structure quite similar tocompositions. The shape fixes only ratios of distances. Perturbations correspond torelative changes of the size of subshapes and of aspect ratios. The power transformincreases the expression of the shape by increasing distance ratios. In analogy to thesubcompositional consistency, results should not depend too much on the choice ofdistances, because different subsets of the pairwise distances of landmarks uniquelydefine the shape.Various compositional analysis tools can be applied to sets of distances directly or afterminor modifications concerning the singularity of the covariance matrix and yield resultswith direct interpretations in terms of shape changes. The remaining problem isthat not all sets of distances correspond to a valid shape. Nevertheless interpolated orpredicted shapes can be backtransformated by multidimensional scaling (when all pairwisedistances are used) or free geodetic adjustment (when sufficiently many distancesare used)
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)