69 resultados para methods: data analysis
Resumo:
Dual scaling of a subjects-by-objects table of dominance data (preferences,paired comparisons and successive categories data) has been contrasted with correspondence analysis, as if the two techniques were somehow different. In this note we show that dual scaling of dominance data is equivalent to the correspondence analysis of a table which is doubled with respect to subjects. We also show that the results of both methods can be recovered from a principal components analysis of the undoubled dominance table which is centred with respect to subject means.
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult toachieve because the relative values of the forecast components often fail to behave ina way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It hasbeen shown that cause-specic mortality forecasts are pessimistic when compared withall-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approachof using log mortality rates and forecasts the density of deaths in the life table. Sincethese values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbingstate), they are intrinsically relative rather than absolute values across decrements aswell as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that theunit sum constraint is honoured. The structure of the best-known, single-decrementmortality-rate forecasting model, devised by Lee and Carter (1992), is expressed incompositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortalityby cause of death for Japan
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
The statistical analysis of compositional data should be treated using logratios of parts,which are difficult to use correctly in standard statistical packages. For this reason afreeware package, named CoDaPack was created. This software implements most of thebasic statistical methods suitable for compositional data.In this paper we describe the new version of the package that now is calledCoDaPack3D. It is developed in Visual Basic for applications (associated with Excel©),Visual Basic and Open GL, and it is oriented towards users with a minimum knowledgeof computers with the aim at being simple and easy to use.This new version includes new graphical output in 2D and 3D. These outputs could bezoomed and, in 3D, rotated. Also a customization menu is included and outputs couldbe saved in jpeg format. Also this new version includes an interactive help and alldialog windows have been improved in order to facilitate its use.To use CoDaPack one has to access Excel© and introduce the data in a standardspreadsheet. These should be organized as a matrix where Excel© rows correspond tothe observations and columns to the parts. The user executes macros that returnnumerical or graphical results. There are two kinds of numerical results: new variablesand descriptive statistics, and both appear on the same sheet. Graphical output appearsin independent windows. In the present version there are 8 menus, with a total of 38submenus which, after some dialogue, directly call the corresponding macro. Thedialogues ask the user to input variables and further parameters needed, as well aswhere to put these results. The web site http://ima.udg.es/CoDaPack contains thisfreeware package and only Microsoft Excel© under Microsoft Windows© is required torun the software.Kew words: Compositional data Analysis, Software
Resumo:
Several eco-toxicological studies have shown that insectivorous mammals, due to theirfeeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uenceon essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P,S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greaterwhite-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control(Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can givemisleading results. Therefore, to improve the interpretation of the data obtained, weused statistical techniques for compositional data analysis to define groups of metalsand to evaluate the relationships between them, from an inter-population viewpoint.Hypothesis testing on the adequate balance-coordinates allow us to confirm intuitionbased hypothesis and some previous results. The main statistical goal was to test equalmeans of balance-coordinates for the two defined populations. After checking normality,one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances
Resumo:
Pounamu (NZ jade), or nephrite, is a protected mineral in its natural form following thetransfer of ownership back to Ngai Tahu under the Ngai Tahu (Pounamu Vesting) Act 1997.Any theft of nephrite is prosecutable under the Crimes Act 1961. Scientific evidence isessential in cases where origin is disputed. A robust method for discrimination of thismaterial through the use of elemental analysis and compositional data analysis is required.Initial studies have characterised the variability within a given nephrite source. This hasincluded investigation of both in situ outcrops and alluvial material. Methods for thediscrimination of two geographically close nephrite sources are being developed.Key Words: forensic, jade, nephrite, laser ablation, inductively coupled plasma massspectrometry, multivariate analysis, elemental analysis, compositional data analysis
Resumo:
Modern methods of compositional data analysis are not well known in biomedical research.Moreover, there appear to be few mathematical and statistical researchersworking on compositional biomedical problems. Like the earth and environmental sciences,biomedicine has many problems in which the relevant scienti c information isencoded in the relative abundance of key species or categories. I introduce three problemsin cancer research in which analysis of compositions plays an important role. Theproblems involve 1) the classi cation of serum proteomic pro les for early detection oflung cancer, 2) inference of the relative amounts of di erent tissue types in a diagnostictumor biopsy, and 3) the subcellular localization of the BRCA1 protein, and it'srole in breast cancer patient prognosis. For each of these problems I outline a partialsolution. However, none of these problems is \solved". I attempt to identify areas inwhich additional statistical development is needed with the hope of encouraging morecompositional data analysts to become involved in biomedical research
Resumo:
The application of correspondence analysis to square asymmetrictables is often unsuccessful because of the strong role played by thediagonal entries of the matrix, obscuring the data off the diagonal. A simplemodification of the centering of the matrix, coupled with the correspondingchange in row and column masses and row and column metrics, allows the tableto be decomposed into symmetric and skew--symmetric components, which canthen be analyzed separately. The symmetric and skew--symmetric analyses canbe performed using a simple correspondence analysis program if the data areset up in a special block format.
Resumo:
We consider the joint visualization of two matrices which have common rowsand columns, for example multivariate data observed at two time pointsor split accord-ing to a dichotomous variable. Methods of interest includeprincipal components analysis for interval-scaled data, or correspondenceanalysis for frequency data or ratio-scaled variables on commensuratescales. A simple result in matrix algebra shows that by setting up thematrices in a particular block format, matrix sum and difference componentscan be visualized. The case when we have more than two matrices is alsodiscussed and the methodology is applied to data from the InternationalSocial Survey Program.
Resumo:
The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis, with adjusted principal inertias, as the method of choice for the geometric definition, since it contains simple correspondence analysis as an exact special case, which is not the situation of the standard generalizations. We also clarify the issue of supplementary point representation and the properties of joint correspondence analysis, a method that visualizes all two-way relationships between the variables. The methodology is illustrated using data on attitudes to science from the International Social Survey Program on Environment in 1993.
Resumo:
The case of two transition tables is considered, that is two squareasymmetric matrices of frequencies where the rows and columns of thematrices are the same objects observed at three different timepoints. Different ways of visualizing the tables, either separatelyor jointly, are examined. We generalize an existing idea where asquare matrix is descomposed into symmetric and skew-symmetric partsto two matrices, leading to a decomposition into four components: (1)average symmetric, (2) average skew-symmetric, (3) symmetricdifference from average, and (4) skew-symmetric difference fromaverage. The method is illustrated with an artificial example and anexample using real data from a study of changing values over threegenerations.
Resumo:
We show the equivalence between the use of correspondence analysis (CA)of concadenated tables and the application of a particular version ofconjoint analysis called categorical conjoint measurement (CCM). Theconnection is established using canonical correlation (CC). The second part introduces the interaction e¤ects in all three variants of theanalysis and shows how to pass between the results of each analysis.
Resumo:
A new analytical method was developed to non-destructively determine pH and degree of polymerisation (DP) of cellulose in fibres in 19th 20th century painting canvases, and to identify the fibre type: cotton, linen, hemp, ramie or jute. The method is based on NIR spectroscopy and multivariate data analysis, while for calibration and validation a reference collection of 199 historical canvas samples was used. The reference collection was analysed destructively using microscopy and chemical analytical methods. Partial least squares regression was used to build quantitative methods to determine pH and DP, and linear discriminant analysis was used to determine the fibre type. To interpret the obtained chemical information, an expert assessment panel developed a categorisation system to discriminate between canvases that may not be fit to withstand excessive mechanical stress, e.g. transportation. The limiting DP for this category was found to be 600. With the new method and categorisation system, canvases of 12 Dalí paintings from the Fundació Gala-Salvador Dalí (Figueres, Spain) were non-destructively analysed for pH, DP and fibre type, and their fitness determined, which informs conservation recommendations. The study demonstrates that collection-wide canvas condition surveys can be performed efficiently and non-destructively, which could significantly improve collection management.
Resumo:
Durante los cuatro años de disfrute de la beca (2006 – 2009) se ha consolidado una base de datos de medidas osteológicas del esqueleto apendicular de numerosas especies del O. Carnivora. Concretamente, se han medido 364 individuos de 126 especies. Los ejemplares pertenecían a las colecciones del Phyletisches Museum (Jena, Alemania), el Museum für Naturkunde (Berlín, Alemania), el Museu de Ciències Naturals de la Ciutadella (Barcelona, España), el Múseum National d'Histoire Naturelle (París, Francia), y el Museo Nacional de Ciencias Naturales (Madrid, España). Asimismo, con estos datos se han estado preparando tres artículos sobre la morfología de ciertos elementos del esqueleto apendicular en carnívoros, dos de los cuales se encuentran actualmente en estado de revisión para su publicación científica. Dos de ellos, "Scapula, habitat and locomotion in Carnivora" y "Size and shape in the carnivore scapula", relacionan la morfología escapular con factores como el tamaño del animal, el tipo de locomoción que presenta y el hábitat en el que se encuentra; el primero mediante metodología multivariante (análisis funcional) y el segundo bajo las nuevas técnicas de morfometría geométrica. El tercer artículo, "Scaling and mechanics in the carnivore calcaneus: A comparison of natural and artificial selection", evalúa el efecto de diferentes tipos de selección, natural frente a artificial, sobre la morfología del calcáneo y su influencia en la biomecánica de este hueso. Finalmente, también se ha desarrollado un estudio experimental sobre la búsqueda de estabilidad durante la locomoción arbórea, cuyos resultados han dado lugar al artículo "The search for stability on narrow supports: An experimental study in cats and dogs", que también se halla bajo revisión actualmente.
Resumo:
In this paper we look at how a web-based social software can be used to make qualitative data analysis of online peer-to-peer learning experiences. Specifically, we propose to use Cohere, a web-based social sense-making tool, to observe, track, annotate and visualize discussion group activities in online courses. We define a specific methodology for data observation and structuring, and present results of the analysis of peer interactions conducted in discussion forum in a real case study of a P2PU course. Finally we discuss how network visualization and analysis can be used to gather a better understanding of the peer-to-peer learning experience. To do so, we provide preliminary insights on the social, dialogical and conceptual connections that have been generated within one online discussion group.