914 resultados para statistical data analysis


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes. Against this background the relevance of perturbations and subcompositions can be clearly seen. Moreover we can identify a number of hypotheses of interest involving the specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel density estimation techniques in the context of compositional data analysis. Indeed, they gave two options for the choice of the kernel to be used in the kernel estimator. One of these kernels is based on the use the alr transformation on the simplex SD jointly with the normal distribution on RD-1. However, these authors themselves recognized that this method has some deficiencies. A method for overcoming these dificulties based on recent developments for compositional data analysis and multivariate kernel estimation theory, combining the ilr transformation with the use of the normal density with a full bandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu- Figueras (2006). Here we present an extensive simulation study that compares both methods in practice, thus exploring the finite-sample behaviour of both estimators

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Pounamu (NZ jade), or nephrite, is a protected mineral in its natural form following the transfer of ownership back to Ngai Tahu under the Ngai Tahu (Pounamu Vesting) Act 1997. Any theft of nephrite is prosecutable under the Crimes Act 1961. Scientific evidence is essential in cases where origin is disputed. A robust method for discrimination of this material through the use of elemental analysis and compositional data analysis is required. Initial studies have characterised the variability within a given nephrite source. This has included investigation of both in situ outcrops and alluvial material. Methods for the discrimination of two geographically close nephrite sources are being developed. Key Words: forensic, jade, nephrite, laser ablation, inductively coupled plasma mass spectrometry, multivariate analysis, elemental analysis, compositional data analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Hardy-Weinberg law, formulated about 100 years ago, states that under certain assumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur in the proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p. There are many statistical tests being used to check whether empirical marker data obeys the Hardy-Weinberg principle. Among these are the classical xi-square test (with or without continuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combination with Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE) are numerical in nature, requiring the computation of a test statistic and a p-value. There is however, ample space for the use of graphics in HWE tests, in particular for the ternary plot. Nowadays, many genetical studies are using genetical markers known as Single Nucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the counts one typically computes genotype frequencies and allele frequencies. These frequencies satisfy the unit-sum constraint, and their analysis therefore falls within the realm of compositional data analysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotype frequencies can be adequately represented in a ternary plot. Compositions that are in exact HWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected in a statistical test are typically “close" to the parabola, whereas compositions that differ significantly from HWE are “far". By rewriting the statistics used to test for HWE in terms of heterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted in the ternary plot. This way, compositions can be tested for HWE purely on the basis of their position in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphical representations where large numbers of SNPs can be tested for HWE in a single graph. Several examples of graphical tests for HWE (implemented in R software), will be shown, using SNP data from different human populations

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult to achieve because the relative values of the forecast components often fail to behave in a way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It has been shown that cause-specic mortality forecasts are pessimistic when compared with all-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approach of using log mortality rates and forecasts the density of deaths in the life table. Since these values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbing state), they are intrinsically relative rather than absolute values across decrements as well as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison (1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that the unit sum constraint is honoured. The structure of the best-known, single-decrement mortality-rate forecasting model, devised by Lee and Carter (1992), is expressed in compositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortality by cause of death for Japan

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we examine the problem of compositional data from a different starting point. Chemical compositional data, as used in provenance studies on archaeological materials, will be approached from the measurement theory. The results will show, in a very intuitive way that chemical data can only be treated by using the approach developed for compositional data. It will be shown that compositional data analysis is a particular case in projective geometry, when the projective coordinates are in the positive orthant, and they have the properties of logarithmic interval metrics. Moreover, it will be shown that this approach can be extended to a very large number of applications, including shape analysis. This will be exemplified with a case study in architecture of Early Christian churches dated back to the 5th-7th centuries AD

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El reciclaje se ha caracterizado por ser un tema importante en la última década, por el desarrollo económico, social y tecnológico que tiene consigo. Evidentemente, el sector de reciclaje se ha convertido en un sector con visión para poder conseguir un nuevo sector. Por este motivo lo que se ha querido con este trabajo de investigación es buscar nuevas manera de ver los recursos que se encuentran en cualquier lugar de las ciudades. La tesis cuenta con un sin número de argumentos que ayudarán a que las personas que lean el documento se interesen cada vez más en reutilizar los materiales que día a día encuentran. De esta manera, se puede ver una cadena de suministro, que llevará a que la materia prima que se haya mejorado sirva para realizar otro tipo de productos y genere un sostenimiento significativo a miles de personas que pueden sacarle provecho a estos materiales. Nuestros hábitos más cotidianos tienen mucho que ver con la degradación global del planeta. Actos tan rutinarios como tirar la basura sin separarla, comprar utensilios de usar y tirar o adquirir los alimentos envasados en materiales anti ecológicos o no reciclables contribuyen en gran medida a la contaminación medioambiental (Inzillo, 2000).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

notes for class discussion and exercise

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Slides and Handouts for class introducing some of the concepts associated with the analysis of qualitative data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El sector de comida rápida es un sector que está presentando un crecimiento acelerado a nivel mundial debido a los cambios en el día a día de los consumidores, en donde la prioridad es el tiempo y este genera unos acelerados ritmos de vida (Sirgado & Lamas, 2011). Este crecimiento, es un elemento que se quiere mostrar a lo largo de esta investigación, a través de cifras que corroboran la expansión del sector de comida rápida a nivel mundial, latinoamericano y a nivel nacional; información que será útil para que empresarios puedan extraer un análisis del escenario en el que se encuentran las marcas pertenecientes al sector. La expansión del sector ha generado un incremento en los puntos de venta de comida rápida, lo cual se ha facilitado con la implementación del modelo de franquicia como una estrategia de crecimiento en las compañías debido a las ventajas que ofrece y a la posibilidad de diversificar la oferta e invertir en nuevos mercados (Portafolio, 2006). De este modo, se presentan los conceptos fundamentales a entender de este tipo de negocio, así como las ventajas que posee y las clases de franquicia que se pueden encontrar; para quienes desean implementar el modelo en su compañía o para quienes piensan en adquirir una. Para ejemplificar lo mencionado anteriormente, se toma una organización nacional, la cual ha introducido el modelo de franquicia y es perteneciente al sector de comida rápida, caso que puede servir como referencia para otras marcas del sector.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El propósito de este estudio es evaluar la sensibilidad, especificidad y valores predictivos del Cuestionario Anamnésico de Síntomas de Miembro Superior y Columna (CASMSC) desarrollado por la Unidad de Investigación de Ergonomía de Postura y Movimiento (EPM). Se realizó un estudio descriptivo de tipo correlacional, mediante el análisis de datos secundarios de una base de datos con registros de trabajadores de la industria de alimentos (n=401) en el año 2013, a quienes se les había aplicado el CASMSC, así como una evaluación clínica fisioterapéutica enfocada en los mismos segmentos corporales; esta última utilizada como prueba de oro. Para analizar si existían diferencias estadísticas por edad, antigüedad y género se aplicó el análisis de varianza de una vía. La sensibilidad, especificidad y valores predictivos del CASMSC se informan con sus respectivos intervalos de confianza (95%). La prevalencia de umbral positivo para sospecha de Desorden Músculo Esquelético (DME) tanto de miembro superior como de columna se encontró muy por encima de la media nacional para el sector. La sensibilidad del CASMSC para miembro superior estuvo en el rango de un 80% a 94,57% y para columna cervical y lumbar fue de 36,4% y 43,4%, respectivamente. Para la región dorsal fue casi del doble de las otras dos regiones (85,7%). El CASMSC es recomendable en su apartado para miembro superior dado a su alto nivel de sensibilidad.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Resumen: Este trabajo estudia los resultados en matemáticas y lenguaje de 32000 estudiantes en la prueba saber 11 del 2008, de la ciudad de Bogotá. Este análisis reconoce que los individuos se encuentran contenidos en barrios y colegios, pero no todos los individuos del mismo barrio asisten a la misma escuela y viceversa. Con el fin de modelar esta estructura de datos se utilizan varios modelos econométricos, incluyendo una regresión jerárquica multinivel de efectos cruzados. Nuestro objetivo central es identificar en qué medida y que condiciones del barrio y del colegio se correlacionan con los resultados educacionales de la población objetivo y cuáles características de los barrios y de los colegios están más asociadas al resultado en las pruebas. Usamos datos de la prueba saber 11, del censo de colegios c600, del censo poblacional del 2005 y de la policía metropolitana de Bogotá. Nuestras estimaciones muestran que tanto el barrio como el colegio están correlacionados con los resultados en las pruebas; pero el efecto del colegio parece ser mucho más fuerte que el del barrio. Las características del colegio que están más asociadas con el resultado en las pruebas son la educación de los profesores, la jornada, el valor de la pensión, y el contexto socio económico del colegio. Las características de los barrios más asociadas con el resultado en las pruebas son, la presencia de universitarios en la UPZ, un clúster de altos niveles de educación y nivel de crimen en el barrio que se correlaciona negativamente. Los resultados anteriores fueron hallados teniendo en cuenta controles familiares y personales.