851 resultados para Association analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The low levels of unemployment recorded in the UK in recent years are widely cited asevidence of the country’s improved economic performance, and the apparent convergence of unemployment rates across the country’s regions used to suggest that the longstanding divide in living standards between the relatively prosperous ‘south’ and the more depressed ‘north’ has been substantially narrowed. Dissenters from theseconclusions have drawn attention to the greatly increased extent of non-employment(around a quarter of the UK’s working age population are not in employment) and themarked regional dimension in its distribution across the country. Amongst these dissenters it is generally agreed that non-employment is concentrated amongst oldermales previously employed in the now very much smaller ‘heavy’ industries (e.g. coal,steel, shipbuilding).This paper uses the tools of compositiona l data analysis to provide a much richer picture of non-employment and one which challenges the conventional analysis wisdom about UK labour market performance as well as the dissenters view of the nature of theproblem. It is shown that, associated with the striking ‘north/south’ divide in nonemployment rates, there is a statistically significant relationship between the size of the non-employment rate and the composition of non-employment. Specifically, it is shown that the share of unemployment in non-employment is negatively correlated with the overall non-employment rate: in regions where the non-employment rate is high the share of unemployment is relatively low. So the unemployment rate is not a very reliable indicator of regional disparities in labour market performance. Even more importantly from a policy viewpoint, a significant positive relationship is found between the size ofthe non-employment rate and the share of those not employed through reason of sicknessor disability and it seems (contrary to the dissenters) that this connection is just as strong for women as it is for men

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are two principal chemical concepts that are important for studying the naturalenvironment. The first one is thermodynamics, which describes whether a system is atequilibrium or can spontaneously change by chemical reactions. The second main conceptis how fast chemical reactions (kinetics or rate of chemical change) take place wheneverthey start. In this work we examine a natural system in which both thermodynamics andkinetic factors are important in determining the abundance of NH+4 , NO−2 and NO−3 insuperficial waters. Samples were collected in the Arno Basin (Tuscany, Italy), a system inwhich natural and antrophic effects both contribute to highly modify the chemical compositionof water. Thermodynamical modelling based on the reduction-oxidation reactionsinvolving the passage NH+4 -& NO−2 -& NO−3 in equilibrium conditions has allowed todetermine the Eh redox potential values able to characterise the state of each sample and,consequently, of the fluid environment from which it was drawn. Just as pH expressesthe concentration of H+ in solution, redox potential is used to express the tendency of anenvironment to receive or supply electrons. In this context, oxic environments, as thoseof river systems, are said to have a high redox potential because O2 is available as anelectron acceptor.Principles of thermodynamics and chemical kinetics allow to obtain a model that oftendoes not completely describe the reality of natural systems. Chemical reactions may indeedfail to achieve equilibrium because the products escape from the site of the rectionor because reactions involving the trasformation are very slow, so that non-equilibriumconditions exist for long periods. Moreover, reaction rates can be sensitive to poorly understoodcatalytic effects or to surface effects, while variables as concentration (a largenumber of chemical species can coexist and interact concurrently), temperature and pressurecan have large gradients in natural systems. By taking into account this, data of 91water samples have been modelled by using statistical methodologies for compositionaldata. The application of log–contrast analysis has allowed to obtain statistical parametersto be correlated with the calculated Eh values. In this way, natural conditions in whichchemical equilibrium is hypothesised, as well as underlying fast reactions, are comparedwith those described by a stochastic approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data.Many of the issues that are discussed with reference to the statistical analysis of compositionaldata have a natural counterpart in the construction of a Bayesian statistical model for categoricaldata.This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986)in his seminal book on compositional data. Particular emphasis is put on the problem of whatparameterization to use

Relevância:

30.00% 30.00%

Publicador:

Resumo:

At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12variables and approximates completely compositional data if the main component, sil-ica, is included. We suggested that what has been termed `crude' principal componentanalysis (PCA) of standardized data often identi ed interpretable pattern in the datamore readily than analyses based on log-ratio transformed data (LRA). The funda-mental problem is that, in LRA, minor oxides with high relative variation, that maynot be structure carrying, can dominate an analysis and obscure pattern associatedwith variables present at higher absolute levels. We investigate this further using sub-compositional data relating to archaeological glasses found on Israeli sites. A simplemodel for glass-making is that it is based on a `recipe' consisting of two `ingredients',sand and a source of soda. Our analysis focuses on the sub-composition of componentsassociated with the sand source. A `crude' PCA of standardized data shows two clearcompositional groups that can be interpreted in terms of di erent recipes being used atdi erent periods, reected in absolute di erences in the composition. LRA analysis canbe undertaken either by normalizing the data or de ning a `residual'. In either case,after some `tuning', these groups are recovered. The results from the normalized LRAare di erently interpreted as showing that the source of sand used to make the glassdi ered. These results are complementary. One relates to the recipe used. The otherrelates to the composition (and presumed sources) of one of the ingredients. It seemsto be axiomatic in some expositions of LRA that statistical analysis of compositionaldata should focus on relative variation via the use of ratios. Our analysis suggests thatabsolute di erences can also be informative

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main instrument used in psychological measurement is the self-report questionnaire. One of its majordrawbacks however is its susceptibility to response biases. A known strategy to control these biases hasbeen the use of so-called ipsative items. Ipsative items are items that require the respondent to makebetween-scale comparisons within each item. The selected option determines to which scale the weight ofthe answer is attributed. Consequently in questionnaires only consisting of ipsative items everyrespondent is allotted an equal amount, i.e. the total score, that each can distribute differently over thescales. Therefore this type of response format yields data that can be considered compositional from itsinception.Methodological oriented psychologists have heavily criticized this type of item format, since the resultingdata is also marked by the associated unfavourable statistical properties. Nevertheless, clinicians havekept using these questionnaires to their satisfaction. This investigation therefore aims to evaluate bothpositions and addresses the similarities and differences between the two data collection methods. Theultimate objective is to formulate a guideline when to use which type of item format.The comparison is based on data obtained with both an ipsative and normative version of threepsychological questionnaires, which were administered to 502 first-year students in psychology accordingto a balanced within-subjects design. Previous research only compared the direct ipsative scale scoreswith the derived ipsative scale scores. The use of compositional data analysis techniques also enables oneto compare derived normative score ratios with direct normative score ratios. The addition of the secondcomparison not only offers the advantage of a better-balanced research strategy. In principle it also allowsfor parametric testing in the evaluation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A recent study reported an association between the brain natriuretic peptide (BNP) promoter T-381C polymorphism (rs198389) and protection against type 2 diabetes (T2D). As replication in several studies is mandatory to confirm genetic results, we analyzed the T-381C polymorphism in seven independent case-control cohorts and in 291 T2D-enriched pedigrees totalling 39 557 subjects of European origin. A meta-analysis of the seven case-control studies (n = 39 040) showed a nominal protective effect [odds ratio (OR) = 0.86 (0.79-0.94), P = 0.0006] of the CC genotype on T2D risk, consistent with the previous study. By combining all available data (n = 49 279), we further confirmed a modest contribution of the BNP T-381C polymorphism for protection against T2D [OR = 0.86 (0.80-0.92), P = 1.4 x 10(-5)]. Potential confounders such as gender, age, obesity status or family history were tested in 4335 T2D and 4179 normoglycemic subjects and they had no influence on T2D risk. This study provides further evidence of a modest contribution of the BNP T-381C polymorphism in protection against T2D and illustrates the difficulty of unambiguously proving modest-sized associations even with large sample sizes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of economic literature has presented its analysis under the assumption of homogeneous capital stock.However, capital composition differs across countries. What has been the pattern of capital compositionassociated with World economies? We make an exploratory statistical analysis based on compositional datatransformed by Aitchinson logratio transformations and we use tools for visualizing and measuring statisticalestimators of association among the components. The goal is to detect distinctive patterns in the composition.As initial findings could be cited that:1. Sectorial components behaved in a correlated way, building industries on one side and , in a lessclear view, equipment industries on the other.2. Full sample estimation shows a negative correlation between durable goods component andother buildings component and between transportation and building industries components.3. Countries with zeros in some components are mainly low income countries at the bottom of theincome category and behaved in a extreme way distorting main results observed in the fullsample.4. After removing these extreme cases, conclusions seem not very sensitive to the presence ofanother isolated cases

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Adiponectin has a variety of metabolic effects on obesity, insulin sensitivity, and atherosclerosis. To identify genes influencing variation in plasma adiponectin levels, we performed genome-wide linkage and association scans of adiponectin in two cohorts of subjects recruited in the Genetic Epidemiology of Metabolic Syndrome Study. The genome-wide linkage scan was conducted in families of Turkish and southern European (TSE, n = 789) and Northern and Western European (NWE, N = 2,280) origin. A whole genome association (WGA) analysis (500K Affymetrix platform) was carried out in a set of unrelated NWE subjects consisting of approximately 1,000 subjects with dyslipidemia and 1,000 overweight subjects with normal lipids. Peak evidence for linkage occurred at chromosome 8p23 in NWE subjects (lod = 3.10) and at chromosome 3q28 near ADIPOQ, the adiponectin structural gene, in TSE subjects (lod = 1.70). In the WGA analysis, the single-nucleotide polymorphisms (SNPs) most strongly associated with adiponectin were rs3774261 and rs6773957 (P < 10(-7)). These two SNPs were in high linkage disequilibrium (r(2) = 0.98) and located within ADIPOQ. Interestingly, our fourth strongest region of association (P < 2 x 10(-5)) was to an SNP within CDH13, whose protein product is a newly identified receptor for high-molecular-weight species of adiponectin. Through WGA analysis, we confirmed previous studies showing SNPs within ADIPOQ to be strongly associated with variation in adiponectin levels and further observed these to have the strongest effects on adiponectin levels throughout the genome. We additionally identified a second gene (CDH13) possibly influencing variation in adiponectin levels. The impact of these SNPs on health and disease has yet to be determined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Usually, psychometricians apply classical factorial analysis to evaluate construct validity of order rankscales. Nevertheless, these scales have particular characteristics that must be taken into account: totalscores and rank are highly relevant

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Caveolae are involved in physical compartmentalization between different groups of signaling events. Its main component, CAV1, modulates different pathways in cellular physiology. The emerging evidence pointing to the role of CAV1 in cancer led us to study whether different alleles of this gene are associated with colorectal cancer (CRC). Since one of the most characterized enzymes regulated by CAV1 is eNOS, we decided to include both genes in this study. We analyzed five SNPs in 360 unrelated CRC patients and 550 controls from the general population. Two of these SNPs were located within eNOS and three within the CAV1 gene. Although haplotype distribution was not associated with CRC, haplotype TiA (CAV1) was associated with familiar forms of CRC (p<0.05). This was especially evident in CRC antecedents and nuclear forms of CRC. If both CG (eNOS) and TiA (CAV1) haplotypes were taken together, this association increased in significance. Thus, we propose that CAV1, either alone or together with eNOS alleles, might modify CRC heritability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A compositional time series is obtained when a compositional data vector is observed atdifferent points in time. Inherently, then, a compositional time series is a multivariatetime series with important constraints on the variables observed at any instance in time.Although this type of data frequently occurs in situations of real practical interest, atrawl through the statistical literature reveals that research in the field is very much in itsinfancy and that many theoretical and empirical issues still remain to be addressed. Anyappropriate statistical methodology for the analysis of compositional time series musttake into account the constraints which are not allowed for by the usual statisticaltechniques available for analysing multivariate time series. One general approach toanalyzing compositional time series consists in the application of an initial transform tobreak the positive and unit sum constraints, followed by the analysis of the transformedtime series using multivariate ARIMA models. In this paper we discuss the use of theadditive log-ratio, centred log-ratio and isometric log-ratio transforms. We also presentresults from an empirical study designed to explore how the selection of the initialtransform affects subsequent multivariate ARIMA modelling as well as the quality ofthe forecasts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Colorectal cancer is one of the most prevalent cancers in developed countries. However, the genetic factors influencing its appearance remain far from being fully characterized. Recently, a G>A functional transition mapping the 3' untranslated region of the CXCL12 gene (rs1801157) has been found to be under-represented among rectal cancer patients when compared to colon cancer patients from a Swedish series. Here we present the results from an independent analysis of CXCL12 rs1801157 in a larger CRC series of Spanish origin in order to analyse the robustness of this association within a different European population. No significant difference was observed between controls and colon or rectal cancer patients. We were also unable to find a correlation between rs1801157 and different prognostic markers such as metastasis development or disease-free survival time. The epidemiologic data involving CXCL12 rs1801157 in colorectal cancer risk are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A joint distribution of two discrete random variables with finite support can be displayed as a two way table of probabilities adding to one. Assume that this table hasn rows and m columns and all probabilities are non-null. This kind of table can beseen as an element in the simplex of n · m parts. In this context, the marginals areidentified as compositional amalgams, conditionals (rows or columns) as subcompositions. Also, simplicial perturbation appears as Bayes theorem. However, the Euclideanelements of the Aitchison geometry of the simplex can also be translated into the tableof probabilities: subspaces, orthogonal projections, distances.Two important questions are addressed: a) given a table of probabilities, which isthe nearest independent table to the initial one? b) which is the largest orthogonalprojection of a row onto a column? or, equivalently, which is the information in arow explained by a column, thus explaining the interaction? To answer these questionsthree orthogonal decompositions are presented: (1) by columns and a row-wise geometric marginal, (2) by rows and a columnwise geometric marginal, (3) by independenttwo-way tables and fully dependent tables representing row-column interaction. Animportant result is that the nearest independent table is the product of the two (rowand column)-wise geometric marginal tables. A corollary is that, in an independenttable, the geometric marginals conform with the traditional (arithmetic) marginals.These decompositions can be compared with standard log-linear models.Key words: balance, compositional data, simplex, Aitchison geometry, composition,orthonormal basis, arithmetic and geometric marginals, amalgam, dependence measure,contingency table