933 resultados para Statistical factora analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Developments in the statistical analysis of compositional data over the last twodecades have made possible a much deeper exploration of the nature of variability,and the possible processes associated with compositional data sets from manydisciplines. In this paper we concentrate on geochemical data sets. First we explainhow hypotheses of compositional variability may be formulated within the naturalsample space, the unit simplex, including useful hypotheses of subcompositionaldiscrimination and specific perturbational change. Then we develop through standardmethodology, such as generalised likelihood ratio tests, statistical tools to allow thesystematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require specialconstruction. We comment on the use of graphical methods in compositional dataanalysis and on the ordination of specimens. The recent development of the conceptof compositional processes is then explained together with the necessary tools for astaying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland.Finally we point out a number of unresolved problems in the statistical analysis ofcompositional processes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are two principal chemical concepts that are important for studying the naturalenvironment. The first one is thermodynamics, which describes whether a system is atequilibrium or can spontaneously change by chemical reactions. The second main conceptis how fast chemical reactions (kinetics or rate of chemical change) take place wheneverthey start. In this work we examine a natural system in which both thermodynamics andkinetic factors are important in determining the abundance of NH+4 , NO−2 and NO−3 insuperficial waters. Samples were collected in the Arno Basin (Tuscany, Italy), a system inwhich natural and antrophic effects both contribute to highly modify the chemical compositionof water. Thermodynamical modelling based on the reduction-oxidation reactionsinvolving the passage NH+4 -& NO−2 -& NO−3 in equilibrium conditions has allowed todetermine the Eh redox potential values able to characterise the state of each sample and,consequently, of the fluid environment from which it was drawn. Just as pH expressesthe concentration of H+ in solution, redox potential is used to express the tendency of anenvironment to receive or supply electrons. In this context, oxic environments, as thoseof river systems, are said to have a high redox potential because O2 is available as anelectron acceptor.Principles of thermodynamics and chemical kinetics allow to obtain a model that oftendoes not completely describe the reality of natural systems. Chemical reactions may indeedfail to achieve equilibrium because the products escape from the site of the rectionor because reactions involving the trasformation are very slow, so that non-equilibriumconditions exist for long periods. Moreover, reaction rates can be sensitive to poorly understoodcatalytic effects or to surface effects, while variables as concentration (a largenumber of chemical species can coexist and interact concurrently), temperature and pressurecan have large gradients in natural systems. By taking into account this, data of 91water samples have been modelled by using statistical methodologies for compositionaldata. The application of log–contrast analysis has allowed to obtain statistical parametersto be correlated with the calculated Eh values. In this way, natural conditions in whichchemical equilibrium is hypothesised, as well as underlying fast reactions, are comparedwith those described by a stochastic approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data.Many of the issues that are discussed with reference to the statistical analysis of compositionaldata have a natural counterpart in the construction of a Bayesian statistical model for categoricaldata.This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986)in his seminal book on compositional data. Particular emphasis is put on the problem of whatparameterization to use

Relevância:

30.00% 30.00%

Publicador:

Resumo:

At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12variables and approximates completely compositional data if the main component, sil-ica, is included. We suggested that what has been termed `crude' principal componentanalysis (PCA) of standardized data often identi ed interpretable pattern in the datamore readily than analyses based on log-ratio transformed data (LRA). The funda-mental problem is that, in LRA, minor oxides with high relative variation, that maynot be structure carrying, can dominate an analysis and obscure pattern associatedwith variables present at higher absolute levels. We investigate this further using sub-compositional data relating to archaeological glasses found on Israeli sites. A simplemodel for glass-making is that it is based on a `recipe' consisting of two `ingredients',sand and a source of soda. Our analysis focuses on the sub-composition of componentsassociated with the sand source. A `crude' PCA of standardized data shows two clearcompositional groups that can be interpreted in terms of di erent recipes being used atdi erent periods, reected in absolute di erences in the composition. LRA analysis canbe undertaken either by normalizing the data or de ning a `residual'. In either case,after some `tuning', these groups are recovered. The results from the normalized LRAare di erently interpreted as showing that the source of sand used to make the glassdi ered. These results are complementary. One relates to the recipe used. The otherrelates to the composition (and presumed sources) of one of the ingredients. It seemsto be axiomatic in some expositions of LRA that statistical analysis of compositionaldata should focus on relative variation via the use of ratios. Our analysis suggests thatabsolute di erences can also be informative

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main instrument used in psychological measurement is the self-report questionnaire. One of its majordrawbacks however is its susceptibility to response biases. A known strategy to control these biases hasbeen the use of so-called ipsative items. Ipsative items are items that require the respondent to makebetween-scale comparisons within each item. The selected option determines to which scale the weight ofthe answer is attributed. Consequently in questionnaires only consisting of ipsative items everyrespondent is allotted an equal amount, i.e. the total score, that each can distribute differently over thescales. Therefore this type of response format yields data that can be considered compositional from itsinception.Methodological oriented psychologists have heavily criticized this type of item format, since the resultingdata is also marked by the associated unfavourable statistical properties. Nevertheless, clinicians havekept using these questionnaires to their satisfaction. This investigation therefore aims to evaluate bothpositions and addresses the similarities and differences between the two data collection methods. Theultimate objective is to formulate a guideline when to use which type of item format.The comparison is based on data obtained with both an ipsative and normative version of threepsychological questionnaires, which were administered to 502 first-year students in psychology accordingto a balanced within-subjects design. Previous research only compared the direct ipsative scale scoreswith the derived ipsative scale scores. The use of compositional data analysis techniques also enables oneto compare derived normative score ratios with direct normative score ratios. The addition of the secondcomparison not only offers the advantage of a better-balanced research strategy. In principle it also allowsfor parametric testing in the evaluation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, the development of statistical models in support of forensic fingerprint identification has been the subject of increasing research attention, spurned on recently by commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. Such models are increasingly seen as useful tools in support of the fingerprint identification process within or in addition to the ACE-V framework. This paper provides a critical review of recent statistical models from both a practical and theoretical perspective. This includes analysis of models of two different methodologies: Probability of Random Correspondence (PRC) models that focus on calculating probabilities of the occurrence of fingerprint configurations for a given population, and Likelihood Ratio (LR) models which use analysis of corresponding features of fingerprints to derive a likelihood value representing the evidential weighting for a potential source.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A compositional time series is obtained when a compositional data vector is observed atdifferent points in time. Inherently, then, a compositional time series is a multivariatetime series with important constraints on the variables observed at any instance in time.Although this type of data frequently occurs in situations of real practical interest, atrawl through the statistical literature reveals that research in the field is very much in itsinfancy and that many theoretical and empirical issues still remain to be addressed. Anyappropriate statistical methodology for the analysis of compositional time series musttake into account the constraints which are not allowed for by the usual statisticaltechniques available for analysing multivariate time series. One general approach toanalyzing compositional time series consists in the application of an initial transform tobreak the positive and unit sum constraints, followed by the analysis of the transformedtime series using multivariate ARIMA models. In this paper we discuss the use of theadditive log-ratio, centred log-ratio and isometric log-ratio transforms. We also presentresults from an empirical study designed to explore how the selection of the initialtransform affects subsequent multivariate ARIMA modelling as well as the quality ofthe forecasts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION The aim of the present study was to investigate the possible role of CD40 and CD40 ligand (CD40LG) genes in the susceptibility and phenotype expression of systemic sclerosis (SSc). METHODS In total, 2,670 SSc patients and 3,245 healthy individuals from four European populations (Spain, Germany, The Netherlands, and Italy) were included in the study. Five single-nucleotide polymorphisms (SNPs) of CD40 (rs1883832, rs4810485, rs1535045) and CD40LG (rs3092952, rs3092920) were genotyped by using a predesigned TaqMan allele-discrimination assay technology. Meta-analysis was assessed to determine whether an association exists between the genetic variants and SSc or its main clinical subtypes. RESULTS No evidence of association between CD40 and CD40LG genes variants and susceptibility to SSc was observed. Similarly, no significant statistical differences were observed when SSc patients were stratified by the clinical subtypes, the serologic features, and pulmonary fibrosis. CONCLUSIONS Our results do not suggest an important role of CD40 and CD40LG gene polymorphisms in the susceptibility to or clinical expression of SSc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Alcohol and tobacco consumption are well-recognized risk factors for head and neck cancer (HNC). Evidence suggests that genetic predisposition may also play a role. Only a few epidemiologic studies, however, have considered the relation between HNC risk and family history of HNC and other cancers. We pooled individual-level data across 12 case-control studies including 8,967 HNC cases and 13,627 controls. We obtained pooled odds ratios (OR) using fixed and random effect models and adjusting for potential confounding factors. All statistical tests were two-sided. A family history of HNC in first-degree relatives increased the risk of HNC (OR=1.7, 95% confidence interval, CI, 1.2-2.3). The risk was higher when the affected relative was a sibling (OR=2.2, 95% CI 1.6-3.1) rather than a parent (OR=1.5, 95% CI 1.1-1.8) and for more distal HNC anatomic sites (hypopharynx and larynx). The risk was also higher, or limited to, in subjects exposed to tobacco. The OR rose to 7.2 (95% CI 5.5-9.5) among subjects with family history, who were alcohol and tobacco users. A weak but significant association (OR=1.1, 95% CI 1.0-1.2) emerged for family history of other tobacco-related neoplasms, particularly with laryngeal cancer (OR=1.3, 95% CI 1.1-1.5). No association was observed for family history of nontobacco-related neoplasms and the risk of HNC (OR=1.0, 95% CI 0.9-1.1). Familial factors play a role in the etiology of HNC. In both subjects with and without family history of HNC, avoidance of tobacco and alcohol exposure may be the best way to avoid HNC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to their performance enhancing properties, use of anabolic steroids (e.g. testosterone, nandrolone, etc.) is banned in elite sports. Therefore, doping control laboratories accredited by the World Anti-Doping Agency (WADA) screen among others for these prohibited substances in urine. It is particularly challenging to detect misuse with naturally occurring anabolic steroids such as testosterone (T), which is a popular ergogenic agent in sports and society. To screen for misuse with these compounds, drug testing laboratories monitor the urinary concentrations of endogenous steroid metabolites and their ratios, which constitute the steroid profile and compare them with reference ranges to detect unnaturally high values. However, the interpretation of the steroid profile is difficult due to large inter-individual variances, various confounding factors and different endogenous steroids marketed that influence the steroid profile in various ways. A support vector machine (SVM) algorithm was developed to statistically evaluate urinary steroid profiles composed of an extended range of steroid profile metabolites. This model makes the interpretation of the analytical data in the quest for deviating steroid profiles feasible and shows its versatility towards different kinds of misused endogenous steroids. The SVM model outperforms the current biomarkers with respect to detection sensitivity and accuracy, particularly when it is coupled to individual data as stored in the Athlete Biological Passport.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND AND HYPOTHESIS Although prodromal angina occurring shortly before an acute myocardial infarction (MI) has protective effects against in-hospital complications, this effect has not been well documented after initial hospitalization, especially in older or diabetic patients. We examined whether angina 1 week before a first MI provides protection in these patients. METHODS A total of 290 consecutive patients, 143 elderly (>64 years of age) and 147 adults (<65 years of age), 68 of whom were diabetic (23.4%) and 222 nondiabetic (76.6%), were examined to assess the effect of preceding angina on long-term prognosis (56 months) after initial hospitalization for a first MI. RESULTS No significant differences were found in long-term complications after initial hospitalization in these adult and elderly patients according to whether or not they had prodromal angina (44.4% with angina vs 45.4% without in adults; 45.5% vs 58% in elderly, P < 0.2). Nor were differences found according to their diabetic status (61.5% with angina vs 72.7% without in diabetics; 37.3% vs 38.3% in nondiabetics; P = 0.4). CONCLUSION The occurrence of angina 1 week before a first MI does not confer long-term protection against cardiovascular complications after initial hospitalization in adult or elderly patients, whether or not they have diabetes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previously published scientific papers have reported a negative correlation between drinking water hardness and cardiovascular mortality. Some ecologic and case-control studies suggest the protective effect of calcium and magnesium concentration in drinking water. In this article we present an analysis of this protective relationship in 538 municipalities of Comunidad Valenciana (Spain) from 1991-1998. We used the Spanish version of the Rapid Inquiry Facility (RIF) developed under the European Environment and Health Information System (EUROHEIS) research project. The strategy of analysis used in our study conforms to the exploratory nature of the RIF that is used as a tool to obtain quick and flexible insight into epidemiologic surveillance problems. This article describes the use of the RIF to explore possible associations between disease indicators and environmental factors. We used exposure analysis to assess the effect of both protective factors--calcium and magnesium--on mortality from cerebrovascular (ICD-9 430-438) and ischemic heart (ICD-9 410-414) diseases. This study provides statistical evidence of the relationship between mortality from cardiovascular diseases and hardness of drinking water. This relationship is stronger in cerebrovascular disease than in ischemic heart disease, is more pronounced for women than for men, and is more apparent with magnesium than with calcium concentration levels. Nevertheless, the protective nature of these two factors is not clearly established. Our results suggest the possibility of protectiveness but cannot be claimed as conclusive. The weak effects of these covariates make it difficult to separate them from the influence of socioeconomic and environmental factors. We have also performed disease mapping of standardized mortality ratios to detect clusters of municipalities with high risk. Further standardization by levels of calcium and magnesium in drinking water shows changes in the maps when we remove the effect of these covariates.