15 resultados para multivariate analysis of variance
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
The objective of research was to analyse the potential of Normalized Difference Vegetation Index (NDVI) maps from satellite images, yield maps and grapevine fertility and load variables to delineate zones with different wine grape properties for selective harvesting. Two vineyard blocks located in NE Spain (Cabernet Sauvignon and Syrah) were analysed. The NDVI was computed from a Quickbird-2 multi-spectral image at veraison (July 2005). Yield data was acquired by means of a yield monitor during September 2005. Other variables, such as the number of buds, number of shoots, number of wine grape clusters and weight of 100 berries were sampled in a 10 rows × 5 vines pattern and used as input variables, in combination with the NDVI, to define the clusters as alternative to yield maps. Two days prior to the harvesting, grape samples were taken. The analysed variables were probable alcoholic degree, pH of the juice, total acidity, total phenolics, colour, anthocyanins and tannins. The input variables, alone or in combination, were clustered (2 and 3 Clusters) by using the ISODATA algorithm, and an analysis of variance and a multiple rang test were performed. The results show that the zones derived from the NDVI maps are more effective to differentiate grape maturity and quality variables than the zones derived from the yield maps. The inclusion of other grapevine fertility and load variables did not improve the results.
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuouslycored boreholes, 100 to 220m deep were drilled in the northern part of the PoPlain by Regione Lombardia in the last five years. Quantitative provenanceanalysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carriedout by using multivariate statistical analysis (principal component analysis, PCA,and similarity analysis) on an integrated data set, including high-resolution bulkpetrography and heavy-mineral analyses on Pleistocene sands and of 250 majorand minor modern rivers draining the southern flank of the Alps from West toEast (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations,metamorphic and quartzofeldspathic detritus from the Western and Central Alpswas carried from the axial belt to the Po basin longitudinally parallel to theSouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenariorapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset ofthe first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA andsimilarity analysis from core samples show that the longitudinal trunk river at thistime was shifted southward by the rapid southward and westward progradation oftransverse alluvial river systems fed from the Central and Southern Alps.Sediments were transported southward by braided river systems as well as glacialsediments transported by Alpine valley glaciers invaded the alluvial plain.Kew words: Detrital modes; Modern sands; Provenance; Principal ComponentsAnalysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Standard methods for the analysis of linear latent variable models oftenrely on the assumption that the vector of observed variables is normallydistributed. This normality assumption (NA) plays a crucial role inassessingoptimality of estimates, in computing standard errors, and in designinganasymptotic chi-square goodness-of-fit test. The asymptotic validity of NAinferences when the data deviates from normality has been calledasymptoticrobustness. In the present paper we extend previous work on asymptoticrobustnessto a general context of multi-sample analysis of linear latent variablemodels,with a latent component of the model allowed to be fixed across(hypothetical)sample replications, and with the asymptotic covariance matrix of thesamplemoments not necessarily finite. We will show that, under certainconditions,the matrix $\Gamma$ of asymptotic variances of the analyzed samplemomentscan be substituted by a matrix $\Omega$ that is a function only of thecross-product moments of the observed variables. The main advantage of thisis thatinferences based on $\Omega$ are readily available in standard softwareforcovariance structure analysis, and do not require to compute samplefourth-order moments. An illustration with simulated data in the context ofregressionwith errors in variables will be presented.
Resumo:
Pounamu (NZ jade), or nephrite, is a protected mineral in its natural form following thetransfer of ownership back to Ngai Tahu under the Ngai Tahu (Pounamu Vesting) Act 1997.Any theft of nephrite is prosecutable under the Crimes Act 1961. Scientific evidence isessential in cases where origin is disputed. A robust method for discrimination of thismaterial through the use of elemental analysis and compositional data analysis is required.Initial studies have characterised the variability within a given nephrite source. This hasincluded investigation of both in situ outcrops and alluvial material. Methods for thediscrimination of two geographically close nephrite sources are being developed.Key Words: forensic, jade, nephrite, laser ablation, inductively coupled plasma massspectrometry, multivariate analysis, elemental analysis, compositional data analysis
Resumo:
When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.
Resumo:
Structural equation models are widely used in economic, socialand behavioral studies to analyze linear interrelationships amongvariables, some of which may be unobservable or subject to measurementerror. Alternative estimation methods that exploit different distributionalassumptions are now available. The present paper deals with issues ofasymptotic statistical inferences, such as the evaluation of standarderrors of estimates and chi--square goodness--of--fit statistics,in the general context of mean and covariance structures. The emphasisis on drawing correct statistical inferences regardless of thedistribution of the data and the method of estimation employed. A(distribution--free) consistent estimate of $\Gamma$, the matrix ofasymptotic variances of the vector of sample second--order moments,will be used to compute robust standard errors and a robust chi--squaregoodness--of--fit squares. Simple modifications of the usual estimateof $\Gamma$ will also permit correct inferences in the case of multi--stage complex samples. We will also discuss the conditions under which,regardless of the distribution of the data, one can rely on the usual(non--robust) inferential statistics. Finally, a multivariate regressionmodel with errors--in--variables will be used to illustrate, by meansof simulated data, various theoretical aspects of the paper.
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.
Resumo:
We consider the joint visualization of two matrices which have common rowsand columns, for example multivariate data observed at two time pointsor split accord-ing to a dichotomous variable. Methods of interest includeprincipal components analysis for interval-scaled data, or correspondenceanalysis for frequency data or ratio-scaled variables on commensuratescales. A simple result in matrix algebra shows that by setting up thematrices in a particular block format, matrix sum and difference componentscan be visualized. The case when we have more than two matrices is alsodiscussed and the methodology is applied to data from the InternationalSocial Survey Program.
Resumo:
The work presented evaluates the statistical characteristics of regional bias and expected error in reconstructions of real positron emission tomography (PET) data of human brain fluoro-deoxiglucose (FDG) studies carried out by the maximum likelihood estimator (MLE) method with a robust stopping rule, and compares them with the results of filtered backprojection (FBP) reconstructions and with the method of sieves. The task of evaluating radioisotope uptake in regions-of-interest (ROIs) is investigated. An assessment of bias and variance in uptake measurements is carried out with simulated data. Then, by using three different transition matrices with different degrees of accuracy and a components of variance model for statistical analysis, it is shown that the characteristics obtained from real human FDG brain data are consistent with the results of the simulation studies.
Resumo:
A comment about the article “Local sensitivity analysis for compositional data with application to soil texture in hydrologic modelling” writen by L. Loosvelt and co-authors. The present comment is centered in three specific points. The first one is related to the fact that the authors avoid the use of ilr-coordinates. The second one refers to some generalization of sensitivity analysis when input parameters are compositional. The third tries to show that the role of the Dirichlet distribution in the sensitivity analysis is irrelevant
Resumo:
Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.
Resumo:
The main aim of this study was to replicate and extend previous results on subtypes of adolescents with substance use disorders (SUD), according to their Minnesota Multiphasic Personality Inventory for adolescents (MMPI-A) profiles. Sixty patients with SUD and psychiatric comorbidity (41.7% male, mean age = 15.9 years old) completed the MMPI-A, the Teen Addiction Severity Index (T-ASI), the Child Behaviour Checklist (CBCL), and were interviewed in order to determine DSMIV diagnoses and level of substance use. Mean MMPI-A personality profile showed moderate peaks in Psychopathic Deviate, Depression and Hysteria scales. Hierarchical cluster analysis revealed four profiles (acting-out, 35% of the sample; disorganized-conflictive, 15%; normative-impulsive, 15%; and deceptive-concealed, 35%). External correlates were found between cluster 1, CBCL externalizing symptoms at a clinical level and conduct disorders, and between cluster 2 and mixed CBCL internalized/externalized symptoms at a clinical level. Discriminant analysis showed that Depression, Psychopathic Deviate and Psychasthenia MMPI-A scales correctly classified 90% of the patients into the clusters obtained.
Resumo:
Background: Polyphenols may lower the risk of cardiovascular disease (CVD) and other chronic diseases due to their antioxidant and anti-inflammatory properties, as well as their beneficial effects on blood pressure, lipids and insulin resistance. However, no previous epidemiological studies have evaluated the relationship between the intake of total polyphenols intake and polyphenol subclasses with overall mortality. Our aim was to evaluate whether polyphenol intake is associated with all-cause mortality in subjects at high cardiovascular risk. Methods: We used data from the PREDIMED study, a 7,447-participant, parallel-group, randomized, multicenter, controlled five-year feeding trial aimed at assessing the effects of the Mediterranean Diet in primary prevention of cardiovascular disease. Polyphenol intake was calculated by matching food consumption data from repeated food frequency questionnaires (FFQ) with the Phenol-Explorer database on the polyphenol content of each reported food. Hazard ratios (HR) and 95% confidence intervals (CI) between polyphenol intake and mortality were estimated using time-dependent Cox proportional hazard models. Results: Over an average of 4.8 years of follow-up, we observed 327 deaths. After multivariate adjustment, we found a 37% relative reduction in all-cause mortality comparing the highest versus the lowest quintiles of total polyphenol intake (hazard ratio (HR) = 0.63; 95% CI 0.41 to 0.97; P for trend = 0.12). Among the polyphenol subclasses, stilbenes and lignans were significantly associated with reduced all-cause mortality (HR =0.48; 95% CI 0.25 to 0.91; P for trend = 0.04 and HR = 0.60; 95% CI 0.37 to 0.97; P for trend = 0.03, respectively), with no significant associations apparent in the rest (flavonoids or phenolic acids). Conclusions: Among high-risk subjects, those who reported a high polyphenol intake, especially of stilbenes and lignans, showed a reduced risk of overall mortality compared to those with lower intakes. These results may be useful to determine optimal polyphenol intake or specific food sources of polyphenols that may reduce the risk of all-cause mortality.
Resumo:
Public opinion surveys have become progressively incorporated into systems of official statistics. Surveys of the economic climate are usually qualitative because they collect opinions of businesspeople and/or experts about the long-term indicators described by a number of variables. In such cases the responses are expressed in ordinal numbers, that is, the respondents verbally report, for example, whether during a given trimester the sales or the new orders have increased, decreased or remained the same as in the previous trimester. These data allow to calculate the percent of respondents in the total population (results are extrapolated), who select every one of the three options. Data are often presented in the form of an index calculated as the difference between the percent of those who claim that a given variable has improved in value and of those who claim that it has deteriorated.