939 resultados para principal components
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources
Resumo:
The aim of this study was to present the initial validation of a new questionnaire, the Transition to Retirement Questionnaire (TRQ) and to study its relationship with resistance to change and personality dimensions. Based on Schlossberg's typology of the retired, the TRQ is designed to assess five dimensions related to personal perceptions of transition to retirement, retirement, and personal plans and activities. The sample consisted of 1,054 professionally active or retired adults from the Swiss French-speaking Canton of Vaud. Exploratory principal components and confirmatory factor analyses highlighted a five-factor solution that fit coherently with Schlossberg's typology. Moreover, TRQ dimensions were related to resistance to change tendencies and personality dimensions. The TRQ seems to be an interesting tool for use in research but also for interventions with young retirees or people preparing for retirement.
Resumo:
Previous microarray studies on breast cancer identified multiple tumour classes, of which the most prominent, named luminal and basal, differ in expression of the oestrogen receptor alpha gene (ER). We report here the identification of a group of breast tumours with increased androgen signalling and a 'molecular apocrine' gene expression profile. Tumour samples from 49 patients with large operable or locally advanced breast cancers were tested on Affymetrix U133A gene expression microarrays. Principal components analysis and hierarchical clustering split the tumours into three groups: basal, luminal and a group we call molecular apocrine. All of the molecular apocrine tumours have strong apocrine features on histological examination (P=0.0002). The molecular apocrine group is androgen receptor (AR) positive and contains all of the ER-negative tumours outside the basal group. Kolmogorov-Smirnov testing indicates that oestrogen signalling is most active in the luminal group, and androgen signalling is most active in the molecular apocrine group. ERBB2 amplification is commoner in the molecular apocrine than the other groups. Genes that best split the three groups were identified by Wilcoxon test. Correlation of the average expression profile of these genes in our data with the expression profile of individual tumours in four published breast cancer studies suggest that molecular apocrine tumours represent 8-14% of tumours in these studies. Our data show that it is possible with microarray data to divide mammary tumour cells into three groups based on steroid receptor activity: luminal (ER+ AR+), basal (ER- AR-) and molecular apocrine (ER- AR+).
Resumo:
The fatty acids from cocoa butters of different origins, varieties, and suppliers and a number of cocoa butter equivalents (Illexao 30-61, Illexao 30-71, Illexao 30-96, Choclin, Coberine, Chocosine-Illipe, Chocosine-Shea, Shokao, Akomax, Akonord, and Ertina) were investigated by bulk stable carbon isotope analysis and compound specific isotope analysis. The interpretation is based on principal component analysis combining the fatty acid concentrations and the bulk and molecular isotopic data. The scatterplot of the two first principal components allowed detection of the addition of vegetable fats to cocoa butters. Enrichment in heavy carbon isotope (C-13) of the bulk cocoa butter and of the individual fatty acids is related to mixing with other vegetable fats and possibly to thermally or oxidatively induced degradation during processing (e.g., drying and roasting of the cocoa beans or deodorization of the pressed fat) or storage. The feasibility of the analytical approach for authenticity assessment is discussed.
Resumo:
Objective To analyze the reliability and validity of the psychometric properties of the Brazilian version of the instrument for symptom assessment, titled MD Anderson Symptom Inventory - core. Method A cross-sectional study with 268 cancer patients in outpatient treatment, in the municipality of Ijuí, state of Rio Grande do Sul, Brazil. Results The Cronbach’s alpha for the MDASI general, symptoms and interferences was respectively (0.857), (0.784) and (0.794). The factor analysis showed adequacy of the data (0.792). In total, were identified four factors of the principal components related to the symptoms. Factor I: sleep problems, distress (upset), difficulties in remembering things and sadness. Factor II: dizziness, nausea, lack of appetite and vomiting. Factor III: drowsiness, dry mouth, numbness and tingling. Factor IV: pain, fatigue and shortness of breath. A single factor was revealed in the component of interferences with life (0.780), with prevalence of activity in general (59.7%), work (54.9%) and walking (49.3%). Conclusion The Brazilian version of the MD Anderson Symptom Inventory - core showed adequate psychometric properties in the studied population.
Resumo:
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.
Resumo:
Com características morfológicas e edafo-climáticas extremamente diversificadas, a ilha de Santo Antão em Cabo Verde apresenta uma reconhecida vulnerabilidade ambiental a par de uma elevada carência de estudos científicos que incidam sobre essa realidade e sirvam de base à uma compreensão integrada dos fenómenos. A cartografia digital e as tecnologias de informação geográfica vêm proporcionando um avanço tecnológico na colecção, armazenamento e processamento de dados espaciais. Várias ferramentas actualmente disponíveis permitem modelar uma multiplicidade de factores, localizar e quantificar os fenómenos bem como e definir os níveis de contribuição de diferentes factores no resultado final. No presente estudo, desenvolvido no âmbito do curso de pós-graduação e mestrado em sistemas de Informação geográfica realizado pela Universidade de Trás-os-Montes e Alto Douro, pretende-se contribuir para a minimização do deficit de informação relativa às características biofísicas da citada ilha, recorrendo-se à aplicação de tecnologias de informação geográfica e detecção remota, associadas à análise estatística multivariada. Nesse âmbito, foram produzidas e analisadas cartas temáticas e desenvolvido um modelo de análise integrada de dados. Com efeito, a multiplicidade de variáveis espaciais produzidas, de entre elas 29 variáveis com variação contínua passíveis de influenciar as características biofísicas da região e, possíveis ocorrências de efeitos mútuos antagónicos ou sinergéticos, condicionam uma relativa complexidade à interpretação a partir dos dados originais. Visando contornar este problema, recorre-se a uma rede de amostragem sistemática, totalizando 921 pontos ou repetições, para extrair os dados correspondentes às 29 variáveis nos pontos de amostragem e, subsequente desenvolvimento de técnicas de análise estatística multivariada, nomeadamente a análise em componentes principais. A aplicação destas técnicas permitiu simplificar e interpretar as variáreis originais, normalizando-as e resumindo a informação contida na diversidade de variáveis originais, correlacionadas entre si, num conjunto de variáveis ortogonais (não correlacionadas), e com níveis de importância decrescente, as componentes principais. Fixou-se como meta a concentração de 75% da variância dos dados originais explicadas pelas primeiras 3 componentes principais e, desenvolveu-se um processo interactivo em diferentes etapas, eliminando sucessivamente as variáveis menos representativas. Na última etapa do processo as 3 primeiras CP resultaram em 74,54% da variância dos dados originais explicadas mas, que vieram a demonstrar na fase posterior, serem insuficientes para retratar a realidade. Optou-se pela inclusão da 4ª CP (CP4), com a qual 84% da referida variância era explicada e, representando oito variáveis biofísicas: a altitude, a densidade hidrográfica, a densidade de fracturação geológica, a precipitação, o índice de vegetação, a temperatura, os recursos hídricos e a distância à rede hidrográfica. A subsequente interpolação da 1ª componente principal (CP1) e, das principais variáveis associadas as componentes CP2, CP3 e CP4 como variáveis auxiliares, recorrendo a técnicas geoestatística em ambiente ArcGIS permitiu a obtenção de uma carta representando 84% da variação das características biofísicas no território. A análise em clusters validada pelo teste “t de Student” permitiu reclassificar o território em 6 unidades biofísicas homogéneas. Conclui-se que, as tecnologias de informação geográfica actualmente disponíveis a par de facilitar análises interactivas e flexíveis, possibilitando que se faça variar temas e critérios, integrar novas informações e introduzir melhorias em modelos construídos com bases em informações disponíveis num determinado contexto, associadas a técnicas de análise estatística multivariada, possibilitam, com base em critérios científicos, desenvolver a análise integrada de múltiplas variáveis biofísicas cuja correlação entre si, torna complexa a compreensão integrada dos fenómenos.
Resumo:
Dual scaling of a subjects-by-objects table of dominance data (preferences,paired comparisons and successive categories data) has been contrasted with correspondence analysis, as if the two techniques were somehow different. In this note we show that dual scaling of dominance data is equivalent to the correspondence analysis of a table which is doubled with respect to subjects. We also show that the results of both methods can be recovered from a principal components analysis of the undoubled dominance table which is centred with respect to subject means.
Resumo:
The Baix Empordà-Selva-Gavarres aquifer system is related to the fault set that created the tectonic basins of Empordà and Selva areas (NE Spain) during the Neogene. In this work, we describe groundwater hydrogeological, hydrochemical and isotopical (3H, δD, δ18O, and the 87Sr/86Sr ratio) characteristics of this system in order to illustrate the relevance of fault zones in groundwater flow-paths and the recharge. In that way, we identify two flow systems, with distinct hydrochemistry and isotopes. A local flow system originates at the Gavarres Range, and it flows towards the basins of the Baix Empordà and Selva, with an approximate residence time of 20 years. Additionally, a regional flow system has only been identified in the Selva basin. This one is related to the main fault zones, as preferential flow paths. Its recharge is located in mountain ranges with higher altitudes, namely the Transversal and Guilleries Ranges, with residence times larger than 50 years. Isotopical data has also shown mixing processes between both flow systems and rainfall recharge while multivariate statistical analysis of principal components has shown the main processes that control hydrochemistry of each flow systems
Resumo:
The spatial variability of soil and plant properties exerts great influence on the yeld of agricultural crops. This study analyzed the spatial variability of the fertility of a Humic Rhodic Hapludox with Arabic coffee, using principal component analysis, cluster analysis and geostatistics in combination. The experiment was carried out in an area under Coffea arabica L., variety Catucai 20/15 - 479. The soil was sampled at a depth 0.20 m, at 50 points of a sampling grid. The following chemical properties were determined: P, K+, Ca2+, Mg2+, Na+, S, Al3+, pH, H + Al, SB, t, T, V, m, OM, Na saturation index (SSI), remaining phosphorus (P-rem), and micronutrients (Zn, Fe, Mn, Cu and B). The data were analyzed with descriptive statistics, followed by principal component and cluster analyses. Geostatistics were used to check and quantify the degree of spatial dependence of properties, represented by principal components. The principal component analysis allowed a dimensional reduction of the problem, providing interpretable components, with little information loss. Despite the characteristic information loss of principal component analysis, the combination of this technique with geostatistical analysis was efficient for the quantification and determination of the structure of spatial dependence of soil fertility. In general, the availability of soil mineral nutrients was low and the levels of acidity and exchangeable Al were high.
Resumo:
Controversies regarding the pathogenesis of cardiovascular diseases in HIV patients Since the introduction of HAART (Highly active anti-retroviral therapy), the incidence of cardiovascular events has risen in patients infected with HIV. This development is mainly due to the increased survival in these patients. Nonetheless, the pathogenic effects of HIV on the principal components of haemostasis (endothelium, platelets and the clotting cascade) are the subject of numerous ongoing research studies, and are becoming an argument for starting HAART or for modifying the components of an established therapy. The aim of this article is to raise clinician awareness regarding the issue of cardiovascular disease in the HIV-infected patient.
Resumo:
Considering that information from soil reflectance spectra is underutilized in soil classification, this paper aimed to evaluate the relationship of soil physical, chemical properties and their spectra, to identify spectral patterns for soil classes, evaluate the use of numerical classification of profiles combined with spectral data for soil classification. We studied 20 soil profiles from the municipality of Piracicaba, State of São Paulo, Brazil, which were morphologically described and classified up to the 3rd category level of the Brazilian Soil Classification System (SiBCS). Subsequently, soil samples were collected from pedogenetic horizons and subjected to soil particle size and chemical analyses. Their Vis-NIR spectra were measured, followed by principal component analysis. Pearson's linear correlation coefficients were determined among the four principal components and the following soil properties: pH, organic matter, P, K, Ca, Mg, Al, CEC, base saturation, and Al saturation. We also carried out interpretation of the first three principal components and their relationships with soil classes defined by SiBCS. In addition, numerical classification of the profiles based on the OSACA algorithm was performed using spectral data as a basis. We determined the Normalized Mutual Information (NMI) and Uncertainty Coefficient (U). These coefficients represent the similarity between the numerical classification and the soil classes from SiBCS. Pearson's correlation coefficients were significant for the principal components when compared to sand, clay, Al content and soil color. Visual analysis of the principal component scores showed differences in the spectral behavior of the soil classes, mainly among Argissolos and the others soils. The NMI and U similarity coefficients showed values of 0.74 and 0.64, respectively, suggesting good similarity between the numerical and SiBCS classes. For example, numerical classification correctly distinguished Argissolos from Latossolos and Nitossolos. However, this mathematical technique was not able to distinguish Latossolos from Nitossolos Vermelho férricos, but the Cambissolos were well differentiated from other soil classes. The numerical technique proved to be effective and applicable to the soil classification process.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
To study the stress-induced effects caused by wounding under a new perspective, a metabolomic strategy based on HPLC-MS has been devised for the model plant Arabidopsis thaliana. To detect induced metabolites and precisely localise these compounds among the numerous constitutive metabolites, HPLC-MS analyses were performed in a two-step strategy. In a first step, rapid direct TOF-MS measurements of the crude leaf extract were performed with a ballistic gradient on a short LC-column. The HPLC-MS data were investigated by multivariate analysis as total mass spectra (TMS). Principal components analysis (PCA) and hierarchical cluster analysis (HCA) on principal coordinates were combined for data treatment. PCA and HCA demonstrated a clear clustering of plant specimens selecting the highest discriminating ions given by the complete data analysis, leading to the specific detection of discrete-induced ions (m/z values). Furthermore, pool constitution with plants of homogeneous behaviour was achieved for confirmatory analysis. In this second step, long high-resolution LC profilings on an UPLC-TOF-MS system were used on pooled samples. This allowed to precisely localise the putative biological marker induced by wounding and by specific extraction of accurate m/z values detected in the screening procedure with the TMS spectra.