924 resultados para Canonical correlation
Resumo:
Copyright © (2014) by the International Machine Learning Society (IMLS) All rights reserved. Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear re-lationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have been proposed to construct features that help reveal nonlinear patterns in data. For basic tasks such as regression or classification, random features exhibit little or no loss in performance, while achieving drastic savings in computational requirements. In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. We demonstrate our algorithms through experiments on real- world data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided.
Resumo:
Environmental mechanism of change in cyanobacterial species composition in the northeastern part of Lake Dianchi (also called Macun Bay and Haidong Bay) was studied using canonical correlation analysis (CCA), but also bottom-up control and top-down control were fully discussed. Results from CCA suggest: (1) the abundance and dominance of Microcystis aeruginosa in Macun Bay and Haidong Bay are influenced by total phosphorus (TP), nitrate (NO3--N), nitrite (NO2--N), dissolved oxygen (DO) and water temperature (WT); (2) water temperature has a positive correlation with the abundance of M. aeruginosa and it also has negative correlations with the abundances of Anabaena flos-aquae and Aphanizomenonon flos-aquae; and (3) abundances of both Anabaena flos-aquae and Aphanizomenon flos-aquae have positive correlations with ammonia-N (NH4+-N). Furthermore, cyanobacterial species composition has no significant correlations with light and size-fractioned iron in this study. Grazers, cyanophages and viruses were able to control cyanobacterial blooms and change the composition of cyanobacterial species. Though we studied physical and chemical factors intensely enough, we still are not able to predict the change in the composition of cyanobacterial blooms, because of plankton system in a chaotic behavior.
Resumo:
Principal Component and Canonical Correlation Analysis of the Environmental Factors Influencing the Growth of Caragana korshinskii Kom. in Grassland
Resumo:
The relationship between monthly sea-level data measured at stations located along the Chinese coast and concurrent large-scale atmospheric forcing in the period 1960-1990 is examined. It is found that sea-level varies quite coherently along the whole coast, despite the geographical extension of the station set. A canonical correlation analysis between sea-level and sea-level pressure (SLP) indicates that a great part of the sea-level variability can be explained by the action of the wind stress on the ocean surface. The relationship between sea-level and sea-level pressure is analyzed separately for the summer and winter half-years. In winter, one factor affecting sea-level variability at all stations is the SLP contrast between the continent and the Pacific Ocean, hence the intensity of the winter Monsoon circulation. Another factor that affects coherently all stations is the intensity of the zonal circulation at mid-latitudes. In the summer half year, on the other hand, the influence of SLP on sea-level is spatially less coherent: the stations in the Yellow Sea are affected by a more localized circulation anomaly pattern, whereas the rest of the stations is more directly connected to the intensity of the zonal circulation. Based on this analysis, statistical models (different for summer and winter) to hindcast coastal sealevel anomalies from the large-scale SLP field are formulated. These models have been tested by fitting their internal parameters in a test period and reproducing reasonably the sea-level evolution in an independent period. These statistical models are also used to estimate the contribution of the changes of the atmospheric circulation on sea-level along the Chinese coast in an altered climate. For this purpose the ouput of 150 year-long experiment with the coupled ocean-atmosphere model ECHAM1-LSG has been analyzed, in which the atmospheric concentration of greenhouse gases was continuously increased from 1940 until 2090, according to the Scenario A projection of the Intergovermental Panel on Climate Change. In this experiment the meridional (zonal) circulation relevant for sea-level tends to become weaker (stronger) in the winter half year and stronger (weaker) in summer. The estimated contribution of this atmospheric circulation changes to coastal sea-level is of the order of a few centimeters at the end of the integration, being in winter negative in the Yellow Sea and positive in the China Sea with opposite signs in the summer half-year.
Resumo:
This paper introduces the application of linear multivariate statistical techniques, including partial least squares (PLS), canonical correlation analysis (CCA) and reduced rank regression (RRR), into the area of Systems Biology. This new approach aims to extract the important proteins embedded in complex signal transduction pathway models.The analysis is performed on a model of intracellular signalling along the janus-associated kinases/signal transducers and transcription factors (JAK/STAT) and mitogen activated protein kinases (MAPK) signal transduction pathways in interleukin-6 (IL6) stimulated hepatocytes, which produce signal transducer and activator of transcription factor 3 (STAT3).A region of redundancy within the MAPK pathway that does not affect the STAT3 transcription was identified using CCA. This is the core finding of this analysis and cannot be obtained by inspecting the model by eye. In addition, RRR was found to isolate terms that do not significantly contribute to changes in protein concentrations, while the application of PLS does not provide such a detailed picture by virtue of its construction.This analysis has a similar objective to conventional model reduction techniques with the advantage of maintaining the meaning of the states prior to and after the reduction process. A significant model reduction is performed, with a marginal loss in accuracy, offering a more concise model while maintaining the main influencing factors on the STAT3 transcription.The findings offer a deeper understanding of the reaction terms involved, confirm the relevance of several proteins to the production of Acute Phase Proteins and complement existing findings regarding cross-talk between the two signalling pathways.
Resumo:
Using a unique high-frequency data-set on a comprehensive sample of Greek blue-chip stocks, spanning from September 2003 through March 2006, this note assesses the extent and role of commonality in returns, order flows (OFs), and liquidity. It also formally models aggregate equity returns in terms of aggregate equity OF, in an effort to clarify OF's importance in explaining returns for the Athens Exchange market. Almost a quarter of the daily returns in the FTSE/ATHEX20 index is explained by aggregate own OF. In a second step, using principal components and canonical correlation analyses, we document substantial common movements in returns, OFs, and liquidity, both on a market-wide basis and on an individual security basis. These results emphasize that asset pricing and liquidity cannot be analyzed in isolation from each other.
Resumo:
Statistics are regularly used to make some form of comparison between trace evidence or deploy the exclusionary principle (Morgan and Bull, 2007) in forensic investigations. Trace evidence are routinely the results of particle size, chemical or modal analyses and as such constitute compositional data. The issue is that compositional data including percentages, parts per million etc. only carry relative information. This may be problematic where a comparison of percentages and other constraint/closed data is deemed a statistically valid and appropriate way to present trace evidence in a court of law. Notwithstanding an awareness of the existence of the constant sum problem since the seminal works of Pearson (1896) and Chayes (1960) and the introduction of the application of log-ratio techniques (Aitchison, 1986; Pawlowsky-Glahn and Egozcue, 2001; Pawlowsky-Glahn and Buccianti, 2011; Tolosana-Delgado and van den Boogaart, 2013) the problem that a constant sum destroys the potential independence of variances and covariances required for correlation regression analysis and empirical multivariate methods (principal component analysis, cluster analysis, discriminant analysis, canonical correlation) is all too often not acknowledged in the statistical treatment of trace evidence. Yet the need for a robust treatment of forensic trace evidence analyses is obvious. This research examines the issues and potential pitfalls for forensic investigators if the constant sum constraint is ignored in the analysis and presentation of forensic trace evidence. Forensic case studies involving particle size and mineral analyses as trace evidence are used to demonstrate the use of a compositional data approach using a centred log-ratio (clr) transformation and multivariate statistical analyses.
Resumo:
I t is generally accepted among scholars that individual learning and team learning contribute to the concept we refer to as organizational learning. However, a small number of quantitative and qualitative studies that have investigated their relationship reported contradicting results. This thesis investigated the relationship between individual learning, team learning, and organizational learning. A survey instrument was used to collect information on individual learning, team learning, and organizational learning. The study sample comprised of supervisors from the clinical laboratories in teaching hospitals and community hospitals in Ontario. The analyses utilized a linear regression to investigate the relationship between individual and team learning. The relationship between individual and organizational learning, and team and organizational learning were simultaneously investigated with canonical correlation and set correlation. T-test and multivariate analysis of variance were used to compare the differences in learning scores of respondents employed by laboratories in teaching and those employed by community hospitals. The study validated its tests results with 1,000 bootstrap replications. Results from this study suggest that there are moderate correlations between individual learning and team learning. The correlation individual learning and organizational learning and team learning and organizational learning appeared to be weak. The scores of the three learning levels show statistically significant differences between respondents from laboratories in teaching hospitals and respondents from community hospitals.
Resumo:
Temperature and precipitation are major forcing factors influencing grapevine phenology and yield, as well as wine quality. Bioclimatic indices describing the suitability of a particular region for wine production are a commonly used tool for viticultural zoning. For this research these indices were computed for Europe by using the E-OBS gridded daily temperature and precipitation data set for the period from 1950 to 2009. Results showed strong regional contrasts based on the different index patterns and reproduced the wide diversity of local conditions that largely explain the quality and diversity of grapevines being grown across Europe. Owing to the strong inter-annual variability in the indices, a trend analysis and a principal component analysis were applied together with an assessment of their mean patterns. Significant trends were identified in the Winkler and Huglin indices, particularly for southwestern Europe. Four statistically significant orthogonal modes of variability were isolated for the Huglin index (HI), jointly representing 82% of the total variance in Europe. The leading mode was largely dominant (48% of variance) and mainly reflected the observed historical long-term changes. The other 3 modes corresponded to regional dipoles within Europe. Despite the relevance of local and regional climatic characteristics to grapevines, it was demonstrated via canonical correlation analysis that the observed inter-annual variability of the HI was strongly controlled by the large-scale atmospheric circulation during the growing season (April to September).
Resumo:
Background: The electroencephalogram (EEG) may be described by a large number of different feature types and automated feature selection methods are needed in order to reliably identify features which correlate with continuous independent variables. New method: A method is presented for the automated identification of features that differentiate two or more groups inneurologicaldatasets basedupona spectraldecompositionofthe feature set. Furthermore, the method is able to identify features that relate to continuous independent variables. Results: The proposed method is first evaluated on synthetic EEG datasets and observed to reliably identify the correct features. The method is then applied to EEG recorded during a music listening task and is observed to automatically identify neural correlates of music tempo changes similar to neural correlates identified in a previous study. Finally,the method is applied to identify neural correlates of music-induced affective states. The identified neural correlates reside primarily over the frontal cortex and are consistent with widely reported neural correlates of emotions. Comparison with existing methods: The proposed method is compared to the state-of-the-art methods of canonical correlation analysis and common spatial patterns, in order to identify features differentiating synthetic event-related potentials of different amplitudes and is observed to exhibit greater performance as the number of unique groups in the dataset increases. Conclusions: The proposed method is able to identify neural correlates of continuous variables in EEG datasets and is shown to outperform canonical correlation analysis and common spatial patterns.
Resumo:
This paper uses canonical correlation analisys to identify leading and coincident indicators of economic activity in Brazil. ln contrast with the traditional literature on the subject, no restrictions are made regarding the number of common cycles that are necessary to explain the complete cyclical behavior of the coincident variables. For the brazillian data, it is found that three common cycles exhaust all the cyclical pattern of economic activity. Based on the methodology developed here, it is also sugested an alternative chronology of the recent brazillian recessions.
Resumo:
We use the information content in the decisions of the NBER Business Cycle Dating Committee to construct coincident and leading indices of economic activity for the United States. We identify the coincident index by assuming that the coincident variables have a common cycle with the unobserved state of the economy, and that the NBER business cycle dates signify the turning points in the unobserved state. This model allows us to estimate our coincident index as a linear combination of the coincident series. We establish that our index performs better than other currently popular coincident indices of economic activity.
Resumo:
We use the information content in the decisions of the NBER Business Cycle Dating Committee to construct coincident and leading indices of economic activity for the United States. We identify the coincident index by assuming that the coincident variables have a common cycle with the unobserved state of the economy, and that the NBER business cycle dates signify the turning points in the unobserved state. This model allows us to estimate our coincident index as a linear combination of the coincident series. We establish that our index performs better than other currently popular coincident indices of economic activity.
Resumo:
We use the information content in the decisions of the NBER Business Cycle Dating Committee to construct coincident and leading indices of economic activity for the United States. We identify the coincident index by assuming that the coincident variables have a common cycle with the unobserved state of the economy, and that the NBER business cycle dates signify the turning points in the unobserved state. This model allows us to estimate our coincident index as a linear combination of the coincident series. We compare the performance of our index with other currently popular coincident indices of economic activity.
Phytoplankton structure in two contrasting cascade reservoirs (Paranapanema River, Southeast Brazil)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)