967 resultados para PCA and HCA
Resumo:
A high resolution mixed carbonate and siliciclastic sequence from DSDP Site 594 contains a detailed record of climate change in the late Pliocene. The sequence can be accurately dated by the LAD of Nitzschia weaveri, the LAD of Thalassiosira insigna, the LAD of T. vulnifica and the LAD of T. kolbei diatom datums. Carbonate content and delta18O signatures provide added resolution and place the sequence between isotope stage 100 and 92. The sequence contains well-preserved and diverse dinoflagellate cyst floras. Use of principal component (PCA) and canonical correspondence analyses (CCA) identifies changes in the assemblages that principally reflect warming and cooling trends. Species association with warmer climates included Impagidinium patulum, I. paradoxum and I. sp. cf. paradoxum while those from cooler climates include Invertecysta tabulata and I. velorum. CCA is shown to be a valuable method of determining the past environmental preferences of extinct species such as I. tabulata.
Resumo:
Molecular interactions between microcrystalline cellulose (MCC) and water were investigated by attenuated total reflection infrared (ATR/IR) spectroscopy. Moisture-content-dependent IR spectra during a drying process of wet MCC were measured. In order to distinguish overlapping O–H stretching bands arising from both cellulose and water, principal component analysis (PCA) and, generalized two-dimensional correlation spectroscopy (2DCOS) and second derivative analysis were applied to the obtained spectra. Four typical drying stages were clearly separated by PCA, and spectral variations in each stage were analyzed by 2DCOS. In the drying time range of 0–41 min, a decrease in the broad band around 3390 cm−1 was observed, indicating that bulk water was evaporated. In the drying time range of 49–195 min, decreases in the bands at 3412, 3344 and 3286 cm−1 assigned to the O6H6cdots, three dots, centeredO3′ interchain hydrogen bonds (H-bonds), the O3H3cdots, three dots, centeredO5 intrachain H-bonds and the H-bonds in Iβ phase in MCC, respectively, were observed. The result of the second derivative analysis suggests that water molecules mainly interact with the O6H6cdots, three dots, centeredO3′ interchain H-bonds. Thus, the H-bonding network in MCC is stabilized by H-bonds between OH groups constructing O6H6cdots, three dots, centeredO3′ interchain H-bonds and water, and the removal of the water molecules induces changes in the H-bonding network in MCC.
Resumo:
In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.
Resumo:
Two contrasting multivariate statistical methods, viz., principal components analysis (PCA) and cluster analysis were applied to the study of neuropathological variations between cases of Alzheimer's disease (AD). To compare the two methods, 78 cases of AD were analyzed, each characterised by measurements of 47 neuropathological variables. Both methods of analysis revealed significant variations between AD cases. These variations were related primarily to differences in the distribution and abundance of senile plaques (SP) and neurofibrillary tangles (NFT) in the brain. Cluster analysis classified the majority of AD cases into five groups which could represent subtypes of AD. However, PCA suggested that variation between cases was more continuous with no distinct subtypes. Hence, PCA may be a more appropriate method than cluster analysis in the study of neuropathological variations between AD cases.
Resumo:
The thesis presents new methodology and algorithms that can be used to analyse and measure the hand tremor and fatigue of surgeons while performing surgery. This will assist them in deriving useful information about their fatigue levels, and make them aware of the changes in their tool point accuracies. This thesis proposes that muscular changes of surgeons, which occur through a day of operating, can be monitored using Electromyography (EMG) signals. The multi-channel EMG signals are measured at different muscles in the upper arm of surgeons. The dependence of EMG signals has been examined to test the hypothesis that EMG signals are coupled with and dependent on each other. The results demonstrated that EMG signals collected from different channels while mimicking an operating posture are independent. Consequently, single channel fatigue analysis has been performed. In measuring hand tremor, a new method for determining the maximum tremor amplitude using Principal Component Analysis (PCA) and a new technique to detrend acceleration signals using Empirical Mode Decomposition algorithm were introduced. This tremor determination method is more representative for surgeons and it is suggested as an alternative fatigue measure. This was combined with the complexity analysis method, and applied to surgically captured data to determine if operating has an effect on a surgeon’s fatigue and tremor levels. It was found that surgical tremor and fatigue are developed throughout a day of operating and that this could be determined based solely on their initial values. Finally, several Nonlinear AutoRegressive with eXogenous inputs (NARX) neural networks were evaluated. The results suggest that it is possible to monitor surgeon tremor variations during surgery from their EMG fatigue measurements.
Resumo:
This thesis presents research within empirical financial economics with focus on liquidity and portfolio optimisation in the stock market. The discussion on liquidity is focused on measurement issues, including TAQ data processing and measurement of systematic liquidity factors (FSO). Furthermore, a framework for treatment of the two topics in combination is provided. The liquidity part of the thesis gives a conceptual background to liquidity and discusses several different approaches to liquidity measurement. It contributes to liquidity measurement by providing detailed guidelines on the data processing needed for applying TAQ data to liquidity research. The main focus, however, is the derivation of systematic liquidity factors. The principal component approach to systematic liquidity measurement is refined by the introduction of moving and expanding estimation windows, allowing for time-varying liquidity co-variances between stocks. Under several liability specifications, this improves the ability to explain stock liquidity and returns, as compared to static window PCA and market average approximations of systematic liquidity. The highest ability to explain stock returns is obtained when using inventory cost as a liquidity measure and a moving window PCA as the systematic liquidity derivation technique. Systematic factors of this setting also have a strong ability in explaining a cross-sectional liquidity variation. Portfolio optimisation in the FSO framework is tested in two empirical studies. These contribute to the assessment of FSO by expanding the applicability to stock indexes and individual stocks, by considering a wide selection of utility function specifications, and by showing explicitly how the full-scale optimum can be identified using either grid search or the heuristic search algorithm of differential evolution. The studies show that relative to mean-variance portfolios, FSO performs well in these settings and that the computational expense can be mitigated dramatically by application of differential evolution.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
The objectives of this research are to analyze and develop a modified Principal Component Analysis (PCA) and to develop a two-dimensional PCA with applications in image processing. PCA is a classical multivariate technique where its mathematical treatment is purely based on the eigensystem of positive-definite symmetric matrices. Its main function is to statistically transform a set of correlated variables to a new set of uncorrelated variables over $\IR\sp{n}$ by retaining most of the variations present in the original variables.^ The variances of the Principal Components (PCs) obtained from the modified PCA form a correlation matrix of the original variables. The decomposition of this correlation matrix into a diagonal matrix produces a set of orthonormal basis that can be used to linearly transform the given PCs. It is this linear transformation that reproduces the original variables. The two-dimensional PCA can be devised as a two successive of one-dimensional PCA. It can be shown that, for an $m\times n$ matrix, the PCs obtained from the two-dimensional PCA are the singular values of that matrix.^ In this research, several applications for image analysis based on PCA are developed, i.e., edge detection, feature extraction, and multi-resolution PCA decomposition and reconstruction. ^
Resumo:
The elemental analysis of soil is useful in forensic and environmental sciences. Methods were developed and optimized for two laser-based multi-element analysis techniques: laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS). This work represents the first use of a 266 nm laser for forensic soil analysis by LIBS. Sample preparation methods were developed and optimized for a variety of sample types, including pellets for large bulk soil specimens (470 mg) and sediment-laden filters (47 mg), and tape-mounting for small transfer evidence specimens (10 mg). Analytical performance for sediment filter pellets and tape-mounted soils was similar to that achieved with bulk pellets. An inter-laboratory comparison exercise was designed to evaluate the performance of the LA-ICP-MS and LIBS methods, as well as for micro X-ray fluorescence (μXRF), across multiple laboratories. Limits of detection (LODs) were 0.01-23 ppm for LA-ICP-MS, 0.25-574 ppm for LIBS, 16-4400 ppm for μXRF, and well below the levels normally seen in soils. Good intra-laboratory precision (≤ 6 % relative standard deviation (RSD) for LA-ICP-MS; ≤ 8 % for μXRF; ≤ 17 % for LIBS) and inter-laboratory precision (≤ 19 % for LA-ICP-MS; ≤ 25 % for μXRF) were achieved for most elements, which is encouraging for a first inter-laboratory exercise. While LIBS generally has higher LODs and RSDs than LA-ICP-MS, both were capable of generating good quality multi-element data sufficient for discrimination purposes. Multivariate methods using principal components analysis (PCA) and linear discriminant analysis (LDA) were developed for discriminations of soils from different sources. Specimens from different sites that were indistinguishable by color alone were discriminated by elemental analysis. Correct classification rates of 94.5 % or better were achieved in a simulated forensic discrimination of three similar sites for both LIBS and LA-ICP-MS. Results for tape-mounted specimens were nearly identical to those achieved with pellets. Methods were tested on soils from USA, Canada and Tanzania. Within-site heterogeneity was site-specific. Elemental differences were greatest for specimens separated by large distances, even within the same lithology. Elemental profiles can be used to discriminate soils from different locations and narrow down locations even when mineralogy is similar.
Resumo:
lmage super-resolution is defined as a class of techniques that enhance the spatial resolution of images. Super-resolution methods can be subdivided in single and multi image methods. This thesis focuses on developing algorithms based on mathematical theories for single image super resolution problems. lndeed, in arder to estimate an output image, we adopta mixed approach: i.e., we use both a dictionary of patches with sparsity constraints (typical of learning-based methods) and regularization terms (typical of reconstruction-based methods). Although the existing methods already per- form well, they do not take into account the geometry of the data to: regularize the solution, cluster data samples (samples are often clustered using algorithms with the Euclidean distance as a dissimilarity metric), learn dictionaries (they are often learned using PCA or K-SVD). Thus, state-of-the-art methods still suffer from shortcomings. In this work, we proposed three new methods to overcome these deficiencies. First, we developed SE-ASDS (a structure tensor based regularization term) in arder to improve the sharpness of edges. SE-ASDS achieves much better results than many state-of-the- art algorithms. Then, we proposed AGNN and GOC algorithms for determining a local subset of training samples from which a good local model can be computed for recon- structing a given input test sample, where we take into account the underlying geometry of the data. AGNN and GOC methods outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings. Next, we proposed aSOB strategy which takes into account the geometry of the data and the dictionary size. The aSOB strategy outperforms both PCA and PGA methods. Finally, we combine all our methods in a unique algorithm, named G2SR. Our proposed G2SR algorithm shows better visual and quantitative results when compared to the results of state-of-the-art methods.
Resumo:
This dissertation focuses on two vital challenges in relation to whale acoustic signals: detection and classification.
In detection, we evaluated the influence of the uncertain ocean environment on the spectrogram-based detector, and derived the likelihood ratio of the proposed Short Time Fourier Transform detector. Experimental results showed that the proposed detector outperforms detectors based on the spectrogram. The proposed detector is more sensitive to environmental changes because it includes phase information.
In classification, our focus is on finding a robust and sparse representation of whale vocalizations. Because whale vocalizations can be modeled as polynomial phase signals, we can represent the whale calls by their polynomial phase coefficients. In this dissertation, we used the Weyl transform to capture chirp rate information, and used a two dimensional feature set to represent whale vocalizations globally. Experimental results showed that our Weyl feature set outperforms chirplet coefficients and MFCC (Mel Frequency Cepstral Coefficients) when applied to our collected data.
Since whale vocalizations can be represented by polynomial phase coefficients, it is plausible that the signals lie on a manifold parameterized by these coefficients. We also studied the intrinsic structure of high dimensional whale data by exploiting its geometry. Experimental results showed that nonlinear mappings such as Laplacian Eigenmap and ISOMAP outperform linear mappings such as PCA and MDS, suggesting that the whale acoustic data is nonlinear.
We also explored deep learning algorithms on whale acoustic data. We built each layer as convolutions with either a PCA filter bank (PCANet) or a DCT filter bank (DCTNet). With the DCT filter bank, each layer has different a time-frequency scale representation, and from this, one can extract different physical information. Experimental results showed that our PCANet and DCTNet achieve high classification rate on the whale vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.
Resumo:
Extensive use of fossil fuels is leading to increasing CO2 concentrations in the atmosphere and causes changes in the carbonate chemistry of the oceans which represents a major sink for anthropogenic CO2. As a result, the oceans' surface pH is expected to decrease by ca. 0.4 units by the year 2100, a major change with potentially negative consequences for some marine species. Because of their carbonate skeleton, sea urchins and their larval stages are regarded as likely to be one of the more sensitive taxa. In order to investigate sensitivity of pre-feeding (2 days post-fertilization) and feeding (4 and 7 days post-fertilization) pluteus larvae, we raised Strongylocentrotus purpuratus embryos in control (pH 8.1 and pCO2 41 Pa e.g. 399 µatm) and CO2 acidified seawater with pH of 7.7 (pCO2 134 Pa e.g. 1318 µatm) and investigated growth, calcification and survival. At three time points (day 2, day 4 and day 7 post-fertilization), we measured the expression of 26 representative genes important for metabolism, calcification and ion regulation using RT-qPCR. After one week of development, we observed a significant difference in growth. Maximum differences in size were detected at day 4 (ca. 10 % reduction in body length). A comparison of gene expression patterns using PCA and ANOSIM clearly distinguished between the different age groups (Two way ANOSIM: Global R = 1) while acidification effects were less pronounced (Global R = 0.518). Significant differences in gene expression patterns (ANOSIM R = 0.938, SIMPER: 4.3% difference) were also detected at day 4 leading to the hypothesis that differences between CO2 treatments could reflect patterns of expression seen in control experiments of a younger larva and thus a developmental artifact rather than a direct CO2 effect. We found an up regulation of metabolic genes (between 10 to 20% in ATP-synthase, citrate synthase, pyruvate kinase and thiolase at day 4) and down regulation of calcification related genes (between 23 and 36% in msp130, SM30B, SM50 at day 4). Ion regulation was mainly impacted by up regulation of Na+/K+-ATPase at day 4 (15%) and down regulation of NHE3 at day 4 (45%). We conclude that in studies in which a stressor induces an alteration in the speed of development, it is crucial to employ experimental designs with a high time resolution in order to correct for developmental artifacts. This helps prevent misinterpretation of stressor effects on organism physiology.
Resumo:
The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).
Resumo:
Identifying 20th-century periodic coastal surge variation is strategic for the 21st-century coastal surge estimates, as surge periodicities may amplify/reduce future MSL enhanced surge forecasts. Extreme coastal surge data from Belfast Harbour (UK) tide gauges are available for 1901–2010 and provide the potential for decadal-plus periodic coastal surge analysis. Annual extreme surge-elevation distributions (sampled every 10-min) are analysed using PCA and cluster analysis to decompose variation within- and between-years to assess similarity of years in terms of Surge Climate Types, and to establish significance of any transitions in Type occurrence over time using non-parametric Markov analysis. Annual extreme surge variation is shown to be periodically organised across the 20th century. Extreme surge magnitude and distribution show a number of significant cyclonic induced multi-annual (2, 3, 5 & 6 years) cycles, as well as dominant multi-decadal (15–25 years) cycles of variation superimposed on an 80 year fluctuation in atmospheric–oceanic variation across the North Atlantic (relative to NAO/AMO interaction). The top 30 extreme surge events show some relationship with NAO per se, given that 80% are associated with westerly dominant atmospheric flows (+ NAO), but there are 20% of the events associated with blocking air massess (− NAO). Although 20% of the top 30 ranked positive surges occurred within the last twenty years, there is no unequivocal evidence of recent acceleration in extreme surge magnitude related to other than the scale of natural periodic variation.
Resumo:
BACKGROUND: Calcium channel blockers (CCBs) may affect prostate cancer (PCa) growth by various mechanisms including those related to androgens. The fusion of the androgen-regulated gene TMPRSS2 and the oncogene ERG (TMPRSS2:ERG or T2E) is common in PCa, and prostate tumors that harbor the gene fusion are believed to represent a distinct disease subtype. We studied the association of CCB use with the risk of PCa, and molecular subtypes of PCa defined by T2E status.
METHODS: Participants were residents of King County, Washington, recruited for population-based case-control studies (1993-1996 or 2002-2005). Tumor T2E status was determined by fluorescence in situ hybridization using tumor tissue specimens from radical prostatectomy. Detailed information on use of CCBs and other variables was obtained through in-person interviews. Binomial and polytomous logistic regression were used to generate odds ratios (ORs) and 95% confidence intervals (CIs).
RESULTS: The study included 1,747 PCa patients and 1,635 age-matched controls. A subset of 563 patients treated with radical prostatectomy had T2E status determined, of which 295 were T2E positive (52%). Use of CCBs (ever vs. never) was not associated with overall PCa risk. However, among European-American men, users had a reduced risk of higher-grade PCa (Gleason scores ≥7: adjusted OR = 0.64; 95% CI: 0.44-0.95). Further, use of CCBs was associated with a reduced risk of T2E positive PCa (adjusted OR = 0.38; 95% CI: 0.19-0.78), but was not associated with T2E negative PCa.
CONCLUSIONS: This study found suggestive evidence that use of CCBs is associated with reduced relative risks for higher Gleason score and T2E positive PCa. Future studies of PCa etiology should consider etiologic heterogeneity as PCa subtypes may develop through different causal pathways.