14 resultados para NIRS. Bactérias. PCA. SIMCA. PLS-DA
em Cambridge University Engineering Department Publications Database
Resumo:
Gene microarray technology is highly effective in screening for differential gene expression and has hence become a popular tool in the molecular investigation of cancer. When applied to tumours, molecular characteristics may be correlated with clinical features such as response to chemotherapy. Exploitation of the huge amount of data generated by microarrays is difficult, however, and constitutes a major challenge in the advancement of this methodology. Independent component analysis (ICA), a modern statistical method, allows us to better understand data in such complex and noisy measurement environments. The technique has the potential to significantly increase the quality of the resulting data and improve the biological validity of subsequent analysis. We performed microarray experiments on 31 postmenopausal endometrial biopsies, comprising 11 benign and 20 malignant samples. We compared ICA to the established methods of principal component analysis (PCA), Cyber-T, and SAM. We show that ICA generated patterns that clearly characterized the malignant samples studied, in contrast to PCA. Moreover, ICA improved the biological validity of the genes identified as differentially expressed in endometrial carcinoma, compared to those found by Cyber-T and SAM. In particular, several genes involved in lipid metabolism that are differentially expressed in endometrial carcinoma were only found using this method. This report highlights the potential of ICA in the analysis of microarray data.
Resumo:
Thickness of the near-interface regions (NIR) and central bulk ohmic resistivity in lead lanthanum zirconate titanate ferroelectric thin films were investigated. A method to separate the low-resistive near-interface regions (NIRs) from the high-resistive central bulk region (CBR) in ferroelectric thin films was presented. Results showed that the thickness of the NIRs depended on the electrode materials in use and the CBR resistivity depended on the impurity doping levels.
Resumo:
DNA microarrays provide such a huge amount of data that unsupervised methods are required to reduce the dimension of the data set and to extract meaningful biological information. This work shows that Independent Component Analysis (ICA) is a promising approach for the analysis of genome-wide transcriptomic data. The paper first presents an overview of the most popular algorithms to perform ICA. These algorithms are then applied on a microarray breast-cancer data set. Some issues about the application of ICA and the evaluation of biological relevance of the results are discussed. This study indicates that ICA significantly outperforms Principal Component Analysis (PCA).
Resumo:
We present in this paper a new multivariate probabilistic approach to Acoustic Pulse Recognition (APR) for tangible interface applications. This model uses Principle Component Analysis (PCA) in a probabilistic framework to classify tapping pulses with a high degree of variability. It was found that this model, achieves a higher robustness to pulse variability than simpler template matching methods, specifically when allowed to train on data containing high variability. © 2011 IEEE.
Resumo:
The thermal imaging technique relies on the usage of infrared signal to detect the temperature field. Using temperature as a flow tracer, thermography is used to investigate the scalar transport in the shallow-water wake generated by an emergent circular cylinder. Thermal imaging is demonstrated to be a good quantitative flow visualization technique for studying turbulent mixing phenomena in shallow waters. A key advantage of the thermal imaging method over other scalar measurement techniques, such as the Laser Induced Fluorescence (LIF) and Planar Concentration Analysis (PCA) methods, is that it involves a very simple experimental setup. The dispersion characteristics captured with this technique are found to be similar to past studies with traditional measurement techniques. © 2012 Publishing House for Journal of Hydrodynamics.
Resumo:
With the rapid growth of information and communication technology (ICT) in Korea, there was a need to improve the quality of official ICT statistics. In order to do this, various factors had to be considered, such as the quality of surveying, processing, and output as well as the reputation of the statistical agency. We used PLS estimation to determine how these factors might influence customer satisfaction. Furthermore, through a comparison of associated satisfaction indices, we provided feedback to the responsible statistics agency. It appears that our model can be used as a tool for improving the quality of official ICT statistics. © 2008 Elsevier B.V. All rights reserved.
Resumo:
In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function or the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed. © 2010 Michel Journée, Yurii Nesterov, Peter Richtárik and Rodolphe Sepulchre.
Resumo:
In this paper, we discuss methods to refine locally optimal solutions of sparse PCA. Starting from a local solution obtained by existing algorithms, these methods take advantage of convex relaxations of the sparse PCA problem to propose a refined solution that is still locally optimal but with a higher objective value. © 2010 Springer -Verlag Berlin Heidelberg.
Resumo:
We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.
Resumo:
Copyright © (2014) by the International Machine Learning Society (IMLS) All rights reserved. Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear re-lationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have been proposed to construct features that help reveal nonlinear patterns in data. For basic tasks such as regression or classification, random features exhibit little or no loss in performance, while achieving drastic savings in computational requirements. In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. We demonstrate our algorithms through experiments on real- world data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided.