937 resultados para principal components analysis (PCA) algorithm
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
Premenstrual syndrome and premenstrual dysphoric disorder (PMDD) seem to form a severity continuum with no clear-cut boundary. However, since the American Psychiatric Association proposed the research criteria for PMDD in 1994, there has been no agreement about the symptomatic constellation that constitutes this syndrome. The objective of the present study was to establish the core latent structure of PMDD symptoms in a non-clinical sample. Data concerning PMDD symptoms were obtained from 632 regularly menstruating college students (mean age 24.4 years, SD 5.9, range 17 to 49). For the first random half (N = 316), we performed principal component analysis (PCA) and for the remaining half (N = 316), we tested three theory-derived competing models of PMDD by confirmatory factor analysis. PCA allowed us to extract two correlated factors, i.e., dysphoric-somatic and behavioral-impairment factors. The two-dimensional latent model derived from PCA showed the best overall fit among three models tested by confirmatory factor analysis (c²53 = 64.39, P = 0.13; goodness-of-fit indices = 0.96; adjusted goodness-of-fit indices = 0.95; root mean square residual = 0.05; root mean square error of approximation = 0.03; 90%CI = 0.00 to 0.05; Akaike's information criterion = -41.61). The items "out of control" and "physical symptoms" loaded conspicuously on the first factor and "interpersonal impairment" loaded higher on the second factor. The construct validity for PMDD was accounted for by two highly correlated dimensions. These results support the argument for focusing on the core psychopathological dimension of PMDD in future studies.
Resumo:
The contents of total phenolic compounds (TPC), total flavonoids (TF), and ascorbic acid (AA) of 18 frozen fruit pulps and their scavenging capacities against peroxyl radical (ROO), hydrogen peroxide (H2O2), and hydroxyl radical (OH) were determined. Principal Component Analysis (PCA) showed that TPC (total phenolic compounds) and AA (ascorbic acid) presented positive correlation with the scavenging capacity against ROO, and TF (total flavonoids) showed positive correlation with the scavenging capacity against OH and ROO However, the scavenging capacity against H2O2 presented low correlation with TF (total flavonoids), TPC (total phenolic compounds), and AA (ascorbic acid). The Hierarchical Cluster Analysis (HCA) allowed the classification of the fruit pulps into three groups: one group was formed by the açai pulp with high TF, total flavonoids, content (134.02 mg CE/100 g pulp) and the highest scavenging capacity against ROO, OH and H2O2; the second group was formed by the acerola pulp with high TPC, total phenolic compounds, (658.40 mg GAE/100 g pulp) and AA , ascorbic acid, (506.27 mg/100 g pulp) contents; and the third group was formed by pineapple, cacao, caja, cashew-apple, coconut, cupuaçu, guava, orange, lemon, mango, passion fruit, watermelon, pitanga, tamarind, tangerine, and umbu pulps, which could not be separated considering only the contents of bioactive compounds and the scavenging properties.
Resumo:
This study aimed at comparing both the results of wheat flour quality assessed by the new equipment Wheat Gluten Quality Analyser (WGQA) and those obtained by the extensigraph and farinograph. Fifty-nine wheat samples were evaluated for protein and gluten contents; the rheological properties of gluten and wheat flour were assessed using the WGQA and the extensigraph/farinograph methods, respectively, in addition to the baking test. Principal component analysis (PCA) and linear regression were used to evaluate the results. The parameters of energy and maximum resistance to extension determined by the extensigraph and WGQA showed an acceptable level for the linear correlation within the range from 0.6071 to 0.6511. The PCA results obtained using WGQA and the other rheological apparatus showed values similar to those expected for wheat flours in the baking test. Although all equipment used was effective in assessing the behavior of strong and weak flours, the results of medium strength wheat flour varied. WGQA has shown to use less amount of sample and to be faster and easier to use in relation to the other instruments used.
Resumo:
Remote sensing techniques involving hyperspectral imagery have applications in a number of sciences that study some aspects of the surface of the planet. The analysis of hyperspectral images is complex because of the large amount of information involved and the noise within that data. Investigating images with regard to identify minerals, rocks, vegetation and other materials is an application of hyperspectral remote sensing in the earth sciences. This thesis evaluates the performance of two classification and clustering techniques on hyperspectral images for mineral identification. Support Vector Machines (SVM) and Self-Organizing Maps (SOM) are applied as classification and clustering techniques, respectively. Principal Component Analysis (PCA) is used to prepare the data to be analyzed. The purpose of using PCA is to reduce the amount of data that needs to be processed by identifying the most important components within the data. A well-studied dataset from Cuprite, Nevada and a dataset of more complex data from Baffin Island were used to assess the performance of these techniques. The main goal of this research study is to evaluate the advantage of training a classifier based on a small amount of data compared to an unsupervised method. Determining the effect of feature extraction on the accuracy of the clustering and classification method is another goal of this research. This thesis concludes that using PCA increases the learning accuracy, and especially so in classification. SVM classifies Cuprite data with a high precision and the SOM challenges SVM on datasets with high level of noise (like Baffin Island).
Resumo:
Peu d’études ont évalué les caractéristiques des parcs pouvant encourager l’activité physique spécifiquement chez les jeunes. Cette étude vise à estimer la fiabilité d’un outil d’observation des parcs orienté vers les jeunes, à identifier les domaines conceptuels des parcs capturés par cet outil à l’aide d’une opérationnalisation du modèle conceptuel des parcs et de l’activité physique et à identifier différents types de parcs. Un total de 576 parcs ont été évalués en utilisant un outil d’évaluation des parcs. La fiabilité intra-juges et la fiabilité inter-juges de cet outil ont été estimées. Une analyse exploratoire par composantes principales (ACP) a été effectuée en utilisant une rotation orthogonale varimax et les variables étaient retenues si elles saturaient à ≥0.3 sur une composante. Une analyse par grappes (AG) à l’aide de la méthode de Ward a ensuite été réalisée en utilisant les composantes principales et une mesure de l’aire des parcs. L’outil était généralement fiable et l’ACP a permis d'identifier dix composantes principales qui expliquaient 60% de la variance totale. L’AG a donné un résultat de neuf grappes qui expliquaient 40% de la variance totale. Les méthodes de l’ACP et l’AG sont donc faisables avec des données de parcs. Les résultats ont été interprétés en utilisant l’opérationnalisation du modèle conceptuel.
Resumo:
Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.
Resumo:
In this paper an attempt has been made to determine the number of Premature Ventricular Contraction (PVC) cycles accurately from a given Electrocardiogram (ECG) using a wavelet constructed from multiple Gaussian functions. It is difficult to assess the ECGs of patients who are continuously monitored over a long period of time. Hence the proposed method of classification will be helpful to doctors to determine the severity of PVC in a patient. Principal Component Analysis (PCA) and a simple classifier have been used in addition to the specially developed wavelet transform. The proposed wavelet has been designed using multiple Gaussian functions which when summed up looks similar to that of a normal ECG. The number of Gaussians used depends on the number of peaks present in a normal ECG. The developed wavelet satisfied all the properties of a traditional continuous wavelet. The new wavelet was optimized using genetic algorithm (GA). ECG records from Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) database have been used for validation. Out of the 8694 ECG cycles used for evaluation, the classification algorithm responded with an accuracy of 97.77%. In order to compare the performance of the new wavelet, classification was also performed using the standard wavelets like morlet, meyer, bior3.9, db5, db3, sym3 and haar. The new wavelet outperforms the rest
Resumo:
Information and communication technology (ICT) projects have a great potential to revolutionise the information delivery system by bridging the gap between farmers and extension personnel. aAQUA (Almost All Questions Answered) portal was launched by the Developmental Informatics Laboratory (DIL) at Indian Institute of Technology (IIT) Mumbai, Maharashtra, India in 2003 as an information providing system to deliver technology options and tailored information for the problems and queries raised by Indian dairy farmers. To measure the effectiveness of this service the attitudinal dimensions of the users of aAQUA e-Agriservice were investigated using a 22 item scale. A simple random sampling technique was used to select 120 dairy farmers from which data were collected and subjected to factor analysis to identify the underlying constructs in this research. From the attitude items, four components were extracted and named as the pessimistic, utility, technical and efficacy perspective, which influenced the development of varied level of attitudinal inclination towards the e-Agriservice. These components explained 64.40 per cent of variation in the attitude of the users towards the aAQUA e-Agriservice. This study provides a framework for technically efficient service provision that might help to reduce the pessimistic attitude of target population to adopt e-Agriservice in their farming system. The results should also be helpful for researchers, academics, ICT based service providers and policy makers to consider these perspectives while planning and implementing ICT projects.
Resumo:
This paper presents a new paradigm for signal reconstruction and superresolution, Correlation Kernel Analysis (CKA), that is based on the selection of a sparse set of bases from a large dictionary of class- specific basis functions. The basis functions that we use are the correlation functions of the class of signals we are analyzing. To choose the appropriate features from this large dictionary, we use Support Vector Machine (SVM) regression and compare this to traditional Principal Component Analysis (PCA) for the tasks of signal reconstruction, superresolution, and compression. The testbed we use in this paper is a set of images of pedestrians. This paper also presents results of experiments in which we use a dictionary of multiscale basis functions and then use Basis Pursuit De-Noising to obtain a sparse, multiscale approximation of a signal. The results are analyzed and we conclude that 1) when used with a sparse representation technique, the correlation function is an effective kernel for image reconstruction and superresolution, 2) for image compression, PCA and SVM have different tradeoffs, depending on the particular metric that is used to evaluate the results, 3) in sparse representation techniques, L_1 is not a good proxy for the true measure of sparsity, L_0, and 4) the L_epsilon norm may be a better error metric for image reconstruction and compression than the L_2 norm, though the exact psychophysical metric should take into account high order structure in images.
Resumo:
Compositional data naturally arises from the scientific analysis of the chemical composition of archaeological material such as ceramic and glass artefacts. Data of this type can be explored using a variety of techniques, from standard multivariate methods such as principal components analysis and cluster analysis, to methods based upon the use of log-ratios. The general aim is to identify groups of chemically similar artefacts that could potentially be used to answer questions of provenance. This paper will demonstrate work in progress on the development of a documented library of methods, implemented using the statistical package R, for the analysis of compositional data. R is an open source package that makes available very powerful statistical facilities at no cost. We aim to show how, with the aid of statistical software such as R, traditional exploratory multivariate analysis can easily be used alongside, or in combination with, specialist techniques of compositional data analysis. The library has been developed from a core of basic R functionality, together with purpose-written routines arising from our own research (for example that reported at CoDaWork'03). In addition, we have included other appropriate publicly available techniques and libraries that have been implemented in R by other authors. Available functions range from standard multivariate techniques through to various approaches to log-ratio analysis and zero replacement. We also discuss and demonstrate a small selection of relatively new techniques that have hitherto been little-used in archaeometric applications involving compositional data. The application of the library to the analysis of data arising in archaeometry will be demonstrated; results from different analyses will be compared; and the utility of the various methods discussed
Resumo:
The use of perturbation and power transformation operations permits the investigation of linear processes in the simplex as in a vectorial space. When the investigated geochemical processes can be constrained by the use of well-known starting point, the eigenvectors of the covariance matrix of a non-centred principal component analysis allow to model compositional changes compared with a reference point. The results obtained for the chemistry of water collected in River Arno (central-northern Italy) have open new perspectives for considering relative changes of the analysed variables and to hypothesise the relative effect of different acting physical-chemical processes, thus posing the basis for a quantitative modelling