951 resultados para Principal component analysis (PCA)
Resumo:
Edaphic factors affect the quality of onions (Allium cepa). Two experiments were carried out in the field and glasshouse to investigate the effects of N (field: 0, 120 kg ha(-1); glasshouse: 0, 108 kg ha(-1)), S (field: 0, 20 kg ha(-1); glasshouse: 0, 4.35 kg ha(-1)) and soil type (clay, sandy loam) on onion quality. A conducting polymer sensor electronic nose (E-nose) was used to classify onion headspace volatiles. Relative changes in the E-nose sensor resistance ratio (%dR/R) were reduced following N and S fertilisation. A 2D Principal Component Analysis (PCA) of the E-nose data sets accounted for c. 100% of the variations in onion headspace volatiles in both experiments. For the field experiment, E-nose data set clusters for headspace volatiles for no N-added onions overlapped (D-2 = 1.0) irrespective of S treatment. Headspace volatiles of N-fertilised onions for the glasshouse sandy loam also overlapped (D-2 = 1.1) irrespective of S treatment as compared with distinct separations among clusters for the clay soil. N fertilisation significantly (P < 0.01) reduced onion bulb pyruvic acid concentration (flavour) in both experiments. S fertilisation increased pyruvic acid concentration significantly (P < 0.01) in the glasshouse experiment, especially for the clay soil, but had no effect on pyruvic acid concentration in the field. N and S fertilisation significantly (P < 0.01) increased lachrymatory potency (pungency), but reduced total soluble solids (TSS) content in the field experiment. In the glasshouse experiment, N and S had no effect on TSS. TSS content was increased on the clay by 1.2-fold as compared with the sandy loam. Onion tissue N:water-soluble SO42- ratios of between five and eight were associated with greater %dR/R and pyruvic acid concentration values. N did not affect inner bulb tissue microbial load. In contrast, S fertilisation reduced inner bulb tissue microbial load by 80% in the field experiment and between 27% (sandy loam) and 92% (clay) in the glasshouse experiment. Overall, onion bulb quality discriminated by the E-nose responded to N, S and soil type treatments, and reflected their interactions. However, the conventional analytical and sensory measures of onion quality did not correlate with %dR/R.
Resumo:
Biological wastewater treatment is a complex, multivariate process, in which a number of physical and biological processes occur simultaneously. In this study, principal component analysis (PCA) and parallel factor analysis (PARAFAC) were used to profile and characterise Lagoon 115E, a multistage biological lagoon treatment system at Melbourne Water's Western Treatment Plant (WTP) in Melbourne, Australia. In this study, the objective was to increase our understanding of the multivariate processes taking place in the lagoon. The data used in the study span a 7-year period during which samples were collected as often as weekly from the ponds of Lagoon 115E and subjected to analysis. The resulting database, involving 19 chemical and physical variables, was studied using the multivariate data analysis methods PCA and PARAFAC. With these methods, alterations in the state of the wastewater due to intrinsic and extrinsic factors could be discerned. The methods were effective in illustrating and visually representing the complex purification stages and cyclic changes occurring along the lagoon system. The two methods proved complementary, with each having its own beneficial features. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Genotype, sulphur (S) nutrition and soil-type effects on spring onion quality were assessed using a 32-conducting polymer sensor E-nose. Relative changes in sensor resistance ratio (% dR/R) varied among eight spring onion genotypes. The % dR/R was reduced by S application in four of the eight genotypes. For the other four genotypes, S application gave no change in % dR/R in three, and increased % dR/R in the other. E-nose classification of headspace volatiles by a two-dimensional principal component analysis (PCA) plot for spring onion genotypes differed for S fertilisation vs. no S fertilisation. Headspace volatiles data set clusters for cv. 'White Lisbon' grown on clay or on sandy loam overlapped when 2.9 [Mahalanobis distance value (D2) = 1.6], or 5.8-(D2 = 0.3) kg S ha-1 was added. In contrast, clear separation (D2 = 7.5) was recorded for headspace volatile clusters for 0 kg S hd-1 on clay vs. sandy loam. Addition of 5.8 kg S ha-1 increased pyruvic acid content (mmole g-1 fresh weight) by 1.7-fold on average across the eight genotypes. However, increased S from 2.9 to 5.8 kg ha-1 did not significantly (P > 0.05) influence % dR/R, % dry matter (DM) or total soluble solids (TSS) contents, but significantly (P < 0.05) increased pyruvic acid content. TSS was significantly (P < 0.05) reduced by S addition, while % DM was unaffected. In conclusion, the 32-conducting polymer E-nose discerned differences in spring onion quality that were attributable to genotype and to variations in growing conditions as shown by the significant (P < 0.05) interaction effects for % dR/R.
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.
Resumo:
Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).
Resumo:
The thesis presents new methodology and algorithms that can be used to analyse and measure the hand tremor and fatigue of surgeons while performing surgery. This will assist them in deriving useful information about their fatigue levels, and make them aware of the changes in their tool point accuracies. This thesis proposes that muscular changes of surgeons, which occur through a day of operating, can be monitored using Electromyography (EMG) signals. The multi-channel EMG signals are measured at different muscles in the upper arm of surgeons. The dependence of EMG signals has been examined to test the hypothesis that EMG signals are coupled with and dependent on each other. The results demonstrated that EMG signals collected from different channels while mimicking an operating posture are independent. Consequently, single channel fatigue analysis has been performed. In measuring hand tremor, a new method for determining the maximum tremor amplitude using Principal Component Analysis (PCA) and a new technique to detrend acceleration signals using Empirical Mode Decomposition algorithm were introduced. This tremor determination method is more representative for surgeons and it is suggested as an alternative fatigue measure. This was combined with the complexity analysis method, and applied to surgically captured data to determine if operating has an effect on a surgeon’s fatigue and tremor levels. It was found that surgical tremor and fatigue are developed throughout a day of operating and that this could be determined based solely on their initial values. Finally, several Nonlinear AutoRegressive with eXogenous inputs (NARX) neural networks were evaluated. The results suggest that it is possible to monitor surgeon tremor variations during surgery from their EMG fatigue measurements.
Resumo:
Guest editorial Ali Emrouznejad is a Senior Lecturer at the Aston Business School in Birmingham, UK. His areas of research interest include performance measurement and management, efficiency and productivity analysis as well as data mining. He has published widely in various international journals. He is an Associate Editor of IMA Journal of Management Mathematics and Guest Editor to several special issues of journals including Journal of Operational Research Society, Annals of Operations Research, Journal of Medical Systems, and International Journal of Energy Management Sector. He is in the editorial board of several international journals and co-founder of Performance Improvement Management Software. William Ho is a Senior Lecturer at the Aston University Business School. Before joining Aston in 2005, he had worked as a Research Associate in the Department of Industrial and Systems Engineering at the Hong Kong Polytechnic University. His research interests include supply chain management, production and operations management, and operations research. He has published extensively in various international journals like Computers & Operations Research, Engineering Applications of Artificial Intelligence, European Journal of Operational Research, Expert Systems with Applications, International Journal of Production Economics, International Journal of Production Research, Supply Chain Management: An International Journal, and so on. His first authored book was published in 2006. He is an Editorial Board member of the International Journal of Advanced Manufacturing Technology and an Associate Editor of the OR Insight Journal. Currently, he is a Scholar of the Advanced Institute of Management Research. Uses of frontier efficiency methodologies and multi-criteria decision making for performance measurement in the energy sector This special issue aims to focus on holistic, applied research on performance measurement in energy sector management and for publication of relevant applied research to bridge the gap between industry and academia. After a rigorous refereeing process, seven papers were included in this special issue. The volume opens with five data envelopment analysis (DEA)-based papers. Wu et al. apply the DEA-based Malmquist index to evaluate the changes in relative efficiency and the total factor productivity of coal-fired electricity generation of 30 Chinese administrative regions from 1999 to 2007. Factors considered in the model include fuel consumption, labor, capital, sulphur dioxide emissions, and electricity generated. The authors reveal that the east provinces were relatively and technically more efficient, whereas the west provinces had the highest growth rate in the period studied. Ioannis E. Tsolas applies the DEA approach to assess the performance of Greek fossil fuel-fired power stations taking undesirable outputs into consideration, such as carbon dioxide and sulphur dioxide emissions. In addition, the bootstrapping approach is deployed to address the uncertainty surrounding DEA point estimates, and provide bias-corrected estimations and confidence intervals for the point estimates. The author revealed from the sample that the non-lignite-fired stations are on an average more efficient than the lignite-fired stations. Maethee Mekaroonreung and Andrew L. Johnson compare the relative performance of three DEA-based measures, which estimate production frontiers and evaluate the relative efficiency of 113 US petroleum refineries while considering undesirable outputs. Three inputs (capital, energy consumption, and crude oil consumption), two desirable outputs (gasoline and distillate generation), and an undesirable output (toxic release) are considered in the DEA models. The authors discover that refineries in the Rocky Mountain region performed the best, and about 60 percent of oil refineries in the sample could improve their efficiencies further. H. Omrani, A. Azadeh, S. F. Ghaderi, and S. Abdollahzadeh presented an integrated approach, combining DEA, corrected ordinary least squares (COLS), and principal component analysis (PCA) methods, to calculate the relative efficiency scores of 26 Iranian electricity distribution units from 2003 to 2006. Specifically, both DEA and COLS are used to check three internal consistency conditions, whereas PCA is used to verify and validate the final ranking results of either DEA (consistency) or DEA-COLS (non-consistency). Three inputs (network length, transformer capacity, and number of employees) and two outputs (number of customers and total electricity sales) are considered in the model. Virendra Ajodhia applied three DEA-based models to evaluate the relative performance of 20 electricity distribution firms from the UK and the Netherlands. The first model is a traditional DEA model for analyzing cost-only efficiency. The second model includes (inverse) quality by modelling total customer minutes lost as an input data. The third model is based on the idea of using total social costs, including the firm’s private costs and the interruption costs incurred by consumers, as an input. Both energy-delivered and number of consumers are treated as the outputs in the models. After five DEA papers, Stelios Grafakos, Alexandros Flamos, Vlasis Oikonomou, and D. Zevgolis presented a multiple criteria analysis weighting approach to evaluate the energy and climate policy. The proposed approach is akin to the analytic hierarchy process, which consists of pairwise comparisons, consistency verification, and criteria prioritization. In the approach, stakeholders and experts in the energy policy field are incorporated in the evaluation process by providing an interactive mean with verbal, numerical, and visual representation of their preferences. A total of 14 evaluation criteria were considered and classified into four objectives, such as climate change mitigation, energy effectiveness, socioeconomic, and competitiveness and technology. Finally, Borge Hess applied the stochastic frontier analysis approach to analyze the impact of various business strategies, including acquisition, holding structures, and joint ventures, on a firm’s efficiency within a sample of 47 natural gas transmission pipelines in the USA from 1996 to 2005. The author finds that there were no significant changes in the firm’s efficiency by an acquisition, and there is a weak evidence for efficiency improvements caused by the new shareholder. Besides, the author discovers that parent companies appear not to influence a subsidiary’s efficiency positively. In addition, the analysis shows a negative impact of a joint venture on technical efficiency of the pipeline company. To conclude, we are grateful to all the authors for their contribution, and all the reviewers for their constructive comments, which made this special issue possible. We hope that this issue would contribute significantly to performance improvement of the energy sector.
Resumo:
Relationships among quality factors in retailed free-range, corn-fed, organic, and conventional chicken breasts (9) were modeled using chemometric approaches. Use of principal component analysis (PCA) to neutral lipid composition data explained the majority (93%) of variability (variance) in fatty acid contents in 2 significant multivariate factors. PCA explained 88 and 75% variance in 3 factors for, respectively, flame ionization detection (FID) and nitrogen phosphorus (NPD) components in chromatographic flavor data from cooked chicken after simultaneous distillation extraction. Relationships to tissue antioxidant contents were modeled. Partial least square regression (PLS2), interrelating total data matrices, provided no useful models. By using single antioxidants as Y variables in PLS (1), good models (r2 values > 0.9) were obtained for alpha-tocopherol, glutathione, catalase, glutathione peroxidase, and reductase and FID flavor components and among the variables total mono and polyunsaturated fatty acids and subsets of FID, and saturated fatty acid and NPD components. Alpha-tocopherol had a modest (r2 = 0.63) relationship with neutral lipid n-3 fatty acid content. Such factors thus relate to flavor development and quality in chicken breast meat.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
Principal component analysis (PCA) is well recognized in dimensionality reduction, and kernel PCA (KPCA) has also been proposed in statistical data analysis. However, KPCA fails to detect the nonlinear structure of data well when outliers exist. To reduce this problem, this paper presents a novel algorithm, named iterative robust KPCA (IRKPCA). IRKPCA works well in dealing with outliers, and can be carried out in an iterative manner, which makes it suitable to process incremental input data. As in the traditional robust PCA (RPCA), a binary field is employed for characterizing the outlier process, and the optimization problem is formulated as maximizing marginal distribution of a Gibbs distribution. In this paper, this optimization problem is solved by stochastic gradient descent techniques. In IRKPCA, the outlier process is in a high-dimensional feature space, and therefore kernel trick is used. IRKPCA can be regarded as a kernelized version of RPCA and a robust form of kernel Hebbian algorithm. Experimental results on synthetic data demonstrate the effectiveness of IRKPCA. © 2010 Taylor & Francis.
Resumo:
This dissertation establishes a novel system for human face learning and recognition based on incremental multilinear Principal Component Analysis (PCA). Most of the existing face recognition systems need training data during the learning process. The system as proposed in this dissertation utilizes an unsupervised or weakly supervised learning approach, in which the learning phase requires a minimal amount of training data. It also overcomes the inability of traditional systems to adapt to the testing phase as the decision process for the newly acquired images continues to rely on that same old training data set. Consequently when a new training set is to be used, the traditional approach will require that the entire eigensystem will have to be generated again. However, as a means to speed up this computational process, the proposed method uses the eigensystem generated from the old training set together with the new images to generate more effectively the new eigensystem in a so-called incremental learning process. In the empirical evaluation phase, there are two key factors that are essential in evaluating the performance of the proposed method: (1) recognition accuracy and (2) computational complexity. In order to establish the most suitable algorithm for this research, a comparative analysis of the best performing methods has been carried out first. The results of the comparative analysis advocated for the initial utilization of the multilinear PCA in our research. As for the consideration of the issue of computational complexity for the subspace update procedure, a novel incremental algorithm, which combines the traditional sequential Karhunen-Loeve (SKL) algorithm with the newly developed incremental modified fast PCA algorithm, was established. In order to utilize the multilinear PCA in the incremental process, a new unfolding method was developed to affix the newly added data at the end of the previous data. The results of the incremental process based on these two methods were obtained to bear out these new theoretical improvements. Some object tracking results using video images are also provided as another challenging task to prove the soundness of this incremental multilinear learning method.
Resumo:
A multivariate statistical analysis was applied to a 10 year, multiparameter data set in an effort to describe the spatial dependence and inherent variation of water quality patterns in the mangrove estuaries of Ten Thousand Islands – Whitewater Bay area. Principal component analysis (PCA) of 16 water quality parameters collected monthly resulted in five groupings, which explained 72.5% of the variance of the original variables. The “Organic” component (PCI) was composed of alkaline phosphatase activity, total organic nitrogen, and total organic carbon; the “Dissolved Inorganic N” component (PCII) contained NO 3 − , NO 2 − , and NH 4 + ; the “Phytoplankton” component (PCIII) was made up of total phosphorus, chlorophyll a, and turbidity; dissolved oxygen and temperature were inversely related (PCIV); and salinity and soluble reactive phosphorus made up PCV. A cluster analysis of the mean and SD of PC scores resulted in the spatial aggregation of the 47 fixed stations into six classes having similar water quality, which we defined as: Mangrove Rivers, Whitewater Bay, Gulf Islands, Coot Bay, Blackwater River, and Inland Waterway. Marked differences in physical, chemical, and biological characteristics among classes were illustrated by this technique. Comparison of medians and variability of parameters among classes allowed large scale generalizations as to underlying differences in water quality in these regions. A strong south to north gradient in estuaries from high N - low P to low N - high P was ascribed to marked differences in landuse, freshwater input, geomorphology, and sedimentary geology along this tract. The ecological significance of this gradient discussed along with potential effects of future restoration plans.
Resumo:
This dissertation develops an image processing framework with unique feature extraction and similarity measurements for human face recognition in the thermal mid-wave infrared portion of the electromagnetic spectrum. The goals of this research is to design specialized algorithms that would extract facial vasculature information, create a thermal facial signature and identify the individual. The objective is to use such findings in support of a biometrics system for human identification with a high degree of accuracy and a high degree of reliability. This last assertion is due to the minimal to no risk for potential alteration of the intrinsic physiological characteristics seen through thermal infrared imaging. The proposed thermal facial signature recognition is fully integrated and consolidates the main and critical steps of feature extraction, registration, matching through similarity measures, and validation through testing our algorithm on a database, referred to as C-X1, provided by the Computer Vision Research Laboratory at the University of Notre Dame. Feature extraction was accomplished by first registering the infrared images to a reference image using the functional MRI of the Brain’s (FMRIB’s) Linear Image Registration Tool (FLIRT) modified to suit thermal infrared images. This was followed by segmentation of the facial region using an advanced localized contouring algorithm applied on anisotropically diffused thermal images. Thermal feature extraction from facial images was attained by performing morphological operations such as opening and top-hat segmentation to yield thermal signatures for each subject. Four thermal images taken over a period of six months were used to generate thermal signatures and a thermal template for each subject, the thermal template contains only the most prevalent and consistent features. Finally a similarity measure technique was used to match signatures to templates and the Principal Component Analysis (PCA) was used to validate the results of the matching process. Thirteen subjects were used for testing the developed technique on an in-house thermal imaging system. The matching using an Euclidean-based similarity measure showed 88% accuracy in the case of skeletonized signatures and templates, we obtained 90% accuracy for anisotropically diffused signatures and templates. We also employed the Manhattan-based similarity measure and obtained an accuracy of 90.39% for skeletonized and diffused templates and signatures. It was found that an average 18.9% improvement in the similarity measure was obtained when using diffused templates. The Euclidean- and Manhattan-based similarity measure was also applied to skeletonized signatures and templates of 25 subjects in the C-X1 database. The highly accurate results obtained in the matching process along with the generalized design process clearly demonstrate the ability of the thermal infrared system to be used on other thermal imaging based systems and related databases. A novel user-initialization registration of thermal facial images has been successfully implemented. Furthermore, the novel approach at developing a thermal signature template using four images taken at various times ensured that unforeseen changes in the vasculature did not affect the biometric matching process as it relied on consistent thermal features.