37 resultados para principal component analysis (PCA)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis presents new methodology and algorithms that can be used to analyse and measure the hand tremor and fatigue of surgeons while performing surgery. This will assist them in deriving useful information about their fatigue levels, and make them aware of the changes in their tool point accuracies. This thesis proposes that muscular changes of surgeons, which occur through a day of operating, can be monitored using Electromyography (EMG) signals. The multi-channel EMG signals are measured at different muscles in the upper arm of surgeons. The dependence of EMG signals has been examined to test the hypothesis that EMG signals are coupled with and dependent on each other. The results demonstrated that EMG signals collected from different channels while mimicking an operating posture are independent. Consequently, single channel fatigue analysis has been performed. In measuring hand tremor, a new method for determining the maximum tremor amplitude using Principal Component Analysis (PCA) and a new technique to detrend acceleration signals using Empirical Mode Decomposition algorithm were introduced. This tremor determination method is more representative for surgeons and it is suggested as an alternative fatigue measure. This was combined with the complexity analysis method, and applied to surgically captured data to determine if operating has an effect on a surgeon’s fatigue and tremor levels. It was found that surgical tremor and fatigue are developed throughout a day of operating and that this could be determined based solely on their initial values. Finally, several Nonlinear AutoRegressive with eXogenous inputs (NARX) neural networks were evaluated. The results suggest that it is possible to monitor surgeon tremor variations during surgery from their EMG fatigue measurements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Guest editorial Ali Emrouznejad is a Senior Lecturer at the Aston Business School in Birmingham, UK. His areas of research interest include performance measurement and management, efficiency and productivity analysis as well as data mining. He has published widely in various international journals. He is an Associate Editor of IMA Journal of Management Mathematics and Guest Editor to several special issues of journals including Journal of Operational Research Society, Annals of Operations Research, Journal of Medical Systems, and International Journal of Energy Management Sector. He is in the editorial board of several international journals and co-founder of Performance Improvement Management Software. William Ho is a Senior Lecturer at the Aston University Business School. Before joining Aston in 2005, he had worked as a Research Associate in the Department of Industrial and Systems Engineering at the Hong Kong Polytechnic University. His research interests include supply chain management, production and operations management, and operations research. He has published extensively in various international journals like Computers & Operations Research, Engineering Applications of Artificial Intelligence, European Journal of Operational Research, Expert Systems with Applications, International Journal of Production Economics, International Journal of Production Research, Supply Chain Management: An International Journal, and so on. His first authored book was published in 2006. He is an Editorial Board member of the International Journal of Advanced Manufacturing Technology and an Associate Editor of the OR Insight Journal. Currently, he is a Scholar of the Advanced Institute of Management Research. Uses of frontier efficiency methodologies and multi-criteria decision making for performance measurement in the energy sector This special issue aims to focus on holistic, applied research on performance measurement in energy sector management and for publication of relevant applied research to bridge the gap between industry and academia. After a rigorous refereeing process, seven papers were included in this special issue. The volume opens with five data envelopment analysis (DEA)-based papers. Wu et al. apply the DEA-based Malmquist index to evaluate the changes in relative efficiency and the total factor productivity of coal-fired electricity generation of 30 Chinese administrative regions from 1999 to 2007. Factors considered in the model include fuel consumption, labor, capital, sulphur dioxide emissions, and electricity generated. The authors reveal that the east provinces were relatively and technically more efficient, whereas the west provinces had the highest growth rate in the period studied. Ioannis E. Tsolas applies the DEA approach to assess the performance of Greek fossil fuel-fired power stations taking undesirable outputs into consideration, such as carbon dioxide and sulphur dioxide emissions. In addition, the bootstrapping approach is deployed to address the uncertainty surrounding DEA point estimates, and provide bias-corrected estimations and confidence intervals for the point estimates. The author revealed from the sample that the non-lignite-fired stations are on an average more efficient than the lignite-fired stations. Maethee Mekaroonreung and Andrew L. Johnson compare the relative performance of three DEA-based measures, which estimate production frontiers and evaluate the relative efficiency of 113 US petroleum refineries while considering undesirable outputs. Three inputs (capital, energy consumption, and crude oil consumption), two desirable outputs (gasoline and distillate generation), and an undesirable output (toxic release) are considered in the DEA models. The authors discover that refineries in the Rocky Mountain region performed the best, and about 60 percent of oil refineries in the sample could improve their efficiencies further. H. Omrani, A. Azadeh, S. F. Ghaderi, and S. Abdollahzadeh presented an integrated approach, combining DEA, corrected ordinary least squares (COLS), and principal component analysis (PCA) methods, to calculate the relative efficiency scores of 26 Iranian electricity distribution units from 2003 to 2006. Specifically, both DEA and COLS are used to check three internal consistency conditions, whereas PCA is used to verify and validate the final ranking results of either DEA (consistency) or DEA-COLS (non-consistency). Three inputs (network length, transformer capacity, and number of employees) and two outputs (number of customers and total electricity sales) are considered in the model. Virendra Ajodhia applied three DEA-based models to evaluate the relative performance of 20 electricity distribution firms from the UK and the Netherlands. The first model is a traditional DEA model for analyzing cost-only efficiency. The second model includes (inverse) quality by modelling total customer minutes lost as an input data. The third model is based on the idea of using total social costs, including the firm’s private costs and the interruption costs incurred by consumers, as an input. Both energy-delivered and number of consumers are treated as the outputs in the models. After five DEA papers, Stelios Grafakos, Alexandros Flamos, Vlasis Oikonomou, and D. Zevgolis presented a multiple criteria analysis weighting approach to evaluate the energy and climate policy. The proposed approach is akin to the analytic hierarchy process, which consists of pairwise comparisons, consistency verification, and criteria prioritization. In the approach, stakeholders and experts in the energy policy field are incorporated in the evaluation process by providing an interactive mean with verbal, numerical, and visual representation of their preferences. A total of 14 evaluation criteria were considered and classified into four objectives, such as climate change mitigation, energy effectiveness, socioeconomic, and competitiveness and technology. Finally, Borge Hess applied the stochastic frontier analysis approach to analyze the impact of various business strategies, including acquisition, holding structures, and joint ventures, on a firm’s efficiency within a sample of 47 natural gas transmission pipelines in the USA from 1996 to 2005. The author finds that there were no significant changes in the firm’s efficiency by an acquisition, and there is a weak evidence for efficiency improvements caused by the new shareholder. Besides, the author discovers that parent companies appear not to influence a subsidiary’s efficiency positively. In addition, the analysis shows a negative impact of a joint venture on technical efficiency of the pipeline company. To conclude, we are grateful to all the authors for their contribution, and all the reviewers for their constructive comments, which made this special issue possible. We hope that this issue would contribute significantly to performance improvement of the energy sector.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relationships among quality factors in retailed free-range, corn-fed, organic, and conventional chicken breasts (9) were modeled using chemometric approaches. Use of principal component analysis (PCA) to neutral lipid composition data explained the majority (93%) of variability (variance) in fatty acid contents in 2 significant multivariate factors. PCA explained 88 and 75% variance in 3 factors for, respectively, flame ionization detection (FID) and nitrogen phosphorus (NPD) components in chromatographic flavor data from cooked chicken after simultaneous distillation extraction. Relationships to tissue antioxidant contents were modeled. Partial least square regression (PLS2), interrelating total data matrices, provided no useful models. By using single antioxidants as Y variables in PLS (1), good models (r2 values > 0.9) were obtained for alpha-tocopherol, glutathione, catalase, glutathione peroxidase, and reductase and FID flavor components and among the variables total mono and polyunsaturated fatty acids and subsets of FID, and saturated fatty acid and NPD components. Alpha-tocopherol had a modest (r2 = 0.63) relationship with neutral lipid n-3 fatty acid content. Such factors thus relate to flavor development and quality in chicken breast meat.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is well recognized in dimensionality reduction, and kernel PCA (KPCA) has also been proposed in statistical data analysis. However, KPCA fails to detect the nonlinear structure of data well when outliers exist. To reduce this problem, this paper presents a novel algorithm, named iterative robust KPCA (IRKPCA). IRKPCA works well in dealing with outliers, and can be carried out in an iterative manner, which makes it suitable to process incremental input data. As in the traditional robust PCA (RPCA), a binary field is employed for characterizing the outlier process, and the optimization problem is formulated as maximizing marginal distribution of a Gibbs distribution. In this paper, this optimization problem is solved by stochastic gradient descent techniques. In IRKPCA, the outlier process is in a high-dimensional feature space, and therefore kernel trick is used. IRKPCA can be regarded as a kernelized version of RPCA and a robust form of kernel Hebbian algorithm. Experimental results on synthetic data demonstrate the effectiveness of IRKPCA. © 2010 Taylor & Francis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two contrasting multivariate statistical methods, viz., principal components analysis (PCA) and cluster analysis were applied to the study of neuropathological variations between cases of Alzheimer's disease (AD). To compare the two methods, 78 cases of AD were analyzed, each characterised by measurements of 47 neuropathological variables. Both methods of analysis revealed significant variations between AD cases. These variations were related primarily to differences in the distribution and abundance of senile plaques (SP) and neurofibrillary tangles (NFT) in the brain. Cluster analysis classified the majority of AD cases into five groups which could represent subtypes of AD. However, PCA suggested that variation between cases was more continuous with no distinct subtypes. Hence, PCA may be a more appropriate method than cluster analysis in the study of neuropathological variations between AD cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new principled domain independent watermarking framework is presented. The new approach is based on embedding the message in statistically independent sources of the covertext to mimimise covertext distortion, maximise the information embedding rate and improve the method's robustness against various attacks. Experiments comparing the performance of the new approach, on several standard attacks show the current proposed approach to be competitive with other state of the art domain-specific methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel approach to watermarking of audio signals using Independent Component Analysis (ICA) is proposed. It exploits the statistical independence of components obtained by practical ICA algorithms to provide a robust watermarking scheme with high information rate and low distortion. Numerical simulations have been performed on audio signals, showing good robustness of the watermark against common attacks with unnoticeable distortion, even for high information rates. An important aspect of the method is its domain independence: it can be used to hide information in other types of data, with minor technical adaptations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of quantitative methods has become increasingly important in the study of neurodegenerative disease. Disorders such as Alzheimer's disease (AD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This article reviews the advantages and limitations of the different methods of quantifying the abundance of pathological lesions in histological sections, including estimates of density, frequency, coverage, and the use of semiquantitative scores. The major sampling methods by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are also described. In addition, the data analysis methods commonly used to analyse quantitative data in neuropathology, including analyses of variance (ANOVA) and principal components analysis (PCA), are discussed. These methods are illustrated with reference to particular problems in the pathological diagnosis of AD and dementia with Lewy bodies (DLB).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective of this work was to explore the performance of a recently introduced source extraction method, FSS (Functional Source Separation), in recovering induced oscillatory change responses from extra-cephalic magnetoencephalographic (MEG) signals. Unlike algorithms used to solve the inverse problem, FSS does not make any assumption about the underlying biophysical source model; instead, it makes use of task-related features (functional constraints) to estimate source/s of interest. FSS was compared with blind source separation (BSS) approaches such as Principal and Independent Component Analysis, PCA and ICA, which are not subject to any explicit forward solution or functional constraint, but require source uncorrelatedness (PCA), or independence (ICA). A visual MEG experiment with signals recorded from six subjects viewing a set of static horizontal black/white square-wave grating patterns at different spatial frequencies was analyzed. The beamforming technique Synthetic Aperture Magnetometry (SAM) was applied to localize task-related sources; obtained spatial filters were used to automatically select BSS and FSS components in the spatial area of interest. Source spectral properties were investigated by using Morlet-wavelet time-frequency representations and significant task-induced changes were evaluated by means of a resampling technique; the resulting spectral behaviours in the gamma frequency band of interest (20-70 Hz), as well as the spatial frequency-dependent gamma reactivity, were quantified and compared among methods. Among the tested approaches, only FSS was able to estimate the expected sustained gamma activity enhancement in primary visual cortex, throughout the whole duration of the stimulus presentation for all subjects, and to obtain sources comparable to invasively recorded data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Principal Components Analysis of neuropathological data from 79 Alzheimer’s disease (AD) cases was performed to determine whether there was evidence for subtypes of the disease. Two principal components were extracted from the data which accounted for 72% and 12% of the total variance respectively. The results suggested that 1) AD was heterogeneous but subtypes could not be clearly defined; 2) the heterogeneity, in part, reflected disease onset; 3) familial cases did not constitute a distinct subtype of AD and 4) there were two forms of late onset AD, one of which was associated with less senile plaque and neurofibrillary tangle development but with a greater degree of brain atherosclerosis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The abundance of senile plaques (SP) and neurofibrillary tangles (NFT) was studied in cortical and subcortical regions from 30 patients with Alzheimer’s disease (AD) expressing different apolipoprotein E (apoE) genotypes. Principal components analysis (PCA) was used to identify the most important neuropathological variations between individual patients and to determine whether these variations were related to apoE genotype. The first two principal components (PC) accounted for 60% and 40% of the total variance of the SP and NFT data respectively. The abundance of SP in the frontal and occipital cortex and NFT in the frontal cortex, amygdala and substantia nigra were positively correlated with the first principal component (PC1). Analysis of the SP data revealed that the apoE score of the patient (the sum of the two alleles) was positively correlated with PC1 while analysis of the NFT data revealed no significant correlations between apoE score and the PC. The data suggest that apoE genotype was more closely related to variations in the distribution and abundance of SP than of NFT. In addition, a more rapid spread of SP into the frontal and occipital cortex may occur in patients with a high apoE score.