910 resultados para principal coordinates analysis
Resumo:
Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
The use of quantitative methods has become increasingly important in the study of neurodegenerative disease. Disorders such as Alzheimer's disease (AD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This article reviews the advantages and limitations of the different methods of quantifying the abundance of pathological lesions in histological sections, including estimates of density, frequency, coverage, and the use of semiquantitative scores. The major sampling methods by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are also described. In addition, the data analysis methods commonly used to analyse quantitative data in neuropathology, including analyses of variance (ANOVA) and principal components analysis (PCA), are discussed. These methods are illustrated with reference to particular problems in the pathological diagnosis of AD and dementia with Lewy bodies (DLB).
Resumo:
Plasmid constitutions of Aeromonas salmonicida isolates were characterised by flat-bed and pulsed field gel electrophoresis. Resolution of plasmids by pulsed field gel electrophoresis was greater and more consistent than that achieved by flat-bed gel electrophoresis. The number of plasmids separated by pulsed field gel electrophoresis varied between A. salmonicida isolates, with five being the most common number present in the isolates used in this study. Plasmid profiles were diverse and the reproducibility of the distances migrated facilitated the use of principal components analysis for the characterisation of the isolates. Isolates were grouped according to the number of plasmids supported. Further principal components analysis of groups of isolates supporting five and seven plasmids showed a spatial separation of plasmids based upon distance migrated. Principal components analysis of plasmid profiles and antimicrobial minimum inhibitory concentrations could not be correlated suggesting that resistance to antimicrobial agents is not associated with either one plasmid or a particular plasmid constitution.
Resumo:
This book is aimed primarily at microbiologists who are undertaking research and who require a basic knowledge of statistics to analyse their experimental data. Computer software employing a wide range of data analysis methods is widely available to experimental scientists. The availability of this software, however, makes it essential that investigators understand the basic principles of statistics. Statistical analysis of data can be complex with many different methods of approach, each of which applies in a particular experimental circumstance. Hence, it is possible to apply an incorrect statistical method to data and to draw the wrong conclusions from an experiment. The purpose of this book, which has its origin in a series of articles published in the Society for Applied Microbiology journal ‘The Microbiologist’, is an attempt to present the basic logic of statistics as clearly as possible and therefore, to dispel some of the myths that often surround the subject. The 28 ‘Statnotes’ deal with various topics that are likely to be encountered, including the nature of variables, the comparison of means of two or more groups, non-parametric statistics, analysis of variance, correlating variables, and more complex methods such as multiple linear regression and principal components analysis. In each case, the relevant statistical method is illustrated with examples drawn from experiments in microbiological research. The text incorporates a glossary of the most commonly used statistical terms and there are two appendices designed to aid the investigator in the selection of the most appropriate test.
Resumo:
The pattern of correlation between two sets of variables can be tested using canonical variate analysis (CVA). CVA, like principal components analysis (PCA) and factor analysis (FA) (Statnote 27, Hilton & Armstrong, 2011b), is a multivariate analysis Essentially, as in PCA/FA, the objective is to determine whether the correlations between two sets of variables can be explained by a smaller number of ‘axes of correlation’ or ‘canonical roots’.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Resumo:
The use of quantitative methods has become increasingly important in the study of neuropathology and especially in neurodegenerative disease. Disorders such as Alzheimer's disease (AD) and the frontotemporal dementias (FTD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This chapter reviews the advantages and limitations of the different methods of quantifying pathological lesions in histological sections including estimates of density, frequency, coverage, and the use of semi-quantitative scores. The sampling strategies by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are described. In addition, data analysis methods commonly used to analysis quantitative data in neuropathology, including analysis of variance (ANOVA), polynomial curve fitting, multiple regression, classification trees, and principal components analysis (PCA), are discussed. These methods are illustrated with reference to quantitative studies of a variety of neurodegenerative disorders.
Resumo:
The elemental analysis of soil is useful in forensic and environmental sciences. Methods were developed and optimized for two laser-based multi-element analysis techniques: laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS). This work represents the first use of a 266 nm laser for forensic soil analysis by LIBS. Sample preparation methods were developed and optimized for a variety of sample types, including pellets for large bulk soil specimens (470 mg) and sediment-laden filters (47 mg), and tape-mounting for small transfer evidence specimens (10 mg). Analytical performance for sediment filter pellets and tape-mounted soils was similar to that achieved with bulk pellets. An inter-laboratory comparison exercise was designed to evaluate the performance of the LA-ICP-MS and LIBS methods, as well as for micro X-ray fluorescence (μXRF), across multiple laboratories. Limits of detection (LODs) were 0.01-23 ppm for LA-ICP-MS, 0.25-574 ppm for LIBS, 16-4400 ppm for μXRF, and well below the levels normally seen in soils. Good intra-laboratory precision (≤ 6 % relative standard deviation (RSD) for LA-ICP-MS; ≤ 8 % for μXRF; ≤ 17 % for LIBS) and inter-laboratory precision (≤ 19 % for LA-ICP-MS; ≤ 25 % for μXRF) were achieved for most elements, which is encouraging for a first inter-laboratory exercise. While LIBS generally has higher LODs and RSDs than LA-ICP-MS, both were capable of generating good quality multi-element data sufficient for discrimination purposes. Multivariate methods using principal components analysis (PCA) and linear discriminant analysis (LDA) were developed for discriminations of soils from different sources. Specimens from different sites that were indistinguishable by color alone were discriminated by elemental analysis. Correct classification rates of 94.5 % or better were achieved in a simulated forensic discrimination of three similar sites for both LIBS and LA-ICP-MS. Results for tape-mounted specimens were nearly identical to those achieved with pellets. Methods were tested on soils from USA, Canada and Tanzania. Within-site heterogeneity was site-specific. Elemental differences were greatest for specimens separated by large distances, even within the same lithology. Elemental profiles can be used to discriminate soils from different locations and narrow down locations even when mineralogy is similar.
Resumo:
This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
We formally compare fundamental factor and latent factor approaches to oil price modelling. Fundamental modelling has a long history in seeking to understand oil price movements, while latent factor modelling has a more recent and limited history, but has gained popularity in other financial markets. The two approaches, though competing, have not formally been compared as to effectiveness. For a range of short- medium- and long-dated WTI oil futures we test a recently proposed five-factor fundamental model and a Principal Component Analysis latent factor model. Our findings demonstrate that there is no discernible difference between the two techniques in a dynamic setting. We conclude that this infers some advantages in adopting the latent factor approach due to the difficulty in determining a well specified fundamental model.
Resumo:
Phenotypic variation in plants can be evaluated by morphological characterization using visual attributes. Fruits have been the major descriptors for identification of different varieties of fruit crops. However, even in their absence, farmers, breeders and interested stakeholders require to distinguish between different mango varieties. This study aimed at determining diversity in mango germplasm from the Upper Athi River (UAR) and providing useful alternative descriptors for the identification of different mango varieties in the absence of fruits. A total of 20 International Plant Genetic Resources Institute (IPGRI) descriptors for mango were selected for use in the visual assessment of 98 mango accessions from 15 sites of the UAR region of eastern Kenya. Purposive sampling was used to identify farmers growing diverse varieties of mangoes. Evaluation of the descriptors was performed on-site and the data collected were then subjected to multivariate analysis including Principal Component Analysis (PCA) and Cluster analysis, one- way analysis of variance (ANOVA) and Chi square tests. Results classified the accessions into two major groups corresponding to indigenous and exotic varieties. The PCA showed the first six principal components accounting for 75.12% of the total variance. A strong and highly significant correlation was observed between the color of fully grown leaves, leaf blade width, leaf blade length and petiole length and also between the leaf attitude, color of young leaf, stem circumference, tree height, leaf margin, growth habit and fragrance. Useful descriptors for morphological evaluation were 14 out of the selected 20; however, ANOVA and Chi square test revealed that diversity in the accessions was majorly as a result of variations in color of young leaves, leaf attitude, leaf texture, growth habit, leaf blade length, leaf blade width and petiole length traits. These results reveal that mango germplasm in the UAR has significant diversity and that other morphological traits apart from fruits can be useful in morphological characterization of mango.
Resumo:
The elemental analysis of soil is useful in forensic and environmental sciences. Methods were developed and optimized for two laser-based multi-element analysis techniques: laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS). This work represents the first use of a 266 nm laser for forensic soil analysis by LIBS. Sample preparation methods were developed and optimized for a variety of sample types, including pellets for large bulk soil specimens (470 mg) and sediment-laden filters (47 mg), and tape-mounting for small transfer evidence specimens (10 mg). Analytical performance for sediment filter pellets and tape-mounted soils was similar to that achieved with bulk pellets. An inter-laboratory comparison exercise was designed to evaluate the performance of the LA-ICP-MS and LIBS methods, as well as for micro X-ray fluorescence (μXRF), across multiple laboratories. Limits of detection (LODs) were 0.01-23 ppm for LA-ICP-MS, 0.25-574 ppm for LIBS, 16-4400 ppm for µXRF, and well below the levels normally seen in soils. Good intra-laboratory precision (≤ 6 % relative standard deviation (RSD) for LA-ICP-MS; ≤ 8 % for µXRF; ≤ 17 % for LIBS) and inter-laboratory precision (≤ 19 % for LA-ICP-MS; ≤ 25 % for µXRF) were achieved for most elements, which is encouraging for a first inter-laboratory exercise. While LIBS generally has higher LODs and RSDs than LA-ICP-MS, both were capable of generating good quality multi-element data sufficient for discrimination purposes. Multivariate methods using principal components analysis (PCA) and linear discriminant analysis (LDA) were developed for discriminations of soils from different sources. Specimens from different sites that were indistinguishable by color alone were discriminated by elemental analysis. Correct classification rates of 94.5 % or better were achieved in a simulated forensic discrimination of three similar sites for both LIBS and LA-ICP-MS. Results for tape-mounted specimens were nearly identical to those achieved with pellets. Methods were tested on soils from USA, Canada and Tanzania. Within-site heterogeneity was site-specific. Elemental differences were greatest for specimens separated by large distances, even within the same lithology. Elemental profiles can be used to discriminate soils from different locations and narrow down locations even when mineralogy is similar.
Resumo:
The comparative study based on spectroscopic analysis of the materials used to produce four sixteenth-century Manueline Charters (the Charters of Alcochete, Terena, Alandroal and Evora) was performed following a systematic analytical approach. SEM–EDS, l-Raman and l-FTIR analysis highlighted interesting features between them, namely the use of different pigments and colourants (such as different green and yellow pigments), the presence of pigments alterations and the use of a non-expected extemporaneous material (with the presence of titanium white in the Charter of Alcochete). Principal component analysis restricted to the C–H absorption region (3000–2840 cm-1) was applied to 36 infrared spectra of blue historical samples from the Charters of Alcochete,Terena, Alandroal and Évora, suggesting the use of a mixture of a triglyceride and polysaccharide as binder.