7 resultados para Process visualisation
em Aston University Research Archive
Resumo:
Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures and relationships in the dataset. However, a single two-dimensional visualisation may not display all the intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model (MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein electrostatic potentials for the ‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases, visual observation and the quantitative quality measures have shown better visualisation at lower levels.
Resumo:
In recent years there has been a great effort to combine the technologies and techniques of GIS and process models. This project examines the issues of linking a standard current generation 2½d GIS with several existing model codes. The focus for the project has been the Shropshire Groundwater Scheme, which is being developed to augment flow in the River Severn during drought periods by pumping water from the Shropshire Aquifer. Previous authors have demonstrated that under certain circumstances pumping could reduce the soil moisture available for crops. This project follows earlier work at Aston in which the effects of drawdown were delineated and quantified through the development of a software package that implemented a technique which brought together the significant spatially varying parameters. This technique is repeated here, but using a standard GIS called GRASS. The GIS proved adequate for the task and the added functionality provided by the general purpose GIS - the data capture, manipulation and visualisation facilities - were of great benefit. The bulk of the project is concerned with examining the issues of the linkage of GIS and environmental process models. To this end a groundwater model (Modflow) and a soil moisture model (SWMS2D) were linked to the GIS and a crop model was implemented within the GIS. A loose-linked approach was adopted and secondary and surrogate data were used wherever possible. The implications of which relate to; justification of a loose-linked versus a closely integrated approach; how, technically, to achieve the linkage; how to reconcile the different data models used by the GIS and the process models; control of the movement of data between models of environmental subsystems, to model the total system; the advantages and disadvantages of using a current generation GIS as a medium for linking environmental process models; generation of input data, including the use of geostatistic, stochastic simulation, remote sensing, regression equations and mapped data; issues of accuracy, uncertainty and simply providing adequate data for the complex models; how such a modelling system fits into an organisational framework.
Resumo:
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
The focus of this thesis is the extension of topographic visualisation mappings to allow for the incorporation of uncertainty. Few visualisation algorithms in the literature are capable of mapping uncertain data with fewer able to represent observation uncertainties in visualisations. As such, modifications are made to NeuroScale, Locally Linear Embedding, Isomap and Laplacian Eigenmaps to incorporate uncertainty in the observation and visualisation spaces. The proposed mappings are then called Normally-distributed NeuroScale (N-NS), T-distributed NeuroScale (T-NS), Probabilistic LLE (PLLE), Probabilistic Isomap (PIso) and Probabilistic Weighted Neighbourhood Mapping (PWNM). These algorithms generate a probabilistic visualisation space with each latent visualised point transformed to a multivariate Gaussian or T-distribution, using a feed-forward RBF network. Two types of uncertainty are then characterised dependent on the data and mapping procedure. Data dependent uncertainty is the inherent observation uncertainty. Whereas, mapping uncertainty is defined by the Fisher Information of a visualised distribution. This indicates how well the data has been interpolated, offering a level of ‘surprise’ for each observation. These new probabilistic mappings are tested on three datasets of vectorial observations and three datasets of real world time series observations for anomaly detection. In order to visualise the time series data, a method for analysing observed signals and noise distributions, Residual Modelling, is introduced. The performance of the new algorithms on the tested datasets is compared qualitatively with the latent space generated by the Gaussian Process Latent Variable Model (GPLVM). A quantitative comparison using existing evaluation measures from the literature allows performance of each mapping function to be compared. Finally, the mapping uncertainty measure is combined with NeuroScale to build a deep learning classifier, the Cascading RBF. This new structure is tested on the MNist dataset achieving world record performance whilst avoiding the flaws seen in other Deep Learning Machines.
Resumo:
Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.
Resumo:
Atomisation of an aqueous solution for tablet film coating is a complex process with multiple factors determining droplet formation and properties. The importance of droplet size for an efficient process and a high quality final product has been noted in the literature, with smaller droplets reported to produce smoother, more homogenous coatings whilst simultaneously avoiding the risk of damage through over-wetting of the tablet core. In this work the effect of droplet size on tablet film coat characteristics was investigated using X-ray microcomputed tomography (XμCT) and confocal laser scanning microscopy (CLSM). A quality by design approach utilising design of experiments (DOE) was used to optimise the conditions necessary for production of droplets at a small (20 μm) and large (70 μm) droplet size. Droplet size distribution was measured using real-time laser diffraction and the volume median diameter taken as a response. DOE yielded information on the relationship three critical process parameters: pump rate, atomisation pressure and coating-polymer concentration, had upon droplet size. The model generated was robust, scoring highly for model fit (R2 = 0.977), predictability (Q2 = 0.837), validity and reproducibility. Modelling confirmed that all parameters had either a linear or quadratic effect on droplet size and revealed an interaction between pump rate and atomisation pressure. Fluidised bed coating of tablet cores was performed with either small or large droplets followed by CLSM and XμCT imaging. Addition of commonly used contrast materials to the coating solution improved visualisation of the coating by XμCT, showing the coat as a discrete section of the overall tablet. Imaging provided qualitative and quantitative evidence revealing that smaller droplets formed thinner, more uniform and less porous film coats.