39 resultados para visualisation of acoustic data
em Aston University Research Archive
Resumo:
Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
Population measures for genetic programs are defined and analysed in an attempt to better understand the behaviour of genetic programming. Some measures are simple, but do not provide sufficient insight. The more meaningful ones are complex and take extra computation time. Here we present a unified view on the computation of population measures through an information hypertree (iTree). The iTree allows for a unified and efficient calculation of population measures via a basic tree traversal. © Springer-Verlag 2004.
Resumo:
The evaluation of geospatial data quality and trustworthiness presents a major challenge to geospatial data users when making a dataset selection decision. The research presented here therefore focused on defining and developing a GEO label – a decision support mechanism to assist data users in efficient and effective geospatial dataset selection on the basis of quality, trustworthiness and fitness for use. This thesis thus presents six phases of research and development conducted to: (a) identify the informational aspects upon which users rely when assessing geospatial dataset quality and trustworthiness; (2) elicit initial user views on the GEO label role in supporting dataset comparison and selection; (3) evaluate prototype label visualisations; (4) develop a Web service to support GEO label generation; (5) develop a prototype GEO label-based dataset discovery and intercomparison decision support tool; and (6) evaluate the prototype tool in a controlled human-subject study. The results of the studies revealed, and subsequently confirmed, eight geospatial data informational aspects that were considered important by users when evaluating geospatial dataset quality and trustworthiness, namely: producer information, producer comments, lineage information, compliance with standards, quantitative quality information, user feedback, expert reviews, and citations information. Following an iterative user-centred design (UCD) approach, it was established that the GEO label should visually summarise availability and allow interrogation of these key informational aspects. A Web service was developed to support generation of dynamic GEO label representations and integrated into a number of real-world GIS applications. The service was also utilised in the development of the GEO LINC tool – a GEO label-based dataset discovery and intercomparison decision support tool. The results of the final evaluation study indicated that (a) the GEO label effectively communicates the availability of dataset quality and trustworthiness information and (b) GEO LINC successfully facilitates ‘at a glance’ dataset intercomparison and fitness for purpose-based dataset selection.
Resumo:
We analyse how the Generative Topographic Mapping (GTM) can be modified to cope with missing values in the training data. Our approach is based on an Expectation -Maximisation (EM) method which estimates the parameters of the mixture components and at the same time deals with the missing values. We incorporate this algorithm into a hierarchical GTM. We verify the method on a toy data set (using a single GTM) and a realistic data set (using a hierarchical GTM). The results show our algorithm can help to construct informative visualisation plots, even when some of the training points are corrupted with missing values.
Resumo:
Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.
Resumo:
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Resumo:
A simple technique is presented for improving the robustness of the n-tuple recognition method against inauspicious choices of architectural parameters, guarding against the saturation problem, and improving the utilisation of small data sets. Experiments are reported which confirm that the method significantly improves performance and reduces saturation in character recognition problems.
Resumo:
AIMS To demonstrate the potential use of in vitro poly(lactic-co-glycolic acid) (PLGA) microparticles in comparison with triamcinolone suspension to aid visualisation of vitreous during anterior and posterior vitrectomy. METHODS PLGA microparticles (diameter 10-60 microm) were fabricated using single and/or double emulsion technique(s) and used untreated or following the surface adsorption of a protein (transglutaminase). Particle size, shape, morphology and surface topography were assessed using scanning electron microscopy (SEM) and compared with a standard triamcinolone suspension. The efficacy of these microparticles to enhance visualisation of vitreous against the triamcinolone suspension was assessed using an in vitro set-up exploiting porcine vitreous. RESULTS Unmodified PLGA microparticles failed to adequately adhere to porcine vitreous and were readily washed out by irrigation. In contrast, modified transglutaminase-coated PLGA microparticles demonstrated a significant improvement in adhesiveness and were comparable to a triamcinolone suspension in their ability to enhance the visualisation of vitreous. This adhesive behaviour also demonstrated selectivity by not binding to the corneal endothelium. CONCLUSION The use of transglutaminase-modified biodegradable PLGA microparticles represents a novel method of visualising vitreous and aiding vitrectomy. This method may provide a distinct alternative for the visualisation of vitreous whilst eliminating the pharmacological effects of triamcinolone acetonide suspension.
Resumo:
An increasing number of neuroimaging studies are concerned with the identification of interactions or statistical dependencies between brain areas. Dependencies between the activities of different brain regions can be quantified with functional connectivity measures such as the cross-correlation coefficient. An important factor limiting the accuracy of such measures is the amount of empirical data available. For event-related protocols, the amount of data also affects the temporal resolution of the analysis. We use analytical expressions to calculate the amount of empirical data needed to establish whether a certain level of dependency is significant when the time series are autocorrelated, as is the case for biological signals. These analytical results are then contrasted with estimates from simulations based on real data recorded with magnetoencephalography during a resting-state paradigm and during the presentation of visual stimuli. Results indicate that, for broadband signals, 50-100 s of data is required to detect a true underlying cross-correlations coefficient of 0.05. This corresponds to a resolution of a few hundred milliseconds for typical event-related recordings. The required time window increases for narrow band signals as frequency decreases. For instance, approximately 3 times as much data is necessary for signals in the alpha band. Important implications can be derived for the design and interpretation of experiments to characterize weak interactions, which are potentially important for brain processing.
Resumo:
In this paper we develop an index and an indicator of productivity change that can be used with negative data. For that purpose the range directional model (RDM), a particular case of the directional distance function, is used for computing efficiency in the presence of negative data. We use RDM efficiency measures to arrive at a Malmquist-type index, which can reflect productivity change, and we use RDM inefficiency measures to arrive at a Luenberger productivity indicator, and relate the two. The productivity index and indicator are developed relative to a fixed meta-technology and so they are referred to as a meta-Malmquist index and meta-Luenberger indicator. We also address the fact that VRS technologies are used for computing the productivity index and indicator (a requirement under negative data), which raises issues relating to the interpretability of the index. We illustrate how the meta-Malmquist index can be used, not only for comparing the performance of a unit in two time periods, but also for comparing the performance of two different units at the same or different time periods. The proposed approach is then applied to a sample of bank branches where negative data were involved. The paper shows how the approach yields information from a variety of perspectives on performance which management can use.
Resumo:
There is a growing demand for data transmission over digital networks involving mobile terminals. An important class of data required for transmission over mobile terminals is image information such as street maps, floor plans and identikit images. This sort of transmission is of particular interest to the service industries such as the Police force, Fire brigade, medical services and other services. These services cannot be applied directly to mobile terminals because of the limited capacity of the mobile channels and the transmission errors caused by the multipath (Rayleigh) fading. In this research, transmission of line diagram images such as floor plans and street maps, over digital networks involving mobile terminals at transmission rates of 2400 bits/s and 4800 bits/s have been studied. A low bit-rate source encoding technique using geometric codes is found to be suitable to represent line diagram images. In geometric encoding, the amount of data required to represent or store the line diagram images is proportional to the image detail. Thus a simple line diagram image would require a small amount of data. To study the effect of transmission errors due to mobile channels on the transmitted images, error sources (error files), which represent mobile channels under different conditions, have been produced using channel modelling techniques. Satisfactory models of the mobile channel have been obtained when compared to the field test measurements. Subjective performance tests have been carried out to evaluate the quality and usefulness of the received line diagram images under various mobile channel conditions. The effect of mobile transmission errors on the quality of the received images has been determined. To improve the quality of the received images under various mobile channel conditions, forward error correcting codes (FEC) with interleaving and automatic repeat request (ARQ) schemes have been proposed. The performance of the error control codes have been evaluated under various mobile channel conditions. It has been shown that a FEC code with interleaving can be used effectively to improve the quality of the received images under normal and severe mobile channel conditions. Under normal channel conditions, similar results have been obtained when using ARQ schemes. However, under severe mobile channel conditions, the FEC code with interleaving shows better performance.