820 resultados para Dimensionality Reduction
Resumo:
As a part of the Managing Uncertainty in Complex Models (MUCM) project, research at Aston University will develop methods for dimensionality reduction of the input and/or output spaces of models, as seen within the emulator framework. Towards this end this report describes a framework for generating toy datasets, whose underlying structure is understood, to facilitate early investigations of dimensionality reduction methods and to gain a deeper understanding of the algorithms employed, both in terms of how effective they are for given types of models / situations, and also their speed in applications and how this scales with various factors. The framework, which allows the evaluation of both screening and projection approaches to dimensionality reduction, is described. We also describe the screening and projection methods currently under consideration and present some preliminary results. The aim of this draft of the report is to solicit feedback from the project team on the dataset generation framework, the methods we propose to use, and suggestions for extensions that should be considered.
Resumo:
This thesis describes the development of a complete data visualisation system for large tabular databases, such as those commonly found in a business environment. A state-of-the-art 'cyberspace cell' data visualisation technique was investigated and a powerful visualisation system using it was implemented. Although allowing databases to be explored and conclusions drawn, it had several drawbacks, the majority of which were due to the three-dimensional nature of the visualisation. A novel two-dimensional generic visualisation system, known as MADEN, was then developed and implemented, based upon a 2-D matrix of 'density plots'. MADEN allows an entire high-dimensional database to be visualised in one window, while permitting close analysis in 'enlargement' windows. Selections of records can be made and examined, and dependencies between fields can be investigated in detail. MADEN was used as a tool for investigating and assessing many data processing algorithms, firstly data-reducing (clustering) methods, then dimensionality-reducing techniques. These included a new 'directed' form of principal components analysis, several novel applications of artificial neural networks, and discriminant analysis techniques which illustrated how groups within a database can be separated. To illustrate the power of the system, MADEN was used to explore customer databases from two financial institutions, resulting in a number of discoveries which would be of interest to a marketing manager. Finally, the database of results from the 1992 UK Research Assessment Exercise was analysed. Using MADEN allowed both universities and disciplines to be graphically compared, and supplied some startling revelations, including empirical evidence of the 'Oxbridge factor'.
Resumo:
Privacy issues and data scarcity in PET field call for efficient methods to expand datasets via synthetic generation of new data that cannot be traced back to real patients and that are also realistic. In this thesis, machine learning techniques were applied to 1001 amyloid-beta PET images, which had undergone a diagnosis of Alzheimer’s disease: the evaluations were 540 positive, 457 negative and 4 unknown. Isomap algorithm was used as a manifold learning method to reduce the dimensions of the PET dataset; a numerical scale-free interpolation method was applied to invert the dimensionality reduction map. The interpolant was tested on the PET images via LOOCV, where the removed images were compared with the reconstructed ones with the mean SSIM index (MSSIM = 0.76 ± 0.06). The effectiveness of this measure is questioned, since it indicated slightly higher performance for a method of comparison using PCA (MSSIM = 0.79 ± 0.06), which gave clearly poor quality reconstructed images with respect to those recovered by the numerical inverse mapping. Ten synthetic PET images were generated and, after having been mixed with ten originals, were sent to a team of clinicians for the visual assessment of their realism; no significant agreements were found either between clinicians and the true image labels or among the clinicians, meaning that original and synthetic images were indistinguishable. The future perspective of this thesis points to the improvement of the amyloid-beta PET research field by increasing available data, overcoming the constraints of data acquisition and privacy issues. Potential improvements can be achieved via refinements of the manifold learning and the inverse mapping stages during the PET image analysis, by exploring different combinations in the choice of algorithm parameters and by applying other non-linear dimensionality reduction algorithms. A final prospect of this work is the search for new methods to assess image reconstruction quality.
Resumo:
In hyperspectral imagery a pixel typically consists mixture of spectral signatures of reference substances, also called endmembers. Linear spectral mixture analysis, or linear unmixing, aims at estimating the number of endmembers, their spectral signatures, and their abundance fractions. This paper proposes a framework for hyperpsectral unmixing. A blind method (SISAL) is used for the estimation of the unknown endmember signature and their abundance fractions. This method solve a non-convex problem by a sequence of augmented Lagrangian optimizations, where the positivity constraints, forcing the spectral vectors to belong to the convex hull of the endmember signatures, are replaced by soft constraints. The proposed framework simultaneously estimates the number of endmembers present in the hyperspectral image by an algorithm based on the minimum description length (MDL) principle. Experimental results on both synthetic and real hyperspectral data demonstrate the effectiveness of the proposed algorithm.
Resumo:
Ce mémoire de maîtrise présente une nouvelle approche non supervisée pour détecter et segmenter les régions urbaines dans les images hyperspectrales. La méthode proposée n ́ecessite trois étapes. Tout d’abord, afin de réduire le coût calculatoire de notre algorithme, une image couleur du contenu spectral est estimée. A cette fin, une étape de réduction de dimensionalité non-linéaire, basée sur deux critères complémentaires mais contradictoires de bonne visualisation; à savoir la précision et le contraste, est réalisée pour l’affichage couleur de chaque image hyperspectrale. Ensuite, pour discriminer les régions urbaines des régions non urbaines, la seconde étape consiste à extraire quelques caractéristiques discriminantes (et complémentaires) sur cette image hyperspectrale couleur. A cette fin, nous avons extrait une série de paramètres discriminants pour décrire les caractéristiques d’une zone urbaine, principalement composée d’objets manufacturés de formes simples g ́eométriques et régulières. Nous avons utilisé des caractéristiques texturales basées sur les niveaux de gris, la magnitude du gradient ou des paramètres issus de la matrice de co-occurrence combinés avec des caractéristiques structurelles basées sur l’orientation locale du gradient de l’image et la détection locale de segments de droites. Afin de réduire encore la complexité de calcul de notre approche et éviter le problème de la ”malédiction de la dimensionnalité” quand on décide de regrouper des données de dimensions élevées, nous avons décidé de classifier individuellement, dans la dernière étape, chaque caractéristique texturale ou structurelle avec une simple procédure de K-moyennes et ensuite de combiner ces segmentations grossières, obtenues à faible coût, avec un modèle efficace de fusion de cartes de segmentations. Les expérimentations données dans ce rapport montrent que cette stratégie est efficace visuellement et se compare favorablement aux autres méthodes de détection et segmentation de zones urbaines à partir d’images hyperspectrales.
Resumo:
This paper is an elaboration of the simplex identification via split augmented Lagrangian (SISAL) algorithm (Bioucas-Dias, 2009) to blindly unmix hyperspectral data. SISAL is a linear hyperspectral unmixing method of the minimum volume class. This method solve a non-convex problem by a sequence of augmented Lagrangian optimizations, where the positivity constraints, forcing the spectral vectors to belong to the convex hull of the endmember signatures, are replaced by soft constraints. With respect to SISAL, we introduce a dimensionality estimation method based on the minimum description length (MDL) principle. The effectiveness of the proposed algorithm is illustrated with simulated and real data.
Resumo:
There is great interindividual variability in the response to GH therapy. Ascertaining genetic factors can improve the accuracy of growth response predictions. Suppressor of cytokine signaling (SOCS)-2 is an intracellular negative regulator of GH receptor (GHR) signaling. The objective of the study was to assess the influence of a SOCS2 polymorphism (rs3782415) and its interactive effect with GHR exon 3 and -202 A/C IGFBP3 (rs2854744) polymorphisms on adult height of patients treated with recombinant human GH (rhGH). Genotypes were correlated with adult height data of 65 Turner syndrome (TS) and 47 GH deficiency (GHD) patients treated with rhGH, by multiple linear regressions. Generalized multifactor dimensionality reduction was used to evaluate gene-gene interactions. Baseline clinical data were indistinguishable among patients with different genotypes. Adult height SD scores of patients with at least one SOCS2 single-nucleotide polymorphism rs3782415-C were 0.7 higher than those homozygous for the T allele (P < .001). SOCS2 (P = .003), GHR-exon 3 (P= .016) and -202 A/C IGFBP3 (P = .013) polymorphisms, together with clinical factors accounted for 58% of the variability in adult height and 82% of the total height SD score gain. Patients harboring any two negative genotypes in these three different loci (homozygosity for SOCS2 T allele; the GHR exon 3 full-length allele and/or the -202C-IGFBP3 allele) were more likely to achieve an adult height at the lower quartile (odds ratio of 13.3; 95% confidence interval of 3.2-54.2, P = .0001). The SOCS2 polymorphism (rs3782415) has an influence on the adult height of children with TS and GHD after long-term rhGH therapy. Polymorphisms located in GHR, IGFBP3, and SOCS2 loci have an influence on the growth outcomes of TS and GHD patients treated with rhGH. The use of these genetic markers could identify among rhGH-treated patients those who are genetically predisposed to have less favorable outcomes.
Resumo:
Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.
Resumo:
Signal subspace identification is a crucial first step in many hyperspectral processing algorithms such as target detection, change detection, classification, and unmixing. The identification of this subspace enables a correct dimensionality reduction, yielding gains in algorithm performance and complexity and in data storage. This paper introduces a new minimum mean square error-based approach to infer the signal subspace in hyperspectral imagery. The method, which is termed hyperspectral signal identification by minimum error, is eigen decomposition based, unsupervised, and fully automatic (i.e., it does not depend on any tuning parameters). It first estimates the signal and noise correlation matrices and then selects the subset of eigenvalues that best represents the signal subspace in the least squared error sense. State-of-the-art performance of the proposed method is illustrated by using simulated and real hyperspectral images.
Resumo:
Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
This paper proposes an FPGA-based architecture for onboard hyperspectral unmixing. This method based on the Vertex Component Analysis (VCA) has several advantages, namely it is unsupervised, fully automatic, and it works without dimensionality reduction (DR) pre-processing step. The architecture has been designed for a low cost Xilinx Zynq board with a Zynq-7020 SoC FPGA based on the Artix-7 FPGA programmable logic and tested using real hyperspectral datasets. Experimental results indicate that the proposed implementation can achieve real-time processing, while maintaining the methods accuracy, which indicate the potential of the proposed platform to implement high-performance, low cost embedded systems.
Resumo:
Dimensionality reduction plays a crucial role in many hyperspectral data processing and analysis algorithms. This paper proposes a new mean squared error based approach to determine the signal subspace in hyperspectral imagery. The method first estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least square sense. The effectiveness of the proposed method is illustrated using simulated and real hyperspectral images.
Resumo:
Hyperspectral imaging sensors provide image data containing both spectral and spatial information from the Earth surface. The huge data volumes produced by these sensors put stringent requirements on communications, storage, and processing. This paper presents a method, termed hyperspectral signal subspace identification by minimum error (HySime), that infer the signal subspace and determines its dimensionality without any prior knowledge. The identification of this subspace enables a correct dimensionality reduction yielding gains in algorithm performance and complexity and in data storage. HySime method is unsupervised and fully-automatic, i.e., it does not depend on any tuning parameters. The effectiveness of the proposed method is illustrated using simulated data based on U.S.G.S. laboratory spectra and real hyperspectral data collected by the AVIRIS sensor over Cuprite, Nevada.
Resumo:
Hyperspectral instruments have been incorporated in satellite missions, providing large amounts of data of high spectral resolution of the Earth surface. This data can be used in remote sensing applications that often require a real-time or near-real-time response. To avoid delays between hyperspectral image acquisition and its interpretation, the last usually done on a ground station, onboard systems have emerged to process data, reducing the volume of information to transfer from the satellite to the ground station. For this purpose, compact reconfigurable hardware modules, such as field-programmable gate arrays (FPGAs), are widely used. This paper proposes an FPGA-based architecture for hyperspectral unmixing. This method based on the vertex component analysis (VCA) and it works without a dimensionality reduction preprocessing step. The architecture has been designed for a low-cost Xilinx Zynq board with a Zynq-7020 system-on-chip FPGA-based on the Artix-7 FPGA programmable logic and tested using real hyperspectral data. Experimental results indicate that the proposed implementation can achieve real-time processing, while maintaining the methods accuracy, which indicate the potential of the proposed platform to implement high-performance, low-cost embedded systems, opening perspectives for onboard hyperspectral image processing.