86 resultados para Spectral Feature Extraction
em Université de Lausanne, Switzerland
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
Diagnosis of several neurological disorders is based on the detection of typical pathological patterns in the electroencephalogram (EEG). This is a time-consuming task requiring significant training and experience. Automatic detection of these EEG patterns would greatly assist in quantitative analysis and interpretation. We present a method, which allows automatic detection of epileptiform events and discrimination of them from eye blinks, and is based on features derived using a novel application of independent component analysis. The algorithm was trained and cross validated using seven EEGs with epileptiform activity. For epileptiform events with compensation for eyeblinks, the sensitivity was 65 +/- 22% at a specificity of 86 +/- 7% (mean +/- SD). With feature extraction by PCA or classification of raw data, specificity reduced to 76 and 74%, respectively, for the same sensitivity. On exactly the same data, the commercially available software Reveal had a maximum sensitivity of 30% and concurrent specificity of 77%. Our algorithm performed well at detecting epileptiform events in this preliminary test and offers a flexible tool that is intended to be generalized to the simultaneous classification of many waveforms in the EEG.
Resumo:
The analysis of multi-modal and multi-sensor images is nowadays of paramount importance for Earth Observation (EO) applications. There exist a variety of methods that aim at fusing the different sources of information to obtain a compact representation of such datasets. However, for change detection existing methods are often unable to deal with heterogeneous image sources and very few consider possible nonlinearities in the data. Additionally, the availability of labeled information is very limited in change detection applications. For these reasons, we present the use of a semi-supervised kernel-based feature extraction technique. It incorporates a manifold regularization accounting for the geometric distribution and jointly addressing the small sample problem. An exhaustive example using Landsat 5 data illustrates the potential of the method for multi-sensor change detection.
Resumo:
Monitoring of posture allocations and activities enables accurate estimation of energy expenditure and may aid in obesity prevention and treatment. At present, accurate devices rely on multiple sensors distributed on the body and thus may be too obtrusive for everyday use. This paper presents a novel wearable sensor, which is capable of very accurate recognition of common postures and activities. The patterns of heel acceleration and plantar pressure uniquely characterize postures and typical activities while requiring minimal preprocessing and no feature extraction. The shoe sensor was tested in nine adults performing sitting and standing postures and while walking, running, stair ascent/descent and cycling. Support vector machines (SVMs) were used for classification. A fourfold validation of a six-class subject-independent group model showed 95.2% average accuracy of posture/activity classification on full sensor set and over 98% on optimized sensor set. Using a combination of acceleration/pressure also enabled a pronounced reduction of the sampling frequency (25 to 1 Hz) without significant loss of accuracy (98% versus 93%). Subjects had shoe sizes (US) M9.5-11 and W7-9 and body mass index from 18.1 to 39.4 kg/m2 and thus suggesting that the device can be used by individuals with varying anthropometric characteristics.
Resumo:
Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.
Resumo:
An important aspect of immune monitoring for vaccine development, clinical trials, and research is the detection, measurement, and comparison of antigen-specific T-cells from subject samples under different conditions. Antigen-specific T-cells compose a very small fraction of total T-cells. Developments in cytometry technology over the past five years have enabled the measurement of single-cells in a multivariate and high-throughput manner. This growth in both dimensionality and quantity of data continues to pose a challenge for effective identification and visualization of rare cell subsets, such as antigen-specific T-cells. Dimension reduction and feature extraction play pivotal role in both identifying and visualizing cell populations of interest in large, multi-dimensional cytometry datasets. However, the automated identification and visualization of rare, high-dimensional cell subsets remains challenging. Here we demonstrate how a systematic and integrated approach combining targeted feature extraction with dimension reduction can be used to identify and visualize biological differences in rare, antigen-specific cell populations. By using OpenCyto to perform semi-automated gating and features extraction of flow cytometry data, followed by dimensionality reduction with t-SNE we are able to identify polyfunctional subpopulations of antigen-specific T-cells and visualize treatment-specific differences between them.
Resumo:
Understanding the basis on which recruiters form hirability impressions for a job applicant is a key issue in organizational psychology and can be addressed as a social computing problem. We approach the problem from a face-to-face, nonverbal perspective where behavioral feature extraction and inference are automated. This paper presents a computational framework for the automatic prediction of hirability. To this end, we collected an audio-visual dataset of real job interviews where candidates were applying for a marketing job. We automatically extracted audio and visual behavioral cues related to both the applicant and the interviewer. We then evaluated several regression methods for the prediction of hirability scores and showed the feasibility of conducting such a task, with ridge regression explaining 36.2% of the variance. Feature groups were analyzed, and two main groups of behavioral cues were predictive of hirability: applicant audio features and interviewer visual cues, showing the predictive validity of cues related not only to the applicant, but also to the interviewer. As a last step, we analyzed the predictive validity of psychometric questionnaires often used in the personnel selection process, and found that these questionnaires were unable to predict hirability, suggesting that hirability impressions were formed based on the interaction during the interview rather than on questionnaire data.
Resumo:
A combined strategy based on the computation of absorption energies, using the ZINDO/S semiempirical method, for a statistically relevant number of thermally sampled configurations extracted from QM/MM trajectories is used to establish a one-to-one correspondence between the structures of the different early intermediates (dark, batho, BSI, lumi) involved in the initial steps of the rhodopsin photoactivation mechanism and their optical spectra. A systematic analysis of the results based on a correlation-based feature selection algorithm shows that the origin of the color shifts among these intermediates can be mainly ascribed to alterations in intrinsic properties of the chromophore structure, which are tuned by several residues located in the protein binding pocket. In addition to the expected electrostatic and dipolar effects caused by the charged residues (Glu113, Glu181) and to strong hydrogen bonding with Glu113, other interactions such as π-stacking with Ala117 and Thr118 backbone atoms, van der Waals contacts with Gly114 and Ala292, and CH/π weak interactions with Tyr268, Ala117, Thr118, and Ser186 side chains are found to make non-negligible contributions to the modulation of the color tuning among the different rhodopsin photointermediates.
Resumo:
Cardiovascular disease is the leading cause of death worldwide. Within this subset, coronary artery disease (CAD) is the most prevalent. Magnetic resonance angiography (MRA) is an emerging technique that provides a safe, non-invasive way of assessing CAD progression. To generate contrast between tissues, MR images are weighted according to the magnetic properties of those tissues. In cardiac MRI, T2 contrast, which is governed by the rate of transverse signal loss, is often created through the use of a T2-Preparation module. T2-Preparation, or T2-Prep, is a magnetization preparation scheme used to improve blood/myocardium contrast in cardiac MRI. T2-Prep methods generally use a non-selective +90°, 180°, 180°, -90° train of radiofrequency (RF) pulses (or variant thereof), to tip magnetization into the transverse plane, allow it to evolve, and then to restore it to the longitudinal plane. A key feature in this process is the combination of a +90° and -90° RF pulse. By changing either one of these, a mismatch occurs between signal excitation and restoration. This feature can be exploited to provide additional spectral or spatial selectivity. In this work, both of these possibilities are explored. The first - spectral selectivity - has been examined as a method of improving fat saturation in coronary MRA. The second - spatial selectivity - has been examined as a means of reducing imaging time by decreasing the field of view, and as a method of reducing artefacts originating from the tissues surrounding the heart. Two additional applications, parallel imaging and self-navigation, are also presented. This thesis is thus composed of four sections. The first, "A Fat Signal Suppression for Coronary MRA at 3T using a Water-Selective Adiabatic T2-Preparation Technique", was originally published in the journal Magnetic Resonance in Medicine (MRM) with co-authors Ruud B. van Heeswijk and Matthias Stuber. The second, "Combined T2-Preparation and 2D Pencil Beam Inner Volume Selection", again with co-authors Ruud van Heeswijk and Matthias Stuber, was also published in the journal MRM. The third, "A cylindrical, inner volume selecting 2D-T2-Prep improves GRAPPA-accelerated image quality in MRA of the right coronary artery", written with co-authors Jerome Yerly and Matthias Stuber, has been submitted to the "Journal of Cardiovascular Magnetic Resonance", and the fourth, "Combined respiratory self-navigation and 'pencil-beam' 2D-T2 -Prep for free-breathing, whole-heart coronary MRA", with co¬authors Jerome Chaptinel, Giulia Ginami, Gabriele Bonanno, Simone Coppo, Ruud van Heeswijk, Davide Piccini, and Matthias Stuber, is undergoing internal review prior to submission to the journal MRM. -- Les maladies cardiovasculaires sont la cause principale de décès dans le monde : parmi celles-ci, les maladies coronariennes sont les plus répandues. L'angiographie par résonance magnétique (ARM) est une technique émergente qui fournit une manière sûre, non invasive d'évaluer la progression de la coronaropathie. Pour obtenir un contraste entre les tissus, les images d'IRM sont pondérées en fonction des propriétés magnétiques de ces tissus. En IRM cardiaque, le contraste en T2, qui est lié à la décroissance du signal transversal, est souvent créé grâce à l'utilisàtion d'un module de préparation T2. La préparation T2, ou T2-Prep, est un système de préparation de l'aimantation qui est utilisé pour améliorer le contraste entre le sang et le myocarde lors d'une IRM cardiaque. Les méthodes de T2-Prep utilisent généralement une série non-sélective d'impulsions de radiofréquence (RF), typiquement [+ 90°, 180°, 180°, -90°] ou une variante, qui bascule l'aimantation dans le plan transversal, lui permet d'évoluer, puis la restaure dans le plan longitudinal. Un élément clé de ce processus est la combinaison des impulsions RF de +90° et -90°. En changeant l'une ou l'autre des impulsions, un décalage se produit entre l'excitation du signal et de la restauration. Cette fonction peut être exploitée pour fournir une sélectivité spectrale ou spatiale. Dans cette thèse, les deux possibilités sont explorées. La première - la sélectivité spectrale - a été examinée comme une méthode d'améliorer la saturation de la graisse dans l'IRM coronarienne. La deuxième - la sélectivité spatiale - a été étudiée comme un moyen de réduire le temps d'imagerie en diminuant le champ de vue, et comme une méthode de réduction des artefacts provenant des tissus entourant le coeur. Deux applications supplémentaires, l'imagerie parallèle et la self-navigation, sont également présentées. Cette thèse est ainsi composée de quatre sections. La première, "A Fat Signal Suppression for Coronary MRA at 3T using a Water-Selective Adiabatic T2-Preparation Technique", a été publiée dans la revue médicale Magnetic Resonance .in Medicine (MRM) avec les co-auteurs Ruud B. van Heeswijk et Matthias Stuber. La deuxième, Combined T2-Preparation and 2D Pencil Beam Inner Volume Selection", encore une fois avec les co-auteurs Ruud van Heeswijk et Matthias Stuber, a également été publiée dans le journal MRM. La troisième, "A cylindrical, inner volume selecting 2D-T2-Prep improves GRAPPA- accelerated image quality in MRA of the right coronary artery", écrite avec les co-auteurs Jérôme Yerly et Matthias Stuber, a été présentée au "Journal of Cardiovascular Magnetic Resonance", et la quatrième, "Combined respiratory self-navigation and 'pencil-beam' 2D-T2 -Prep for free-breathing, whole-heart coronary MRA", avec les co-auteurs Jérôme Chaptinel, Giulia Ginami, Gabriele Bonanno , Simone Coppo, Ruud van Heeswijk, Davide Piccini, et Matthias Stuber, subit un examen interne avant la soumission à la revue MRM.