50 resultados para Data-driven

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The neural mechanisms determining the timing of even simple actions, such as when to walk or rest, are largely mysterious. One intriguing, but untested, hypothesis posits a role for ongoing activity fluctuations in neurons of central action selection circuits that drive animal behavior from moment to moment. To examine how fluctuating activity can contribute to action timing, we paired high-resolution measurements of freely walking Drosophila melanogaster with data-driven neural network modeling and dynamical systems analysis. We generated fluctuation-driven network models whose outputs-locomotor bouts-matched those measured from sensory-deprived Drosophila. From these models, we identified those that could also reproduce a second, unrelated dataset: the complex time-course of odor-evoked walking for genetically diverse Drosophila strains. Dynamical models that best reproduced both Drosophila basal and odor-evoked locomotor patterns exhibited specific characteristics. First, ongoing fluctuations were required. In a stochastic resonance-like manner, these fluctuations allowed neural activity to escape stable equilibria and to exceed a threshold for locomotion. Second, odor-induced shifts of equilibria in these models caused a depression in locomotor frequency following olfactory stimulation. Our models predict that activity fluctuations in action selection circuits cause behavioral output to more closely match sensory drive and may therefore enhance navigation in complex sensory environments. Together these data reveal how simple neural dynamics, when coupled with activity fluctuations, can give rise to complex patterns of animal behavior.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVES: To develop data-driven criteria for clinically inactive disease on and off therapy for juvenile dermatomyositis (JDM). METHODS: The Paediatric Rheumatology International Trials Organisation (PRINTO) database contains 275 patients with active JDM evaluated prospectively up to 24 months. Thirty-eight patients off therapy at 24 months were defined as clinically inactive and included in the reference group. These were compared with a random sample of 76 patients who had active disease at study baseline. Individual measures of muscle strength/endurance, muscle enzymes, physician's and parent's global disease activity/damage evaluations, inactive disease criteria derived from the literature and other ad hoc criteria were evaluated for sensitivity, specificity and Cohen's κ agreement. RESULTS: The individual measures that best characterised inactive disease (sensitivity and specificity >0.8 and Cohen's κ >0.8) were manual muscle testing (MMT) ≥78, physician global assessment of muscle activity=0, physician global assessment of overall disease activity (PhyGloVAS) ≤0.2, Childhood Myositis Assessment Scale (CMAS) ≥48, Disease Activity Score ≤3 and Myositis Disease Activity Assessment Visual Analogue Scale ≤0.2. The best combination of variables to classify a patient as being in a state of inactive disease on or off therapy is at least three of four of the following criteria: creatine kinase ≤150, CMAS ≥48, MMT ≥78 and PhyGloVAS ≤0.2. After 24 months, 30/31 patients (96.8%) were inactive off therapy and 69/145 (47.6%) were inactive on therapy. CONCLUSION: PRINTO established data-driven criteria with clearly evidence-based cut-off values to identify JDM patients with clinically inactive disease. These criteria can be used in clinical trials, in research and in clinical practice.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The investigation of perceptual and cognitive functions with non-invasive brain imaging methods critically depends on the careful selection of stimuli for use in experiments. For example, it must be verified that any observed effects follow from the parameter of interest (e.g. semantic category) rather than other low-level physical features (e.g. luminance, or spectral properties). Otherwise, interpretation of results is confounded. Often, researchers circumvent this issue by including additional control conditions or tasks, both of which are flawed and also prolong experiments. Here, we present some new approaches for controlling classes of stimuli intended for use in cognitive neuroscience, however these methods can be readily extrapolated to other applications and stimulus modalities. Our approach is comprised of two levels. The first level aims at equalizing individual stimuli in terms of their mean luminance. Each data point in the stimulus is adjusted to a standardized value based on a standard value across the stimulus battery. The second level analyzes two populations of stimuli along their spectral properties (i.e. spatial frequency) using a dissimilarity metric that equals the root mean square of the distance between two populations of objects as a function of spatial frequency along x- and y-dimensions of the image. Randomized permutations are used to obtain a minimal value between the populations to minimize, in a completely data-driven manner, the spectral differences between image sets. While another paper in this issue applies these methods in the case of acoustic stimuli (Aeschlimann et al., Brain Topogr 2008), we illustrate this approach here in detail for complex visual stimuli.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Self-consciousness has mostly been approached by philosophical enquiry and not by empirical neuroscientific study, leading to an overabundance of diverging theories and an absence of data-driven theories. Using robotic technology, we achieved specific bodily conflicts and induced predictable changes in a fundamental aspect of self-consciousness by altering where healthy subjects experienced themselves to be (self-location). Functional magnetic resonance imaging revealed that temporo-parietal junction (TPJ) activity reflected experimental changes in self-location that also depended on the first-person perspective due to visuo-tactile and visuo-vestibular conflicts. Moreover, in a large lesion analysis study of neurological patients with a well-defined state of abnormal self-location, brain damage was also localized at TPJ, providing causal evidence that TPJ encodes self-location. Our findings reveal that multisensory integration at the TPJ reflects one of the most fundamental subjective feelings of humans: the feeling of being an entity localized at a position in space and perceiving the world from this position and perspective.