21 resultados para Semi-supervised learning

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semi-supervised techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fluvial deposits are a challenge for modelling flow in sub-surface reservoirs. Connectivity and continuity of permeable bodies have a major impact on fluid flow in porous media. Contemporary object-based and multipoint statistics methods face a problem of robust representation of connected structures. An alternative approach to model petrophysical properties is based on machine learning algorithm ? Support Vector Regression (SVR). Semi-supervised SVR is able to establish spatial connectivity taking into account the prior knowledge on natural similarities. SVR as a learning algorithm is robust to noise and captures dependencies from all available data. Semi-supervised SVR applied to a synthetic fluvial reservoir demonstrated robust results, which are well matched to the flow performance

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ultrasound segmentation is a challenging problem due to the inherent speckle and some artifacts like shadows, attenuation and signal dropout. Existing methods need to include strong priors like shape priors or analytical intensity models to succeed in the segmentation. However, such priors tend to limit these methods to a specific target or imaging settings, and they are not always applicable to pathological cases. This work introduces a semi-supervised segmentation framework for ultrasound imaging that alleviates the limitation of fully automatic segmentation, that is, it is applicable to any kind of target and imaging settings. Our methodology uses a graph of image patches to represent the ultrasound image and user-assisted initialization with labels, which acts as soft priors. The segmentation problem is formulated as a continuous minimum cut problem and solved with an efficient optimization algorithm. We validate our segmentation framework on clinical ultrasound imaging (prostate, fetus, and tumors of the liver and eye). We obtain high similarity agreement with the ground truth provided by medical expert delineations in all applications (94% DICE values in average) and the proposed algorithm performs favorably with the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmenting ultrasound images is a challenging problemwhere standard unsupervised segmentation methods such asthe well-known Chan-Vese method fail. We propose in thispaper an efficient segmentation method for this class ofimages. Our proposed algorithm is based on asemi-supervised approach (user labels) and the use ofimage patches as data features. We also consider thePearson distance between patches, which has been shown tobe robust w.r.t speckle noise present in ultrasoundimages. Our results on phantom and clinical data show avery high similarity agreement with the ground truthprovided by a medical expert.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A semisupervised support vector machine is presented for the classification of remote sensing images. The method exploits the wealth of unlabeled samples for regularizing the training kernel representation locally by means of cluster kernels. The method learns a suitable kernel directly from the image and thus avoids assuming a priori signal relations by using a predefined kernel structure. Good results are obtained in image classification examples when few labeled samples are available. The method scales almost linearly with the number of unlabeled samples and provides out-of-sample predictions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The analysis of multi-modal and multi-sensor images is nowadays of paramount importance for Earth Observation (EO) applications. There exist a variety of methods that aim at fusing the different sources of information to obtain a compact representation of such datasets. However, for change detection existing methods are often unable to deal with heterogeneous image sources and very few consider possible nonlinearities in the data. Additionally, the availability of labeled information is very limited in change detection applications. For these reasons, we present the use of a semi-supervised kernel-based feature extraction technique. It incorporates a manifold regularization accounting for the geometric distribution and jointly addressing the small sample problem. An exhaustive example using Landsat 5 data illustrates the potential of the method for multi-sensor change detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article presents an experimental study about the classification ability of several classifiers for multi-classclassification of cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland lawenforcement authorities regularly ask forensic laboratories to determinate the chemotype of a seized cannabisplant and then to conclude if the plantation is legal or not. This classification is mainly performed when theplant is mature as required by the EU official protocol and then the classification of cannabis seedlings is a timeconsuming and costly procedure. A previous study made by the authors has investigated this problematic [1]and showed that it is possible to differentiate between drug type (illegal) and fibre type (legal) cannabis at anearly stage of growth using gas chromatography interfaced with mass spectrometry (GC-MS) based on therelative proportions of eight major leaf compounds. The aims of the present work are on one hand to continueformer work and to optimize the methodology for the discrimination of drug- and fibre type cannabisdeveloped in the previous study and on the other hand to investigate the possibility to predict illegal cannabisvarieties. Seven classifiers for differentiating between cannabis seedlings are evaluated in this paper, namelyLinear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), Nearest NeighbourClassification (NNC), Learning Vector Quantization (LVQ), Radial Basis Function Support Vector Machines(RBF SVMs), Random Forest (RF) and Artificial Neural Networks (ANN). The performance of each method wasassessed using the same analytical dataset that consists of 861 samples split into drug- and fibre type cannabiswith drug type cannabis being made up of 12 varieties (i.e. 12 classes). The results show that linear classifiersare not able to manage the distribution of classes in which some overlap areas exist for both classificationproblems. Unlike linear classifiers, NNC and RBF SVMs best differentiate cannabis samples both for 2-class and12-class classifications with average classification results up to 99% and 98%, respectively. Furthermore, RBFSVMs correctly classified into drug type cannabis the independent validation set, which consists of cannabisplants coming from police seizures. In forensic case work this study shows that the discrimination betweencannabis samples at an early stage of growth is possible with fairly high classification performance fordiscriminating between cannabis chemotypes or between drug type cannabis varieties.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Learning is predicted to affect manifold ecological and evolutionary processes, but the extent to which animals rely on learning in nature remains poorly known, especially for short-lived non-social invertebrates. This is in particular the case for Drosophila, a favourite laboratory system to study molecular mechanisms of learning. Here we tested whether Drosophila melanogaster use learned information to choose food while free-flying in a large greenhouse emulating the natural environment. In a series of experiments flies were first given an opportunity to learn which of two food odours was associated with good versus unpalatable taste; subsequently, their preference for the two odours was assessed with olfactory traps set up in the greenhouse. Flies that had experienced palatable apple-flavoured food and unpalatable orange-flavoured food were more likely to be attracted to the odour of apple than flies with the opposite experience. This was true both when the flies first learned in the laboratory and were then released and recaptured in the greenhouse, and when the learning occurred under free-flying conditions in the greenhouse. Furthermore, flies retained the memory of their experience while exploring the greenhouse overnight in the absence of focal odours, pointing to the involvement of consolidated memory. These results support the notion that even small, short lived insects which are not central-place foragers make use of learned cues in their natural environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An active learning method is proposed for the semi-automatic selection of training sets in remote sensing image classification. The method adds iteratively to the current training set the unlabeled pixels for which the prediction of an ensemble of classifiers based on bagged training sets show maximum entropy. This way, the algorithm selects the pixels that are the most uncertain and that will improve the model if added in the training set. The user is asked to label such pixels at each iteration. Experiments using support vector machines (SVM) on an 8 classes QuickBird image show the excellent performances of the methods, that equals accuracies of both a model trained with ten times more pixels and a model whose training set has been built using a state-of-the-art SVM specific active learning method