194 resultados para Semi-supervised classification
em Université de Lausanne, Switzerland
Resumo:
A semisupervised support vector machine is presented for the classification of remote sensing images. The method exploits the wealth of unlabeled samples for regularizing the training kernel representation locally by means of cluster kernels. The method learns a suitable kernel directly from the image and thus avoids assuming a priori signal relations by using a predefined kernel structure. Good results are obtained in image classification examples when few labeled samples are available. The method scales almost linearly with the number of unlabeled samples and provides out-of-sample predictions.
Resumo:
Ultrasound segmentation is a challenging problem due to the inherent speckle and some artifacts like shadows, attenuation and signal dropout. Existing methods need to include strong priors like shape priors or analytical intensity models to succeed in the segmentation. However, such priors tend to limit these methods to a specific target or imaging settings, and they are not always applicable to pathological cases. This work introduces a semi-supervised segmentation framework for ultrasound imaging that alleviates the limitation of fully automatic segmentation, that is, it is applicable to any kind of target and imaging settings. Our methodology uses a graph of image patches to represent the ultrasound image and user-assisted initialization with labels, which acts as soft priors. The segmentation problem is formulated as a continuous minimum cut problem and solved with an efficient optimization algorithm. We validate our segmentation framework on clinical ultrasound imaging (prostate, fetus, and tumors of the liver and eye). We obtain high similarity agreement with the ground truth provided by medical expert delineations in all applications (94% DICE values in average) and the proposed algorithm performs favorably with the literature.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
Segmenting ultrasound images is a challenging problemwhere standard unsupervised segmentation methods such asthe well-known Chan-Vese method fail. We propose in thispaper an efficient segmentation method for this class ofimages. Our proposed algorithm is based on asemi-supervised approach (user labels) and the use ofimage patches as data features. We also consider thePearson distance between patches, which has been shown tobe robust w.r.t speckle noise present in ultrasoundimages. Our results on phantom and clinical data show avery high similarity agreement with the ground truthprovided by a medical expert.
Resumo:
Fluvial deposits are a challenge for modelling flow in sub-surface reservoirs. Connectivity and continuity of permeable bodies have a major impact on fluid flow in porous media. Contemporary object-based and multipoint statistics methods face a problem of robust representation of connected structures. An alternative approach to model petrophysical properties is based on machine learning algorithm ? Support Vector Regression (SVR). Semi-supervised SVR is able to establish spatial connectivity taking into account the prior knowledge on natural similarities. SVR as a learning algorithm is robust to noise and captures dependencies from all available data. Semi-supervised SVR applied to a synthetic fluvial reservoir demonstrated robust results, which are well matched to the flow performance
Resumo:
We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semi-supervised techniques.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
The 2008 Data Fusion Contest organized by the IEEE Geoscience and Remote Sensing Data Fusion Technical Committee deals with the classification of high-resolution hyperspectral data from an urban area. Unlike in the previous issues of the contest, the goal was not only to identify the best algorithm but also to provide a collaborative effort: The decision fusion of the best individual algorithms was aiming at further improving the classification performances, and the best algorithms were ranked according to their relative contribution to the decision fusion. This paper presents the five awarded algorithms and the conclusions of the contest, stressing the importance of decision fusion, dimension reduction, and supervised classification methods, such as neural networks and support vector machines.
Resumo:
The analysis of multi-modal and multi-sensor images is nowadays of paramount importance for Earth Observation (EO) applications. There exist a variety of methods that aim at fusing the different sources of information to obtain a compact representation of such datasets. However, for change detection existing methods are often unable to deal with heterogeneous image sources and very few consider possible nonlinearities in the data. Additionally, the availability of labeled information is very limited in change detection applications. For these reasons, we present the use of a semi-supervised kernel-based feature extraction technique. It incorporates a manifold regularization accounting for the geometric distribution and jointly addressing the small sample problem. An exhaustive example using Landsat 5 data illustrates the potential of the method for multi-sensor change detection.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.
Resumo:
The development of targeted treatment strategies adapted to individual patients requires identification of the different tumor classes according to their biology and prognosis. We focus here on the molecular aspects underlying these differences, in terms of sets of genes that control pathogenesis of the different subtypes of astrocytic glioma. By performing cDNA-array analysis of 53 patient biopsies, comprising low-grade astrocytoma, secondary glioblastoma (respective recurrent high-grade tumors), and newly diagnosed primary glioblastoma, we demonstrate that human gliomas can be differentiated according to their gene expression. We found that low-grade astrocytoma have the most specific and similar expression profiles, whereas primary glioblastoma exhibit much larger variation between tumors. Secondary glioblastoma display features of both other groups. We identified several sets of genes with relatively highly correlated expression within groups that: (a). can be associated with specific biological functions; and (b). effectively differentiate tumor class. One prominent gene cluster discriminating primary versus nonprimary glioblastoma comprises mostly genes involved in angiogenesis, including VEGF fms-related tyrosine kinase 1 but also IGFBP2, that has not yet been directly linked to angiogenesis. In situ hybridization demonstrating coexpression of IGFBP2 and VEGF in pseudopalisading cells surrounding tumor necrosis provided further evidence for a possible involvement of IGFBP2 in angiogenesis. The separating groups of genes were found by the unsupervised coupled two-way clustering method, and their classification power was validated by a supervised construction of a nearly perfect glioma classifier.