11 resultados para incremental learning algorithm

em Université de Lausanne, Switzerland


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fluvial deposits are a challenge for modelling flow in sub-surface reservoirs. Connectivity and continuity of permeable bodies have a major impact on fluid flow in porous media. Contemporary object-based and multipoint statistics methods face a problem of robust representation of connected structures. An alternative approach to model petrophysical properties is based on machine learning algorithm ? Support Vector Regression (SVR). Semi-supervised SVR is able to establish spatial connectivity taking into account the prior knowledge on natural similarities. SVR as a learning algorithm is robust to noise and captures dependencies from all available data. Semi-supervised SVR applied to a synthetic fluvial reservoir demonstrated robust results, which are well matched to the flow performance

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Transmission of drug-resistant pathogens presents an almost-universal challenge for fighting infectious diseases. Transmitted drug resistance mutations (TDRM) can persist in the absence of drugs for considerable time. It is generally believed that differential TDRM-persistence is caused, at least partially, by variations in TDRM-fitness-costs. However, in vivo epidemiological evidence for the impact of fitness costs on TDRM-persistence is rare. Here, we studied the persistence of TDRM in HIV-1 using longitudinally-sampled nucleotide sequences from the Swiss-HIV-Cohort-Study (SHCS). All treatment-naïve individuals with TDRM at baseline were included. Persistence of TDRM was quantified via reversion rates (RR) determined with interval-censored survival models. Fitness costs of TDRM were estimated in the genetic background in which they occurred using a previously published and validated machine-learning algorithm (based on in vitro replicative capacities) and were included in the survival models as explanatory variables. In 857 sequential samples from 168 treatment-naïve patients, 17 TDRM were analyzed. RR varied substantially and ranged from 174.0/100-person-years;CI=[51.4, 588.8] (for 184V) to 2.7/100-person-years;[0.7, 10.9] (for 215D). RR increased significantly with fitness cost (increase by 1.6[1.3,2.0] per standard deviation of fitness costs). When subdividing fitness costs into the average fitness cost of a given mutation and the deviation from the average fitness cost of a mutation in a given genetic background, we found that both components were significantly associated with reversion-rates. Our results show that the substantial variations of TDRM persistence in the absence of drugs are associated with fitness-cost differences both among mutations and among different genetic backgrounds for the same mutation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An active learning method is proposed for the semi-automatic selection of training sets in remote sensing image classification. The method adds iteratively to the current training set the unlabeled pixels for which the prediction of an ensemble of classifiers based on bagged training sets show maximum entropy. This way, the algorithm selects the pixels that are the most uncertain and that will improve the model if added in the training set. The user is asked to label such pixels at each iteration. Experiments using support vector machines (SVM) on an 8 classes QuickBird image show the excellent performances of the methods, that equals accuracies of both a model trained with ten times more pixels and a model whose training set has been built using a state-of-the-art SVM specific active learning method

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose and validate a multivariate classification algorithm for characterizing changes in human intracranial electroencephalographic data (iEEG) after learning motor sequences. The algorithm is based on a Hidden Markov Model (HMM) that captures spatio-temporal properties of the iEEG at the level of single trials. Continuous intracranial iEEG was acquired during two sessions (one before and one after a night of sleep) in two patients with depth electrodes implanted in several brain areas. They performed a visuomotor sequence (serial reaction time task, SRTT) using the fingers of their non-dominant hand. Our results show that the decoding algorithm correctly classified single iEEG trials from the trained sequence as belonging to either the initial training phase (day 1, before sleep) or a later consolidated phase (day 2, after sleep), whereas it failed to do so for trials belonging to a control condition (pseudo-random sequence). Accurate single-trial classification was achieved by taking advantage of the distributed pattern of neural activity. However, across all the contacts the hippocampus contributed most significantly to the classification accuracy for both patients, and one fronto-striatal contact for one patient. Together, these human intracranial findings demonstrate that a multivariate decoding approach can detect learning-related changes at the level of single-trial iEEG. Because it allows an unbiased identification of brain sites contributing to a behavioral effect (or experimental condition) at the level of single subject, this approach could be usefully applied to assess the neural correlates of other complex cognitive functions in patients implanted with multiple electrodes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we study the relevance of multiple kernel learning (MKL) for the automatic selection of time series inputs. Recently, MKL has gained great attention in the machine learning community due to its flexibility in modelling complex patterns and performing feature selection. In general, MKL constructs the kernel as a weighted linear combination of basis kernels, exploiting different sources of information. An efficient algorithm wrapping a Support Vector Regression model for optimizing the MKL weights, named SimpleMKL, is used for the analysis. In this sense, MKL performs feature selection by discarding inputs/kernels with low or null weights. The approach proposed is tested with simulated linear and nonlinear time series (AutoRegressive, Henon and Lorenz series).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The potential of type-2 fuzzy sets for managing high levels of uncertainty in the subjective knowledge of experts or of numerical information has focused on control and pattern classification systems in recent years. One of the main challenges in designing a type-2 fuzzy logic system is how to estimate the parameters of type-2 fuzzy membership function (T2MF) and the Footprint of Uncertainty (FOU) from imperfect and noisy datasets. This paper presents an automatic approach for learning and tuning Gaussian interval type-2 membership functions (IT2MFs) with application to multi-dimensional pattern classification problems. T2MFs and their FOUs are tuned according to the uncertainties in the training dataset by a combination of genetic algorithm (GA) and crossvalidation techniques. In our GA-based approach, the structure of the chromosome has fewer genes than other GA methods and chromosome initialization is more precise. The proposed approach addresses the application of the interval type-2 fuzzy logic system (IT2FLS) for the problem of nodule classification in a lung Computer Aided Detection (CAD) system. The designed IT2FLS is compared with its type-1 fuzzy logic system (T1FLS) counterpart. The results demonstrate that the IT2FLS outperforms the T1FLS by more than 30% in terms of classification accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context: Ovarian tumors (OT) typing is a competency expected from pathologists, with significant clinical implications. OT however come in numerous different types, some rather rare, with the consequence of few opportunities for practice in some departments. Aim: Our aim was to design a tool for pathologists to train in less common OT typing. Method and Results: Representative slides of 20 less common OT were scanned (Nano Zoomer Digital Hamamatsu®) and the diagnostic algorithm proposed by Young and Scully applied to each case (Young RH and Scully RE, Seminars in Diagnostic Pathology 2001, 18: 161-235) to include: recognition of morphological pattern(s); shortlisting of differential diagnosis; proposition of relevant immunohistochemical markers. The next steps of this project will be: evaluation of the tool in several post-graduate training centers in Europe and Québec; improvement of its design based on evaluation results; diffusion to a larger public. Discussion: In clinical medicine, solving many cases is recognized as of utmost importance for a novice to become an expert. This project relies on the virtual slides technology to provide pathologists with a learning tool aimed at increasing their skills in OT typing. After due evaluation, this model might be extended to other uncommon tumors.