781 resultados para Machine Learning Algorithm
Resumo:
Nowadays, the joint exploitation of images acquired daily by remote sensing instruments and of images available from archives allows a detailed monitoring of the transitions occurring at the surface of the Earth. These modifications of the land cover generate spectral discrepancies that can be detected via the analysis of remote sensing images. Independently from the origin of the images and of type of surface change, a correct processing of such data implies the adoption of flexible, robust and possibly nonlinear method, to correctly account for the complex statistical relationships characterizing the pixels of the images. This Thesis deals with the development and the application of advanced statistical methods for multi-temporal optical remote sensing image processing tasks. Three different families of machine learning models have been explored and fundamental solutions for change detection problems are provided. In the first part, change detection with user supervision has been considered. In a first application, a nonlinear classifier has been applied with the intent of precisely delineating flooded regions from a pair of images. In a second case study, the spatial context of each pixel has been injected into another nonlinear classifier to obtain a precise mapping of new urban structures. In both cases, the user provides the classifier with examples of what he believes has changed or not. In the second part, a completely automatic and unsupervised method for precise binary detection of changes has been proposed. The technique allows a very accurate mapping without any user intervention, resulting particularly useful when readiness and reaction times of the system are a crucial constraint. In the third, the problem of statistical distributions shifting between acquisitions is studied. Two approaches to transform the couple of bi-temporal images and reduce their differences unrelated to changes in land cover are studied. The methods align the distributions of the images, so that the pixel-wise comparison could be carried out with higher accuracy. Furthermore, the second method can deal with images from different sensors, no matter the dimensionality of the data nor the spectral information content. This opens the doors to possible solutions for a crucial problem in the field: detecting changes when the images have been acquired by two different sensors.
Resumo:
Student guidance is an always desired characteristic in any educational system, butit represents special difficulty if it has to be deployed in an automated way to fulfilsuch needs in a computer supported educational tool. In this paper we explorepossible avenues relying on machine learning techniques, to be included in a nearfuture -in the form of a tutoring navigational tool- in a teleeducation platform -InterMediActor- currently under development. Since no data from that platform isavailable yet, the preliminary experiments presented in this paper are builtinterpreting every subject in the Telecommunications Degree at Universidad CarlosIII de Madrid as an aggregated macro-competence (following the methodologicalconsiderations in InterMediActor), such that marks achieved by students can beused as data for the models, to be replaced in a near future by real data directlymeasured inside InterMediActor. We evaluate the predictability of students qualifications, and we deploy a preventive early detection system -failure alert-, toidentify those students more prone to fail a certain subject such that correctivemeans can be deployed with sufficient anticipation.
Resumo:
Machine learning and pattern recognition methods have been used to diagnose Alzheimer's disease (AD) and mild cognitive impairment (MCI) from individual MRI scans. Another application of such methods is to predict clinical scores from individual scans. Using relevance vector regression (RVR), we predicted individuals' performances on established tests from their MRI T1 weighted image in two independent data sets. From Mayo Clinic, 73 probable AD patients and 91 cognitively normal (CN) controls completed the Mini-Mental State Examination (MMSE), Dementia Rating Scale (DRS), and Auditory Verbal Learning Test (AVLT) within 3months of their scan. Baseline MRI's from the Alzheimer's disease Neuroimaging Initiative (ADNI) comprised the other data set; 113 AD, 351 MCI, and 122 CN subjects completed the MMSE and Alzheimer's Disease Assessment Scale-Cognitive subtest (ADAS-cog) and 39 AD, 92 MCI, and 32 CN ADNI subjects completed MMSE, ADAS-cog, and AVLT. Predicted and actual clinical scores were highly correlated for the MMSE, DRS, and ADAS-cog tests (P<0.0001). Training with one data set and testing with another demonstrated stability between data sets. DRS, MMSE, and ADAS-Cog correlated better than AVLT with whole brain grey matter changes associated with AD. This result underscores their utility for screening and tracking disease. RVR offers a novel way to measure interactions between structural changes and neuropsychological tests beyond that of univariate methods. In clinical practice, we envision using RVR to aid in diagnosis and predict clinical outcome.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
We consider an agent who has to repeatedly make choices in an uncertainand changing environment, who has full information of the past, who discountsfuture payoffs, but who has no prior. We provide a learning algorithm thatperforms almost as well as the best of a given finite number of experts orbenchmark strategies and does so at any point in time, provided the agentis sufficiently patient. The key is to find the appropriate degree of forgettingdistant past. Standard learning algorithms that treat recent and distant pastequally do not have the sequential epsilon optimality property.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.
Resumo:
Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.
Resumo:
1. Identifying the boundary of a species' niche from observational and environmental data is a common problem in ecology and conservation biology and a variety of techniques have been developed or applied to model niches and predict distributions. Here, we examine the performance of some pattern-recognition methods as ecological niche models (ENMs). Particularly, one-class pattern recognition is a flexible and seldom used methodology for modelling ecological niches and distributions from presence-only data. The development of one-class methods that perform comparably to two-class methods (for presence/absence data) would remove modelling decisions about sampling pseudo-absences or background data points when absence points are unavailable. 2. We studied nine methods for one-class classification and seven methods for two-class classification (five common to both), all primarily used in pattern recognition and therefore not common in species distribution and ecological niche modelling, across a set of 106 mountain plant species for which presence-absence data was available. We assessed accuracy using standard metrics and compared trade-offs in omission and commission errors between classification groups as well as effects of prevalence and spatial autocorrelation on accuracy. 3. One-class models fit to presence-only data were comparable to two-class models fit to presence-absence data when performance was evaluated with a measure weighting omission and commission errors equally. One-class models were superior for reducing omission errors (i.e. yielding higher sensitivity), and two-classes models were superior for reducing commission errors (i.e. yielding higher specificity). For these methods, spatial autocorrelation was only influential when prevalence was low. 4. These results differ from previous efforts to evaluate alternative modelling approaches to build ENM and are particularly noteworthy because data are from exhaustively sampled populations minimizing false absence records. Accurate, transferable models of species' ecological niches and distributions are needed to advance ecological research and are crucial for effective environmental planning and conservation; the pattern-recognition approaches studied here show good potential for future modelling studies. This study also provides an introduction to promising methods for ecological modelling inherited from the pattern-recognition discipline.
Resumo:
Soil information is needed for managing the agricultural environment. The aim of this study was to apply artificial neural networks (ANNs) for the prediction of soil classes using orbital remote sensing products, terrain attributes derived from a digital elevation model and local geology information as data sources. This approach to digital soil mapping was evaluated in an area with a high degree of lithologic diversity in the Serra do Mar. The neural network simulator used in this study was JavaNNS and the backpropagation learning algorithm. For soil class prediction, different combinations of the selected discriminant variables were tested: elevation, declivity, aspect, curvature, curvature plan, curvature profile, topographic index, solar radiation, LS topographic factor, local geology information, and clay mineral indices, iron oxides and the normalized difference vegetation index (NDVI) derived from an image of a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) sensor. With the tested sets, best results were obtained when all discriminant variables were associated with geological information (overall accuracy 93.2 - 95.6 %, Kappa index 0.924 - 0.951, for set 13). Excluding the variable profile curvature (set 12), overall accuracy ranged from 93.9 to 95.4 % and the Kappa index from 0.932 to 0.948. The maps based on the neural network classifier were consistent and similar to conventional soil maps drawn for the study area, although with more spatial details. The results show the potential of ANNs for soil class prediction in mountainous areas with lithological diversity.
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.
Resumo:
The paper presents a novel method for monitoring network optimisation, based on a recent machine learning technique known as support vector machine. It is problem-oriented in the sense that it directly answers the question of whether the advised spatial location is important for the classification model. The method can be used to increase the accuracy of classification models by taking a small number of additional measurements. Traditionally, network optimisation is performed by means of the analysis of the kriging variances. The comparison of the method with the traditional approach is presented on a real case study with climate data.