67 resultados para Distributed algorithms
em Université de Lausanne, Switzerland
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.
Resumo:
The paper presents an approach for mapping of precipitation data. The main goal is to perform spatial predictions and simulations of precipitation fields using geostatistical methods (ordinary kriging, kriging with external drift) as well as machine learning algorithms (neural networks). More practically, the objective is to reproduce simultaneously both the spatial patterns and the extreme values. This objective is best reached by models integrating geostatistics and machine learning algorithms. To demonstrate how such models work, two case studies have been considered: first, a 2-day accumulation of heavy precipitation and second, a 6-day accumulation of extreme orographic precipitation. The first example is used to compare the performance of two optimization algorithms (conjugate gradients and Levenberg-Marquardt) of a neural network for the reproduction of extreme values. Hybrid models, which combine geostatistical and machine learning algorithms, are also treated in this context. The second dataset is used to analyze the contribution of radar Doppler imagery when used as external drift or as input in the models (kriging with external drift and neural networks). Model assessment is carried out by comparing independent validation errors as well as analyzing data patterns.
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
To make a comprehensive evaluation of organ-specific out-of-field doses using Monte Carlo (MC) simulations for different breast cancer irradiation techniques and to compare results with a commercial treatment planning system (TPS). Three breast radiotherapy techniques using 6MV tangential photon beams were compared: (a) 2DRT (open rectangular fields), (b) 3DCRT (conformal wedged fields), and (c) hybrid IMRT (open conformal+modulated fields). Over 35 organs were contoured in a whole-body CT scan and organ-specific dose distributions were determined with MC and the TPS. Large differences in out-of-field doses were observed between MC and TPS calculations, even for organs close to the target volume such as the heart, the lungs and the contralateral breast (up to 70% difference). MC simulations showed that a large fraction of the out-of-field dose comes from the out-of-field head scatter fluence (>40%) which is not adequately modeled by the TPS. Based on MC simulations, the 3DCRT technique using external wedges yielded significantly higher doses (up to a factor 4-5 in the pelvis) than the 2DRT and the hybrid IMRT techniques which yielded similar out-of-field doses. In sharp contrast to popular belief, the IMRT technique investigated here does not increase the out-of-field dose compared to conventional techniques and may offer the most optimal plan. The 3DCRT technique with external wedges yields the largest out-of-field doses. For accurate out-of-field dose assessment, a commercial TPS should not be used, even for organs near the target volume (contralateral breast, lungs, heart).
Resumo:
Résumé Introduction : La chirurgie de la maladie de Hirschsprung est fréquemment compliquée d'une atteinte post-opératoire de la motilité intestinale. Des anomalies du système nerveux entérique (SNE) telles que la dysplasie neuronale intestinale de type B, l'hypoganglionose ou l'aganglionose, présents dans le segment abaissé, peuvent être la cause de certaines de ces complications mais aucune information n'est disponible quant au rôle des cellules interstitielles de Cajal (CIC) sur la motilité intestinale dans la phase post-opératoire. Ces cellules sont considérées avoir un rôle de pacemaker dans le tractus gastro-intestinal. L'objectif de cette étude était de décrire la distribution des CIC dans le segment proximal du côlon réséqué lors de cures chirurgicales de maladie de Hirschsprung et de confronter ces observations à l'évolution clinique post-opératoire. Matériel et Méthodes : L'incidence des complications post-opératoires a été déterminée par une revue rétrospective des dossiers de 48 patients opérés pour maladie de Hirschspung entre 1977 et 1999 et par l'étude histologique et immuno-histochimique des pièces réséquées chez ces patients. Nous avons comparé la distribution des CIC dans le segment proximal du côlon avec celle du côlon sain de 16 enfants contrôles par microscopie optique. L'immunohistochimie au c-Kit a été utilisée pour marquer spécifiquement les CIC sur échantillons paraffinés. Ces résultats ont ensuite été corrélés avec l'étude du SNE de ces mêmes segments, déterminée par immunohistochimie au CD56 et au protein gene product 9.5. Résultats Les complications post-opératoires suivantes furent identifiées : constipation 46%, constipation avec incontinence 15%, entérocolite 8%, décès 4% (probablement sur entérocolite). La distribution des CIC dans les segments proximaux réséqués chez les enfants avec maladie de Hirschsprung était identique à celle observée dans les segments de côlon sain, et ce indépendamment de la distribution normale ou anormale du SNE. Chez les enfants opérés pour maladie de Hirschsprung les segments réséqués présentaient les anomalies d'innervation suivantes : aganglionose 10.4%, hypoganglionose 12.5%, dysplasie neuronale intestinale de type B 6.3%, autres dysganglionoses 14.6%. Aucune relation entre ces anomalies d'innervation et les complications post-opératoires n'a été mise en évidence. Conclusion : La distribution des CIC est normale chez les patient opérés pour maladie de Hirschsprung, et ne contribue donc pas aux atteintes post-opératoires de la motilité intestinale. Cela signifie aussi que le réseau de CIC se développe noinialement dans le côlon humain, même en présence d'une innervation colique anormale ou absente. Abstract: Surgery for Hirschsprung's disease is often complicated by post-operative bowel motility disorders. The impact of intestinal neural histology on the surgical outcome has been previously studied, but no information is available concerning the influence of the distribution of interstitial cells of Cajal (ICC) on these complications. These cells are considered to be pacemakers in the gastrointestinal tract. The aim of this study was to assess the distribution of ICC in the proximal segment of resected bowel in Hirschsprung's disease and confront these results with the clinical outcome. Using immunohistochemistry for light microscopy, we compared the pattern of distribution of ICC in the proximal segment of resected bowel in Hirschsprung's disease with that in normal colon. We correlated these results with the corresponding neural intestinal histology determined by CD56 and the protein gene product 9.5 immunohistochemistry. The distribution of ICC in the proximal segment of resected bowel is identical to that of normal colon, regardless of normal or abnormal colon innervation. ICC distribution does not seem to contribute to post-operative bowel motility disorders in patients operated for Hirschsprung's disease.