958 resultados para Probabilistic generalization


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work focuses on the prediction of the two main nitrogenous variables that describe the water quality at the effluent of a Wastewater Treatment Plant. We have developed two kind of Neural Networks architectures based on considering only one output or, in the other hand, the usual five effluent variables that define the water quality: suspended solids, biochemical organic matter, chemical organic matter, total nitrogen and total Kjedhal nitrogen. Two learning techniques based on a classical adaptative gradient and a Kalman filter have been implemented. In order to try to improve generalization and performance we have selected variables by means genetic algorithms and fuzzy systems. The training, testing and validation sets show that the final networks are able to learn enough well the simulated available data specially for the total nitrogen

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A comment about the article “Local sensitivity analysis for compositional data with application to soil texture in hydrologic modelling” writen by L. Loosvelt and co-authors. The present comment is centered in three specific points. The first one is related to the fact that the authors avoid the use of ilr-coordinates. The second one refers to some generalization of sensitivity analysis when input parameters are compositional. The third tries to show that the role of the Dirichlet distribution in the sensitivity analysis is irrelevant

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: The analysis of molecular coevolution provides information on the potential functional and structural implication of positions along DNA sequences, and several methods are available to identify coevolving positions using probabilistic or combinatorial approaches. The specific nucleotide or amino acid profile associated with the coevolution process is, however, not estimated, but only known profiles, such as the Watson-Crick constraint, are usually considered a priori in current measures of coevolution. RESULTS: Here, we propose a new probabilistic model, Coev, to identify coevolving positions and their associated profile in DNA sequences while incorporating the underlying phylogenetic relationships. The process of coevolution is modeled by a 16 × 16 instantaneous rate matrix that includes rates of transition as well as a profile of coevolution. We used simulated, empirical and illustrative data to evaluate our model and to compare it with a model of 'independent' evolution using Akaike Information Criterion. We showed that the Coev model is able to discriminate between coevolving and non-coevolving positions and provides better specificity and specificity than other available approaches. We further demonstrate that the identification of the profile of coevolution can shed new light on the process of dependent substitution during lineage evolution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent single-cell studies in monkeys (Romo et al., 2004) show that the activity of neurons in the ventral premotor cortex covaries with the animal's decisions in a perceptual comparison task regarding the frequency of vibrotactile events. The firing rate response of these neurons was dependent only on the frequency differences between the two applied vibrations, the sign of that difference being the determining factor for correct task performance. We present a biophysically realistic neurodynamical model that can account for the most relevant characteristics of this decision-making-related neural activity. One of the nontrivial predictions of this model is that Weber's law will underlie the perceptual discrimination behavior. We confirmed this prediction in behavioral tests of vibrotactile discrimination in humans and propose a computational explanation of perceptual discrimination that accounts naturally for the emergence of Weber's law. We conclude that the neurodynamical mechanisms and computational principles underlying the decision-making processes in this perceptual discrimination task are consistent with a fluctuation-driven scenario in a multistable regime.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

OBJECTIVES: Clinical staging is widespread in medicine - it informs prognosis, clinical course, and treatment, and assists individualized care. Staging places an individual on a probabilistic continuum of increasing potential disease severity, ranging from clinically at-risk or latency stage through first threshold episode of illness or recurrence, and, finally, to late or end-stage disease. The aim of the present paper was to examine and update the evidence regarding staging in bipolar disorder, and how this might inform targeted and individualized intervention approaches. METHODS: We provide a narrative review of the relevant information. RESULTS: In bipolar disorder, the validity of staging is informed by a range of findings that accompany illness progression, including neuroimaging data suggesting incremental volume loss, cognitive changes, and a declining likelihood of response to pharmacological and psychosocial treatments. Staging informs the adoption of a number of approaches, including the active promotion of both indicated prevention for at-risk individuals and early intervention strategies for newly diagnosed individuals, and the tailored implementation of treatments according to the stage of illness. CONCLUSIONS: The nature of bipolar disorder implies the presence of an active process of neuroprogression that is considered to be at least partly mediated by inflammation, oxidative stress, apoptosis, and changes in neurogenesis. It further supports the concept of neuroprotection, in that a diversity of agents have putative effects against these molecular targets. Clinically, staging suggests that the at-risk state or first episode is a period that requires particularly active and broad-based treatment, consistent with the hope that the temporal trajectory of the illness can be altered. Prompt treatment may be potentially neuroprotective and attenuate the neurostructural and neurocognitive changes that emerge with chronicity. Staging highlights the need for interventions at a service delivery level and implementing treatments at the earliest stage of illness possible.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a seabed profile estimation and following method for close proximity inspection of 3D underwater structures using autonomous underwater vehicles (AUVs). The presented method is used to determine a path allowing the AUV to pass its sensors over all points of the target structure, which is known as coverage path planning. Our profile following method goes beyond traditional seabed following at a safe altitude and exploits hovering capabilities of recent AUV developments. A range sonar is used to incrementally construct a local probabilistic map representation of the environment and estimates of the local profile are obtained via linear regression. Two behavior-based controllers use these estimates to perform horizontal and vertical profile following. We build upon these tools to address coverage path planning for 3D underwater structures using a (potentially inaccurate) prior map and following cross-section profiles of the target structure. The feasibility of the proposed method is demonstrated using the GIRONA 500 AUV both in simulation using synthetic and real-world bathymetric data and in pool trials

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To compare the cost and effectiveness of the levonorgestrel-releasing intrauterine system (LNG-IUS) versus combined oral contraception (COC) and progestogens (PROG) in first-line treatment of dysfunctional uterine bleeding (DUB) in Spain. STUDY DESIGN: A cost-effectiveness and cost-utility analysis of LNG-IUS, COC and PROG was carried out using a Markov model based on clinical data from the literature and expert opinion. The population studied were women with a previous diagnosis of idiopathic heavy menstrual bleeding. The analysis was performed from the National Health System perspective, discounting both costs and future effects at 3%. In addition, a sensitivity analysis (univariate and probabilistic) was conducted. RESULTS: The results show that the greater efficacy of LNG-IUS translates into a gain of 1.92 and 3.89 symptom-free months (SFM) after six months of treatment versus COC and PROG, respectively (which represents an increase of 33% and 60% of symptom-free time). Regarding costs, LNG-IUS produces savings of 174.2-309.95 and 230.54-577.61 versus COC and PROG, respectively, after 6 months-5 years. Apart from cost savings and gains in SFM, quality-adjusted life months (QALM) are also favourable to LNG-IUS in all scenarios, with a range of gains between 1 and 2 QALM compared to COC and PROG. CONCLUSIONS: The results indicate that first-line use of the LNG-IUS is the dominant therapeutic option (less costly and more effective) in comparison with first-line use of COC or PROG for the treatment of DUB in Spain. LNG-IUS as first line is also the option that provides greatest health-related quality of life to patients.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We study steady states in d-dimensional lattice systems that evolve in time by a probabilistic majority rule, which corresponds to the zero-temperature limit of a system with conflicting dynamics. The rule satisfies detailed balance for d=1 but not for d>1. We find numerically nonequilibrium critical points of the Ising class for d=2 and 3.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a probabilistic approach to model the problem of power supply voltage fluctuations. Error probability calculations are shown for some 90-nm technology digital circuits.The analysis here considered gives the timing violation error probability as a new design quality factor in front of conventional techniques that assume the full perfection of the circuit. The evaluation of the error bound can be useful for new design paradigms where retry and self-recoveringtechniques are being applied to the design of high performance processors. The method here described allows to evaluate the performance of these techniques by means of calculating the expected error probability in terms of power supply distribution quality.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of robust beamformer design for mobile communicationsapplications in the presence of moving co-channel sources isaddressed. A generalization of the optimum beamformer based on a statisticalmodel accounting for source movement is proposed. The new methodis easily implemented and is shown to offer dramatic improvements overconventional optimum beamforming for moving sources under a varietyof operating conditions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Diplomityössä on tutustuttu ydinvoimalaitosten paloriskejä käsittelevään todennäköisyyspohjaiseen turvallisuusanalyysiin. Tavoitteena on ollut Olkiluoto 1 ja 2 laitosyksiköiden paloanalyysimenetelmän kehittäminen. Työssä esitetään paloanalyysin pääpiirteet, kaksi erilaista palotaajuuksien estimointimenetelmää sekä palojen leviämisen arviointimenetelmiä. Palotaajuuksien estimointimenetelmistä keskitytään Berryn menetelmän sekä NUREG/CR-6850-palotaajuuslaskentamenetelmän tarkasteluun. Palon leviämisen arvioinnissa on esitetty kolmen erilaisen virtausteknisen laskentatyökalun perusteet sekä palon leviämistodennäköisyyksiä arvioivan Probabilistic Fire Simulator (PFS) -ohjelman käyttöä. Työn aikana on laskettu molemmilla palotaajuuden estimointimenetelmillä palotaajuuksia eri tyyppisille huonetiloille. Berryn menetelmän palotaajuudet olivat pääosin alhaisempia kuin NUREG/CR-6850-menetelmällä lasketut palotaajuudet. Palon leviämistarkastelussa on tutkittu ydinvoimalaitoksen relehuoneen tulipaloa. PFS:n avulla laskettujen leviämistodennäköisyyksien arvoja on vertailtu TVO:n paloanalyysissa käytettyihin kvalitatiivisiin peittokertoimiin. Palon leviämistodennäköisyys eri osajärjestelmien välillä todettiin suuresti riippuvan analyysissaoletetuista vaurioitumislämpötiloista. Tutkittuja menetelmiä hyödyntäen diplomityössä kehitettiin paloanalyysimenetelmäkuvaus. Menetelmäkuvauksessa huonetilojen paloriskit kartoitetaan aluksi Berryn menetelmällä. Näin kaikille laitoksen huonetiloille saadaan arvioitua palotaajuus sekä paloalkutapahtumaluokkien sydänvauriotaajuus. Seuraavaksi suoritetaan valintamenettely, jossa valitut kriteerit täyttäville huonetiloille tehdään tarkentava palotaajuuslaskenta. Tarkentava palotaajuuslaskenta perustuu NUREG/CR-6850-menetelmän mukaisesti huonetilojen realistisiin syttymislähteisiin. Kriittisimpien huonetilojen osalta palon leviämisen arviointiin on tarkoitus hyödyntää numeerista simulointia.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

 Main goal of this thesis was to implement a localization system which uses sonars and WLAN intensity maps to localize an indoor mobile robot. A probabilistic localization method, Monte Carlo Localization is used in localization. Also the theory behind probabilistic localization is explained. Two main problems in mobile robotics, path tracking and global localization, are solved in this thesis. Implemented system can achieve acceptable performance in path tracking. Global localization using WLAN received signal strength information is shown to provide good results, which can be used to localize the robot accurately, but also some bad results, which are no use when trying to localize the robot to the correct place. Main goal of solving ambiguity in office like environment is achieved in many test cases.