843 resultados para Artificial intelligence algorithms


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Estudio e implantación de algoritmos de recomendación, búsqueda, ranking y aprendizaje.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis develops a model for the topological structure of situations. In this model, the topological structure of space is altered by the presence or absence of boundaries, such as those at the edges of objects. This allows the intuitive meaning of topological concepts such as region connectivity, function continuity, and preservation of topological structure to be modeled using the standard mathematical definitions. The thesis shows that these concepts are important in a wide range of artificial intelligence problems, including low-level vision, high-level vision, natural language semantics, and high-level reasoning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a novel approach for multispectral image contextual classification by combining iterative combinatorial optimization algorithms. The pixel-wise decision rule is defined using a Bayesian approach to combine two MRF models: a Gaussian Markov Random Field (GMRF) for the observations (likelihood) and a Potts model for the a priori knowledge, to regularize the solution in the presence of noisy data. Hence, the classification problem is stated according to a Maximum a Posteriori (MAP) framework. In order to approximate the MAP solution we apply several combinatorial optimization methods using multiple simultaneous initializations, making the solution less sensitive to the initial conditions and reducing both computational cost and time in comparison to Simulated Annealing, often unfeasible in many real image processing applications. Markov Random Field model parameters are estimated by Maximum Pseudo-Likelihood (MPL) approach, avoiding manual adjustments in the choice of the regularization parameters. Asymptotic evaluations assess the accuracy of the proposed parameter estimation procedure. To test and evaluate the proposed classification method, we adopt metrics for quantitative performance assessment (Cohen`s Kappa coefficient), allowing a robust and accurate statistical analysis. The obtained results clearly show that combining sub-optimal contextual algorithms significantly improves the classification performance, indicating the effectiveness of the proposed methodology. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents vectorized methods of construction and descent of quadtrees that can be easily adapted to message passing parallel computing. A time complexity analysis for the present approach is also discussed. The proposed method of tree construction requires a hash table to index nodes of a linear quadtree in the breadth-first order. The hash is performed in two steps: an internal hash to index child nodes and an external hash to index nodes in the same level (depth). The quadtree descent is performed by considering each level as a vector segment of a linear quadtree, so that nodes of the same level can be processed concurrently. © 2012 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have developed an algorithm using a Design of Experiments technique for reduction of search-space in global optimization problems. Our approach is called Domain Optimization Algorithm. This approach can efficiently eliminate search-space regions with low probability of containing a global optimum. The Domain Optimization Algorithm approach is based on eliminating non-promising search-space regions, which are identifyed using simple models (linear) fitted to the data. Then, we run a global optimization algorithm starting its population inside the promising region. The proposed approach with this heuristic criterion of population initialization has shown relevant results for tests using hard benchmark functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design of a network is a solution to several engineering and science problems. Several network design problems are known to be NP-hard, and population-based metaheuristics like evolutionary algorithms (EAs) have been largely investigated for such problems. Such optimization methods simultaneously generate a large number of potential solutions to investigate the search space in breadth and, consequently, to avoid local optima. Obtaining a potential solution usually involves the construction and maintenance of several spanning trees, or more generally, spanning forests. To efficiently explore the search space, special data structures have been developed to provide operations that manipulate a set of spanning trees (population). For a tree with n nodes, the most efficient data structures available in the literature require time O(n) to generate a new spanning tree that modifies an existing one and to store the new solution. We propose a new data structure, called node-depth-degree representation (NDDR), and we demonstrate that using this encoding, generating a new spanning forest requires average time O(root n). Experiments with an EA based on NDDR applied to large-scale instances of the degree-constrained minimum spanning tree problem have shown that the implementation adds small constants and lower order terms to the theoretical bound.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heuristic optimization algorithms are of great importance for reaching solutions to various real world problems. These algorithms have a wide range of applications such as cost reduction, artificial intelligence, and medicine. By the term cost, one could imply that that cost is associated with, for instance, the value of a function of several independent variables. Often, when dealing with engineering problems, we want to minimize the value of a function in order to achieve an optimum, or to maximize another parameter which increases with a decrease in the cost (the value of this function). The heuristic cost reduction algorithms work by finding the optimum values of the independent variables for which the value of the function (the “cost”) is the minimum. There is an abundance of heuristic cost reduction algorithms to choose from. We will start with a discussion of various optimization algorithms such as Memetic algorithms, force-directed placement, and evolution-based algorithms. Following this initial discussion, we will take up the working of three algorithms and implement the same in MATLAB. The focus of this report is to provide detailed information on the working of three different heuristic optimization algorithms, and conclude with a comparative study on the performance of these algorithms when implemented in MATLAB. In this report, the three algorithms we will take in to consideration will be the non-adaptive simulated annealing algorithm, the adaptive simulated annealing algorithm, and random restart hill climbing algorithm. The algorithms are heuristic in nature, that is, the solution these achieve may not be the best of all the solutions but provide a means to reach a quick solution that may be a reasonably good solution without taking an indefinite time to implement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diabetes is the most common disease nowadays in all populations and in all age groups. Different techniques of artificial intelligence has been applied to diabetes problem. This research proposed the artificial metaplasticity on multilayer perceptron (AMMLP) as prediction model for prediction of diabetes. The Pima Indians diabetes was used to test the proposed model AMMLP. The results obtained by AMMLP were compared with other algorithms, recently proposed by other researchers, that were applied to the same database. The best result obtained so far with the AMMLP algorithm is 89.93%

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the Bonner spheres spectrometer neutron spectrum is obtained through an unfolding procedure. Monte Carlo methods, Regularization, Parametrization, Least-squares, and Maximum Entropy are some of the techniques utilized for unfolding. In the last decade methods based on Artificial Intelligence Technology have been used. Approaches based on Genetic Algorithms and Artificial Neural Networks have been developed in order to overcome the drawbacks of previous techniques. Nevertheless the advantages of Artificial Neural Networks still it has some drawbacks mainly in the design process of the network, vg the optimum selection of the architectural and learning ANN parameters. In recent years the use of hybrid technologies, combining Artificial Neural Networks and Genetic Algorithms, has been utilized to. In this work, several ANN topologies were trained and tested using Artificial Neural Networks and Genetically Evolved Artificial Neural Networks in the aim to unfold neutron spectra using the count rates of a Bonner sphere spectrometer. Here, a comparative study of both procedures has been carried out.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A nivel mundial, el cáncer de mama es el tipo de cáncer más frecuente además de una de las principales causas de muerte entre la población femenina. Actualmente, el método más eficaz para detectar lesiones mamarias en una etapa temprana es la mamografía. Ésta contribuye decisivamente al diagnóstico precoz de esta enfermedad que, si se detecta a tiempo, tiene una probabilidad de curación muy alta. Uno de los principales y más frecuentes hallazgos en una mamografía, son las microcalcificaciones, las cuales son consideradas como un indicador importante de cáncer de mama. En el momento de analizar las mamografías, factores como la capacidad de visualización, la fatiga o la experiencia profesional del especialista radiólogo hacen que el riesgo de omitir ciertas lesiones presentes se vea incrementado. Para disminuir dicho riesgo es importante contar con diferentes alternativas como por ejemplo, una segunda opinión por otro especialista o un doble análisis por el mismo. En la primera opción se eleva el coste y en ambas se prolonga el tiempo del diagnóstico. Esto supone una gran motivación para el desarrollo de sistemas de apoyo o asistencia en la toma de decisiones. En este trabajo de tesis se propone, se desarrolla y se justifica un sistema capaz de detectar microcalcificaciones en regiones de interés extraídas de mamografías digitalizadas, para contribuir a la detección temprana del cáncer demama. Dicho sistema estará basado en técnicas de procesamiento de imagen digital, de reconocimiento de patrones y de inteligencia artificial. Para su desarrollo, se tienen en cuenta las siguientes consideraciones: 1. Con el objetivo de entrenar y probar el sistema propuesto, se creará una base de datos de imágenes, las cuales pertenecen a regiones de interés extraídas de mamografías digitalizadas. 2. Se propone la aplicación de la transformada Top-Hat, una técnica de procesamiento digital de imagen basada en operaciones de morfología matemática. La finalidad de aplicar esta técnica es la de mejorar el contraste entre las microcalcificaciones y el tejido presente en la imagen. 3. Se propone un algoritmo novel llamado sub-segmentación, el cual está basado en técnicas de reconocimiento de patrones aplicando un algoritmo de agrupamiento no supervisado, el PFCM (Possibilistic Fuzzy c-Means). El objetivo es encontrar las regiones correspondientes a las microcalcificaciones y diferenciarlas del tejido sano. Además, con la finalidad de mostrar las ventajas y desventajas del algoritmo propuesto, éste es comparado con dos algoritmos del mismo tipo: el k-means y el FCM (Fuzzy c-Means). Por otro lado, es importante destacar que en este trabajo por primera vez la sub-segmentación es utilizada para detectar regiones pertenecientes a microcalcificaciones en imágenes de mamografía. 4. Finalmente, se propone el uso de un clasificador basado en una red neuronal artificial, específicamente un MLP (Multi-layer Perceptron). El propósito del clasificador es discriminar de manera binaria los patrones creados a partir de la intensidad de niveles de gris de la imagen original. Dicha clasificación distingue entre microcalcificación y tejido sano. ABSTRACT Breast cancer is one of the leading causes of women mortality in the world and its early detection continues being a key piece to improve the prognosis and survival. Currently, the most reliable and practical method for early detection of breast cancer is mammography.The presence of microcalcifications has been considered as a very important indicator ofmalignant types of breast cancer and its detection and classification are important to prevent and treat the disease. However, the detection and classification of microcalcifications continue being a hard work due to that, in mammograms there is a poor contrast between microcalcifications and the tissue around them. Factors such as visualization, tiredness or insufficient experience of the specialist increase the risk of omit some present lesions. To reduce this risk, is important to have alternatives such as a second opinion or a double analysis for the same specialist. In the first option, the cost increases and diagnosis time also increases for both of them. This is the reason why there is a great motivation for development of help systems or assistance in the decision making process. This work presents, develops and justifies a system for the detection of microcalcifications in regions of interest extracted fromdigitizedmammographies to contribute to the early detection of breast cancer. This systemis based on image processing techniques, pattern recognition and artificial intelligence. For system development the following features are considered: With the aim of training and testing the system, an images database is created, belonging to a region of interest extracted from digitized mammograms. The application of the top-hat transformis proposed. This image processing technique is based on mathematical morphology operations. The aim of this technique is to improve the contrast betweenmicrocalcifications and tissue present in the image. A novel algorithm called sub-segmentation is proposed. The sub-segmentation is based on pattern recognition techniques applying a non-supervised clustering algorithm known as Possibilistic Fuzzy c-Means (PFCM). The aim is to find regions corresponding to the microcalcifications and distinguish them from the healthy tissue. Furthermore,with the aim of showing themain advantages and disadvantages this is compared with two algorithms of same type: the k-means and the fuzzy c-means (FCM). On the other hand, it is important to highlight in this work for the first time the sub-segmentation is used for microcalcifications detection. Finally, a classifier based on an artificial neural network such as Multi-layer Perceptron is used. The purpose of this classifier is to discriminate froma binary perspective the patterns built from gray level intensity of the original image. This classification distinguishes between microcalcifications and healthy tissue.