969 resultados para K-nearest neighbour
Resumo:
Terrestrial laser scanning (TLS) is one of the most promising surveying techniques for rockslope characterization and monitoring. Landslide and rockfall movements can be detected by means of comparison of sequential scans. One of the most pressing challenges of natural hazards is combined temporal and spatial prediction of rockfall. An outdoor experiment was performed to ascertain whether the TLS instrumental error is small enough to enable detection of precursory displacements of millimetric magnitude. This consists of a known displacement of three objects relative to a stable surface. Results show that millimetric changes cannot be detected by the analysis of the unprocessed datasets. Displacement measurement are improved considerably by applying Nearest Neighbour (NN) averaging, which reduces the error (1¿) up to a factor of 6. This technique was applied to displacements prior to the April 2007 rockfall event at Castellfollit de la Roca, Spain. The maximum precursory displacement measured was 45 mm, approximately 2.5 times the standard deviation of the model comparison, hampering the distinction between actual displacement and instrumental error using conventional methodologies. Encouragingly, the precursory displacement was clearly detected by applying the NN averaging method. These results show that millimetric displacements prior to failure can be detected using TLS.
Resumo:
Avalanche forecasting is a complex process involving the assimilation of multiple data sources to make predictions over varying spatial and temporal resolutions. Numerically assisted forecasting often uses nearest neighbour methods (NN), which are known to have limitations when dealing with high dimensional data. We apply Support Vector Machines to a dataset from Lochaber, Scotland to assess their applicability in avalanche forecasting. Support Vector Machines (SVMs) belong to a family of theoretically based techniques from machine learning and are designed to deal with high dimensional data. Initial experiments showed that SVMs gave results which were comparable with NN for categorical and probabilistic forecasts. Experiments utilising the ability of SVMs to deal with high dimensionality in producing a spatial forecast show promise, but require further work.
Resumo:
Image registration has been proposed as an automatic method for recovering cardiac displacement fields from Tagged Magnetic Resonance Imaging (tMRI) sequences. Initially performed as a set of pairwise registrations, these techniques have evolved to the use of 3D+t deformation models, requiring metrics of joint image alignment (JA). However, only linear combinations of cost functions defined with respect to the first frame have been used. In this paper, we have applied k-Nearest Neighbors Graphs (kNNG) estimators of the -entropy (H ) to measure the joint similarity between frames, and to combine the information provided by different cardiac views in an unified metric. Experiments performed on six subjects showed a significantly higher accuracy (p < 0.05) with respect to a standard pairwise alignment (PA) approach in terms of mean positional error and variance with respect to manually placed landmarks. The developed method was used to study strains in patients with myocardial infarction, showing a consistency between strain, infarction location, and coronary occlusion. This paper also presentsan interesting clinical application of graph-based metric estimators, showing their value for solving practical problems found in medical imaging.
Resumo:
Abstract
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.
Resumo:
This paper presents a novel image classification scheme for benthic coral reef images that can be applied to both single image and composite mosaic datasets. The proposed method can be configured to the characteristics (e.g., the size of the dataset, number of classes, resolution of the samples, color information availability, class types, etc.) of individual datasets. The proposed method uses completed local binary pattern (CLBP), grey level co-occurrence matrix (GLCM), Gabor filter response, and opponent angle and hue channel color histograms as feature descriptors. For classification, either k-nearest neighbor (KNN), neural network (NN), support vector machine (SVM) or probability density weighted mean distance (PDWMD) is used. The combination of features and classifiers that attains the best results is presented together with the guidelines for selection. The accuracy and efficiency of our proposed method are compared with other state-of-the-art techniques using three benthic and three texture datasets. The proposed method achieves the highest overall classification accuracy of any of the tested methods and has moderate execution time. Finally, the proposed classification scheme is applied to a large-scale image mosaic of the Red Sea to create a completely classified thematic map of the reef benthos
Resumo:
Vaikka keraamisten laattojen valmistusprosessi onkin täysin automatisoitu, viimeinen vaihe eli laaduntarkistus ja luokittelu tehdään yleensä ihmisvoimin. Automaattinen laaduntarkastus laattojen valmistuksessa voidaan perustella taloudellisuus- ja turvallisuusnäkökohtien avulla. Tämän työn tarkoituksena on kuvata tutkimusprojektia keraamisten laattojen luokittelusta erilaisten väripiirteiden avulla. Oleellisena osana tutkittiin RGB- ja spektrikuvien välistä eroa. Työn teoreettinen osuus käy läpi aiemmin aiheesta tehdyn tutkimuksen sekä antaa taustatietoa konenäöstä, hahmontunnistuksesta, luokittelijoista sekä väriteoriasta. Käytännön osan aineistona oli 25 keraamista laattaa, jotka olivat viidestä eri luokasta. Luokittelussa käytettiin apuna k:n lähimmän naapurin (k-NN) luokittelijaa sekä itseorganisoituvaa karttaa (SOM). Saatuja tuloksia verrattiin myös ihmisten tekemään luokitteluun. Neuraalilaskenta huomattiin tärkeäksi työkaluksi spektrianalyysissä. SOM:n ja spektraalisten piirteiden avulla saadut tulokset olivat lupaavia ja ainoastaan kromatisoidut RGB-piirteet olivat luokittelussa parempia kuin nämä.
Resumo:
Changes in the angle of illumination incident upon a 3D surface texture can significantly alter its appearance, implying variations in the image texture. These texture variations produce displacements of class members in the feature space, increasing the failure rates of texture classifiers. To avoid this problem, a model-based texture recognition system which classifies textures seen from different distances and under different illumination directions is presented in this paper. The system works on the basis of a surface model obtained by means of 4-source colour photometric stereo, used to generate 2D image textures under different illumination directions. The recognition system combines coocurrence matrices for feature extraction with a Nearest Neighbour classifier. Moreover, the recognition allows one to guess the approximate direction of the illumination used to capture the test image
Resumo:
The feasibility of using augmented block designs and spatial analysis methods for early stage selection in eucalyptus breeding programs was tested. A total of 113 half-sib progenies of Eucalyptus urophylla and eight clones were evaluated in an 11 x 11 triple lattice experiment at two locations: Posto da Mata (Bahia, Brazil) and São Mateus (Minas Gerais, Brazil). Four checks were randomly allocated within each block. Plots consisted of 15 m long rows containing 6 plants spaced 3 m apart. The girth at breast height (cm/plant) was evaluated at 19 and 26 months of age. Variance analyses were performed according to the following methods: lattice design, randomized complete block design, augmented block design, Papadakis method, moving means method, and check plots. Comparisons among different methods were based on the magnitude of experimental errors and precision of the estimates of genetic and phenotypic parameters. General results indicated that augmented block design is useful to evaluate progenies and clones in early selection in eucalyptus breeding programs using moderate and low selection intensities. However, this design is not suitable for estimating genetic and phenotypic parameters due to its low precision. Check plots, nearest neighbour, Papadakis (1937), and moving means methods were efficient in removing the heterogeneity within blocks. These efficiencies were compared to that in lattice analysis for estimation of genetic and phenotypic parameters.
Resumo:
This paper aims to assess the effectiveness of ASTER imagery to support the mapping of Pittosporum undulatum, an invasive woody species, in Pico da Vara Natural Reserve (S. Miguel Island, Archipelago of the Azores, Portugal). This assessment was done by applying K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Maximum Likelihood (MLC) pixel-based supervised classifications to 4 different geographic and remote sensing datasets constituted by the Visible, Near-Infrared (VNIR) and Short Wave Infrared (SWIR) of the ASTER sensor and by digital cartography associated to orography (altitude and "distance to water streams") of which the spatial distribution of Pittosporum undulatum directly depends. Overall, most performed classifications showed a strong agreement and high accuracy. At targeted species level, the two higher classification accuracies were obtained when applying MLC and KNN to the VNIR bands coupled with auxiliary geographic information use. Results improved significantly by including ecology and occurrence information of species (altitude and distance to water streams) in the classification scheme. These results show that the use of ASTER sensor VNIR spectral bands, when coupled to relevant ancillary GIS data, can constitute an effective and low cost approach for the evaluation and continuous assessment of Pittosporum undulatum woodland propagation and distribution within Protected Areas of the Azores Islands.
Resumo:
In this thesis, a classi cation problem in predicting credit worthiness of a customer is tackled. This is done by proposing a reliable classi cation procedure on a given data set. The aim of this thesis is to design a model that gives the best classi cation accuracy to e ectively predict bankruptcy. FRPCA techniques proposed by Yang and Wang have been preferred since they are tolerant to certain type of noise in the data. These include FRPCA1, FRPCA2 and FRPCA3 from which the best method is chosen. Two di erent approaches are used at the classi cation stage: Similarity classi er and FKNN classi er. Algorithms are tested with Australian credit card screening data set. Results obtained indicate a mean classi cation accuracy of 83.22% using FRPCA1 with similarity classi- er. The FKNN approach yields a mean classi cation accuracy of 85.93% when used with FRPCA2, making it a better method for the suitable choices of the number of nearest neighbors and fuzziness parameters. Details on the calibration of the fuzziness parameter and other parameters associated with the similarity classi er are discussed.
Resumo:
Problem of modeling of anaesthesia depth level is studied in this Master Thesis. It applies analysis of EEG signals with nonlinear dynamics theory and further classification of obtained values. The main stages of this study are the following: data preprocessing; calculation of optimal embedding parameters for phase space reconstruction; obtaining reconstructed phase portraits of each EEG signal; formation of the feature set to characterise obtained phase portraits; classification of four different anaesthesia levels basing on previously estimated features. Classification was performed with: Linear and quadratic Discriminant Analysis, k Nearest Neighbours method and online clustering. In addition, this work provides overview of existing approaches to anaesthesia depth monitoring, description of basic concepts of nonlinear dynamics theory used in this Master Thesis and comparative analysis of several different classification methods.
Resumo:
Kandidaatintyö tehtiin osana PulpVision-tutkimusprojektia, jonka tarkoituksena on kehittää kuvapohjaisia laskenta- ja luokittelumetodeja sellun laaduntarkkailuun paperin valmistuksessa. Tämän tutkimusprojektin osana on aiemmin kehitetty metodi, jolla etsittiin kaarevia rakenteita kuvista, ja tätä metodia hyödynnettiin kuitujen etsintään kuvista. Tätä metodia käytettiin lähtökohtana kandidaatintyölle. Työn tarkoituksena oli tutkia, voidaanko erilaisista kuitukuvista laskettujen piirteiden avulla tunnistaa kuvassa olevien kuitujen laji. Näissä kuitukuvissa oli kuituja neljästä eri puulajista ja yhdestä kasvista. Nämä lajit olivat akasia, koivu, mänty, eukalyptus ja vehnä. Jokaisesta lajista valittiin 100 kuitukuvaa ja nämä kuvat jaettiin kahteen ryhmään, joista ensimmäistä käytettiin opetusryhmänä ja toista testausryhmänä. Opetusryhmän avulla jokaiselle kuitulajille laskettiin näitä kuvaavia piirteitä, joiden avulla pyrittiin tunnistamaan testausryhmän kuvissa olevat kuitulajit. Nämä kuvat oli tuottanut CEMIS-Oulu (Center for Measurement and Information Systems), joka on mittaustekniikkaan keskittynyt yksikkö Oulun yliopistossa. Yksittäiselle opetusryhmän kuitukuvalle laskettiin keskiarvot ja keskihajonnat kolmesta eri piirteestä, jotka olivat pituus, leveys ja kaarevuus. Lisäksi laskettiin, kuinka monta kuitua kuvasta löydettiin. Näiden piirteiden eri yhdistelmien avulla testattiin tunnistamisen tarkkuutta käyttämällä k:n lähimmän naapurin menetelmää ja Naiivi Bayes -luokitinta testausryhmän kuville. Testeistä saatiin lupaavia tuloksia muun muassa pituuden ja leveyden keskiarvoja käytettäessä saavutettiin jopa noin 98 %:n tarkkuus molemmilla algoritmeilla. Tunnistuksessa kuitujen keskimäärinen pituus vaikutti olevan kuitukuvia parhaiten kuvaava piirre. Käytettyjen algoritmien välillä ei ollut suurta vaihtelua tarkkuudessa. Testeissä saatujen tulosten perusteella voidaan todeta, että kuitukuvien tunnistaminen on mahdollista. Testien perusteella kuitukuvista tarvitsee laskea vain kaksi piirrettä, joilla kuidut voidaan tunnistaa tarkasti. Käytetyt lajittelualgoritmit olivat hyvin yksinkertaisia, mutta ne toimivat testeissä hyvin.
Resumo:
Expressions for the anharmonic Helmholtz free energy contributions up to o( f ) ,valid for all temperatures, have been obtained using perturbation theory for a c r ystal in which every atom is on a site of inversion symmetry. Numerical calculations have been carried out in the high temperature limit and in the non-leading term approximation for a monatomic facecentred cubic crystal with nearest neighbour c entralforce interactions. The numbers obtained were seen to vary by a s much as 47% from thos e obtai.ned in the leading term approximati.on,indicating that the latter approximati on is not in general very good. The convergence to oct) of the perturbation series in the high temperature limit appears satisfactory.