974 resultados para k-nearest neighbours
Resumo:
Vaikka keraamisten laattojen valmistusprosessi onkin täysin automatisoitu, viimeinen vaihe eli laaduntarkistus ja luokittelu tehdään yleensä ihmisvoimin. Automaattinen laaduntarkastus laattojen valmistuksessa voidaan perustella taloudellisuus- ja turvallisuusnäkökohtien avulla. Tämän työn tarkoituksena on kuvata tutkimusprojektia keraamisten laattojen luokittelusta erilaisten väripiirteiden avulla. Oleellisena osana tutkittiin RGB- ja spektrikuvien välistä eroa. Työn teoreettinen osuus käy läpi aiemmin aiheesta tehdyn tutkimuksen sekä antaa taustatietoa konenäöstä, hahmontunnistuksesta, luokittelijoista sekä väriteoriasta. Käytännön osan aineistona oli 25 keraamista laattaa, jotka olivat viidestä eri luokasta. Luokittelussa käytettiin apuna k:n lähimmän naapurin (k-NN) luokittelijaa sekä itseorganisoituvaa karttaa (SOM). Saatuja tuloksia verrattiin myös ihmisten tekemään luokitteluun. Neuraalilaskenta huomattiin tärkeäksi työkaluksi spektrianalyysissä. SOM:n ja spektraalisten piirteiden avulla saadut tulokset olivat lupaavia ja ainoastaan kromatisoidut RGB-piirteet olivat luokittelussa parempia kuin nämä.
Resumo:
Dirt counting and dirt particle characterisation of pulp samples is an important part of quality control in pulp and paper production. The need for an automatic image analysis system to consider dirt particle characterisation in various pulp samples is also very critical. However, existent image analysis systems utilise a single threshold to segment the dirt particles in different pulp samples. This limits their precision. Based on evidence, designing an automatic image analysis system that could overcome this deficiency is very useful. In this study, the developed Niblack thresholding method is proposed. The method defines the threshold based on the number of segmented particles. In addition, the Kittler thresholding is utilised. Both of these thresholding methods can determine the dirt count of the different pulp samples accurately as compared to visual inspection and the Digital Optical Measuring and Analysis System (DOMAS). In addition, the minimum resolution needed for acquiring a scanner image is defined. By considering the variation in dirt particle features, the curl shows acceptable difference to discriminate the bark and the fibre bundles in different pulp samples. Three classifiers, called k-Nearest Neighbour, Linear Discriminant Analysis and Multi-layer Perceptron are utilised to categorize the dirt particles. Linear Discriminant Analysis and Multi-layer Perceptron are the most accurate in classifying the segmented dirt particles by the Kittler thresholding with morphological processing. The result shows that the dirt particles are successfully categorized for bark and for fibre bundles.
Resumo:
This paper aims to assess the effectiveness of ASTER imagery to support the mapping of Pittosporum undulatum, an invasive woody species, in Pico da Vara Natural Reserve (S. Miguel Island, Archipelago of the Azores, Portugal). This assessment was done by applying K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Maximum Likelihood (MLC) pixel-based supervised classifications to 4 different geographic and remote sensing datasets constituted by the Visible, Near-Infrared (VNIR) and Short Wave Infrared (SWIR) of the ASTER sensor and by digital cartography associated to orography (altitude and "distance to water streams") of which the spatial distribution of Pittosporum undulatum directly depends. Overall, most performed classifications showed a strong agreement and high accuracy. At targeted species level, the two higher classification accuracies were obtained when applying MLC and KNN to the VNIR bands coupled with auxiliary geographic information use. Results improved significantly by including ecology and occurrence information of species (altitude and distance to water streams) in the classification scheme. These results show that the use of ASTER sensor VNIR spectral bands, when coupled to relevant ancillary GIS data, can constitute an effective and low cost approach for the evaluation and continuous assessment of Pittosporum undulatum woodland propagation and distribution within Protected Areas of the Azores Islands.
Resumo:
In this thesis, a classi cation problem in predicting credit worthiness of a customer is tackled. This is done by proposing a reliable classi cation procedure on a given data set. The aim of this thesis is to design a model that gives the best classi cation accuracy to e ectively predict bankruptcy. FRPCA techniques proposed by Yang and Wang have been preferred since they are tolerant to certain type of noise in the data. These include FRPCA1, FRPCA2 and FRPCA3 from which the best method is chosen. Two di erent approaches are used at the classi cation stage: Similarity classi er and FKNN classi er. Algorithms are tested with Australian credit card screening data set. Results obtained indicate a mean classi cation accuracy of 83.22% using FRPCA1 with similarity classi- er. The FKNN approach yields a mean classi cation accuracy of 85.93% when used with FRPCA2, making it a better method for the suitable choices of the number of nearest neighbors and fuzziness parameters. Details on the calibration of the fuzziness parameter and other parameters associated with the similarity classi er are discussed.
Resumo:
Kandidaatintyö tehtiin osana PulpVision-tutkimusprojektia, jonka tarkoituksena on kehittää kuvapohjaisia laskenta- ja luokittelumetodeja sellun laaduntarkkailuun paperin valmistuksessa. Tämän tutkimusprojektin osana on aiemmin kehitetty metodi, jolla etsittiin kaarevia rakenteita kuvista, ja tätä metodia hyödynnettiin kuitujen etsintään kuvista. Tätä metodia käytettiin lähtökohtana kandidaatintyölle. Työn tarkoituksena oli tutkia, voidaanko erilaisista kuitukuvista laskettujen piirteiden avulla tunnistaa kuvassa olevien kuitujen laji. Näissä kuitukuvissa oli kuituja neljästä eri puulajista ja yhdestä kasvista. Nämä lajit olivat akasia, koivu, mänty, eukalyptus ja vehnä. Jokaisesta lajista valittiin 100 kuitukuvaa ja nämä kuvat jaettiin kahteen ryhmään, joista ensimmäistä käytettiin opetusryhmänä ja toista testausryhmänä. Opetusryhmän avulla jokaiselle kuitulajille laskettiin näitä kuvaavia piirteitä, joiden avulla pyrittiin tunnistamaan testausryhmän kuvissa olevat kuitulajit. Nämä kuvat oli tuottanut CEMIS-Oulu (Center for Measurement and Information Systems), joka on mittaustekniikkaan keskittynyt yksikkö Oulun yliopistossa. Yksittäiselle opetusryhmän kuitukuvalle laskettiin keskiarvot ja keskihajonnat kolmesta eri piirteestä, jotka olivat pituus, leveys ja kaarevuus. Lisäksi laskettiin, kuinka monta kuitua kuvasta löydettiin. Näiden piirteiden eri yhdistelmien avulla testattiin tunnistamisen tarkkuutta käyttämällä k:n lähimmän naapurin menetelmää ja Naiivi Bayes -luokitinta testausryhmän kuville. Testeistä saatiin lupaavia tuloksia muun muassa pituuden ja leveyden keskiarvoja käytettäessä saavutettiin jopa noin 98 %:n tarkkuus molemmilla algoritmeilla. Tunnistuksessa kuitujen keskimäärinen pituus vaikutti olevan kuitukuvia parhaiten kuvaava piirre. Käytettyjen algoritmien välillä ei ollut suurta vaihtelua tarkkuudessa. Testeissä saatujen tulosten perusteella voidaan todeta, että kuitukuvien tunnistaminen on mahdollista. Testien perusteella kuitukuvista tarvitsee laskea vain kaksi piirrettä, joilla kuidut voidaan tunnistaa tarkasti. Käytetyt lajittelualgoritmit olivat hyvin yksinkertaisia, mutta ne toimivat testeissä hyvin.
Resumo:
Les simulations ont été implémentées avec le programme Java.
Resumo:
This thesis addresses one of the emerging topics in Sonar Signal Processing.,viz.the implementation of a target classifier for the noise sources in the ocean, as the operator assisted classification turns out to be tedious,laborious and time consuming.In the work reported in this thesis,various judiciously chosen components of the feature vector are used for realizing the newly proposed Hierarchical Target Trimming Model.The performance of the proposed classifier has been compared with the Euclidean distance and Fuzzy K-Nearest Neighbour Model classifiers and is found to have better success rates.The procedures for generating the Target Feature Record or the Feature vector from the spectral,cepstral and bispectral features have also been suggested.The Feature vector ,so generated from the noise data waveform is compared with the feature vectors available in the knowledge base and the most matching pattern is identified,for the purpose of target classification.In an attempt to improve the success rate of the Feature Vector based classifier,the proposed system has been augmented with the HMM based Classifier.Institutions where both the classifier decisions disagree,a contention resolving mechanism built around the DUET algorithm has been suggested.
Resumo:
This paper presents a Robust Content Based Video Retrieval (CBVR) system. This system retrieves similar videos based on a local feature descriptor called SURF (Speeded Up Robust Feature). The higher dimensionality of SURF like feature descriptors causes huge storage consumption during indexing of video information. To achieve a dimensionality reduction on the SURF feature descriptor, this system employs a stochastic dimensionality reduction method and thus provides a model data for the videos. On retrieval, the model data of the test clip is classified to its similar videos using a minimum distance classifier. The performance of this system is evaluated using two different minimum distance classifiers during the retrieval stage. The experimental analyses performed on the system shows that the system has a retrieval performance of 78%. This system also analyses the performance efficiency of the low dimensional SURF descriptor.
Resumo:
We describe a system that learns from examples to recognize people in images taken indoors. Images of people are represented by color-based and shape-based features. Recognition is carried out through combinations of Support Vector Machine classifiers (SVMs). Different types of multiclass strategies based on SVMs are explored and compared to k-Nearest Neighbors classifiers (kNNs). The system works in real time and shows high performance rates for people recognition throughout one day.
Resumo:
A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.
Resumo:
In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
A statistical method for classification of sags their origin downstream or upstream from the recording point is proposed in this work. The goal is to obtain a statistical model using the sag waveforms useful to characterise one type of sags and to discriminate them from the other type. This model is built on the basis of multi-way principal component analysis an later used to project the available registers in a new space with lower dimension. Thus, a case base of diagnosed sags is built in the projection space. Finally classification is done by comparing new sags against the existing in the case base. Similarity is defined in the projection space using a combination of distances to recover the nearest neighbours to the new sag. Finally the method assigns the origin of the new sag according to the origin of their neighbours
Resumo:
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga et al. [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga et al. in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga et al. in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.
Resumo:
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.