971 resultados para K-Nearest Neighbors
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade No Lisboa para obtenção de grau de Mestre em Engenharia de Informática
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial Para obtenção do grau de Mestre em Engenharia Informática
Resumo:
O desenvolvimento das tecnologias associadas à Detecção Remota e aos Sistemas de Informação Geográfica encontram-se cada vez mais na ordem do dia. E, graças a este desenvolvimento de métodos para acelerar a produção de informação geográfica, assiste-se a um crescente aumento da resolução geométrica, espectral e radiométrica das imagens, e simultaneamente, ao aparecimento de novas aplicações com o intuito de facilitar o processamento e a análise de imagens através da melhoria de algoritmos para extracção de informação. Resultado disso são as imagens de alta resolução, provenientes do satélite WorldView 2 e o mais recente software Envi 5.0, utilizados neste estudo. O presente trabalho tem como principal objectivo desenvolver um projecto de cartografia de uso do solo para a cidade de Maputo, com recurso ao tratamento e à exploração de uma imagem de alta resolução, comparando as potencialidades e limitações dos resultados extraídos através da classificação “pixel a pixel”, através do algoritmo Máxima Verossimilhança, face às potencialidades e eventuais limitações da classificação orientada por objecto, através dos algoritmos K Nearest Neighbor (KNN) e Support Vector Machine (SVM), na extracção do mesmo número e tipo de classes de ocupação/uso do solo. Na classificação “pixel a pixel”, com a aplicação do algoritmo classificação Máxima Verosimilhança, foram ensaiados dois tipos de amostra: uma primeira constituída por 20 classes de ocupação/uso do solo, e uma segunda por 18 classes. Após a fase de experimentação, os resultados obtidos com a primeira amostra ficaram aquém das espectativas, pois observavam-se muitos erros de classificação. A segunda amostra formulada com base nestes erros de classificação e com o objectivo de os minimizar, permitiu obter um resultado próximo das espectativas idealizadas inicialmente, onde as classes de interesse coincidem com a realidade geográfica da cidade de Maputo. Na classificação orientada por objecto foram 4 as etapas metodológicas utilizadas: a atribuição do valor 5 para a segmentação e 90 para a fusão de segmentos; a selecção de 15 exemplos sobre os segmentos gerados para cada classe de interesse; bandas diferentemente distribuídas para o cálculo dos atributos espectrais e de textura; os atributos de forma Elongation e Form Factor e a aplicação dos algoritmos KNN e SVM. Confrontando as imagens resultantes das duas abordagens aplicadas, verificou-se que a qualidade do mapa produzido pela classificação “pixel a pixel” apresenta um nível de detalhe superior aos mapas resultantes da classificação orientada por objecto. Esta diferença de nível de detalhe é justificada pela unidade mínima do processamento de cada classificador: enquanto que na primeira abordagem a unidade mínima é o pixel, traduzinho uma maior detalhe, a segunda abordagem utiliza um conjunto de pixels, objecto, como unidade mínima despoletando situações de generalização. De um modo geral, a extracção da forma dos elementos e a distribuição das classes de interesse correspondem à realidade geográfica em si e, os resultados são bons face ao que é frequente em processamento semiautomático.
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Land cover classification is a key research field in remote sensing and land change science as thematic maps derived from remotely sensed data have become the basis for analyzing many socio-ecological issues. However, land cover classification remains a difficult task and it is especially challenging in heterogeneous tropical landscapes where nonetheless such maps are of great importance. The present study aims to establish an efficient classification approach to accurately map all broad land cover classes in a large, heterogeneous tropical area of Bolivia, as a basis for further studies (e.g., land cover-land use change). Specifically, we compare the performance of parametric (maximum likelihood), non-parametric (k-nearest neighbour and four different support vector machines - SVM), and hybrid classifiers, using both hard and soft (fuzzy) accuracy assessments. In addition, we test whether the inclusion of a textural index (homogeneity) in the classifications improves their performance. We classified Landsat imagery for two dates corresponding to dry and wet seasons and found that non-parametric, and particularly SVM classifiers, outperformed both parametric and hybrid classifiers. We also found that the use of the homogeneity index along with reflectance bands significantly increased the overall accuracy of all the classifications, but particularly of SVM algorithms. We observed that improvements in producer’s and user’s accuracies through the inclusion of the homogeneity index were different depending on land cover classes. Earlygrowth/degraded forests, pastures, grasslands and savanna were the classes most improved, especially with the SVM radial basis function and SVM sigmoid classifiers, though with both classifiers all land cover classes were mapped with producer’s and user’s accuracies of around 90%. Our approach seems very well suited to accurately map land cover in tropical regions, thus having the potential to contribute to conservation initiatives, climate change mitigation schemes such as REDD+, and rural development policies.
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
En aquest projecte es presenta l’aplicació per a dispositius mòbils Doppelganger. La seva funció és, a partir d’una fotografia, detectar la cara i mostrar la persona famosa de la nostra base de dades que més s’assembla a la persona en la fotografia. Per la implementació s’han utilitzat algoritmes de visió per computador i d’aprenentatge automàtic com per exemple el PCA i el K-Nearest Neighbor, tot utilitzant llibreries gratuïtes com són les OpenCV.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
A model for the study of hysteresis and avalanches in a first-order phase transition from a single variant phase to a multivariant phase is presented. The model is based on a modification of the random-field Potts model with metastable dynamics by adding a dipolar interaction term truncated at nearest neighbors. We focus our study on hysteresis loop properties, on the three-dimensional microstructure formation, and on avalanche statistics.
Resumo:
The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.
Resumo:
Distortions in a family of conjugated polymers are studied using two complementary approaches: within a many-body valence bond approach using a transfer-matrix technique to treat the Heisenberg model of the systems, and also in terms of the tight-binding band-theoretic model with interactions limited to nearest neighbors. The computations indicate that both methods predict the presence or absence of the same distortions in most of the polymers studied.
Resumo:
Over the past three decades, pedotransfer functions (PTFs) have been widely used by soil scientists to estimate soils properties in temperate regions in response to the lack of soil data for these regions. Several authors indicated that little effort has been dedicated to the prediction of soil properties in the humid tropics, where the need for soil property information is of even greater priority. The aim of this paper is to provide an up-to-date repository of past and recently published articles as well as papers from proceedings of events dealing with water-retention PTFs for soils of the humid tropics. Of the 35 publications found in the literature on PTFs for prediction of water retention of soils of the humid tropics, 91 % of the PTFs are based on an empirical approach, and only 9 % are based on a semi-physical approach. Of the empirical PTFs, 97 % are continuous, and 3 % (one) is a class PTF; of the empirical PTFs, 97 % are based on multiple linear and polynomial regression of n th order techniques, and 3 % (one) is based on the k-Nearest Neighbor approach; 84 % of the continuous PTFs are point-based, and 16 % are parameter-based; 97 % of the continuous PTFs are equation-based PTFs, and 3 % (one) is based on pattern recognition. Additionally, it was found that 26 % of the tropical water-retention PTFs were developed for soils in Brazil, 26 % for soils in India, 11 % for soils in other countries in America, and 11 % for soils in other countries in Africa.
Resumo:
Random scale-free networks have the peculiar property of being prone to the spreading of infections. Here we provide for the susceptible-infected-susceptible model an exact result showing that a scale-free degree distribution with diverging second moment is a sufficient condition to have null epidemic threshold in unstructured networks with either assortative or disassortative mixing. Degree correlations result therefore irrelevant for the epidemic spreading picture in these scale-free networks. The present result is related to the divergence of the average nearest neighbors degree, enforced by the degree detailed balance condition.