91 resultados para kNN


Relevância:

20.00% 20.00%

Publicador:

Resumo:

中国计算机学会

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

k-nearest neighbors (kNN) is a popular method for function approximation and classification. One drawback of this method is that the nearest neighbors can be all located on one side of the point in question x. An alternative natural neighbors method is expensive for more than three variables. In this paper we propose the use of the discrete Choquet integral for combining the values of the nearest neighbors so that redundant information is canceled out. We design a fuzzy measure based on location of the nearest neighbors, which favors neighbors located all around x.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Privacy preserving is an essential aspect of modern recommender systems. However, the traditional approaches can hardly provide a rigid and provable privacy guarantee for recommender systems, especially for those systems based on collaborative filtering (CF) methods. Recent research revealed that by observing the public output of the CF, the adversary could infer the historical ratings of the particular user, which is known as the KNN attack and is considered a serious privacy violation for recommender systems. This paper addresses the privacy issue in CF by proposing a Private Neighbor Collaborative Filtering (PriCF) algorithm, which is constructed on the basis of the notion of differential privacy. PriCF contains an essential privacy operation, Private Neighbor Selection, in which the Laplace noise is added to hide the identity of neighbors and the ratings of each neighbor. To retain the utility, the Recommendation-Aware Sensitivity and a re-designed truncated similarity are introduced to enhance the performance of recommendations. A theoretical analysis shows that the proposed algorithm can resist the KNN attack while retaining the accuracy of recommendations. The experimental results on two real datasets show that the proposed PriCF algorithm retains most of the utility with a fixed privacy budget.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

La obtención de materiales monofásicos con respuesta ferroeléctrica y (anti-)ferromagnética simultánea y acoplada resulta problemática debido a limitaciones intrínsecas de tipo físico, estructural y electrónico. En este sentido una alternativa más realista, y en cierto modo con mayor flexibilidad a la hora de diseñar futuros dispositivos multiferroicos, consiste en preparar materiales compuestos en los cuales el acoplamiento magnetoeléctrico se puede alcanzar explotando los efectos interfaciales entre fases disimilares. Tal es el caso de los materiales compuestos basados en BaTiO3 (fase ferroeléctrica) y NiFe2O4 (fase magnética), que ya se han empezado a preparar fundamentalmente por medio de técnicas de deposición altamente energéticas. Sin embargo de cara a su aplicación práctica, sería interesante poder preparar esos materiales por métodos más sostenibles y menos costosos. De acuerdo con ello, en este trabajo se presenta un estudio preliminar en torno a la evolución microestructural experimentada por los materiales basados en NiFe2O4-BaTiO3 cuando son preparados mediante una técnica de procesamiento suave en disolución como es la síntesis hidrotermal. En concreto se ha analizado la influencia que diversos parámetros característicos del procesamiento hidrotermal pueden tener sobre la generación y distribución de fases e interfases durante la posterior consolidación térmica de estos materiales compuestos.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Prototype Selection (PS) algorithms allow a faster Nearest Neighbor classification by keeping only the most profitable prototypes of the training set. In turn, these schemes typically lower the performance accuracy. In this work a new strategy for multi-label classifications tasks is proposed to solve this accuracy drop without the need of using all the training set. For that, given a new instance, the PS algorithm is used as a fast recommender system which retrieves the most likely classes. Then, the actual classification is performed only considering the prototypes from the initial training set belonging to the suggested classes. Results show that this strategy provides a large set of trade-off solutions which fills the gap between PS-based classification efficiency and conventional kNN accuracy. Furthermore, this scheme is not only able to, at best, reach the performance of conventional kNN with barely a third of distances computed, but it does also outperform the latter in noisy scenarios, proving to be a much more robust approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a novel framework for facial expression recognition from still images by selecting, optimizing and fusing ‘salient’ Gabor feature layers to recognize six universal facial expressions using the K nearest neighbor classifier. The recognition comparisons with all layer approach using JAFFE and Cohn-Kanade (CK) databases confirm that using ‘salient’ Gabor feature layers with optimized sizes can achieve better recognition performance and dramatically reduce computational time. Moreover, comparisons with the state of the art performances demonstrate the effectiveness of our approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.