938 resultados para K-Nearest Neighbor
Resumo:
To enhance the performance of the k-nearest neighbors approach in forecasting short-term traffic volume, this paper proposed and tested a two-step approach with the ability of forecasting multiple steps. In selecting k-nearest neighbors, a time constraint window is introduced, and then local minima of the distances between the state vectors are ranked to avoid overlappings among candidates. Moreover, to control extreme values’ undesirable impact, a novel algorithm with attractive analytical features is developed based on the principle component. The enhanced KNN method has been evaluated using the field data, and our comparison analysis shows that it outperformed the competing algorithms in most cases.
Resumo:
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.
Resumo:
Racing algorithms have recently been proposed as a general-purpose method for performing model selection in machine teaming algorithms. In this paper, we present an empirical study of the Hoeffding racing algorithm for selecting the k parameter in a simple k-nearest neighbor classifier. Fifteen widely-used classification datasets from UCI are used and experiments conducted across different confidence levels for racing. The results reveal a significant amount of sensitivity of the k-nn classifier to its model parameter value. The Hoeffding racing algorithm also varies widely in its performance, in terms of the computational savings gained over an exhaustive evaluation. While in some cases the savings gained are quite small, the racing algorithm proved to be highly robust to the possibility of erroneously eliminating the optimal models. All results were strongly dependent on the datasets used.
Resumo:
This paper introduces an algorithm that uses boosting to learn a distance measure for multiclass k-nearest neighbor classification. Given a family of distance measures as input, AdaBoost is used to learn a weighted distance measure, that is a linear combination of the input measures. The proposed method can be seen both as a novel way to learn a distance measure from data, and as a novel way to apply boosting to multiclass recognition problems, that does not require output codes. In our approach, multiclass recognition of objects is reduced into a single binary recognition task, defined on triples of objects. Preliminary experiments with eight UCI datasets yield no clear winner among our method, boosting using output codes, and k-nn classification using an unoptimized distance measure. Our algorithm did achieve lower error rates in some of the datasets, which indicates that, in some domains, it may lead to better results than existing methods.
Resumo:
Large probabilistic graphs arise in various domains spanning from social networks to biological and communication networks. An important query in these graphs is the k nearest-neighbor query, which involves finding and reporting the k closest nodes to a specific node. This query assumes the existence of a measure of the "proximity" or the "distance" between any two nodes in the graph. To that end, we propose various novel distance functions that extend well known notions of classical graph theory, such as shortest paths and random walks. We argue that many meaningful distance functions are computationally intractable to compute exactly. Thus, in order to process nearest-neighbor queries, we resort to Monte Carlo sampling and exploit novel graph-transformation ideas and pruning opportunities. In our extensive experimental analysis, we explore the trade-offs of our approximation algorithms and demonstrate that they scale well on real-world probabilistic graphs with tens of millions of edges.
Resumo:
Intrinsic and extrinsic speaker normalization methods are systematically compared using a neural network (fuzzy ARTMAP) and L1 and L2 K-Nearest Neighbor (K-NN) categorizers trained and tested on disjoint sets of speakers of the Peterson-Barney vowel database. Intrinsic methods include one nonscaled, four psychophysical scales (bark, bark with endcorrection, mel, ERB), and three log scales, each tested on four combinations of F0 , F1, F2, F3. Extrinsic methods include four speaker adaptation schemes, each combined with the 32 intrinsic methods: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). ARTMAP and KNN show similar trends, with K-NN performing better, but requiring about ten times as much memory. The optimal intrinsic normalization method is bark scale, or bark with endcorrection, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods is LT, CSi, LS, and CS, with fuzzy ARTMAP performing best using bark scale with Diff All; and K-NN choosing psychophysical measures for all except CSi.
Resumo:
The main focus of this thesis is to evaluate and compare Hyperbalilearning algorithm (HBL) to other learning algorithms. In this work HBL is compared to feed forward artificial neural networks using back propagation learning, K-nearest neighbor and 103 algorithms. In order to evaluate the similarity of these algorithms, we carried out three experiments using nine benchmark data sets from UCI machine learning repository. The first experiment compares HBL to other algorithms when sample size of dataset is changing. The second experiment compares HBL to other algorithms when dimensionality of data changes. The last experiment compares HBL to other algorithms according to the level of agreement to data target values. Our observations in general showed, considering classification accuracy as a measure, HBL is performing as good as most ANn variants. Additionally, we also deduced that HBL.:s classification accuracy outperforms 103's and K-nearest neighbour's for the selected data sets.
Resumo:
A two-stage iterative algorithm for selecting a subset of a training set of samples for use in a condensed nearest neighbor (CNN) decision rule is introduced. The proposed method uses the concept of mutual nearest neighborhood for selecting samples close to the decision line. The efficacy of the algorithm is brought out by means of an example.
Resumo:
Let n points be placed independently in d-dimensional space according to the density f(x) = A(d)e(-lambda parallel to x parallel to alpha), lambda, alpha > 0, x is an element of R-d, d >= 2. Let d(n) be the longest edge length of the nearest-neighbor graph on these points. We show that (lambda(-1) log n)(1-1/alpha) d(n) - b(n) converges weakly to the Gumbel distribution, where b(n) similar to ((d - 1)/lambda alpha) log log n. We also prove the following strong law for the normalized nearest-neighbor distance (d) over tilde (n) = (lambda(-1) log n)(1-1/alpha) d(n)/log log n: (d - 1)/alpha lambda <= lim inf(n ->infinity) (d) over tilde (n) <= lim sup(n ->infinity) (d) over tilde (n) <= d/alpha lambda almost surely. Thus, the exponential rate of decay alpha = 1 is critical, in the sense that, for alpha > 1, d(n) -> 0, whereas, for alpha <= 1, d(n) -> infinity almost surely as n -> infinity.
Resumo:
We consider a discrete agent-based model on a one-dimensional lattice, where each agent occupies L sites and attempts movements over a distance of d lattice sites. Agents obey a strict simple exclusion rule. A discrete-time master equation is derived using a mean-field approximation and careful probability arguments. In the continuum limit, nonlinear diffusion equations that describe the average agent occupancy are obtained. Averaged discrete simulation data are generated and shown to compare very well with the solution to the derived nonlinear diffusion equations. This framework allows us to approach a lattice-free result using all the advantages of lattice methods. Since different cell types have different shapes and speeds of movement, this work offers insight into population-level behavior of collective cellular motion.
Resumo:
Recommender systems aggregate individual user ratings into predictions of products or services that might interest visitors. The quality of this aggregation process crucially affects the user experience and hence the effectiveness of recommenders in e-commerce. We present a characterization of nearest-neighbor collaborative filtering that allows us to disaggregate global recommender performance measures into contributions made by each individual rating. In particular, we formulate three roles-scouts, promoters, and connectors-that capture how users receive recommendations, how items get recommended, and how ratings of these two types are themselves connected, respectively. These roles find direct uses in improving recommendations for users, in better targeting of items and, most importantly, in helping monitor the health of the system as a whole. For instance, they can be used to track the evolution of neighborhoods, to identify rating subspaces that do not contribute ( or contribute negatively) to system performance, to enumerate users who are in danger of leaving, and to assess the susceptibility of the system to attacks such as shilling. We argue that the three rating roles presented here provide broad primitives to manage a recommender system and its community.