971 resultados para K-nearest neighbors method


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To enhance the performance of the k-nearest neighbors approach in forecasting short-term traffic volume, this paper proposed and tested a two-step approach with the ability of forecasting multiple steps. In selecting k-nearest neighbors, a time constraint window is introduced, and then local minima of the distances between the state vectors are ranked to avoid overlappings among candidates. Moreover, to control extreme values’ undesirable impact, a novel algorithm with attractive analytical features is developed based on the principle component. The enhanced KNN method has been evaluated using the field data, and our comparison analysis shows that it outperformed the competing algorithms in most cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The correct classification of sugar according to its physico-chemical characteristics directly influences the value of the product and its acceptance by the market. This study shows that using an electronic tongue system along with established techniques of supervised learning leads to the correct classification of sugar samples according to their qualities. In this paper, we offer two new real, public and non-encoded sugar datasets whose attributes were automatically collected using an electronic tongue, with and without pH controlling. Moreover, we compare the performance achieved by several established machine learning methods. Our experiments were diligently designed to ensure statistically sound results and they indicate that k-nearest neighbors method outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel near-infrared spectroscopy (NIRS) method has been researched and developed for the simultaneous analyses of the chemical components and associated properties of mint (Mentha haplocalyx Briq.) tea samples. The common analytes were: total polysaccharide content, total flavonoid content, total phenolic content, and total antioxidant activity. To resolve the NIRS data matrix for such analyses, least squares support vector machines was found to be the best chemometrics method for prediction, although it was closely followed by the radial basis function/partial least squares model. Interestingly, the commonly used partial least squares was unsatisfactory in this case. Additionally, principal component analysis and hierarchical cluster analysis were able to distinguish the mint samples according to their four geographical provinces of origin, and this was further facilitated with the use of the chemometrics classification methods-K-nearest neighbors, linear discriminant analysis, and partial least squares discriminant analysis. In general, given the potential savings with sampling and analysis time as well as with the costs of special analytical reagents required for the standard individual methods, NIRS offered a very attractive alternative for the simultaneous analysis of mint samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is now an emerging need for an efficient modeling strategy to develop a new generation of monitoring systems. One method of approaching the modeling of complex processes is to obtain a global model. It should be able to capture the basic or general behavior of the system, by means of a linear or quadratic regression, and then superimpose a local model on it that can capture the localized nonlinearities of the system. In this paper, a novel method based on a hybrid incremental modeling approach is designed and applied for tool wear detection in turning processes. It involves a two-step iterative process that combines a global model with a local model to take advantage of their underlying, complementary capacities. Thus, the first step constructs a global model using a least squares regression. A local model using the fuzzy k-nearest-neighbors smoothing algorithm is obtained in the second step. A comparative study then demonstrates that the hybrid incremental model provides better error-based performance indices for detecting tool wear than a transductive neurofuzzy model and an inductive neurofuzzy model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Racing algorithms have recently been proposed as a general-purpose method for performing model selection in machine teaming algorithms. In this paper, we present an empirical study of the Hoeffding racing algorithm for selecting the k parameter in a simple k-nearest neighbor classifier. Fifteen widely-used classification datasets from UCI are used and experiments conducted across different confidence levels for racing. The results reveal a significant amount of sensitivity of the k-nn classifier to its model parameter value. The Hoeffding racing algorithm also varies widely in its performance, in terms of the computational savings gained over an exhaustive evaluation. While in some cases the savings gained are quite small, the racing algorithm proved to be highly robust to the possibility of erroneously eliminating the optimal models. All results were strongly dependent on the datasets used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The method of case-based reasoning for a solution of problems of real-time diagnostics and forecasting in intelligent decision support systems (IDSS) is considered. Special attention is drawn to case library structure for real-time IDSS (RT IDSS) and algorithm of k-nearest neighbors type. This work was supported by RFBR.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The molecular and metal profile fingerprints were obtained from a complex substance, Atractylis chinensis DC—a traditional Chinese medicine (TCM), with the use of the high performance liquid chromatography (HPLC) and inductively coupled plasma atomic emission spectroscopy (ICP-AES) techniques. This substance was used in this work as an example of a complex biological material, which has found application as a TCM. Such TCM samples are traditionally processed by the Bran, Cut, Fried and Swill methods, and were collected from five provinces in China. The data matrices obtained from the two types of analysis produced two principal component biplots, which showed that the HPLC fingerprint data were discriminated on the basis of the methods for processing the raw TCM, while the metal analysis grouped according to the geographical origin. When the two data matrices were combined into a one two-way matrix, the resulting biplot showed a clear separation on the basis of the HPLC fingerprints. Importantly, within each different grouping the objects separated according to their geographical origin, and they ranked approximately in the same order in each group. This result suggested that by using such an approach, it is possible to derive improved characterisation of the complex TCM materials on the basis of the two kinds of analytical data. In addition, two supervised pattern recognition methods, K-nearest neighbors (KNNs) method, and linear discriminant analysis (LDA), were successfully applied to the individual data matrices—thus, supporting the PCA approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Flos Chrysanthemum is a generic name for a particular group of edible plants, which also have medicinal properties. There are, in fact, twenty to thirty different cultivars, which are commonly used in beverages and for medicinal purposes. In this work, four Flos Chrysanthemum cultivars, Hangju, Taiju, Gongju, and Boju, were collected and chromatographic fingerprints were used to distinguish and assess these cultivars for quality control purposes. Chromatography fingerprints contain chemical information but also often have baseline drifts and peak shifts, which complicate data processing, and adaptive iteratively reweighted, penalized least squares, and correlation optimized warping were applied to correct the fingerprint peaks. The adjusted data were submitted to unsupervised and supervised pattern recognition methods. Principal component analysis was used to qualitatively differentiate the Flos Chrysanthemum cultivars. Partial least squares, continuum power regression, and K-nearest neighbors were used to predict the unknown samples. Finally, the elliptic joint confidence region method was used to evaluate the prediction ability of these models. The partial least squares and continuum power regression methods were shown to best represent the experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article we introduce and evaluate testing procedures for specifying the number k of nearest neighbours in the weights matrix of spatial econometric models. The spatial J-test is used for specification search. Two testing procedures are suggested: an increasing neighbours testing procedure and a decreasing neighbours testing procedure. Simulations show that the increasing neighbours testing procedures can be used in large samples to determine k. The decreasing neighbours testing procedure is found to have low power, and is not recommended for use in practice. An empirical example involving house price data is provided to show how to use the testing procedures with real data.