47 resultados para Fuzzy K Nearest Neighbor

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate prediction of the roll separating force is critical to assuring the quality of the final product in steel manufacturing. This paper presents an ensemble model that addresses these concerns. A stacked generalisation approach to ensemble modeling is used with two sets of the ensemble model members, the first set being learnt from the current input-output data of the hot rolling finishing mill, while another uses the available information on the previous coil in addition to the current information. Both sets of ensemble members include linear regression, multilayer perceptron, and k-nearest neighbor algorithms. A competitive selection model (multilayer perceptron) is then used to select the output from one of the ensemble members to be the final output of the ensemble model. The ensemble model created by such a stacked generalization is able to achieve extremely high accuracy in predicting the roll separation force with the average relative accuracy being within 1% of the actual measured roll force.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an empirical study of multi-label classification methods, and gives suggestions for multi-label classification that are effective for automatic image annotation applications. The study shows that triple random ensemble multi-label classification algorithm (TREMLC) outperforms among its counterparts, especially on scene image dataset. Multi-label k-nearest neighbor (ML-kNN) and binary relevance (BR) learning algorithms perform well on Corel image dataset. Based on the overall evaluation results, examples are given to show label prediction performance for the algorithms using selected image examples. This provides an indication of the suitability of different multi-label classification methods for automatic image annotation under different problem settings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a comparative evaluation of popular multi-label classification methods on several multi-label problems from different domains. The methods include multi-label k-nearest neighbor, binary relevance, label power set, random k-label set ensemble learning, calibrated label ranking, hierarchy of multi-label classifiers and triple random ensemble multi-label classification algorithms. These multi-label learning algorithms are evaluated using several widely used MLC evaluation metrics. The evaluation results show that for each multi-label classification problem a particular MLC method can be recommended. The multi-label evaluation datasets used in this study are related to scene images, multimedia video frames, diagnostic medical report, email messages, emotional music data, biological genes and multi-structural proteins categorization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate and timely traffic flow prediction is crucial to proactive traffic management and control in data-driven intelligent transportation systems (D2ITS), which has attracted great research interest in the last few years. In this paper, we propose a Spatial-Temporal Weighted K-Nearest Neighbor model, named STW-KNN, in a general MapReduce framework of distributed modeling on a Hadoop platform, to enhance the accuracy and efficiency of short-term traffic flow forecasting. More specifically, STW-KNN considers the spatial-temporal correlation and weight of traffic flow with trend adjustment features, to optimize the search mechanisms containing state vector, proximity measure, prediction function, and K selection. urthermore, STW-KNN is implemented on a widely adopted Hadoop distributed computing platform with the MapReduce parallel processing paradigm, for parallel prediction of traffic flow in real time. inally, with extensive experiments on real-world big taxi trajectory data, STW-KNN is compared with the state-of-the-art prediction models including conventional K-Nearest Neighbor (KNN), Artificial Neural Networks (ANNs), Naïve Bayes (NB), Random orest (R), and C4.. The results demonstrate that the proposed model is superior to existing models on accuracy by decreasing the mean absolute percentage error (MAPE) value more than 11.9% only in time domain and even achieves 89.71% accuracy improvement with the MAPEs of between 4% and 6.% in both space and time domains, and also significantly improves the efficiency and scalability of short-term traffic flow forecasting over existing approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In mobile cloud computing, a fundamental application is to outsource the mobile data to external cloud servers for scalable data storage. The outsourced data, however, need to be encrypted due to the privacy and confidentiality concerns of their owner. This results in the distinguished difficulties on the accurate search over the encrypted mobile cloud data. To tackle this issue, in this paper, we develop the searchable encryption for multi-keyword ranked search over the storage data. Specifically, by considering the large number of outsourced documents (data) in the cloud, we utilize the relevance score and k-nearest neighbor techniques to develop an efficient multi-keyword search scheme that can return the ranked search results based on the accuracy. Within this framework, we leverage an efficient index to further improve the search efficiency, and adopt the blind storage system to conceal access pattern of the search user. Security analysis demonstrates that our scheme can achieve confidentiality of documents and index, trapdoor privacy, trapdoor unlinkability, and concealing access pattern of the search user. Finally, using extensive simulations, we show that our proposal can achieve much improved efficiency in terms of search functionality and search time compared with the existing proposals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data.

METHODS: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features.

RESULTS: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7% improvement in mean absolute error, for all days in the year 2014.

CONCLUSIONS: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The nonlinear, noisy and outlier characteristics of electroencephalography (EEG) signals inspire the employment of fuzzy logic due to its power to handle uncertainty. This paper introduces an approach to classify motor imagery EEG signals using an interval type-2 fuzzy logic system (IT2FLS) in a combination with wavelet transformation. Wavelet coefficients are ranked based on the statistics of the receiver operating characteristic curve criterion. The most informative coefficients serve as inputs to the IT2FLS for the classification task. Two benchmark datasets, named Ia and Ib, downloaded from the brain-computer interface (BCI) competition II, are employed for the experiments. Classification performance is evaluated using accuracy, sensitivity, specificity and F-measure. Widely-used classifiers, including feedforward neural network, support vector machine, k-nearest neighbours, AdaBoost and adaptive neuro-fuzzy inference system, are also implemented for comparisons. The wavelet-IT2FLS method considerably dominates the comparable classifiers on both datasets, and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II by 1.40% and 2.27% respectively. The proposed approach yields great accuracy and requires low computational cost, which can be applied to a real-time BCI system for motor imagery data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces an approach to classify EEG signals using wavelet transform and a fuzzy standard additive model (FSAM) with tabu search learning mechanism. Wavelet coefficients are ranked based on statistics of the Wilcoxon test. The most informative coefficients are assembled to form a feature set that serves as inputs to the tabu-FSAM. Two benchmark datasets, named Ia and Ib, downloaded from the brain-computer interface (BCI) competition II are employed for the experiments. Classification performance is evaluated using accuracy, mutual information, Gini coefficient and F-measure. Widely-used classifiers, including feedforward neural network, support vector machine, k-nearest neighbours, ensemble learning Adaboost and adaptive neuro-fuzzy inference system, are also implemented for comparisons. The proposed tabu-FSAM method considerably dominates the competitive classifiers, and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An approach to EEG signal classification for brain-computer interface (BCI) application using fuzzy standard additive model is introduced in this paper. The Wilcoxon test is employed to rank wavelet coefficients. Top ranking wavelets are used to form a feature set that serves as inputs to the fuzzy classifiers. Experiments are carried out using two benchmark datasets, Ia and Ib, downloaded from the BCI competition II. Prevalent classifiers including feedforward neural network, support vector machine, k-nearest neighbours, ensemble learning Adaboost and adaptive neuro-fuzzy inference system are also implemented for comparisons. Experimental results show the dominance of the proposed method against competing approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We discuss the problem of texture recognition based on the grey level co-occurrence matrix (GLCM). We performed a number of numerical experiments to establish whether the accuracy of classification is optimal when GLCM entries are aggregated into standard metrics like contrast, dissimilarity, homogeneity, entropy, etc., and compared these metrics to several alternative aggregation methods.We conclude that k nearest neighbors classification based on raw GLCM entries typically works better than classification based on the standard metrics for noiseless data, that metrics based on principal component analysis inprove classification, and that a simple change from the arithmetic to quadratic mean in calculating the standard metrics also improves classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

k-nearest neighbors (kNN) is a popular method for function approximation and classification. One drawback of this method is that the nearest neighbors can be all located on one side of the point in question x. An alternative natural neighbors method is expensive for more than three variables. In this paper we propose the use of the discrete Choquet integral for combining the values of the nearest neighbors so that redundant information is canceled out. We design a fuzzy measure based on location of the nearest neighbors, which favors neighbors located all around x.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Succinonitrile (N≡C—CH2—CH2—C≡N) is a good ionic conductor, when doped with an ionic compound, at room temperature, where it is in its plastic crystalline phase (Long et al. Solid State Ionics 2003, 161, 105; Alarco et al. Nat. Mater. 2004, 3, 476). We report on the relaxational dynamics of the plastic phase near the two first-order phase transitions and on the effect of dissolving a salt in the plastic matrix by quasi-elastic neutron scattering. At 240 K, the three observed relaxations are localized and we can describe their dynamics (τ ≈ 1.7, 17, and 140 ps) to a certain extent from a model using a single molecule that was proposed by Bée et al. allowing for all conformations in its unit cell (space group IM3M). The extent of the localized motion as observed is however larger than that predicted by the model and suggests that the isomerization of succinonitrile is correlated with a jump to the nearest neighbor site in the unit cell. The salt containing system is known to be a good ionic conductor, and our results show that the effect of the ions on the succinonitrile matrix is homogeneous. Because the isomerizations and rotations are governed by intermolecular interactions, the dissolved ions have an effect over an extended range. Due to the addition of the salt, the dynamics of one of the components (τ ≈ 17 ps) shows more diffusive character at 300 K. The calculated upper limit of the corresponding diffusion constant of succinonitrile in the electrolyte is a factor 30 higher than what is reported for the ions. Our results suggest that the succinonitrile diffusion is caused by nearest neighbor jumps that are localized on the observed length and time scales.