959 resultados para Fuzzy K Nearest Neighbor


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Localization of RFIDs in the indoor environment will entail determining both the position and the orientation of the user. This paper develops estimator using RSSI measurements to predict the position and orientation of a transmitter in an indoor environment. The best estimator tried was an K-nearest neighbours model that gave an accuracy of approximately 83% for position prediction and 93% for orientation prediction. It was also found that the RSSI values change throughout the day, meaning that an adaptive estimator is necessary for localization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a new variant of Bagging named DepenBag is proposed. This algorithm obtains bootstrap samples at first. Then, it employs a causal discoverer to induce from each sample a dependency model expressed as a Directed Acyclic Graph (DAG). The attributes without connections to the class attribute in all the DAGs are then removed. Finally, a component learner is trained from each of the resulted samples to constitute the ensemble. Empirical study shows that DepenBag is effective in building ensembles of nearest neighbor classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spam or unwanted email is one of the potential issues of Internet security and classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. In this paper we present an effective and efficient spam classification technique using clustering approach to categorize the features. In our clustering technique we use VAT (Visual Assessment and clustering Tendency) approach into our training model to categorize the extracted features and then pass the information into classification engine. We have used WEKA (www.cs.waikato.ac.nz/ml/weka/) interface to classify the data using different classification algorithms, including tree-based classifiers, nearest neighbor algorithms, statistical algorithms and AdaBoosts. Our empirical performance shows that we can achieve detection rate over 97%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning a robust projection with a small number of training samples is still a challenging problem in face recognition, especially when the unseen faces have extreme variation in pose, illumination, and facial expression. To address this problem, we propose a framework formulated under statistical learning theory that facilitates robust learning of a discriminative projection. Dimensionality reduction using the projection matrix is combined with a linear classifier in the regularized framework of lasso regression. The projection matrix in conjunction with the classifier parameters are then found by solving an optimization problem over the Stiefel manifold. The experimental results on standard face databases suggest that the proposed method outperforms some recent regularized techniques when the number of training samples is small.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we investigate the use of a wavelet transform-based analysis of audio tracks accompanying videos for the problem of automatic program genre detection. We compare the classification performance based on wavelet-based audio features to that using conventional features derived from Fourier and time analysis for the task of discriminating TV programs such as news, commercials, music shows, concerts, motor racing games, and animated cartoons. Three different classifiers namely the Decision Trees, SVMs, and k-Nearest Neighbours are studied to analyse the reliability of the performance of our wavelet features based approach. Further, we investigate the issue of an appropriate duration of an audio clip to be analyzed for this automatic genre determination. Our experimental results show that features derived from the wavelet transform of the audio signal can very well separate the six video genres studied. It is also found that there is no significant difference in performance with varying audio clip durations across the classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Protein mass spectrometry (MS) pattern recognition has recently emerged as a new method for cancer diagnosis. Unfortunately, classification performance may degrade owing to the enormously high dimensionality of the data. This paper investigates the use of Random Projection in protein MS data dimensionality reduction. The effectiveness of Random Projection (RP) is analyzed and compared against Principal Component Analysis (PCA) by using three classification algorithms, namely Support Vector Machine, Feed-forward Neural Networks and K-Nearest Neighbour. Three real-world cancer data sets are employed to evaluate the performances of RP and PCA. Through the investigations, RP method demonstrated better or at least comparable classification performance as PCA if the dimensionality of the projection matrix is sufficiently large. This paper also explores the use of RP as a pre-processing step prior to PCA. The results show that without sacrificing classification accuracy, performing RP prior to PCA significantly improves the computational time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine learning techniques to flow statistical feature based classification methods. The nearest neighbor (NN)-based method has exhibited superior classification performance. It also has several important advantages, such as no requirements of training procedure, no risk of overfitting of parameters, and naturally being able to handle a huge number of classes. However, the performance of NN classifier can be severely affected if the size of training data is small. In this paper, we propose a novel nonparametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic data sets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of collaborative filtering, the well known data sparsity issue makes two like-minded users have little similarity, and consequently renders the k nearest neighbour rule inapplicable. In this paper, we address the data sparsity problem in the neighbourhood-based CF methods by proposing an Adaptive-Maximum imputation method (AdaM). The basic idea is to identify an imputation area that can maximize the imputation benefit for recommendation purposes, while minimizing the imputation error brought in. To achieve the maximum imputation benefit, the imputation area is determined from both the user and the item perspectives; to minimize the imputation error, there is at least one real rating preserved for each item in the identified imputation area. A theoretical analysis is provided to prove that the proposed imputation method outperforms the conventional neighbourhood-based CF methods through more accurate neighbour identification. Experiment results on benchmark datasets show that the proposed method significantly outperforms the other related state-of-the-art imputation-based methods in terms of accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

 Computational efficiency and hence the scale of agent-based swarm simulations is bound by the nearest neighbour computation for each agent. This article proposes the use of GPU texture memory to implement lookup tables for a spatial partitioning based k-Nearest Neighbours algorithm. These improvements allow simulation of swarms of 220 agents at higher rates than the current best alternative algorithms. This approach is incorporated into an existing framework for simulating steering behaviours allowing for a complete implementation of massive agent swarm simulations, with per agent behaviour preferences, on a Graphics Processing Unit. These simulations have enabled an investigation of the emergent dynamics that occur when massive swarms interact with a choke point in their environment. Various modes of sustained dynamics with temporal and spatial coherence are identified when a critical mass of agents is simulated and some elementary properties are presented. The algorithms presented in this article enable researchers and content designers in games and movies to implement truly massive agent swarms in real time and thus provide a basis for further identification and analysis of the emergent dynamics in these swarms. This will improve not only the scale of swarms used in commercial games and movies but will also improve the reliability of swarm behaviour with respect to content design goals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texture classification is one of the most important tasks in computer vision field and it has been extensively investigated in the last several decades. Previous texture classification methods mainly used the template matching based methods such as Support Vector Machine and k-Nearest-Neighbour for classification. Given enough training images the state-of-the-art texture classification methods could achieve very high classification accuracies on some benchmark databases. However, when the number of training images is limited, which usually happens in real-world applications because of the high cost of obtaining labelled data, the classification accuracies of those state-of-the-art methods would deteriorate due to the overfitting effect. In this paper we aim to develop a novel framework that could correctly classify textural images with only a small number of training images. By taking into account the repetition and sparsity property of textures we propose a sparse representation based multi-manifold analysis framework for texture classification from few training images. A set of new training samples are generated from each training image by a scale and spatial pyramid, and then the training samples belonging to each class are modelled by a manifold based on sparse representation. We learn a dictionary of sparse representation and a projection matrix for each class and classify the test images based on the projected reconstruction errors. The framework provides a more compact model than the template matching based texture classification methods, and mitigates the overfitting effect. Experimental results show that the proposed method could achieve reasonably high generalization capability even with as few as 3 training images, and significantly outperforms the state-of-the-art texture classification approaches on three benchmark datasets. © 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVES: Equity and fairness at work are associated with a range of organizational and health outcomes. Past research suggests that workers with disabilities experience inequity in the workplace. It is difficult to conclude whether the presence of disability is the reason for perceived unfair treatment due to the possible confounding of effect estimates by other demographic or socioeconomic factors. METHODS: The data source was the Household, Income, and Labor Dynamics in Australia (HILDA) survey (2001-2012). Propensity for disability was calculated from logistic models including gender, age, education, country of birth, and father's occupational skill level as predictors. We then used nearest neighbor (on propensity score) matched analysis to match workers with disabilities to workers without disability. RESULTS: Results suggest that disability is independently associated with lower fairness of pay after controlling for confounding factors in the propensity score matched analysis; although results do suggest less than half a standard deviation difference, indicating small effects. Similar results were apparent in standard multivariable regression models and alternative propensity score analyses (stratification, covariate adjustment using the propensity score, and inverse probability of treatment weighting). CONCLUSIONS: Whilst neither multivariable regression nor propensity scores adjust for unmeasured confounding, and there remains the potential for other biases, similar results for the two methodological approaches to confounder adjustment provide some confidence of an independent association of disability with perceived unfairness of pay. Based on this, we suggest that the disparity in the perceived fairness of pay between people with and without disabilities may be explained by worse treatment of people with disabilities in the workplace.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O objetivo deste trabalho é avaliar o seguinte contra factual: como teria sido o desempenho escolar das crianças que trabalham, caso tivessem sido efetivamente proibidas de trabalhar? Uma vez que não é possível observar as crianças que trabalham na situação de não-trabalho, a estratégia adotada consistiu em. construir um grupo de controle através do matching do propensity score (nearest-neighbor matching). Os dados utilizados neste estudo foram obtidos a partir de uma amostra da PME (Pesquisa Mensal de Emprego) para seis regiões metropolitanas do Brasil - constituída por crianças com idade entre 1 O e 14 anos acompanhadas por dois anos consecutivos ao longo do período de 1984 à 1997. Os resultados obtidos apontam para um impacto negativo do trabalho sobre o desempenho escolar das crianças que trabalham - embora em intensidade diferenciada segundo o indicador de desempenho utilizado. As estimativas para probabilidade de aprovação e para o progresso escolar sugerem um efeito negativo do trabalho, porém bem menor do que é geralmente observado. O que não ocorre com para a probabilidade de evasão - o trabalho explica quase a totalidade da diferença observada na probabilidade de evasão entre as crianças que trabalham e as demais. Os resultados sugerem, portanto, que se a legislação que proíbe o trabalho infantil fosse rigorosamente cumprida as crianças que trabalham apresentariam em média um melhor desempenho escolar