64 resultados para kNN


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Arylpiperazine compounds are promising 5-HT1A receptor ligands that can contribute for accelerating the onset of therapeutic effect of selective serotonin reuptake inhibitors. In the present work, the chemometric methods HCA, PCA, KNN, SIMCA and PLS were employed in order to obtain SAR and QSAR models relating the structures of arylpiperazine compounds to their 5-HT1A receptor affinities. A training set of 52 compounds was used to construct the models and the best ones were obtained with nine topological descriptors. The classification and regression models were externally validated by means of predictions for a test set of 14 compounds and have presented good quality, as verified by the correctness of classifications, in the case of pattern recognition studies, and b, the high correlation coefficients (q(2) = 0.76, r(2) = 0.83) and small prediction errors for the PLS regression. Since the results are in good agreement with previous SAR studies, we can suggest that these findings can help in the search for 5-HT1A receptor ligands that are able to improve antidepressant treatment. (c) 2007 Elsevier Masson SAS. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The main purpose of this thesis project is to prediction of symptom severity and cause in data from test battery of the Parkinson’s disease patient, which is based on data mining. The collection of the data is from test battery on a hand in computer. We use the Chi-Square method and check which variables are important and which are not important. Then we apply different data mining techniques on our normalize data and check which technique or method gives good results.The implementation of this thesis is in WEKA. We normalize our data and then apply different methods on this data. The methods which we used are Naïve Bayes, CART and KNN. We draw the Bland Altman and Spearman’s Correlation for checking the final results and prediction of data. The Bland Altman tells how the percentage of our confident level in this data is correct and Spearman’s Correlation tells us our relationship is strong. On the basis of results and analysis we see all three methods give nearly same results. But if we see our CART (J48 Decision Tree) it gives good result of under predicted and over predicted values that’s lies between -2 to +2. The correlation between the Actual and Predicted values is 0,794in CART. Cause gives the better percentage classification result then disability because it can use two classes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective: To define and evaluate a Computer-Vision (CV) method for scoring Paced Finger-Tapping (PFT) in Parkinson's disease (PD) using quantitative motion analysis of index-fingers and to compare the obtained scores to the UPDRS (Unified Parkinson's Disease Rating Scale) finger-taps (FT). Background: The naked-eye evaluation of PFT in clinical practice results in coarse resolution to determine PD status. Besides, sensor mechanisms for PFT evaluation may cause patients discomfort. In order to avoid cost and effort of applying wearable sensors, a CV system for non-invasive PFT evaluation is introduced. Methods: A database of 221 PFT videos from 6 PD patients was processed. The subjects were instructed to position their hands above their shoulders besides the face and tap the index-finger against the thumb consistently with speed. They were facing towards a pivoted camera during recording. The videos were rated by two clinicians between symptom levels 0-to-3 using UPDRS-FT. The CV method incorporates a motion analyzer and a face detector. The method detects the face of testee in each video-frame. The frame is split into two images from face-rectangle center. Two regions of interest are located in each image to detect index-finger motion of left and right hands respectively. The tracking of opening and closing phases of dominant hand index-finger produces a tapping time-series. This time-series is normalized by the face height. The normalization calibrates the amplitude in tapping signal which is affected by the varying distance between camera and subject (farther the camera, lesser the amplitude). A total of 15 features were classified using K-nearest neighbor (KNN) classifier to characterize the symptoms levels in UPDRS-FT. The target ratings provided by the raters were averaged. Results: A 10-fold cross validation in KNN classified 221 videos between 3 symptom levels with 75% accuracy. An area under the receiver operating characteristic curves of 82.6% supports feasibility of the obtained features to replicate clinical assessments. Conclusions: The system is able to track index-finger motion to estimate tapping symptoms in PD. It has certain advantages compared to other technologies (e.g. magnetic sensors, accelerometers etc.) for PFT evaluation to improve and automate the ratings

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a method based on the theory of electromagnetic waves reflected to evaluate the behavior of these waves and the level of attenuation caused in bone tissue. For this, it was proposed the construction of two antennas in microstrip structure with resonance frequency at 2.44 GHz The problem becomes relevant because of the diseases osteometabolic reach a large portion of the population, men and women. With this method, the signal is classified into two groups: tissue mass with bony tissues with normal or low bone mass. For this, techniques of feature extraction (Wavelet Transform) and pattern recognition (KNN and ANN) were used. The tests were performed on bovine bone and tissue with chemicals, the methodology and results are described in the work

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The efficacy of fluorescence spectroscopy to detect squamous cell carcinoma is evaluated in an animal model following laser excitation at 442 and 532 nm. Lesions are chemically induced with a topical DMBA application at the left lateral tongue of Golden Syrian hamsters. The animals are investigated every 2 weeks after the 4th week of induction until a total of 26 weeks. The right lateral tongue of each animal is considered as a control site (normal contralateral tissue) and the induced lesions are analyzed as a set of points covering the entire clinically detectable area. Based on fluorescence spectral differences, four indices are determined to discriminate normal and carcinoma tissues, based on intraspectral analysis. The spectral data are also analyzed using a multivariate data analysis and the results are compared with histology as the diagnostic gold standard. The best result achieved is for blue excitation using the KNN (K-nearest neighbor, a interspectral analysis) algorithm with a sensitivity of 95.7% and a specificity of 91.6%. These high indices indicate that fluorescence spectroscopy may constitute a fast noninvasive auxiliary tool for diagnostic of cancer within the oral cavity. (C) 2008 Society of Photo-Optical Instrumentation Engineers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A study using two classification methods (SDA and SIMCA) was carried out in this work with the aim of investigating the relationship between the structure of flavonoid compounds and their free-radical-scavenging ability. In this work, we report the use of chemometric methods (SDA and SIMCA) able to select the most relevant variables (steric, electronic, and topological) responsible for this ability. The results obtained with the SDA and SIMCA methods agree perfectly with our previous model, in which we used other chemometric methods (PCA, HCA and KNN) and are also corroborated with experimental results from the literature. This is a strong indication of how reliable the selection of variables is.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A set of 25 quinone compounds with anti-trypanocidal activity was studied by using the density functional theory (DFT) method in order to calculate atomic and molecular properties to be correlated with the biological activity. The chemometric methods principal component analysis (PCA), hierarchical cluster analysis (HCA), stepwise discriminant analysis (SDA), Kth nearest neighbor (KNN) and soft independent modeling of class analogy (SIMCA) were used to obtain possible relationships between the calculated descriptors and the biological activity studied and to predict the anti-trypanocidal activity of new quinone compounds from a prediction set. Four descriptors were responsible for the separation between the active and inactive compounds: T-5 (torsion angle), QTS1 (sum of absolute values of the atomic charges), VOLS2 (volume of the substituent at region B) and HOMO-1 (energy of the molecular orbital below HOMO). These descriptors give information on the kind of interaction that occurs between the compounds and the biological receptor. The prediction study was done with a set of three new compounds by using the PCA, HCA, SDA, KNN and SIMCA methods and two of them were predicted as active against the Trypanosoma cruzi. (c) 2005 Elsevier SAS. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Métodos quimiométricos (estatísticos) são empregados para classificar um conjunto de compostos derivados de neolignanas com atividade biológica contra a Paracoccidioides brasiliensis. O método AM1 (Austin Model 1) foi utilizado para calcular um conjunto de descritores moleculares (propriedades) para os compostos em estudo. A seguir, os descritores foram analisados utilizando os seguintes métodos de reconhecimento de padrões: Análise de Componentes Principais (PCA), Análise Hierárquica de Agrupamentos (HCA) e o método de K-vizinhos mais próximos (KNN). Os métodos PCA e HCA mostraram-se bastante eficientes para classificação dos compostos estudados em dois grupos (ativos e inativos). Três descritores moleculares foram responsáveis pela separação entre os compostos ativos e inativos: energia do orbital molecular mais alto ocupado (EHOMO), ordem de ligação entre os átomos C1'-R7 (L14) e ordem de ligação entre os átomos C5'-R6 (L22). Como as variáveis responsáveis pela separação entre compostos ativos e inativos são descritores eletrônicos, conclui-se que efeitos eletrônicos podem desempenhar um importante papel na interação entre receptor biológico e compostos derivados de neolignanas com atividade contra a Paracoccidioides brasiliensis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Um conjunto de dezoito compostos de neolignanas com atividade antiesquistossomose foi estudado com o método semi-empírico PM3 e outros métodos teóricos com o intuito de avaliar algumas propriedades (variáveis ou descritores) moleculares selecionadas e correlacioná-las com a atividade biológica. Análise exploratória dos dados (análise de componentes principais, PCA, e análise hierárquica de agrupamentos, HCA), análise discriminante (DA) e o método KNN foram utilizados na obtenção de possíveis correlações entre os descritores calculados e a atividade biológica em questão e na predição da atividade antiesquistossimose de algumas moléculas teste. Os descritores moleculares responsáveis pela separação entre os compostos ativos e inativos foram: energia de hidratação (HE), refratividade molecular (MR) e carga sobre o átomo C19 (Q19). Estes descritores fornecem informações a respeito do tipo de interação que pode ocorrer entre os compostos e seu respectivo receptor biológico. Após a construção do modelo para compostos ativos e inativos, os métodos PCA, HCA, DA e KNN foram empregados em um estudo de predição. Foram estudados 10 novos compostos e somente 5 deles foram classificados como ativos contra esquistossomose.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)