959 resultados para Fuzzy K Nearest Neighbor


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate prediction of the roll separating force is critical to assuring the quality of the final product in steel manufacturing. This paper presents an ensemble model that addresses these concerns. A stacked generalisation approach to ensemble modeling is used with two sets of the ensemble model members, the first set being learnt from the current input-output data of the hot rolling finishing mill, while another uses the available information on the previous coil in addition to the current information. Both sets of ensemble members include linear regression, multilayer perceptron, and k-nearest neighbor algorithms. A competitive selection model (multilayer perceptron) is then used to select the output from one of the ensemble members to be the final output of the ensemble model. The ensemble model created by such a stacked generalization is able to achieve extremely high accuracy in predicting the roll separation force with the average relative accuracy being within 1% of the actual measured roll force.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an empirical study of multi-label classification methods, and gives suggestions for multi-label classification that are effective for automatic image annotation applications. The study shows that triple random ensemble multi-label classification algorithm (TREMLC) outperforms among its counterparts, especially on scene image dataset. Multi-label k-nearest neighbor (ML-kNN) and binary relevance (BR) learning algorithms perform well on Corel image dataset. Based on the overall evaluation results, examples are given to show label prediction performance for the algorithms using selected image examples. This provides an indication of the suitability of different multi-label classification methods for automatic image annotation under different problem settings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a comparative evaluation of popular multi-label classification methods on several multi-label problems from different domains. The methods include multi-label k-nearest neighbor, binary relevance, label power set, random k-label set ensemble learning, calibrated label ranking, hierarchy of multi-label classifiers and triple random ensemble multi-label classification algorithms. These multi-label learning algorithms are evaluated using several widely used MLC evaluation metrics. The evaluation results show that for each multi-label classification problem a particular MLC method can be recommended. The multi-label evaluation datasets used in this study are related to scene images, multimedia video frames, diagnostic medical report, email messages, emotional music data, biological genes and multi-structural proteins categorization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of the researches in artificial intelligence is to qualify the computer to execute functions that are performed by humans using knowledge and reasoning. This work was developed in the area of machine learning, that it s the study branch of artificial intelligence, being related to the project and development of algorithms and techniques capable to allow the computational learning. The objective of this work is analyzing a feature selection method for ensemble systems. The proposed method is inserted into the filter approach of feature selection method, it s using the variance and Spearman correlation to rank the feature and using the reward and punishment strategies to measure the feature importance for the identification of the classes. For each ensemble, several different configuration were used, which varied from hybrid (homogeneous) to non-hybrid (heterogeneous) structures of ensemble. They were submitted to five combining methods (voting, sum, sum weight, multiLayer Perceptron and naïve Bayes) which were applied in six distinct database (real and artificial). The classifiers applied during the experiments were k- nearest neighbor, multiLayer Perceptron, naïve Bayes and decision tree. Finally, the performance of ensemble was analyzed comparatively, using none feature selection method, using a filter approach (original) feature selection method and the proposed method. To do this comparison, a statistical test was applied, which demonstrate that there was a significant improvement in the precision of the ensembles

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The efficacy of fluorescence spectroscopy to detect squamous cell carcinoma is evaluated in an animal model following laser excitation at 442 and 532 nm. Lesions are chemically induced with a topical DMBA application at the left lateral tongue of Golden Syrian hamsters. The animals are investigated every 2 weeks after the 4th week of induction until a total of 26 weeks. The right lateral tongue of each animal is considered as a control site (normal contralateral tissue) and the induced lesions are analyzed as a set of points covering the entire clinically detectable area. Based on fluorescence spectral differences, four indices are determined to discriminate normal and carcinoma tissues, based on intraspectral analysis. The spectral data are also analyzed using a multivariate data analysis and the results are compared with histology as the diagnostic gold standard. The best result achieved is for blue excitation using the KNN (K-nearest neighbor, a interspectral analysis) algorithm with a sensitivity of 95.7% and a specificity of 91.6%. These high indices indicate that fluorescence spectroscopy may constitute a fast noninvasive auxiliary tool for diagnostic of cancer within the oral cavity. (C) 2008 Society of Photo-Optical Instrumentation Engineers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Métodos quimiométricos (estatísticos) são empregados para classificar um conjunto de compostos derivados de neolignanas com atividade biológica contra a Paracoccidioides brasiliensis. O método AM1 (Austin Model 1) foi utilizado para calcular um conjunto de descritores moleculares (propriedades) para os compostos em estudo. A seguir, os descritores foram analisados utilizando os seguintes métodos de reconhecimento de padrões: Análise de Componentes Principais (PCA), Análise Hierárquica de Agrupamentos (HCA) e o método de K-vizinhos mais próximos (KNN). Os métodos PCA e HCA mostraram-se bastante eficientes para classificação dos compostos estudados em dois grupos (ativos e inativos). Três descritores moleculares foram responsáveis pela separação entre os compostos ativos e inativos: energia do orbital molecular mais alto ocupado (EHOMO), ordem de ligação entre os átomos C1'-R7 (L14) e ordem de ligação entre os átomos C5'-R6 (L22). Como as variáveis responsáveis pela separação entre compostos ativos e inativos são descritores eletrônicos, conclui-se que efeitos eletrônicos podem desempenhar um importante papel na interação entre receptor biológico e compostos derivados de neolignanas com atividade contra a Paracoccidioides brasiliensis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Soil organic matter (SOM) constitutes an important reservoir of terrestrial carbon and can be considered an alternative for atmospheric carbon storage, contributing to global warming mitigation. Soil management can favor atmospheric carbon incorporation into SUM or its release from SOM to atmosphere. Thus, the evaluation of the humification degree (HD), which is an indication of the recalcitrance of SOM, can provide an estimation of the capacity of carbon sequestration by soils under various managements. The HD of SOM can be estimated by using various analytical techniques including fluorescence spectroscopy. In the present work, the potential of laser-induced breakdown spectroscopy (LIBS) to estimate the HD of SUM was evaluated for the first time. Intensities of emission lines of Al, Mg and Ca from LIBS spectra showing correlation with fluorescence emissions determined by laser-induced fluorescence spectroscopy (LIFS) reference technique were used to obtain a multivaried calibration model based on the k-nearest neighbor (k-NN) method. The values predicted by the proposed model (A-LIBS) showed strong correlation with LIFS results with a Pearson's coefficient of 0.87. The HD of SUM obtained after normalizing A-LIBS by total carbon in the sample showed a strong correlation to that determined by LIFS (0.94), thus suggesting the great potential of LIBS for this novel application. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model-based calibration of steady-state engine operation is commonly performed with highly parameterized empirical models that are accurate but not very robust, particularly when predicting highly nonlinear responses such as diesel smoke emissions. To address this problem, and to boost the accuracy of more robust non-parametric methods to the same level, GT-Power was used to transform the empirical model input space into multiple input spaces that simplified the input-output relationship and improved the accuracy and robustness of smoke predictions made by three commonly used empirical modeling methods: Multivariate Regression, Neural Networks and the k-Nearest Neighbor method. The availability of multiple input spaces allowed the development of two committee techniques: a 'Simple Committee' technique that used averaged predictions from a set of 10 pre-selected input spaces chosen by the training data and the "Minimum Variance Committee" technique where the input spaces for each prediction were chosen on the basis of disagreement between the three modeling methods. This latter technique equalized the performance of the three modeling methods. The successively increasing improvements resulting from the use of a single best transformed input space (Best Combination Technique), Simple Committee Technique and Minimum Variance Committee Technique were verified with hypothesis testing. The transformed input spaces were also shown to improve outlier detection and to improve k-Nearest Neighbor performance when predicting dynamic emissions with steady-state training data. An unexpected finding was that the benefits of input space transformation were unaffected by changes in the hardware or the calibration of the underlying GT-Power model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present observations of total cloud cover and cloud type classification results from a sky camera network comprising four stations in Switzerland. In a comprehensive intercomparison study, records of total cloud cover from the sky camera, long-wave radiation observations, Meteosat, ceilometer, and visual observations were compared. Total cloud cover from the sky camera was in 65–85% of cases within ±1 okta with respect to the other methods. The sky camera overestimates cloudiness with respect to the other automatic techniques on average by up to 1.1 ± 2.8 oktas but underestimates it by 0.8 ± 1.9 oktas compared to the human observer. However, the bias depends on the cloudiness and therefore needs to be considered when records from various observational techniques are being homogenized. Cloud type classification was conducted using the k-Nearest Neighbor classifier in combination with a set of color and textural features. In addition, a radiative feature was introduced which improved the discrimination by up to 10%. The performance of the algorithm mainly depends on the atmospheric conditions, site-specific characteristics, the randomness of the selected images, and possible visual misclassifications: The mean success rate was 80–90% when the image only contained a single cloud class but dropped to 50–70% if the test images were completely randomly selected and multiple cloud classes occurred in the images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops and tests a comparative effectiveness methodology utilizing a novel approach to the application of Data Envelopment Analysis (DEA) in health studies. The concept of performance tiers (PerT) is introduced as terminology to express a relative risk class for individuals within a peer group and the PerT calculation is implemented with operations research (DEA) and spatial algorithms. The analysis results in the discrimination of the individual data observations into a relative risk classification by the DEA-PerT methodology. The performance of two distance measures, kNN (k-nearest neighbor) and Mahalanobis, was subsequently tested to classify new entrants into the appropriate tier. The methods were applied to subject data for the 14 year old cohort in the Project HeartBeat! study.^ The concepts presented herein represent a paradigm shift in the potential for public health applications to identify and respond to individual health status. The resultant classification scheme provides descriptive, and potentially prescriptive, guidance to assess and implement treatments and strategies to improve the delivery and performance of health systems. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tumor necrosis factor (TNF)-Receptor Associated Factors (TRAFs) are a family of signal transducer proteins. TRAF6 is a unique member of this family in that it is involved in not only the TNF superfamily, but the toll-like receptor (TLR)/IL-1R (TIR) superfamily. The formation of the complex consisting of Receptor Activator of Nuclear Factor κ B (RANK), with its ligand (RANKL) results in the recruitment of TRAF6, which activates NF-κB, JNK and MAP kinase pathways. TRAF6 is critical in signaling with leading to release of various growth factors in bone, and promotes osteoclastogenesis. TRAF6 has also been implicated as an oncogene in lung cancer and as a target in multiple myeloma. In the hopes of developing small molecule inhibitors of the TRAF6-RANK interaction, multiple steps were carried out. Computational prediction of hot spot residues on the protein-protein interaction of TRAF6 and RANK were examined. Three methods were used: Robetta, KFC2, and HotPoint, each of which uses a different methodology to determine if a residue is a hot spot. These hot spot predictions were considered the basis for resolving the binding site for in silico high-throughput screening using GOLD and the MyriaScreen database of drug/lead-like compounds. Computationally intensive molecular dynamics simulations highlighted the binding mechanism and TRAF6 structural changes upon hit binding. Compounds identified as hits were verified using a GST-pull down assay, comparing inhibition to a RANK decoy peptide. Since many drugs fail due to lack of efficacy and toxicity, predictive models for the evaluation of the LD50 and bioavailability of our TRAF6 hits, and these models can be used towards other drugs and small molecule therapeutics as well. Datasets of compounds and their corresponding bioavailability and LD50 values were curated based, and QSAR models were built using molecular descriptors of these compounds using the k-nearest neighbor (k-NN) method, and quality of these models were cross-validated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train