813 resultados para Support vector machines
Resumo:
Noise is one of the main factors degrading the quality of original multichannel remote sensing data and its presence influences classification efficiency, object detection, etc. Thus, pre-filtering is often used to remove noise and improve the solving of final tasks of multichannel remote sensing. Recent studies indicate that a classical model of additive noise is not adequate enough for images formed by modern multichannel sensors operating in visible and infrared bands. However, this fact is often ignored by researchers designing noise removal methods and algorithms. Because of this, we focus on the classification of multichannel remote sensing images in the case of signal-dependent noise present in component images. Three approaches to filtering of multichannel images for the considered noise model are analysed, all based on discrete cosine transform in blocks. The study is carried out not only in terms of conventional efficiency metrics used in filtering (MSE) but also in terms of multichannel data classification accuracy (probability of correct classification, confusion matrix). The proposed classification system combines the pre-processing stage where a DCT-based filter processes the blocks of the multichannel remote sensing image and the classification stage. Two modern classifiers are employed, radial basis function neural network and support vector machines. Simulations are carried out for three-channel image of Landsat TM sensor. Different cases of learning are considered: using noise-free samples of the test multichannel image, the noisy multichannel image and the pre-filtered one. It is shown that the use of the pre-filtered image for training produces better classification in comparison to the case of learning for the noisy image. It is demonstrated that the best results for both groups of quantitative criteria are provided if a proposed 3D discrete cosine transform filter equipped by variance stabilizing transform is applied. The classification results obtained for data pre-filtered in different ways are in agreement for both considered classifiers. Comparison of classifier performance is carried out as well. The radial basis neural network classifier is less sensitive to noise in original images, but after pre-filtering the performance of both classifiers is approximately the same.
Resumo:
A study was performed to determine if targeted metabolic profiling of cattle sera could be used to establish a predictive tool for identifying hormone misuse in cattle. Metabolites were assayed in heifers (n ) 5) treated with nortestosterone decanoate (0.85 mg/kg body weight), untreated heifers (n ) 5), steers (n ) 5) treated with oestradiol benzoate (0.15 mg/kg body weight) and untreated steers (n ) 5). Treatments were administered on days 0, 14, and 28 throughout a 42 day study period. Two support vector machines (SVMs) were trained, respectively, from heifer and steer data to identify hormonetreated animals. Performance of both SVM classifiers were evaluated by sensitivity and specificity of treatment prediction. The SVM trained on steer data achieved 97.33% sensitivity and 93.85% specificity while the one on heifer data achieved 94.67% sensitivity and 87.69% specificity. Solutions of SVM classifiers were further exploited to determine those days when classification accuracy of the SVM was most reliable. For heifers and steers, days 17-35 were determined to be the most selective. In summary, bioinformatics applied to targeted metabolic profiles generated from standard clinical chemistry analyses, has yielded an accurate, inexpensive, high-throughput test for predicting steroid abuse in cattle.
Resumo:
This paper introduces an automated computer- assisted system for the diagnosis of cervical intraepithelial neoplasia (CIN) using ultra-large cervical histological digital slides. The system contains two parts: the segmentation of squamous epithelium and the diagnosis of CIN. For the segmentation, to reduce processing time, a multiresolution method is developed. The squamous epithelium layer is first segmented at a low (2X) resolution. The boundaries are further fine tuned at a higher (20X) resolution. The block-based segmentation method uses robust texture feature vectors in combination with support vector machines (SVMs) to perform classification. Medical rules are finally applied. In testing, segmentation using 31 digital slides achieves 94.25% accuracy. For the diagnosis of CIN, changes in nuclei structure and morphology along lines perpendicular to the main axis of the squamous epithelium are quantified and classified. Using multi-category SVM, perpendicular lines are classified into Normal, CIN I, CIN II, and CIN III. The robustness of the system in term of regional diagnosis is measured against pathologists' diagnoses and inter-observer variability between two pathologists is considered. Initial results suggest that the system has potential as a tool both to assist in pathologists' diagnoses, and in training.
Resumo:
This paper presents a feature selection method for data classification, which combines a model-based variable selection technique and a fast two-stage subset selection algorithm. The relationship between a specified (and complete) set of candidate features and the class label is modelled using a non-linear full regression model which is linear-in-the-parameters. The performance of a sub-model measured by the sum of the squared-errors (SSE) is used to score the informativeness of the subset of features involved in the sub-model. The two-stage subset selection algorithm approaches a solution sub-model with the SSE being locally minimized. The features involved in the solution sub-model are selected as inputs to support vector machines (SVMs) for classification. The memory requirement of this algorithm is independent of the number of training patterns. This property makes this method suitable for applications executed in mobile devices where physical RAM memory is very limited. An application was developed for activity recognition, which implements the proposed feature selection algorithm and an SVM training procedure. Experiments are carried out with the application running on a PDA for human activity recognition using accelerometer data. A comparison with an information gain based feature selection method demonstrates the effectiveness and efficiency of the proposed algorithm.
Resumo:
To improve the performance of classification using Support Vector Machines (SVMs) while reducing the model selection time, this paper introduces Differential Evolution, a heuristic method for model selection in two-class SVMs with a RBF kernel. The model selection method and related tuning algorithm are both presented. Experimental results from application to a selection of benchmark datasets for SVMs show that this method can produce an optimized classification in less time and with higher accuracy than a classical grid search. Comparison with a Particle Swarm Optimization (PSO) based alternative is also included.
Resumo:
Background
G protein-coupled receptors (GPCRs) constitute one of the largest groupings of eukaryotic proteins, and represent a particularly lucrative set of pharmaceutical targets. They play an important role in eukaryotic signal transduction and physiology, mediating cellular responses to a diverse range of extracellular stimuli. The phylum Platyhelminthes is of considerable medical and biological importance, housing major pathogens as well as established model organisms. The recent availability of genomic data for the human blood fluke Schistosoma mansoni and the model planarian Schmidtea mediterranea paves the way for the first comprehensive effort to identify and analyze GPCRs in this important phylum.
Results
Application of a novel transmembrane-oriented approach to receptor mining led to the discovery of 117 S. mansoni GPCRs, representing all of the major families; 105 Rhodopsin, 2 Glutamate, 3 Adhesion, 2 Secretin and 5 Frizzled. Similarly, 418 Rhodopsin, 9 Glutamate, 21 Adhesion, 1 Secretin and 11 Frizzled S. mediterranea receptors were identified. Among these, we report the identification of novel receptor groupings, including a large and highly-diverged Platyhelminth-specific Rhodopsin subfamily, a planarian-specific Adhesion-like family, and atypical Glutamate-like receptors. Phylogenetic analysis was carried out following extensive gene curation. Support vector machines (SVMs) were trained and used for ligand-based classification of full-length Rhodopsin GPCRs, complementing phylogenetic and homology-based classification.
Conclusions
Genome-wide investigation of GPCRs in two platyhelminth genomes reveals an extensive and complex receptor signaling repertoire with many unique features. This work provides important sequence and functional leads for understanding basic flatworm receptor biology, and sheds light on a lucrative set of anthelmintic drug targets.
Resumo:
The concentration of organic acids in anaerobic digesters is one of the most critical parameters for monitoring and advanced control of anaerobic digestion processes. Thus, a reliable online-measurement system is absolutely necessary. A novel approach to obtaining these measurements indirectly and online using UV/vis spectroscopic probes, in conjunction with powerful pattern recognition methods, is presented in this paper. An UV/vis spectroscopic probe from S::CAN is used in combination with a custom-built dilution system to monitor the absorption of fully fermented sludge at a spectrum from 200 to 750 nm. Advanced pattern recognition methods are then used to map the non-linear relationship between measured absorption spectra to laboratory measurements of organic acid concentrations. Linear discriminant analysis, generalized discriminant analysis (GerDA), support vector machines (SVM), relevance vector machines, random forest and neural networks are investigated for this purpose and their performance compared. To validate the approach, online measurements have been taken at a full-scale 1.3-MW industrial biogas plant. Results show that whereas some of the methods considered do not yield satisfactory results, accurate prediction of organic acid concentration ranges can be obtained with both GerDA and SVM-based classifiers, with classification rates in excess of 87% achieved on test data.
Resumo:
The problem of learning from imbalanced data is of critical importance in a large number of application domains and can be a bottleneck in the performance of various conventional learning methods that assume the data distribution to be balanced. The class imbalance problem corresponds to dealing with the situation where one class massively outnumbers the other. The imbalance between majority and minority would lead machine learning to be biased and produce unreliable outcomes if the imbalanced data is used directly. There has been increasing interest in this research area and a number of algorithms have been developed. However, independent evaluation of the algorithms is limited. This paper aims at evaluating the performance of five representative data sampling methods namely SMOTE, ADASYN, BorderlineSMOTE, SMOTETomek and RUSBoost that deal with class imbalance problems. A comparative study is conducted and the performance of each method is critically analysed in terms of assessment metrics. © 2013 Springer-Verlag.
Resumo:
Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.
Resumo:
This paper proposes an efficient learning mechanism to build fuzzy rule-based systems through the construction of sparse least-squares support vector machines (LS-SVMs). In addition to the significantly reduced computational complexity in model training, the resultant LS-SVM-based fuzzy system is sparser while offers satisfactory generalization capability over unseen data. It is well known that the LS-SVMs have their computational advantage over conventional SVMs in the model training process; however, the model sparseness is lost, which is the main drawback of LS-SVMs. This is an open problem for the LS-SVMs. To tackle the nonsparseness issue, a new regression alternative to the Lagrangian solution for the LS-SVM is first presented. A novel efficient learning mechanism is then proposed in this paper to extract a sparse set of support vectors for generating fuzzy IF-THEN rules. This novel mechanism works in a stepwise subset selection manner, including a forward expansion phase and a backward exclusion phase in each selection step. The implementation of the algorithm is computationally very efficient due to the introduction of a few key techniques to avoid the matrix inverse operations to accelerate the training process. The computational efficiency is also confirmed by detailed computational complexity analysis. As a result, the proposed approach is not only able to achieve the sparseness of the resultant LS-SVM-based fuzzy systems but significantly reduces the amount of computational effort in model training as well. Three experimental examples are presented to demonstrate the effectiveness and efficiency of the proposed learning mechanism and the sparseness of the obtained LS-SVM-based fuzzy systems, in comparison with other SVM-based learning techniques.
Resumo:
This experimental study focuses on a detection system at the seismic station level that should have a similar role to the detection algorithms based on the ratio STA/LTA. We tested two types of neural network: Multi-Layer Perceptrons and Support Vector Machines, trained in supervised mode. The universe of data consisted of 2903 patterns extracted from records of the PVAQ station, of the seismography network of the Institute of Meteorology of Portugal. The spectral characteristics of the records and its variation in time were reflected in the input patterns, consisting in a set of values of power spectral density in selected frequencies, extracted from a spectro gram calculated over a segment of record of pre-determined duration. The universe of data was divided, with about 60% for the training and the remainder reserved for testing and validation. To ensure that all patterns in the universe of data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. The best results, in terms of sensitivity and selectivity in the whole data ranged between 98% and 100%. These results compare very favorably with the ones obtained by the existing detection system, 50%.
Resumo:
This study describes the on-line operation of a seismic detection system to act at the level of a seismic station providing similar role to that of a STA /LTA ratio-based detection algorithms. The intelligent detector is a Support Vector Machine (SVM), trained with data consisting of 2903 patterns extracted from records of the PVAQ station, one of the seismographic network's stations of the Institute of Meteorology of Portugal (IM). Records' spectral variations in time and characteristics were reflected in the SVM input patterns, as a set of values of power spectral density at selected frequencies. To ensure that all patterns of the sample data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. After having been trained, the proposed system was experimented in continuous operation for unseen (out of sample) data, and the SVM detector obtained 97.7% and 98.7% of sensitivity and selectivity, respectively. The same type of ANN presented 88.4 % and 99.4% of sensitivity and selectivity when applied to data of a different seismic station of IM. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2014
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade No Lisboa para obtenção de grau de Mestre em Engenharia de Informática