54 resultados para SVMs
Resumo:
Impressive claims have been made for the performance of the SNoW algorithm on face detection tasks by Yang et. al. [7]. In particular, by looking at both their results and those of Heisele et. al. [3], one could infer that the SNoW system performed substantially better than an SVM-based system, even when the SVM used a polynomial kernel and the SNoW system used a particularly simplistic 'primitive' linear representation. We evaluated the two approaches in a controlled experiment, looking directly at performance on a simple, fixed-sized test set, isolating out 'infrastructure' issues related to detecting faces at various scales in large images. We found that SNoW performed about as well as linear SVMs, and substantially worse than polynomial SVMs.
Resumo:
In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.
Resumo:
This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
A classificação automática de sons urbanos é importante para o monitoramento ambiental. Este trabalho apresenta uma nova metodologia para classificar sons urbanos, que se baseia na descoberta de padrões frequentes (motifs) nos sinais sonoros e utiliza-los como atributos para a classificação. Para extrair os motifs é utilizado um método de descoberta multi-resolução baseada em SAX. Para a classificação são usadas árvores de decisão e SVMs. Esta nova metodologia é comparada com outra bastante utilizada baseada em MFCC. Para a realização de experiências foi utilizado o dataset UrbanSound disponível publicamente. Realizadas as experiências, foi possível concluir que os atributos motif são melhores que os MFCC a discriminar sons com timbres semelhantes e que os melhores resultados são conseguidos com ambos os tipos de atributos combinados. Neste trabalho foi também desenvolvida uma aplicação móvel para Android que permite utilizar os métodos de classificação desenvolvidos num contexto de vida real e expandir o dataset.
Resumo:
Programa Doutoral em Engenharia Eletrónica e de Computadores
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Raman spectroscopy has become an attractive tool for the analysis of pharmaceutical solid dosage forms. In the present study it is used to ensure the identity of tablets. The two main applications of this method are release of final products in quality control and detection of counterfeits. Twenty-five product families of tablets have been included in the spectral library and a non-linear classification method, the Support Vector Machines (SVMs), has been employed. Two calibrations have been developed in cascade: the first one identifies the product family while the second one specifies the formulation. A product family comprises different formulations that have the same active pharmaceutical ingredient (API) but in a different amount. Once the tablets have been classified by the SVM model, API peaks detection and correlation are applied in order to have a specific method for the identification and allow in the future to discriminate counterfeits from genuine products. This calibration strategy enables the identification of 25 product families without error and in the absence of prior information about the sample. Raman spectroscopy coupled with chemometrics is therefore a fast and accurate tool for the identification of pharmaceutical tablets.
Resumo:
Cannabis cultivation in order to produce drugs is forbidden in Switzerland. Thus, law enforcement authorities regularly ask forensic laboratories to determinate cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. As required by the EU official analysis protocol the THC rate of cannabis is measured from the flowers at maturity. When laboratories are confronted to seedlings, they have to lead the plant to maturity, meaning a time consuming and costly procedure. This study investigated the discrimination of fibre type from drug type Cannabis seedlings by analysing the compounds found in their leaves and using chemometrics tools. 11 legal varieties allowed by the Swiss Federal Office for Agriculture and 13 illegal ones were greenhouse grown and analysed using a gas chromatograph interfaced with a mass spectrometer. Compounds that show high discrimination capabilities in the seedlings have been identified and a support vector machines (SVMs) analysis was used to classify the cannabis samples. The overall set of samples shows a classification rate above 99% with false positive rates less than 2%. This model allows then discrimination between fibre and drug type Cannabis at an early stage of growth. Therefore it is not necessary to wait plants' maturity to quantify their amount of THC in order to determine their chemotype. This procedure could be used for the control of legal (fibre type) and illegal (drug type) Cannabis production.
Resumo:
Monitoring of posture allocations and activities enables accurate estimation of energy expenditure and may aid in obesity prevention and treatment. At present, accurate devices rely on multiple sensors distributed on the body and thus may be too obtrusive for everyday use. This paper presents a novel wearable sensor, which is capable of very accurate recognition of common postures and activities. The patterns of heel acceleration and plantar pressure uniquely characterize postures and typical activities while requiring minimal preprocessing and no feature extraction. The shoe sensor was tested in nine adults performing sitting and standing postures and while walking, running, stair ascent/descent and cycling. Support vector machines (SVMs) were used for classification. A fourfold validation of a six-class subject-independent group model showed 95.2% average accuracy of posture/activity classification on full sensor set and over 98% on optimized sensor set. Using a combination of acceleration/pressure also enabled a pronounced reduction of the sampling frequency (25 to 1 Hz) without significant loss of accuracy (98% versus 93%). Subjects had shoe sizes (US) M9.5-11 and W7-9 and body mass index from 18.1 to 39.4 kg/m2 and thus suggesting that the device can be used by individuals with varying anthropometric characteristics.
Resumo:
BACKGROUND AND PURPOSE: MCI was recently subdivided into sd-aMCI, sd-fMCI, and md-aMCI. The current investigation aimed to discriminate between MCI subtypes by using DTI. MATERIALS AND METHODS: Sixty-six prospective participants were included: 18 with sd-aMCI, 13 with sd-fMCI, and 35 with md-aMCI. Statistics included group comparisons using TBSS and individual classification using SVMs. RESULTS: The group-level analysis revealed a decrease in FA in md-aMCI versus sd-aMCI in an extensive bilateral, right-dominant network, and a more pronounced reduction of FA in md-aMCI compared with sd-fMCI in right inferior fronto-occipital fasciculus and inferior longitudinal fasciculus. The comparison between sd-fMCI and sd-aMCI, as well as the analysis of the other diffusion parameters, yielded no significant group differences. The individual-level SVM analysis provided discrimination between the MCI subtypes with accuracies around 97%. The major limitation is the relatively small number of cases of MCI. CONCLUSIONS: Our data show that, at the group level, the md-aMCI subgroup has the most pronounced damage in white matter integrity. Individually, SVM analysis of white matter FA provided highly accurate classification of MCI subtypes.
Resumo:
In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.
Resumo:
The development of statistical models for forensic fingerprint identification purposes has been the subject of increasing research attention in recent years. This can be partly seen as a response to a number of commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. In addition, key forensic identification bodies such as ENFSI [1] and IAI [2] have recently endorsed and acknowledged the potential benefits of using statistical models as an important tool in support of the fingerprint identification process within the ACE-V framework. In this paper, we introduce a new Likelihood Ratio (LR) model based on Support Vector Machines (SVMs) trained with features discovered via morphometric and spatial analyses of corresponding minutiae configurations for both match and close non-match populations often found in AFIS candidate lists. Computed LR values are derived from a probabilistic framework based on SVMs that discover the intrinsic spatial differences of match and close non-match populations. Lastly, experimentation performed on a set of over 120,000 publicly available fingerprint images (mostly sourced from the National Institute of Standards and Technology (NIST) datasets) and a distortion set of approximately 40,000 images, is presented, illustrating that the proposed LR model is reliably guiding towards the right proposition in the identification assessment of match and close non-match populations. Results further indicate that the proposed model is a promising tool for fingerprint practitioners to use for analysing the spatial consistency of corresponding minutiae configurations.
Resumo:
This article presents an experimental study about the classification ability of several classifiers for multi-classclassification of cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland lawenforcement authorities regularly ask forensic laboratories to determinate the chemotype of a seized cannabisplant and then to conclude if the plantation is legal or not. This classification is mainly performed when theplant is mature as required by the EU official protocol and then the classification of cannabis seedlings is a timeconsuming and costly procedure. A previous study made by the authors has investigated this problematic [1]and showed that it is possible to differentiate between drug type (illegal) and fibre type (legal) cannabis at anearly stage of growth using gas chromatography interfaced with mass spectrometry (GC-MS) based on therelative proportions of eight major leaf compounds. The aims of the present work are on one hand to continueformer work and to optimize the methodology for the discrimination of drug- and fibre type cannabisdeveloped in the previous study and on the other hand to investigate the possibility to predict illegal cannabisvarieties. Seven classifiers for differentiating between cannabis seedlings are evaluated in this paper, namelyLinear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), Nearest NeighbourClassification (NNC), Learning Vector Quantization (LVQ), Radial Basis Function Support Vector Machines(RBF SVMs), Random Forest (RF) and Artificial Neural Networks (ANN). The performance of each method wasassessed using the same analytical dataset that consists of 861 samples split into drug- and fibre type cannabiswith drug type cannabis being made up of 12 varieties (i.e. 12 classes). The results show that linear classifiersare not able to manage the distribution of classes in which some overlap areas exist for both classificationproblems. Unlike linear classifiers, NNC and RBF SVMs best differentiate cannabis samples both for 2-class and12-class classifications with average classification results up to 99% and 98%, respectively. Furthermore, RBFSVMs correctly classified into drug type cannabis the independent validation set, which consists of cannabisplants coming from police seizures. In forensic case work this study shows that the discrimination betweencannabis samples at an early stage of growth is possible with fairly high classification performance fordiscriminating between cannabis chemotypes or between drug type cannabis varieties.
Resumo:
Avalanche forecasting is a complex process involving the assimilation of multiple data sources to make predictions over varying spatial and temporal resolutions. Numerically assisted forecasting often uses nearest neighbour methods (NN), which are known to have limitations when dealing with high dimensional data. We apply Support Vector Machines to a dataset from Lochaber, Scotland to assess their applicability in avalanche forecasting. Support Vector Machines (SVMs) belong to a family of theoretically based techniques from machine learning and are designed to deal with high dimensional data. Initial experiments showed that SVMs gave results which were comparable with NN for categorical and probabilistic forecasts. Experiments utilising the ability of SVMs to deal with high dimensionality in producing a spatial forecast show promise, but require further work.