887 resultados para Support Vector machines
Mejora diagnóstica de hepatopatías de afectación difusa mediante técnicas de inteligencia artificial
Resumo:
The automatic diagnostic discrimination is an application of artificial intelligence techniques that can solve clinical cases based on imaging. Diffuse liver diseases are diseases of wide prominence in the population and insidious course, yet early in its progression. Early and effective diagnosis is necessary because many of these diseases progress to cirrhosis and liver cancer. The usual technique of choice for accurate diagnosis is liver biopsy, an invasive and not without incompatibilities one. It is proposed in this project an alternative non-invasive and free of contraindications method based on liver ultrasonography. The images are digitized and then analyzed using statistical techniques and analysis of texture. The results are validated from the pathology report. Finally, we apply artificial intelligence techniques as Fuzzy k-Means or Support Vector Machines and compare its significance to the analysis Statistics and the report of the clinician. The results show that this technique is significantly valid and a promising alternative as a noninvasive diagnostic chronic liver disease from diffuse involvement. Artificial Intelligence classifying techniques significantly improve the diagnosing discrimination compared to other statistics.
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Raman spectroscopy has become an attractive tool for the analysis of pharmaceutical solid dosage forms. In the present study it is used to ensure the identity of tablets. The two main applications of this method are release of final products in quality control and detection of counterfeits. Twenty-five product families of tablets have been included in the spectral library and a non-linear classification method, the Support Vector Machines (SVMs), has been employed. Two calibrations have been developed in cascade: the first one identifies the product family while the second one specifies the formulation. A product family comprises different formulations that have the same active pharmaceutical ingredient (API) but in a different amount. Once the tablets have been classified by the SVM model, API peaks detection and correlation are applied in order to have a specific method for the identification and allow in the future to discriminate counterfeits from genuine products. This calibration strategy enables the identification of 25 product families without error and in the absence of prior information about the sample. Raman spectroscopy coupled with chemometrics is therefore a fast and accurate tool for the identification of pharmaceutical tablets.
Resumo:
The 2008 Data Fusion Contest organized by the IEEE Geoscience and Remote Sensing Data Fusion Technical Committee deals with the classification of high-resolution hyperspectral data from an urban area. Unlike in the previous issues of the contest, the goal was not only to identify the best algorithm but also to provide a collaborative effort: The decision fusion of the best individual algorithms was aiming at further improving the classification performances, and the best algorithms were ranked according to their relative contribution to the decision fusion. This paper presents the five awarded algorithms and the conclusions of the contest, stressing the importance of decision fusion, dimension reduction, and supervised classification methods, such as neural networks and support vector machines.
Resumo:
Cannabis cultivation in order to produce drugs is forbidden in Switzerland. Thus, law enforcement authorities regularly ask forensic laboratories to determinate cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. As required by the EU official analysis protocol the THC rate of cannabis is measured from the flowers at maturity. When laboratories are confronted to seedlings, they have to lead the plant to maturity, meaning a time consuming and costly procedure. This study investigated the discrimination of fibre type from drug type Cannabis seedlings by analysing the compounds found in their leaves and using chemometrics tools. 11 legal varieties allowed by the Swiss Federal Office for Agriculture and 13 illegal ones were greenhouse grown and analysed using a gas chromatograph interfaced with a mass spectrometer. Compounds that show high discrimination capabilities in the seedlings have been identified and a support vector machines (SVMs) analysis was used to classify the cannabis samples. The overall set of samples shows a classification rate above 99% with false positive rates less than 2%. This model allows then discrimination between fibre and drug type Cannabis at an early stage of growth. Therefore it is not necessary to wait plants' maturity to quantify their amount of THC in order to determine their chemotype. This procedure could be used for the control of legal (fibre type) and illegal (drug type) Cannabis production.
Resumo:
Monitoring of posture allocations and activities enables accurate estimation of energy expenditure and may aid in obesity prevention and treatment. At present, accurate devices rely on multiple sensors distributed on the body and thus may be too obtrusive for everyday use. This paper presents a novel wearable sensor, which is capable of very accurate recognition of common postures and activities. The patterns of heel acceleration and plantar pressure uniquely characterize postures and typical activities while requiring minimal preprocessing and no feature extraction. The shoe sensor was tested in nine adults performing sitting and standing postures and while walking, running, stair ascent/descent and cycling. Support vector machines (SVMs) were used for classification. A fourfold validation of a six-class subject-independent group model showed 95.2% average accuracy of posture/activity classification on full sensor set and over 98% on optimized sensor set. Using a combination of acceleration/pressure also enabled a pronounced reduction of the sampling frequency (25 to 1 Hz) without significant loss of accuracy (98% versus 93%). Subjects had shoe sizes (US) M9.5-11 and W7-9 and body mass index from 18.1 to 39.4 kg/m2 and thus suggesting that the device can be used by individuals with varying anthropometric characteristics.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
To be diagnostically useful, structural MRI must reliably distinguish Alzheimer's disease (AD) from normal aging in individual scans. Recent advances in statistical learning theory have led to the application of support vector machines to MRI for detection of a variety of disease states. The aims of this study were to assess how successfully support vector machines assigned individual diagnoses and to determine whether data-sets combined from multiple scanners and different centres could be used to obtain effective classification of scans. We used linear support vector machines to classify the grey matter segment of T1-weighted MR scans from pathologically proven AD patients and cognitively normal elderly individuals obtained from two centres with different scanning equipment. Because the clinical diagnosis of mild AD is difficult we also tested the ability of support vector machines to differentiate control scans from patients without post-mortem confirmation. Finally we sought to use these methods to differentiate scans between patients suffering from AD from those with frontotemporal lobar degeneration. Up to 96% of pathologically verified AD patients were correctly classified using whole brain images. Data from different centres were successfully combined achieving comparable results from the separate analyses. Importantly, data from one centre could be used to train a support vector machine to accurately differentiate AD and normal ageing scans obtained from another centre with different subjects and different scanner equipment. Patients with mild, clinically probable AD and age/sex matched controls were correctly separated in 89% of cases which is compatible with published diagnosis rates in the best clinical centres. This method correctly assigned 89% of patients with post-mortem confirmed diagnosis of either AD or frontotemporal lobar degeneration to their respective group. Our study leads to three conclusions: Firstly, support vector machines successfully separate patients with AD from healthy aging subjects. Secondly, they perform well in the differential diagnosis of two different forms of dementia. Thirdly, the method is robust and can be generalized across different centres. This suggests an important role for computer based diagnostic image analysis for clinical practice.
Resumo:
L'objectiu principal del projecte és la creació d'una aplicació per a telèfons intel·ligents que intenti predir la volatilitat no atribuïble al mercat per tal de permetre a l'usuari crear portfolios òptims utilitzant tècniques d'intel·ligència artificial com són les Support Vector Machines (SVM). Una vegada s'hagi predit aquesta volatilitat es crearà un portfolio òptim amb el pes adequat de cada un dels valors, per tal d'obtenir una inversió amb el mínim risc possible.
Resumo:
In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).
Resumo:
This work proposes an original contribution to the understanding of shermen spatial behavior, based on the behavioral ecology and movement ecology paradigms. Through the analysis of Vessel Monitoring System (VMS) data, we characterized the spatial behavior of Peruvian anchovy shermen at di erent scales: (1) the behavioral modes within shing trips (i.e., searching, shing and cruising); (2) the behavioral patterns among shing trips; (3) the behavioral patterns by shing season conditioned by ecosystem scenarios; and (4) the computation of maps of anchovy presence proxy from the spatial patterns of behavioral mode positions. At the rst scale considered, we compared several Markovian (hidden Markov and semi-Markov models) and discriminative models (random forests, support vector machines and arti cial neural networks) for inferring the behavioral modes associated with VMS tracks. The models were trained under a supervised setting and validated using tracks for which behavioral modes were known (from on-board observers records). Hidden semi-Markov models performed better, and were retained for inferring the behavioral modes on the entire VMS dataset. At the second scale considered, each shing trip was characterized by several features, including the time spent within each behavioral mode. Using a clustering analysis, shing trip patterns were classi ed into groups associated to management zones, eet segments and skippers' personalities. At the third scale considered, we analyzed how ecological conditions shaped shermen behavior. By means of co-inertia analyses, we found signi cant associations between shermen, anchovy and environmental spatial dynamics, and shermen behavioral responses were characterized according to contrasted environmental scenarios. At the fourth scale considered, we investigated whether the spatial behavior of shermen re ected to some extent the spatial distribution of anchovy. Finally, this work provides a wider view of shermen behavior: shermen are not only economic agents, but they are also foragers, constrained by ecosystem variability. To conclude, we discuss how these ndings may be of importance for sheries management, collective behavior analyses and end-to-end models.
Resumo:
In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.
Resumo:
The development of statistical models for forensic fingerprint identification purposes has been the subject of increasing research attention in recent years. This can be partly seen as a response to a number of commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. In addition, key forensic identification bodies such as ENFSI [1] and IAI [2] have recently endorsed and acknowledged the potential benefits of using statistical models as an important tool in support of the fingerprint identification process within the ACE-V framework. In this paper, we introduce a new Likelihood Ratio (LR) model based on Support Vector Machines (SVMs) trained with features discovered via morphometric and spatial analyses of corresponding minutiae configurations for both match and close non-match populations often found in AFIS candidate lists. Computed LR values are derived from a probabilistic framework based on SVMs that discover the intrinsic spatial differences of match and close non-match populations. Lastly, experimentation performed on a set of over 120,000 publicly available fingerprint images (mostly sourced from the National Institute of Standards and Technology (NIST) datasets) and a distortion set of approximately 40,000 images, is presented, illustrating that the proposed LR model is reliably guiding towards the right proposition in the identification assessment of match and close non-match populations. Results further indicate that the proposed model is a promising tool for fingerprint practitioners to use for analysing the spatial consistency of corresponding minutiae configurations.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.