13 resultados para infrared spectroscopy,chemometrics,least squares support vector machines

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The pattern classification is one of the machine learning subareas that has the most outstanding. Among the various approaches to solve pattern classification problems, the Support Vector Machines (SVM) receive great emphasis, due to its ease of use and good generalization performance. The Least Squares formulation of SVM (LS-SVM) finds the solution by solving a set of linear equations instead of quadratic programming implemented in SVM. The LS-SVMs provide some free parameters that have to be correctly chosen to achieve satisfactory results in a given task. Despite the LS-SVMs having high performance, lots of tools have been developed to improve them, mainly the development of new classifying methods and the employment of ensembles, in other words, a combination of several classifiers. In this work, our proposal is to use an ensemble and a Genetic Algorithm (GA), search algorithm based on the evolution of species, to enhance the LSSVM classification. In the construction of this ensemble, we use a random selection of attributes of the original problem, which it splits the original problem into smaller ones where each classifier will act. So, we apply a genetic algorithm to find effective values of the LS-SVM parameters and also to find a weight vector, measuring the importance of each machine in the final classification. Finally, the final classification is obtained by a linear combination of the decision values of the LS-SVMs with the weight vector. We used several classification problems, taken as benchmarks to evaluate the performance of the algorithm and compared the results with other classifiers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of the maps obtained from remote sensing orbital images submitted to digital processing became fundamental to optimize conservation and monitoring actions of the coral reefs. However, the accuracy reached in the mapping of submerged areas is limited by variation of the water column that degrades the signal received by the orbital sensor and introduces errors in the final result of the classification. The limited capacity of the traditional methods based on conventional statistical techniques to solve the problems related to the inter-classes took the search of alternative strategies in the area of the Computational Intelligence. In this work an ensemble classifiers was built based on the combination of Support Vector Machines and Minimum Distance Classifier with the objective of classifying remotely sensed images of coral reefs ecosystem. The system is composed by three stages, through which the progressive refinement of the classification process happens. The patterns that received an ambiguous classification in a certain stage of the process were revalued in the subsequent stage. The prediction non ambiguous for all the data happened through the reduction or elimination of the false positive. The images were classified into five bottom-types: deep water; under-water corals; inter-tidal corals; algal and sandy bottom. The highest overall accuracy (89%) was obtained from SVM with polynomial kernel. The accuracy of the classified image was compared through the use of error matrix to the results obtained by the application of other classification methods based on a single classifier (neural network and the k-means algorithm). In the final, the comparison of results achieved demonstrated the potential of the ensemble classifiers as a tool of classification of images from submerged areas subject to the noise caused by atmospheric effects and the water column

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The automatic speech recognition by machine has been the target of researchers in the past five decades. In this period have been numerous advances, such as in the field of recognition of isolated words (commands), which has very high rates of recognition, currently. However, we are still far from developing a system that could have a performance similar to the human being (automatic continuous speech recognition). One of the great challenges of searches for continuous speech recognition is the large amount of pattern. The modern languages such as English, French, Spanish and Portuguese have approximately 500,000 words or patterns to be identified. The purpose of this study is to use smaller units than the word such as phonemes, syllables and difones units as the basis for the speech recognition, aiming to recognize any words without necessarily using them. The main goal is to reduce the restriction imposed by the excessive amount of patterns. In order to validate this proposal, the system was tested in the isolated word recognition in dependent-case. The phonemes characteristics of the Brazil s Portuguese language were used to developed the hierarchy decision system. These decisions are made through the use of neural networks SVM (Support Vector Machines). The main speech features used were obtained from the Wavelet Packet Transform. The descriptors MFCC (Mel-Frequency Cepstral Coefficient) are also used in this work. It was concluded that the method proposed in this work, showed good results in the steps of recognition of vowels, consonants (syllables) and words when compared with other existing methods in literature

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The human voice is an important communication tool and any disorder of the voice can have profound implications for social and professional life of an individual. Techniques of digital signal processing have been used by acoustic analysis of vocal disorders caused by pathologies in the larynx, due to its simplicity and noninvasive nature. This work deals with the acoustic analysis of voice signals affected by pathologies in the larynx, specifically, edema, and nodules on the vocal folds. The purpose of this work is to develop a classification system of voices to help pre-diagnosis of pathologies in the larynx, as well as monitoring pharmacological treatments and after surgery. Linear Prediction Coefficients (LPC), Mel Frequency cepstral coefficients (MFCC) and the coefficients obtained through the Wavelet Packet Transform (WPT) are applied to extract relevant characteristics of the voice signal. For the classification task is used the Support Vector Machine (SVM), which aims to build optimal hyperplanes that maximize the margin of separation between the classes involved. The hyperplane generated is determined by the support vectors, which are subsets of points in these classes. According to the database used in this work, the results showed a good performance, with a hit rate of 98.46% for classification of normal and pathological voices in general, and 98.75% in the classification of diseases together: edema and nodules

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Support Vector Machines (SVM) has attracted increasing attention in machine learning area, particularly on classification and patterns recognition. However, in some cases it is not easy to determinate accurately the class which given pattern belongs. This thesis involves the construction of a intervalar pattern classifier using SVM in association with intervalar theory, in order to model the separation of a pattern set between distinct classes with precision, aiming to obtain an optimized separation capable to treat imprecisions contained in the initial data and generated during the computational processing. The SVM is a linear machine. In order to allow it to solve real-world problems (usually nonlinear problems), it is necessary to treat the pattern set, know as input set, transforming from nonlinear nature to linear problem. The kernel machines are responsible to do this mapping. To create the intervalar extension of SVM, both for linear and nonlinear problems, it was necessary define intervalar kernel and the Mercer s theorem (which caracterize a kernel function) to intervalar function

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is combined with the potential of the technique of near infrared spectroscopy - NIR and chemometrics order to determine the content of diclofenac tablets, without destruction of the sample, to which was used as the reference method, ultraviolet spectroscopy, which is one of the official methods. In the construction of multivariate calibration models has been studied several types of pre-processing of NIR spectral data, such as scatter correction, first derivative. The regression method used in the construction of calibration models is the PLS (partial least squares) using NIR spectroscopic data of a set of 90 tablets were divided into two sets (calibration and prediction). 54 were used in the calibration samples and the prediction was used 36, since the calibration method used was crossvalidation method (full cross-validation) that eliminates the need for a validation set. The evaluation of the models was done by observing the values of correlation coefficient R 2 and RMSEC mean square error (calibration error) and RMSEP (forecast error). As the forecast values estimated for the remaining 36 samples, which the results were consistent with the values obtained by UV spectroscopy

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we used chemometric tools to classify and quantify the protein content in samples of milk powder. We applied the NIR diffuse reflectance spectroscopy combined with multivariate techniques. First, we carried out an exploratory method of samples by principal component analysis (PCA), then the classification of independent modeling of class analogy (SIMCA). Thus it became possible to classify the samples that were grouped by similarities in their composition. Finally, the techniques of partial least squares regression (PLS) and principal components regression (PCR) allowed the quantification of protein content in samples of milk powder, compared with the Kjeldahl reference method. A total of 53 samples of milk powder sold in the metropolitan areas of Natal, Salvador and Rio de Janeiro were acquired for analysis, in which after pre-treatment data, there were four models, which were employed for classification and quantification of samples. The methods employed after being assessed and validated showed good performance, good accuracy and reliability of the results, showing that the NIR technique can be a non invasive technique, since it produces no waste and saves time in analyzing the samples

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work calibration models were constructed to determine the content of total lipids and moisture in powdered milk samples. For this, used the near-infrared spectroscopy by diffuse reflectance, combined with multivariate calibration. Initially, the spectral data were submitted to correction of multiplicative light scattering (MSC) and Savitzsky-Golay smoothing. Then, the samples were divided into subgroups by application of hierarchical clustering analysis of the classes (HCA) and Ward Linkage criterion. Thus, it became possible to build regression models by partial least squares (PLS) that allowed the calibration and prediction of the content total lipid and moisture, based on the values obtained by the reference methods of Soxhlet and 105 ° C, respectively . Therefore, conclude that the NIR had a good performance for the quantification of samples of powdered milk, mainly by minimizing the analysis time, not destruction of the samples and not waste. Prediction models for determination of total lipids correlated (R) of 0.9955, RMSEP of 0.8952, therefore the average error between the Soxhlet and NIR was ± 0.70%, while the model prediction to content moisture correlated (R) of 0.9184, RMSEP, 0.3778 and error of ± 0.76%

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, the quantitative analysis of glucose, triglycerides and cholesterol (total and HDL) in both rat and human blood plasma was performed without any kind of pretreatment of samples, by using near infrared spectroscopy (NIR) combined with multivariate methods. For this purpose, different techniques and algorithms used to pre-process data, to select variables and to build multivariate regression models were compared between each other, such as partial least squares regression (PLS), non linear regression by artificial neural networks, interval partial least squares regression (iPLS), genetic algorithm (GA), successive projections algorithm (SPA), amongst others. Related to the determinations of rat blood plasma samples, the variables selection algorithms showed satisfactory results both for the correlation coefficients (R²) and for the values of root mean square error of prediction (RMSEP) for the three analytes, especially for triglycerides and cholesterol-HDL. The RMSEP values for glucose, triglycerides and cholesterol-HDL obtained through the best PLS model were 6.08, 16.07 e 2.03 mg dL-1, respectively. In the other case, for the determinations in human blood plasma, the predictions obtained by the PLS models provided unsatisfactory results with non linear tendency and presence of bias. Then, the ANN regression was applied as an alternative to PLS, considering its ability of modeling data from non linear systems. The root mean square error of monitoring (RMSEM) for glucose, triglycerides and total cholesterol, for the best ANN models, were 13.20, 10.31 e 12.35 mg dL-1, respectively. Statistical tests (F and t) suggest that NIR spectroscopy combined with multivariate regression methods (PLS and ANN) are capable to quantify the analytes (glucose, triglycerides and cholesterol) even when they are present in highly complex biological fluids, such as blood plasma

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study was to evaluate the potential of near-infrared reflectance spectroscopy (NIRS) as a rapid and non-destructive method to determine the soluble solid content (SSC), pH and titratable acidity of intact plums. Samples of plum with a total solids content ranging from 5.7 to 15%, pH from 2.72 to 3.84 and titratable acidity from 0.88 a 3.6% were collected from supermarkets in Natal-Brazil, and NIR spectra were acquired in the 714 2500 nm range. A comparison of several multivariate calibration techniques with respect to several pre-processing data and variable selection algorithms, such as interval Partial Least Squares (iPLS), genetic algorithm (GA), successive projections algorithm (SPA) and ordered predictors selection (OPS), was performed. Validation models for SSC, pH and titratable acidity had a coefficient of correlation (R) of 0.95 0.90 and 0.80, as well as a root mean square error of prediction (RMSEP) of 0.45ºBrix, 0.07 and 0.40%, respectively. From these results, it can be concluded that NIR spectroscopy can be used as a non-destructive alternative for measuring the SSC, pH and titratable acidity in plums

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aiming to consumer s safety the presence of pathogenic contaminants in foods must be monitored because they are responsible for foodborne outbreaks that depending on the level of contamination can ultimately cause the death of those who consume them. In industry is necessary that this identification be fast and profitable. This study shows the utility and application of near-infrared (NIR) transflectance spectroscopy as an alternative method for the identification and classification of Escherichia coli and Salmonella Enteritidis in commercial fruit pulp (pineapple). Principal Component Analysis (PCA), Independent Modeling of Class Analogy (SIMCA) and Discriminant Analysis Partial Least Squares (PLS-DA) were used in the analysis. It was not possible to obtain total separation between samples using PCA and SIMCA. The PLS-DA showed good performance in prediction capacity reaching 87.5% for E. coli and 88.3% for S. Enteritides, respectively. The best models were obtained for the PLS-DA with second derivative spectra treated with a sensitivity and specificity of 0.87 and 0.83, respectively. These results suggest that the NIR spectroscopy and PLS-DA can be used to discriminate and detect bacteria in the fruit pulp