15 resultados para least square-support vector machine

em Reposit


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Actualmente tem-se observado um aumento do volume de sinais de fala em diversas aplicações, que reforçam a necessidade de um processamento automático dos ficheiros. No campo do processamento automático destacam-se as aplicações de “diarização de orador”, que permitem catalogar os ficheiros de fala com a identidade de oradores e limites temporais de fala de cada um, através de um processo de segmentação e agrupamento. No contexto de agrupamento, este trabalho visa dar continuidade ao trabalho intitulado “Detecção do Orador”, com o desenvolvimento de um algoritmo de “agrupamento multi-orador” capaz de identificar e agrupar correctamente os oradores, sem conhecimento prévio do número ou da identidade dos oradores presentes no ficheiro de fala. O sistema utiliza os coeficientes “Mel Line Spectrum Frequencies” (MLSF) como característica acústica de fala, uma segmentação de fala baseada na energia e uma estrutura do tipo “Universal Background Model - Gaussian Mixture Model” (UBM-GMM) adaptado com o classificador “Support Vector Machine” (SVM). No trabalho foram analisadas três métricas de discriminação dos modelos SVM e a avaliação dos resultados foi feita através da taxa de erro “Speaker Error Rate” (SER), que quantifica percentualmente o número de segmentos “fala” mal classificados. O algoritmo implementado foi ajustado às características da língua portuguesa através de um corpus com 14 ficheiros de treino e 30 ficheiros de teste. Os ficheiros de treino dos modelos e classificação final, enquanto os ficheiros de foram utilizados para avaliar o desempenho do algoritmo. A interacção com o algoritmo foi dinamizada com a criação de uma interface gráfica que permite receber o ficheiro de teste, processá-lo, listar os resultados ou gerar um vídeo para o utilizador confrontar o sinal de fala com os resultados de classificação.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chronic liver disease (CLD) is most of the time an asymptomatic, progressive, and ultimately potentially fatal disease. In this study, an automatic hierarchical procedure to stage CLD using ultrasound images, laboratory tests, and clinical records are described. The first stage of the proposed method, called clinical based classifier (CBC), discriminates healthy from pathologic conditions. When nonhealthy conditions are detected, the method refines the results in three exclusive pathologies in a hierarchical basis: 1) chronic hepatitis; 2) compensated cirrhosis; and 3) decompensated cirrhosis. The features used as well as the classifiers (Bayes, Parzen, support vector machine, and k-nearest neighbor) are optimally selected for each stage. A large multimodal feature database was specifically built for this study containing 30 chronic hepatitis cases, 34 compensated cirrhosis cases, and 36 decompensated cirrhosis cases, all validated after histopathologic analysis by liver biopsy. The CBC classification scheme outperformed the nonhierachical one against all scheme, achieving an overall accuracy of 98.67% for the normal detector, 87.45% for the chronic hepatitis detector, and 95.71% for the cirrhosis detector.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chronic Liver Disease is a progressive, most of the time asymptomatic, and potentially fatal disease. In this paper, a semi-automatic procedure to stage this disease is proposed based on ultrasound liver images, clinical and laboratorial data. In the core of the algorithm two classifiers are used: a k nearest neighbor and a Support Vector Machine, with different kernels. The classifiers were trained with the proposed multi-modal feature set and the results obtained were compared with the laboratorial and clinical feature set. The results showed that using ultrasound based features, in association with laboratorial and clinical features, improve the classification accuracy. The support vector machine, polynomial kernel, outperformed the others classifiers in every class studied. For the Normal class we achieved 100% accuracy, for the chronic hepatitis with cirrhosis 73.08%, for compensated cirrhosis 59.26% and for decompensated cirrhosis 91.67%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work liver contour is semi-automatically segmented and quantified in order to help the identification and diagnosis of diffuse liver disease. The features extracted from the liver contour are jointly used with clinical and laboratorial data in the staging process. The classification results of a support vector machine, a Bayesian and a k-nearest neighbor classifier are compared. A population of 88 patients at five different stages of diffuse liver disease and a leave-one-out cross-validation strategy are used in the classification process. The best results are obtained using the k-nearest neighbor classifier, with an overall accuracy of 80.68%. The good performance of the proposed method shows a reliable indicator that can improve the information in the staging of diffuse liver disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chpater in Book Proceedings with Peer Review Second Iberian Conference, IbPRIA 2005, Estoril, Portugal, June 7-9, 2005, Proceedings, Part II

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human mesenchymal stem/stromal cells (MSCs) have received considerable attention in the field of cell-based therapies due to their high differentiation potential and ability to modulate immune responses. However, since these cells can only be isolated in very low quantities, successful realization of these therapies requires MSCs ex-vivo expansion to achieve relevant cell doses. The metabolic activity is one of the parameters often monitored during MSCs cultivation by using expensive multi-analytical methods, some of them time-consuming. The present work evaluates the use of mid-infrared (MIR) spectroscopy, through rapid and economic high-throughput analyses associated to multivariate data analysis, to monitor three different MSCs cultivation runs conducted in spinner flasks, under xeno-free culture conditions, which differ in the type of microcarriers used and the culture feeding strategy applied. After evaluating diverse spectral preprocessing techniques, the optimized partial least square (PLS) regression models based on the MIR spectra to estimate the glucose, lactate and ammonia concentrations yielded high coefficients of determination (R2 ≥ 0.98, ≥0.98, and ≥0.94, respectively) and low prediction errors (RMSECV ≤ 4.7%, ≤4.4% and ≤5.7%, respectively). Besides PLS models valid for specific expansion protocols, a robust model simultaneously valid for the three processes was also built for predicting glucose, lactate and ammonia, yielding a R2 of 0.95, 0.97 and 0.86, and a RMSECV of 0.33, 0.57, and 0.09 mM, respectively. Therefore, MIR spectroscopy combined with multivariate data analysis represents a promising tool for both optimization and control of MSCs expansion processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Infrared spectroscopy, either in the near and mid (NIR/MIR) region of the spectra, has gained great acceptance in the industry for bioprocess monitoring according to Process Analytical Technology, due to its rapid, economic, high sensitivity mode of application and versatility. Due to the relevance of cyprosin (mostly for dairy industry), and as NIR and MIR spectroscopy presents specific characteristics that ultimately may complement each other, in the present work these techniques were compared to monitor and characterize by in situ and by at-line high-throughput analysis, respectively, recombinant cyprosin production by Saccharomyces cerevisiae. Partial least-square regression models, relating NIR and MIR-spectral features with biomass, cyprosin activity, specific activity, glucose, galactose, ethanol and acetate concentration were developed, all presenting, in general, high regression coefficients and low prediction errors. In the case of biomass and glucose slight better models were achieved by in situ NIR spectroscopic analysis, while for cyprosin activity and specific activity slight better models were achieved by at-line MIR spectroscopic analysis. Therefore both techniques enabled to monitor the highly dynamic cyprosin production bioprocess, promoting by this way more efficient platforms for the bioprocess optimization and control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction plays a crucial role in many hyperspectral data processing and analysis algorithms. This paper proposes a new mean squared error based approach to determine the signal subspace in hyperspectral imagery. The method first estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least square sense. The effectiveness of the proposed method is illustrated using simulated and real hyperspectral images.