28 resultados para multiclass classification problems
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
In the last decade, local image features have been widely used in robot visual localization. In order to assess image similarity, a strategy exploiting these features compares raw descriptors extracted from the current image with those in the models of places. This paper addresses the ensuing step in this process, where a combining function must be used to aggregate results and assign each place a score. Casting the problem in the multiple classifier systems framework, in this paper we compare several candidate combiners with respect to their performance in the visual localization task. For this evaluation, we selected the most popular methods in the class of non-trained combiners, namely the sum rule and product rule. A deeper insight into the potential of these combiners is provided through a discriminativity analysis involving the algebraic rules and two extensions of these methods: the threshold, as well as the weighted modifications. In addition, a voting method, previously used in robot visual localization, is assessed. Furthermore, we address the process of constructing a model of the environment by describing how the model granularity impacts upon performance. All combiners are tested on a visual localization task, carried out on a public dataset. It is experimentally demonstrated that the sum rule extensions globally achieve the best performance, confirming the general agreement on the robustness of this rule in other classification problems. The voting method, whilst competitive with the product rule in its standard form, is shown to be outperformed by its modified versions.
Resumo:
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.
Resumo:
Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
Resumo:
Esta tese tem por objectivo o desenho e avaliação de um sistema de contagem e classificação de veículos automóveis em tempo-real e sem fios. Pretende, também, ser uma alternativa aos actuais equipamentos, muito intrusivos nas vias rodoviárias. Esta tese inclui um estudo sobre as comunicações sem fios adequadas a uma rede de equipamentos sensores rodoviários, um estudo sobre a utilização do campo magnético como meio físico de detecção e contagem de veículos e um estudo sobre a autonomia energética dos equipamentos inseridos na via, com recurso, entre outros, à energia solar. O projecto realizado no âmbito desta tese incorpora, entre outros, a digitalização em tempo real da assinatura magnética deixada pela passagem de um veículo, no campo magnético da Terra, o respectivo envio para servidor via rádio e WAN, Wide Area Network, e o desenvolvimento de software tendo por base a pilha de protocolos ZigBee. Foram desenvolvidas aplicações para o equipamento sensor, para o coordenador, para o painel de controlo e para a biblioteca de Interface de um futuro servidor aplicacional. O software desenvolvido para o equipamento sensor incorpora ciclos de detecção e digitalização, com pausas de adormecimento de baixo consumo, e a activação das comunicações rádio durante a fase de envio, assegurando assim uma estratégia de poupança energética. Os resultados obtidos confirmam a viabilidade desta tecnologia para a detecção e contagem de veículos, assim como para a captura de assinatura usando magnetoresistências. Permitiram ainda verificar o alcance das comunicações sem fios com equipamento sensor embebido no asfalto e confirmar o modelo de cálculo da superfície do painel solar bem como o modelo de consumo energético do equipamento sensor.
Resumo:
Thirty years ago, G.N. de Oliveira has proposed the following completion problems: Describe the possible characteristic polynomials of [C-ij], i,j is an element of {1, 2}, where C-1,C-1 and C-2,C-2 are square submatrices, when some of the blocks C-ij are fixed and the others vary. Several of these problems remain unsolved. This paper gives the solution, over the field of real numbers, of Oliveira's problem where the blocks C-1,C-1, C-2,C-2 are fixed and the others vary.
Resumo:
This paper presents an integrated system for vehicle classification. This system aims to classify vehicles using different approaches: 1) based on the height of the first axle and_the number of axles; 2) based on volumetric measurements and; 3) based on features extracted from the captured image of the vehicle. The system uses a laser sensor for measurements and a set of image analysis algorithms to compute some visual features. By combining different classification methods, it is shown that the system improves its accuracy and robustness, enabling its usage in more difficult environments satisfying the proposed requirements established by the Portuguese motorway contractor BRISA.
Resumo:
n this paper we make an exhaustive study of the fourth order linear operator u((4)) + M u coupled with the clamped beam conditions u(0) = u(1) = u'(0) = u'(1) = 0. We obtain the exact values on the real parameter M for which this operator satisfies an anti-maximum principle. Such a property is equivalent to the fact that the related Green's function is nonnegative in [0, 1] x [0, 1]. When M < 0 we obtain the best estimate by means of the spectral theory and for M > 0 we attain the optimal value by studying the oscillation properties of the solutions of the homogeneous equation u((4)) + M u = 0. By using the method of lower and upper solutions we deduce the existence of solutions for nonlinear problems coupled with this boundary conditions. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs.
Resumo:
Background: Poor nutritional status and worse health-related quality of life (QoL) have been reported in haemodialysis (HD) patients. The utilization of generic and disease specific QoL questionnaires in the same population may provide a better understanding of the significance of nutrition in QoL dimensions. Objective: To assess nutritional status by easy to use parameters and to evaluate the potential relationship with QoL measured by generic and disease specific questionnaires. Methods: Nutritional status was assessed by subjective global assessment adapted to renal patients (SGA), body mass index (BMI), nutritional intake and appetite. QoL was assessed by the generic EuroQoL and disease specific Kidney Disease Quality of Life-Short Form (KDQoL-SF) questionnaires. Results: The study comprised 130 patients of both genders, mean age 62.7 ± 14.7 years. The prevalence of undernutrition ranged from 3.1% by BMI ≤ 18.5 kg/m2 to 75.4% for patients below energy and protein intake recommendations. With the exception of BMI classification, undernourished patients had worse scores in nearly all QoL dimensions (EuroQoL and KDQoL-SF), a pattern which was dominantly maintained when adjusted for demographics and disease-related variables. Overweight/obese patients (BMI ≥ 25) also had worse scores in some QoL dimensions, but after adjustment the pattern was maintained only in the symptoms and problems dimension of KDQoL-SF (p = 0.011). Conclusion: Our study reveals that even in mildly undernourished HD patients, nutritional status has a significant impact in several QoL dimensions. The questionnaires used provided different, almost complementary perspectives, yet for daily practice EuroQoL is simpler. Assuring a good nutritional status, may positively influence QoL.
Resumo:
This paper presents a proposal for an automatic vehicle detection and classification (AVDC) system. The proposed AVDC should classify vehicles accordingly to the Portuguese legislation (vehicle height over the first axel and number of axels), and should also support profile based classification. The AVDC should also fulfill the needs of the Portuguese motorway operator, Brisa. For the classification based on the profile we propose:he use of Eigenprofiles, a technique based on Principal Components Analysis. The system should also support multi-lane free flow for future integration in this kind of environments.
Resumo:
Chronic liver disease (CLD) is most of the time an asymptomatic, progressive, and ultimately potentially fatal disease. In this study, an automatic hierarchical procedure to stage CLD using ultrasound images, laboratory tests, and clinical records are described. The first stage of the proposed method, called clinical based classifier (CBC), discriminates healthy from pathologic conditions. When nonhealthy conditions are detected, the method refines the results in three exclusive pathologies in a hierarchical basis: 1) chronic hepatitis; 2) compensated cirrhosis; and 3) decompensated cirrhosis. The features used as well as the classifiers (Bayes, Parzen, support vector machine, and k-nearest neighbor) are optimally selected for each stage. A large multimodal feature database was specifically built for this study containing 30 chronic hepatitis cases, 34 compensated cirrhosis cases, and 36 decompensated cirrhosis cases, all validated after histopathologic analysis by liver biopsy. The CBC classification scheme outperformed the nonhierachical one against all scheme, achieving an overall accuracy of 98.67% for the normal detector, 87.45% for the chronic hepatitis detector, and 95.71% for the cirrhosis detector.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
Chronic Liver Disease is a progressive, most of the time asymptomatic, and potentially fatal disease. In this paper, a semi-automatic procedure to stage this disease is proposed based on ultrasound liver images, clinical and laboratorial data. In the core of the algorithm two classifiers are used: a k nearest neighbor and a Support Vector Machine, with different kernels. The classifiers were trained with the proposed multi-modal feature set and the results obtained were compared with the laboratorial and clinical feature set. The results showed that using ultrasound based features, in association with laboratorial and clinical features, improve the classification accuracy. The support vector machine, polynomial kernel, outperformed the others classifiers in every class studied. For the Normal class we achieved 100% accuracy, for the chronic hepatitis with cirrhosis 73.08%, for compensated cirrhosis 59.26% and for decompensated cirrhosis 91.67%.