61 resultados para multiclass classification problems

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several popular Machine Learning techniques are originally designed for the solution of two-class problems. However, several classification problems have more than two classes. One approach to deal with multiclass problems using binary classifiers is to decompose the multiclass problem into multiple binary sub-problems disposed in a binary tree. This approach requires a binary partition of the classes for each node of the tree, which defines the tree structure. This paper presents two algorithms to determine the tree structure taking into account information collected from the used dataset. This approach allows the tree structure to be determined automatically for any multiclass dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Various popular machine learning techniques, like support vector machines, are originally conceived for the solution of two-class (binary) classification problems. However, a large number of real problems present more than two classes. A common approach to generalize binary learning techniques to solve problems with more than two classes, also known as multiclass classification problems, consists of hierarchically decomposing the multiclass problem into multiple binary sub-problems, whose outputs are combined to define the predicted class. This strategy results in a tree of binary classifiers, where each internal node corresponds to a binary classifier distinguishing two groups of classes and the leaf nodes correspond to the problem classes. This paper investigates how measures of the separability between classes can be employed in the construction of binary-tree-based multiclass classifiers, adapting the decompositions performed to each particular multiclass problem. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Support vector machines (SVMs) were originally formulated for the solution of binary classification problems. In multiclass problems, a decomposition approach is often employed, in which the multiclass problem is divided into multiple binary subproblems, whose results are combined. Generally, the performance of SVM classifiers is affected by the selection of values for their parameters. This paper investigates the use of genetic algorithms (GAs) to tune the parameters of the binary SVMs in common multiclass decompositions. The developed GA may search for a set of parameter values common to all binary classifiers or for differentiated values for each binary classifier. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate the performance of a variant of Axelrod's model for dissemination of culture-the Adaptive Culture Heuristic (ACH)-on solving an NP-Complete optimization problem, namely, the classification of binary input patterns of size F by a Boolean Binary Perceptron. In this heuristic, N agents, characterized by binary strings of length F which represent possible solutions to the optimization problem, are fixed at the sites of a square lattice and interact with their nearest neighbors only. The interactions are such that the agents' strings (or cultures) become more similar to the low-cost strings of their neighbors resulting in the dissemination of these strings across the lattice. Eventually the dynamics freezes into a homogeneous absorbing configuration in which all agents exhibit identical solutions to the optimization problem. We find through extensive simulations that the probability of finding the optimal solution is a function of the reduced variable F/N(1/4) so that the number of agents must increase with the fourth power of the problem size, N proportional to F(4), to guarantee a fixed probability of success. In this case, we find that the relaxation time to reach an absorbing configuration scales with F(6) which can be interpreted as the overall computational cost of the ACH to find an optimal set of weights for a Boolean binary perceptron, given a fixed probability of success.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Saving our science from ourselves: the plight of biological classification. Biological classification ( nomenclature, taxonomy, and systematics) is being sold short. The desire for new technologies, faster and cheaper taxonomic descriptions, identifications, and revisions is symptomatic of a lack of appreciation and understanding of classification. The problem of gadget-driven science, a lack of best practice and the inability to accept classification as a descriptive and empirical science are discussed. The worst cases scenario is a future in which classifications are purely artificial and uninformative.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PURPOSE: The main goal of this study was to develop and compare two different techniques for classification of specific types of corneal shapes when Zernike coefficients are used as inputs. A feed-forward artificial Neural Network (NN) and discriminant analysis (DA) techniques were used. METHODS: The inputs both for the NN and DA were the first 15 standard Zernike coefficients for 80 previously classified corneal elevation data files from an Eyesys System 2000 Videokeratograph (VK), installed at the Departamento de Oftalmologia of the Escola Paulista de Medicina, São Paulo. The NN had 5 output neurons which were associated with 5 typical corneal shapes: keratoconus, with-the-rule astigmatism, against-the-rule astigmatism, "regular" or "normal" shape and post-PRK. RESULTS: The NN and DA responses were statistically analyzed in terms of precision ([true positive+true negative]/total number of cases). Mean overall results for all cases for the NN and DA techniques were, respectively, 94% and 84.8%. CONCLUSION: Although we used a relatively small database, results obtained in the present study indicate that Zernike polynomials as descriptors of corneal shape may be a reliable parameter as input data for diagnostic automation of VK maps, using either NN or DA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PURPOSE: Compare parents' reports of youth problems (PRYP) with adolescent problems self-reports (APSR) pre/post behavioral treatment of nocturnal enuresis (NE) based on the use of a urine alarm. MATERIALS AND METHODS: Adolescents (N = 19) with mono-symptomatic (primary or secondary) nocturnal enuresis group treatment for 40 weeks. Discharge criterion was established as 8 weeks with consecutive dry nights. PRYP and APSR were scored by the Child Behavior Checklist (CBCL) and Youth Self-Report (YSR). RESULTS: Pre-treatment data: 1) Higher number of clinical cases based on parent report than on self-report for Internalizing Problems (IP) (13/19 vs. 4/19), Externalizing Problems (EP) (7/19 vs. 5/19) and Total Problem (TP) (11/19 vs. 5/19); 2) Mean PRYP scores for IP (60.8) and TP (61) were within the deviant range (T score ≥ 60); while mean PRYP scores for EP (57.4) and mean APSR scores (IP = 52.4, EP = 49.5, TP = 52.4) were within the normal range. Difference between PRYP' and APSR' scores was significant. Post treatment data: 1) Discharge for majority of the participants (16/19); 2) Reduction in the number of clinical cases on parental evaluation: 9/19 adolescents remained within clinical range for IP, 2/19 for EP, and 7/19 for TP. 3) All post-treatment mean scores were within the normal range; the difference between pre and post evaluation scores was significant for PRYP. CONCLUSIONS: The behavioral treatment based on the use of urine alarm is effective for adolescents with mono-symptomatic (primary and secondary) nocturnal enuresis. The study favors the hypothesis that enuresis is a cause, not a consequence, of other behavioral problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a molecular phylogenetic analysis of caenophidian (advanced) snakes using sequences from two mitochondrial genes (12S and 16S rRNA) and one nuclear (c-mos) gene (1681 total base pairs), and with 131 terminal taxa sampled from throughout all major caenophidian lineages but focussing on Neotropical xenodontines. Direct optimization parsimony analysis resulted in a well-resolved phylogenetic tree, which corroborates some clades identified in previous analyses and suggests new hypotheses for the composition and relationships of others. The major salient points of our analysis are: (1) placement of Acrochordus, Xenodermatids, and Pareatids as successive outgroups to all remaining caenophidians (including viperids, elapids, atractaspidids, and all other "colubrid" groups); (2) within the latter group, viperids and homalopsids are sucessive sister clades to all remaining snakes; (3) the following monophyletic clades within crown group caenophidians: Afro-Asian psammophiids (including Mimophis from Madagascar), Elapidae (including hydrophiines but excluding Homoroselaps), Pseudoxyrhophiinae, Colubrinae, Natricinae, Dipsadinae, and Xenodontinae. Homoroselaps is associated with atractaspidids. Our analysis suggests some taxonomic changes within xenodontines, including new taxonomy for Alsophis elegans, Liophis amarali, and further taxonomic changes within Xenodontini and the West Indian radiation of xenodontines. Based on our molecular analysis, we present a revised classification for caenophidians and provide morphological diagnoses for many of the included clades; we also highlight groups where much more work is needed. We name as new two higher taxonomic clades within Caenophidia, one new subfamily within Dipsadidae, and, within Xenodontinae five new tribes, six new genera and two resurrected genera. We synonymize Xenoxybelis and Pseudablabes with Philodryas; Erythrolamprus with Liophis; and Lystrophis and Waglerophis with Xenodon.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJETIVOS: Este trabalho estuda a distribuição dos óbitos por causas mal definidas no Brasil, no ano de 2003, entre as quais identifica a proporção de mortes sem assistência. MÉTODOS: Os dados provieram do Sistema de Informações Sobre Mortalidade, coordenado pelo Ministério da Saúde. As causas mal definidas de morte compreenderam as incluídas no "Capítulo XVIII - Sintomas, sinais e achados anormais de exames clínicos e de laboratório não classificados em outra parte" da Classificação Estatística Internacional de Doenças e Problemas Relacionados à Saúde, décima revisão, capítulo este no qual a categoria R98 identificava a "morte sem assistência". RESULTADOS: No Brasil, em 2003, a causa básica de 13,3% dos óbitos foi identificada como mal definida, sendo que as proporções maiores ocorreram nas Regiões Nordeste e Norte. Do total de causas mal definidas no país, 53,3% corresponderam a mortes sem assistência, proporção esta que superou 70% nos Estados do Maranhão, Piauí, Rio Grande do Norte, Pernambuco, Bahia, Paraíba e Alagoas. CONCLUSÃO: Dada a estrutura descentralizada para o levantamento dos óbitos no país, identifica-se a maior responsabilidade dos municípios e, em seguida, dos Estados para o aprimoramento da qualidade das estatísticas de mortalidade.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Organização Mundial de Saúde tem hoje duas classificações de referência para a descrição dos estados de saúde: a Classificação Estatística Internacional de Doenças e Problemas Relacionados à Saúde, que corresponde à décima revisão da Classificação Internacional de Doenças (CID-10) e a Classificação Internacional de Funcionalidade, Incapacidade e Saúde (CIF). A utilização da CIF vem sendo aguardada com grande expectativa pelas organizações de pessoas com deficiências e instituições relacionadas. A falta de definição clara de "deficiência" ou "incapacidade" tem sido apontada como um impedimento para a promoção de saúde de pessoas com deficiência. É importante que essas definições, especialmente no âmbito legislativo e regulamentar, sejam consistentes e se fundamentem num modelo coerente sobre o processo que origina as situações de incapacidade. Este artigo tem como objetivo apresentar elementos da CID-10 e da CIF, e o papel que desempenham para definir deficiência e incapacidade. Os componentes da CIF podem contribuir para diferentes campos de aplicabilidade no que diz respeito ao entendimento das definições de deficiência ou incapacidade a partir do conceito de funcionalidade e dos fatores contextuais.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a new food classification which assigns foodstuffs according to the extent and purpose of the industrial processing applied to them. Three main groups are defined: unprocessed or minimally processed foods (group 1), processed culinary and food industry ingredients (group 2), and ultra-processed food products (group 3). The use of this classification is illustrated by applying it to data collected in the Brazilian Household Budget Survey which was conducted in 2002/2003 through a probabilistic sample of 48,470 Brazilian households. The average daily food availability was 1,792 kcal/person being 42.5% from group 1 (mostly rice and beans and meat and milk), 37.5% from group 2 (mostly vegetable oils, sugar, and flours), and 20% from group 3 (mostly breads, biscuits, sweets, soft drinks, and sausages). The share of group 3 foods increased with income, and represented almost one third of all calories in higher income households. The impact of the replacement of group 1 foods and group 2 ingredients by group 3 products on the overall quality of the diet, eating patterns and health is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Organização Mundial de Saúde tem hoje duas classificações de referência para a descrição dos estados de saúde: a Classificação Estatística Internacional de Doenças e Problemas Relacionados à Saúde, que corresponde à décima revisão da Classificação Internacional de Doenças (CID-10) e a Classificação Internacional de Funcionalidade, Incapacidade e Saúde (CIF). A utilização da CIF vem sendo aguardada com grande expectativa pelas organizações de pessoas com deficiências e instituições relacionadas. A falta de definição clara de "deficiência" ou "incapacidade" tem sido apontada como um impedimento para a promoção de saúde de pessoas com deficiência. É importante que essas definições, especialmente no âmbito legislativo e regulamentar, sejam consistentes e se fundamentem num modelo coerente sobre o processo que origina as situações de incapacidade. Este artigo tem como objetivo apresentar elementos da CID-10 e da CIF, e o papel que desempenham para definir deficiência e incapacidade. Os componentes da CIF podem contribuir para diferentes campos de aplicabilidade no que diz respeito ao entendimento das definições de deficiência ou incapacidade a partir do conceito de funcionalidade e dos fatores contextuais