15 resultados para Multi-label classification
em Université de Lausanne, Switzerland
Resumo:
Abstract This work studies the multi-label classification of turns in simple English Wikipedia talk pages into dialog acts. The treated dataset was created and multi-labeled by (Ferschke et al., 2012). The first part analyses dependences between labels, in order to examine the annotation coherence and to determine a classification method. Then, a multi-label classification is computed, after transforming the problem into binary relevance. Regarding features, whereas (Ferschke et al., 2012) use features such as uni-, bi-, and trigrams, time distance between turns or the indentation level of the turn, other features are considered here: lemmas, part-of-speech tags and the meaning of verbs (according to WordNet). The dataset authors applied approaches such as Naive Bayes or Support Vector Machines. The present paper proposes, as an alternative, to use Schoenberg transformations which, following the example of kernel methods, transform original Euclidean distances into other Euclidean distances, in a space of high dimensionality. Résumé Ce travail étudie la classification supervisée multi-étiquette en actes de dialogue des tours de parole des contributeurs aux pages de discussion de Simple English Wikipedia (Wikipédia en anglais simple). Le jeu de données considéré a été créé et multi-étiqueté par (Ferschke et al., 2012). Une première partie analyse les relations entre les étiquettes pour examiner la cohérence des annotations et pour déterminer une méthode de classification. Ensuite, une classification supervisée multi-étiquette est effectuée, après recodage binaire des étiquettes. Concernant les variables, alors que (Ferschke et al., 2012) utilisent des caractéristiques telles que les uni-, bi- et trigrammes, le temps entre les tours de parole ou l'indentation d'un tour de parole, d'autres descripteurs sont considérés ici : les lemmes, les catégories morphosyntaxiques et le sens des verbes (selon WordNet). Les auteurs du jeu de données ont employé des approches telles que le Naive Bayes ou les Séparateurs à Vastes Marges (SVM) pour la classification. Cet article propose, de façon alternative, d'utiliser et d'étendre l'analyse discriminante linéaire aux transformations de Schoenberg qui, à l'instar des méthodes à noyau, transforment les distances euclidiennes originales en d'autres distances euclidiennes, dans un espace de haute dimensionnalité.
Resumo:
When dealing with multi-angular image sequences, problems of reflectance changes due either to illumination and acquisition geometry, or to interactions with the atmosphere, naturally arise. These phenomena interplay with the scene and lead to a modification of the measured radiance: for example, according to the angle of acquisition, tall objects may be seen from top or from the side and different light scatterings may affect the surfaces. This results in shifts in the acquired radiance, that make the problem of multi-angular classification harder and might lead to catastrophic results, since surfaces with the same reflectance return significantly different signals. In this paper, rather than performing atmospheric or bi-directional reflection distribution function (BRDF) correction, a non-linear manifold learning approach is used to align data structures. This method maximizes the similarity between the different acquisitions by deforming their manifold, thus enhancing the transferability of classification models among the images of the sequence.
Resumo:
This letter presents advanced classification methods for very high resolution images. Efficient multisource information, both spectral and spatial, is exploited through the use of composite kernels in support vector machines. Weighted summations of kernels accounting for separate sources of spectral and spatial information are analyzed and compared to classical approaches such as pure spectral classification or stacked approaches using all the features in a single vector. Model selection problems are addressed, as well as the importance of the different kernels in the weighted summation.
Resumo:
In recent years, multi-atlas fusion methods have gainedsignificant attention in medical image segmentation. Inthis paper, we propose a general Markov Random Field(MRF) based framework that can perform edge-preservingsmoothing of the labels at the time of fusing the labelsitself. More specifically, we formulate the label fusionproblem with MRF-based neighborhood priors, as an energyminimization problem containing a unary data term and apairwise smoothness term. We present how the existingfusion methods like majority voting, global weightedvoting and local weighted voting methods can be reframedto profit from the proposed framework, for generatingmore accurate segmentations as well as more contiguoussegmentations by getting rid of holes and islands. Theproposed framework is evaluated for segmenting lymphnodes in 3D head and neck CT images. A comparison ofvarious fusion algorithms is also presented.
Resumo:
OBJECTIVE: Mild neurocognitive disorders (MND) affect a subset of HIV+ patients under effective combination antiretroviral therapy (cART). In this study, we used an innovative multi-contrast magnetic resonance imaging (MRI) approach at high-field to assess the presence of micro-structural brain alterations in MND+ patients. METHODS: We enrolled 17 MND+ and 19 MND- patients with undetectable HIV-1 RNA and 19 healthy controls (HC). MRI acquisitions at 3T included: MP2RAGE for T1 relaxation times, Magnetization Transfer (MT), T2* and Susceptibility Weighted Imaging (SWI) to probe micro-structural integrity and iron deposition in the brain. Statistical analysis used permutation-based tests and correction for family-wise error rate. Multiple regression analysis was performed between MRI data and (i) neuropsychological results (ii) HIV infection characteristics. A linear discriminant analysis (LDA) based on MRI data was performed between MND+ and MND- patients and cross-validated with a leave-one-out test. RESULTS: Our data revealed loss of structural integrity and micro-oedema in MND+ compared to HC in the global white and cortical gray matter, as well as in the thalamus and basal ganglia. Multiple regression analysis showed a significant influence of sub-cortical nuclei alterations on the executive index of MND+ patients (p = 0.04 he and R(2) = 95.2). The LDA distinguished MND+ and MND- patients with a classification quality of 73% after cross-validation. CONCLUSION: Our study shows micro-structural brain tissue alterations in MND+ patients under effective therapy and suggests that multi-contrast MRI at high field is a powerful approach to discriminate between HIV+ patients on cART with and without mild neurocognitive deficits.
Resumo:
The potential of type-2 fuzzy sets for managing high levels of uncertainty in the subjective knowledge of experts or of numerical information has focused on control and pattern classification systems in recent years. One of the main challenges in designing a type-2 fuzzy logic system is how to estimate the parameters of type-2 fuzzy membership function (T2MF) and the Footprint of Uncertainty (FOU) from imperfect and noisy datasets. This paper presents an automatic approach for learning and tuning Gaussian interval type-2 membership functions (IT2MFs) with application to multi-dimensional pattern classification problems. T2MFs and their FOUs are tuned according to the uncertainties in the training dataset by a combination of genetic algorithm (GA) and crossvalidation techniques. In our GA-based approach, the structure of the chromosome has fewer genes than other GA methods and chromosome initialization is more precise. The proposed approach addresses the application of the interval type-2 fuzzy logic system (IT2FLS) for the problem of nodule classification in a lung Computer Aided Detection (CAD) system. The designed IT2FLS is compared with its type-1 fuzzy logic system (T1FLS) counterpart. The results demonstrate that the IT2FLS outperforms the T1FLS by more than 30% in terms of classification accuracy.
Resumo:
The taxonomy of Bambusoideae is in a state of flux and phylogenetic studies are required to help resolve systematic issues. Over 60 taxa, representing all subtribes of Bambuseae and related non-bambusoid grasses were sampled. A combined analysis of five plastid DNA regions, trnL intron, trnL-F intergenic spacer, atpB-rbcL intergenic spacer, rps16 intron, and matK, was used to study the phylogenetic relationships among the bamboos in general and the woody bamboos in particular. Within the BEP clade (Bambusoideae s.s., Ehrhartoideae, Pooideae), Pooideae were resolved as sister to Bambusoideae s.s. Tribe Bambuseae, the woody bamboos, as currently recognized were not monophyletic because Olyreae, the herbaceous bamboos, were sister to tropical Bambuseae. Temperate Bambuseae were sister to the group consisting of tropical Bambuseae and Olyreae. Thus, the temperate Bambuseae would be better treated as their own tribe Arundinarieae than as a subgroup of Bambuseae. Within the tropical Bambuseae, neotropical Bambuseae were sister to the palaeotropical and Austral Bambuseae. In addition, Melocanninae were found to be sister to the remaining palaeotropical and Austral Bambuseae. We discuss phylogenetic and morphological patterns of diversification and interpret them in a biogeographic context.
Resumo:
BACKGROUND: The majority of Haemosporida species infect birds or reptiles, but many important genera, including Plasmodium, infect mammals. Dipteran vectors shared by avian, reptilian and mammalian Haemosporida, suggest multiple invasions of Mammalia during haemosporidian evolution; yet, phylogenetic analyses have detected only a single invasion event. Until now, several important mammal-infecting genera have been absent in these analyses. This study focuses on the evolutionary origin of Polychromophilus, a unique malaria genus that only infects bats (Microchiroptera) and is transmitted by bat flies (Nycteribiidae). METHODS: Two species of Polychromophilus were obtained from wild bats caught in Switzerland. These were molecularly characterized using four genes (asl, clpc, coI, cytb) from the three different genomes (nucleus, apicoplast, mitochondrion). These data were then combined with data of 60 taxa of Haemosporida available in GenBank. Bayesian inference, maximum likelihood and a range of rooting methods were used to test specific hypotheses concerning the phylogenetic relationships between Polychromophilus and the other haemosporidian genera. RESULTS: The Polychromophilus melanipherus and Polychromophilus murinus samples show genetically distinct patterns and group according to species. The Bayesian tree topology suggests that the monophyletic clade of Polychromophilus falls within the avian/saurian clade of Plasmodium and directed hypothesis testing confirms the Plasmodium origin. CONCLUSION: Polychromophilus' ancestor was most likely a bird- or reptile-infecting Plasmodium before it switched to bats. The invasion of mammals as hosts has, therefore, not been a unique event in the evolutionary history of Haemosporida, despite the suspected costs of adapting to a new host. This was, moreover, accompanied by a switch in dipteran host.
Resumo:
SUMMARY: A top scoring pair (TSP) classifier consists of a pair of variables whose relative ordering can be used for accurately predicting the class label of a sample. This classification rule has the advantage of being easily interpretable and more robust against technical variations in data, as those due to different microarray platforms. Here we describe a parallel implementation of this classifier which significantly reduces the training time, and a number of extensions, including a multi-class approach, which has the potential of improving the classification performance. AVAILABILITY AND IMPLEMENTATION: Full C++ source code and R package Rgtsp are freely available from http://lausanne.isb-sib.ch/~vpopovic/research/. The implementation relies on existing OpenMP libraries.
Resumo:
In this study we propose an evaluation of the angular effects altering the spectral response of the land-cover over multi-angle remote sensing image acquisitions. The shift in the statistical distribution of the pixels observed in an in-track sequence of WorldView-2 images is analyzed by means of a kernel-based measure of distance between probability distributions. Afterwards, the portability of supervised classifiers across the sequence is investigated by looking at the evolution of the classification accuracy with respect to the changing observation angle. In this context, the efficiency of various physically and statistically based preprocessing methods in obtaining angle-invariant data spaces is compared and possible synergies are discussed.
Resumo:
Axée dans un premier temps sur le formalisme et les méthodes, cette thèse est construite sur trois concepts formalisés: une table de contingence, une matrice de dissimilarités euclidiennes et une matrice d'échange. À partir de ces derniers, plusieurs méthodes d'Analyse des données ou d'apprentissage automatique sont exprimées et développées: l'analyse factorielle des correspondances (AFC), vue comme un cas particulier du multidimensional scaling; la classification supervisée, ou non, combinée aux transformations de Schoenberg; et les indices d'autocorrélation et d'autocorrélation croisée, adaptés à des analyses multivariées et permettant de considérer diverses familles de voisinages. Ces méthodes débouchent dans un second temps sur une pratique de l'analyse exploratoire de différentes données textuelles et musicales. Pour les données textuelles, on s'intéresse à la classification automatique en types de discours de propositions énoncées, en se basant sur les catégories morphosyntaxiques (CMS) qu'elles contiennent. Bien que le lien statistique entre les CMS et les types de discours soit confirmé, les résultats de la classification obtenus avec la méthode K- means, combinée à une transformation de Schoenberg, ainsi qu'avec une variante floue de l'algorithme K-means, sont plus difficiles à interpréter. On traite aussi de la classification supervisée multi-étiquette en actes de dialogue de tours de parole, en se basant à nouveau sur les CMS qu'ils contiennent, mais aussi sur les lemmes et le sens des verbes. Les résultats obtenus par l'intermédiaire de l'analyse discriminante combinée à une transformation de Schoenberg sont prometteurs. Finalement, on examine l'autocorrélation textuelle, sous l'angle des similarités entre diverses positions d'un texte, pensé comme une séquence d'unités. En particulier, le phénomène d'alternance de la longueur des mots dans un texte est observé pour des voisinages d'empan variable. On étudie aussi les similarités en fonction de l'apparition, ou non, de certaines parties du discours, ainsi que les similarités sémantiques des diverses positions d'un texte. Concernant les données musicales, on propose une représentation d'une partition musicale sous forme d'une table de contingence. On commence par utiliser l'AFC et l'indice d'autocorrélation pour découvrir les structures existant dans chaque partition. Ensuite, on opère le même type d'approche sur les différentes voix d'une partition, grâce à l'analyse des correspondances multiples, dans une variante floue, et à l'indice d'autocorrélation croisée. Qu'il s'agisse de la partition complète ou des différentes voix qu'elle contient, des structures répétées sont effectivement détectées, à condition qu'elles ne soient pas transposées. Finalement, on propose de classer automatiquement vingt partitions de quatre compositeurs différents, chacune représentée par une table de contingence, par l'intermédiaire d'un indice mesurant la similarité de deux configurations. Les résultats ainsi obtenus permettent de regrouper avec succès la plupart des oeuvres selon leur compositeur.
Resumo:
In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.
Resumo:
OBJECTIVES: The aim of this study was to investigate pathological mechanisms underlying brain tissue alterations in mild cognitive impairment (MCI) using multi-contrast 3 T magnetic resonance imaging (MRI). METHODS: Forty-two MCI patients and 77 healthy controls (HC) underwent T1/T2* relaxometry as well as Magnetization Transfer (MT) MRI. Between-groups comparisons in MRI metrics were performed using permutation-based tests. Using MRI data, a generalized linear model (GLM) was computed to predict clinical performance and a support-vector machine (SVM) classification was used to classify MCI and HC subjects. RESULTS: Multi-parametric MRI data showed microstructural brain alterations in MCI patients vs HC that might be interpreted as: (i) a broad loss of myelin/cellular proteins and tissue microstructure in the hippocampus (p ≤ 0.01) and global white matter (p < 0.05); and (ii) iron accumulation in the pallidus nucleus (p ≤ 0.05). MRI metrics accurately predicted memory and executive performances in patients (p ≤ 0.005). SVM classification reached an accuracy of 75% to separate MCI and HC, and performed best using both volumes and T1/T2*/MT metrics. CONCLUSION: Multi-contrast MRI appears to be a promising approach to infer pathophysiological mechanisms leading to brain tissue alterations in MCI. Likewise, parametric MRI data provide powerful correlates of cognitive deficits and improve automatic disease classification based on morphometric features.
Resumo:
OBJECTIVES: Specifically we aim to demonstrate that the results of our earlier safety data hold true in this much larger multi-national and multi-ethnical population. BACKGROUND: We sought to re-evaluate the frequency, manifestations, and severity of acute adverse reactions associated with administration of several gadolinium- based contrast agents during routine CMR on a European level. METHODS: Multi-centre, multi-national, and multi-ethnical registry with consecutive enrolment of patients in 57 European centres. RESULTS: During the current observation 37,788 doses of Gadolinium based contrast agent were administered to 37,788 patients. The mean dose was 24.7 ml (range 5-80 ml), which is equivalent to 0.123 mmol/kg (range 0.01 - 0.3 mmol/kg). Forty-five acute adverse reactions due to contrast administration occurred (0.12%). Most reactions were classified as mild (43 of 45) according to the American College of Radiology definition. The most frequent complaints following contrast administration were rashes and hives (15 of 45), followed by nausea (10 of 45) and flushes (10 of 45). The event rate ranged from 0.05% (linear non-ionic agent gadodiamide) to 0.42% (linear ionic agent gadobenate dimeglumine). Interestingly, we also found different event rates between the three main indications for CMR ranging from 0.05% (risk stratification in suspected CAD) to 0.22% (viability in known CAD). CONCLUSIONS: The current data indicate that the results of the earlier safety data hold true in this much larger multi-national and multi-ethnical population. Thus, the "off-label" use of Gadolinium based contrast in cardiovascular MR should be regarded as safe concerning the frequency, manifestation and severity of acute events.