68 resultados para Document classification,Naive Bayes classifier,Verb-object pairs

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract This work studies the multi-label classification of turns in simple English Wikipedia talk pages into dialog acts. The treated dataset was created and multi-labeled by (Ferschke et al., 2012). The first part analyses dependences between labels, in order to examine the annotation coherence and to determine a classification method. Then, a multi-label classification is computed, after transforming the problem into binary relevance. Regarding features, whereas (Ferschke et al., 2012) use features such as uni-, bi-, and trigrams, time distance between turns or the indentation level of the turn, other features are considered here: lemmas, part-of-speech tags and the meaning of verbs (according to WordNet). The dataset authors applied approaches such as Naive Bayes or Support Vector Machines. The present paper proposes, as an alternative, to use Schoenberg transformations which, following the example of kernel methods, transform original Euclidean distances into other Euclidean distances, in a space of high dimensionality. Résumé Ce travail étudie la classification supervisée multi-étiquette en actes de dialogue des tours de parole des contributeurs aux pages de discussion de Simple English Wikipedia (Wikipédia en anglais simple). Le jeu de données considéré a été créé et multi-étiqueté par (Ferschke et al., 2012). Une première partie analyse les relations entre les étiquettes pour examiner la cohérence des annotations et pour déterminer une méthode de classification. Ensuite, une classification supervisée multi-étiquette est effectuée, après recodage binaire des étiquettes. Concernant les variables, alors que (Ferschke et al., 2012) utilisent des caractéristiques telles que les uni-, bi- et trigrammes, le temps entre les tours de parole ou l'indentation d'un tour de parole, d'autres descripteurs sont considérés ici : les lemmes, les catégories morphosyntaxiques et le sens des verbes (selon WordNet). Les auteurs du jeu de données ont employé des approches telles que le Naive Bayes ou les Séparateurs à Vastes Marges (SVM) pour la classification. Cet article propose, de façon alternative, d'utiliser et d'étendre l'analyse discriminante linéaire aux transformations de Schoenberg qui, à l'instar des méthodes à noyau, transforment les distances euclidiennes originales en d'autres distances euclidiennes, dans un espace de haute dimensionnalité.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Major factors influencing the phenotypic diversity of a lineage can be recognized by characterizing the extent and mode of trait evolution between related species. Here, we compared the evolutionary dynamics of traits associated with floral morphology and climatic preferences in a clade composed of the genera Codonanthopsis, Codonanthe and Nematanthus (Gesneriaceae). To test the mode and specific components that lead to phenotypic diversity in this group, we performed a Bayesian phylogenetic analysis of combined nuclear and plastid DNA sequences and modeled the evolution of quantitative traits related to flower shape and size and to climatic preferences. We propose an alternative approach to display graphically the complex dynamics of trait evolution along a phylogenetic tree using a wide range of evolutionary scenarios. RESULTS: Our results demonstrated heterogeneous trait evolution. Floral shapes displaced into separate regimes selected by the different pollinator types (hummingbirds versus insects), while floral size underwent a clade-specific evolution. Rates of evolution were higher for the clade that is hummingbird pollinated and experienced flower resupination, compared with species pollinated by bees, suggesting a relevant role of plant-pollinator interactions in lowland rainforest. The evolution of temperature preferences is best explained by a model with distinct selective regimes between the Brazilian Atlantic Forest and the other biomes, whereas differentiation along the precipitation axis was characterized by higher rates, compared with temperature, and no regime or clade-specific patterns. CONCLUSIONS: Our study shows different selective regimes and clade-specific patterns in the evolution of morphological and climatic components during the diversification of Neotropical species. Our new graphical visualization tool allows the representation of trait trajectories under parameter-rich models, thus contributing to a better understanding of complex evolutionary dynamics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the^way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whilst providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and Diabetes datasets with positive results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

SUMMARY: A top scoring pair (TSP) classifier consists of a pair of variables whose relative ordering can be used for accurately predicting the class label of a sample. This classification rule has the advantage of being easily interpretable and more robust against technical variations in data, as those due to different microarray platforms. Here we describe a parallel implementation of this classifier which significantly reduces the training time, and a number of extensions, including a multi-class approach, which has the potential of improving the classification performance. AVAILABILITY AND IMPLEMENTATION: Full C++ source code and R package Rgtsp are freely available from http://lausanne.isb-sib.ch/~vpopovic/research/. The implementation relies on existing OpenMP libraries.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many species contain genetic lineages that are phylogenetically intermixed with those of other species. In the Sorex araneus group, previous results based on mtDNA and Y chromosome sequence data showed an incongruent position of Sorex granarius within this group. In this study, we explored the relationship between species within the S. araneus group, aiming to resolve the particular position of S. granarius. In this context, we sequenced a total of 2447 base pairs (bp) of X-linked and nuclear genes from 47 individuals of the S. araneus group. The same taxa were also analyzed within a Bayesian framework with nine autosomal microsatellites. These analyses revealed that all markers apart from mtDNA showed similar patterns, suggesting that the problematic position of S. granarius is best explained by an incongruent behavior by mtDNA. Given their close phylogenetic relationship and their close geographic distribution, the most likely explanation for this pattern is past mtDNA introgression from S. araneus race Carlit to S. granarius.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this article, I address epistemological questions regarding the status of linguistic rules and the pervasive--though seldom discussed--tension that arises between theory-driven object perception by linguists on the one hand, and ordinary speakers' possible intuitive knowledge on the other hand. Several issues will be discussed using examples from French verb morphology, based on the 6500 verbs from Le Petit Robert dictionary (2013).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The evolution of key innovations, novel traits that promote diversification, is often seen as major driver for the unequal distribution of species richness within the tree of life. In this study, we aim to determine the factors underlying the extraordinary radiation of the subfamily Bromelioideae, one of the most diverse clades among the neotropical plant family Bromeliaceae. Based on an extended molecular phylogenetic data set, we examine the effect of two putative key innovations, that is, the Crassulacean acid metabolism (CAM) and the water-impounding tank, on speciation and extinction rates. To this aim, we develop a novel Bayesian implementation of the phylogenetic comparative method, binary state speciation and extinction, which enables hypotheses testing by Bayes factors and accommodates the uncertainty on model selection by Bayesian model averaging. Both CAM and tank habit were found to correlate with increased net diversification, thus fulfilling the criteria for key innovations. Our analyses further revealed that CAM photosynthesis is correlated with a twofold increase in speciation rate, whereas the evolution of the tank had primarily an effect on extinction rates that were found five times lower in tank-forming lineages compared to tank-less clades. These differences are discussed in the light of biogeography, ecology, and past climate change.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The crocidurine shrews include the most speciose genus of mammals, Crocidura. The origin and evolution of their radiation is, however, poorly understood because of very scant fossil records and a rather conservative external morphology between species. Here, we use an alignment of 3560 base pairs of mitochondrial and nuclear DNA to generate a phylogenetic hypothesis for the evolution of Old World shrews of the subfamily Crocidurinae. These molecular data confirm the monophyly of the speciose African and Eurasian Crocidura, which also includes the fossorial, monotypic genus Diplomesodon. The phylogenetic reconstructions give further credit to a paraphyletic position of Suncus shrews, which are placed into at least two independent clades (one in Africa and sister to Sylvisorex and one in Eurasia), at the base of the Crocidura radiation. Therefore, we recommend restricting the genus Suncus to the Palaearctic and Oriental taxa, and to consider all the African Suncus as Sylvisorex. Using molecular dating and biogeographic reconstruction analyses, we suggest a Palaearctic-Oriental origin for Crocidura dating back to the Upper Miocene (6.8 million years ago) and several subsequent colonisations of the Afrotropical region by independent lineages of Crocidura.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Past multisensory experiences can influence current unisensory processing and memory performance. Repeated images are better discriminated if initially presented as auditory-visual pairs, rather than only visually. An experience's context thus plays a role in how well repetitions of certain aspects are later recognized. Here, we investigated factors during the initial multisensory experience that are essential for generating improved memory performance. Subjects discriminated repeated versus initial image presentations intermixed within a continuous recognition task. Half of initial presentations were multisensory, and all repetitions were only visual. Experiment 1 examined whether purely episodic multisensory information suffices for enhancing later discrimination performance by pairing visual objects with either tones or vibrations. We could therefore also assess whether effects can be elicited with different sensory pairings. Experiment 2 examined semantic context by manipulating the congruence between auditory and visual object stimuli within blocks of trials. Relative to images only encountered visually, accuracy in discriminating image repetitions was significantly impaired by auditory-visual, yet unaffected by somatosensory-visual multisensory memory traces. By contrast, this accuracy was selectively enhanced for visual stimuli with semantically congruent multisensory pasts and unchanged for those with semantically incongruent multisensory pasts. The collective results reveal opposing effects of purely episodic versus semantic information from auditory-visual multisensory events. Nonetheless, both types of multisensory memory traces are accessible for processing incoming stimuli and indeed result in distinct visual object processing, leading to either impaired or enhanced performance relative to unisensory memory traces. We discuss these results as supporting a model of object-based multisensory interactions.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé de la thèse L'évolution des systèmes policiers donne une place prépondérante à l'information et au renseignement. Cette transformation implique de développer et de maintenir un ensemble de processus permanent d'analyse de la criminalité, en particulier pour traiter des événements répétitifs ou graves. Dans une organisation aux ressources limitées, le temps consacré au recueil des données, à leur codification et intégration, diminue le temps disponible pour l'analyse et la diffusion de renseignements. Les phases de collecte et d'intégration restent néanmoins indispensables, l'analyse n'étant pas possible sur des données volumineuses n'ayant aucune structure. Jusqu'à présent, ces problématiques d'analyse ont été abordées par des approches essentiellement spécialisées (calculs de hot-sports, data mining, ...) ou dirigées par un seul axe (par exemple, les sciences comportementales). Cette recherche s'inscrit sous un angle différent, une démarche interdisciplinaire a été adoptée. L'augmentation continuelle de la quantité de données à analyser tend à diminuer la capacité d'analyse des informations à disposition. Un bon découpage (classification) des problèmes rencontrés permet de délimiter les analyses sur des données pertinentes. Ces classes sont essentielles pour structurer la mémoire du système d'analyse. Les statistiques policières de la criminalité devraient déjà avoir répondu à ces questions de découpage de la délinquance (classification juridique). Cette décomposition a été comparée aux besoins d'un système de suivi permanent dans la criminalité. La recherche confirme que nos efforts pour comprendre la nature et la répartition du crime se butent à un obstacle, à savoir que la définition juridique des formes de criminalité n'est pas adaptée à son analyse, à son étude. Depuis près de vingt ans, les corps de police de Suisse romande utilisent et développent un système de classification basé sur l'expérience policière (découpage par phénomène). Cette recherche propose d'interpréter ce système dans le cadre des approches situationnelles (approche théorique) et de le confronter aux données « statistiques » disponibles pour vérifier sa capacité à distinguer les formes de criminalité. La recherche se limite aux cambriolages d'habitations, un délit répétitif fréquent. La théorie des opportunités soutien qu'il faut réunir dans le temps et dans l'espace au minimum les trois facteurs suivants : un délinquant potentiel, une cible intéressante et l'absence de gardien capable de prévenir ou d'empêcher le passage à l'acte. Ainsi, le délit n'est possible que dans certaines circonstances, c'est-à-dire dans un contexte bien précis. Identifier ces contextes permet catégoriser la criminalité. Chaque cas est unique, mais un groupe de cas montre des similitudes. Par exemple, certaines conditions avec certains environnements attirent certains types de cambrioleurs. Deux hypothèses ont été testées. La première est que les cambriolages d'habitations ne se répartissent pas uniformément dans les classes formées par des « paramètres situationnels » ; la deuxième que des niches apparaissent en recoupant les différents paramètres et qu'elles correspondent à la classification mise en place par la coordination judiciaire vaudoise et le CICOP. La base de données vaudoise des cambriolages enregistrés entre 1997 et 2006 par la police a été utilisée (25'369 cas). Des situations spécifiques ont été mises en évidence, elles correspondent aux classes définies empiriquement. Dans une deuxième phase, le lien entre une situation spécifique et d'activité d'un auteur au sein d'une même situation a été vérifié. Les observations réalisées dans cette recherche indiquent que les auteurs de cambriolages sont actifs dans des niches. Plusieurs auteurs sériels ont commis des délits qui ne sont pas dans leur niche, mais le nombre de ces infractions est faible par rapport au nombre de cas commis dans la niche. Un système de classification qui correspond à des réalités criminelles permet de décomposer les événements et de mettre en place un système d'alerte et de suivi « intelligent ». Une nouvelle série dans un phénomène sera détectée par une augmentation du nombre de cas de ce phénomène, en particulier dans une région et à une période donnée. Cette nouvelle série, mélangée parmi l'ensemble des délits, ne serait pas forcément détectable, en particulier si elle se déplace. Finalement, la coopération entre les structures de renseignement criminel opérationnel en Suisse romande a été améliorée par le développement d'une plateforme d'information commune et le système de classification y a été entièrement intégré.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lipids available in fingermark residue represent important targets for enhancement and dating techniques. While it is well known that lipid composition varies among fingermarks of the same donor (intra-variability) and between fingermarks of different donors (inter-variability), the extent of this variability remains uncharacterised. Thus, this worked aimed at studying qualitatively and quantitatively the initial lipid composition of fingermark residue of 25 different donors. Among the 104 detected lipids, 43 were reported for the first time in the literature. Furthermore, palmitic acid, squalene, cholesterol, myristyl myristate and myristyl myristoleate were quantified and their correlation within fingermark residue was highlighted. Ten compounds were then selected and further studied as potential targets for dating or enhancement techniques. It was shown that their relative standard deviation was significantly lower for the intra-variability than for the inter-variability. Moreover, the use of data pretreatments could significantly reduce this variability. Based on these observations, an objective donor classification model was proposed. Hierarchical cluster analysis was conducted on the pre-treated data and the fingermarks of the 25 donors were classified into two main groups, corresponding to "poor" and "rich" lipid donors. The robustness of this classification was tested using fingermark replicates of selected donors. 86% of these replicates were correctly classified, showing the potential of such a donor classification model for research purposes in order to select representative donors based on compounds of interest.