222 resultados para Classification Tree Pruning


Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. RESULTS: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. CONCLUSIONS: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Over the past few decades, age estimation of living persons has represented a challenging task for many forensic services worldwide. In general, the process for age estimation includes the observation of the degree of maturity reached by some physical attributes, such as dentition or several ossification centers. The estimated chronological age or the probability that an individual belongs to a meaningful class of ages is then obtained from the observed degree of maturity by means of various statistical methods. Among these methods, those developed in a Bayesian framework offer to users the possibility of coherently dealing with the uncertainty associated with age estimation and of assessing in a transparent and logical way the probability that an examined individual is younger or older than a given age threshold. Recently, a Bayesian network for age estimation has been presented in scientific literature; this kind of probabilistic graphical tool may facilitate the use of the probabilistic approach. Probabilities of interest in the network are assigned by means of transition analysis, a statistical parametric model, which links the chronological age and the degree of maturity by means of specific regression models, such as logit or probit models. Since different regression models can be employed in transition analysis, the aim of this paper is to study the influence of the model in the classification of individuals. The analysis was performed using a dataset related to the ossifications status of the medial clavicular epiphysis and results support that the classification of individuals is not dependent on the choice of the regression model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the past few decades, the rise of criminal, civil and asylum cases involving young people lacking valid identification documents has generated an increase in the demand of age estimation. The chronological age or the probability that an individual is older or younger than a given age threshold are generally estimated by means of some statistical methods based on observations performed on specific physical attributes. Among these statistical methods, those developed in the Bayesian framework allow users to provide coherent and transparent assignments which fulfill forensic and medico-legal purposes. The application of the Bayesian approach is facilitated by using probabilistic graphical tools, such as Bayesian networks. The aim of this work is to test the performances of the Bayesian network for age estimation recently presented in scientific literature in classifying individuals as older or younger than 18 years of age. For these exploratory analyses, a sample related to the ossification status of the medial clavicular epiphysis available in scientific literature was used. Results obtained in the classification are promising: in the criminal context, the Bayesian network achieved, on the average, a rate of correct classifications of approximatively 97%, whilst in the civil context, the rate is, on the average, close to the 88%. These results encourage the continuation of the development and the testing of the method in order to support its practical application in casework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract This work studies the multi-label classification of turns in simple English Wikipedia talk pages into dialog acts. The treated dataset was created and multi-labeled by (Ferschke et al., 2012). The first part analyses dependences between labels, in order to examine the annotation coherence and to determine a classification method. Then, a multi-label classification is computed, after transforming the problem into binary relevance. Regarding features, whereas (Ferschke et al., 2012) use features such as uni-, bi-, and trigrams, time distance between turns or the indentation level of the turn, other features are considered here: lemmas, part-of-speech tags and the meaning of verbs (according to WordNet). The dataset authors applied approaches such as Naive Bayes or Support Vector Machines. The present paper proposes, as an alternative, to use Schoenberg transformations which, following the example of kernel methods, transform original Euclidean distances into other Euclidean distances, in a space of high dimensionality. Résumé Ce travail étudie la classification supervisée multi-étiquette en actes de dialogue des tours de parole des contributeurs aux pages de discussion de Simple English Wikipedia (Wikipédia en anglais simple). Le jeu de données considéré a été créé et multi-étiqueté par (Ferschke et al., 2012). Une première partie analyse les relations entre les étiquettes pour examiner la cohérence des annotations et pour déterminer une méthode de classification. Ensuite, une classification supervisée multi-étiquette est effectuée, après recodage binaire des étiquettes. Concernant les variables, alors que (Ferschke et al., 2012) utilisent des caractéristiques telles que les uni-, bi- et trigrammes, le temps entre les tours de parole ou l'indentation d'un tour de parole, d'autres descripteurs sont considérés ici : les lemmes, les catégories morphosyntaxiques et le sens des verbes (selon WordNet). Les auteurs du jeu de données ont employé des approches telles que le Naive Bayes ou les Séparateurs à Vastes Marges (SVM) pour la classification. Cet article propose, de façon alternative, d'utiliser et d'étendre l'analyse discriminante linéaire aux transformations de Schoenberg qui, à l'instar des méthodes à noyau, transforment les distances euclidiennes originales en d'autres distances euclidiennes, dans un espace de haute dimensionnalité.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Heterozygosity-fitness correlations (HFCs) have been used to understand the complex interactions between inbreeding, genetic diversity and evolution. Although frequently reported for decades, evidence for HFCs was often based on underpowered studies or inappropriate methods, and hence their underlying mechanisms are still under debate. Here, we used 6100 genome-wide single nucleotide polymorphisms (SNPs) to test for general and local effect HFCs in maritime pine (Pinus pinaster Ait.), an iconic Mediterranean forest tree. Survival was used as a fitness proxy, and HFCs were assessed at a four-site common garden under contrasting environmental conditions (total of 16 288 trees). We found no significant correlations between genome-wide heterozygosity and fitness at any location, despite variation in inbreeding explaining a substantial proportion of the total variance for survival. However, four SNPs (including two non-synonymous mutations) were involved in significant associations with survival, in particular in the common gardens with higher environmental stress, as shown by a novel heterozygosity-fitness association test at the species-wide level. Fitness effects of SNPs involved in significant HFCs were stable across maritime pine gene pools naturally growing in distinct environments. These results led us to dismiss the general effect hypothesis and suggested a significant role of heterozygosity in specific candidate genes for increasing fitness in maritime pine. Our study highlights the importance of considering the species evolutionary and demographic history and different spatial scales and testing environments when assessing and interpreting HFCs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identifying homology between sex chromosomes of different species is essential to understanding the evolution of sex determination. Here, we show that the identity of a homomorphic sex chromosome pair can be established using a linkage map, without information on offspring sex. By comparing sex-specific maps of the European tree frog Hyla arborea, we find that the sex chromosome (linkage group 1) shows a threefold difference in marker number between the male and female maps. In contrast, the number of markers on each autosome is similar between the two maps. We also find strongly conserved synteny between H. arborea and Xenopus tropicalis across 200 million years of evolution, suggesting that the rate of chromosomal rearrangement in anurans is low. Finally, we show that recombination in males is greatly reduced at the centers of large chromosomes, consistent with previous cytogenetic findings. Our research shows the importance of high-density linkage maps for studies of recombination, chromosomal rearrangement and the genetic architecture of ecologically or economically important traits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nous présentons dans cet article l'histoire, les grands principes méthodologiques ainsi que la réception scientifique et médiatique du projet Research domain criteria (RDoC) lancé en 2009 aux États-Unis par le National institute of mental health (NIMH). Le projet RDoC, dévolu à la recherche, s'oppose au Manuel diagnostique et statistique des troubles mentaux (DSM) en mettant l'accent sur les dimensions du fonctionnement normal du cerveau, au croisement des recherches génétiques, des neurosciences cognitives et des sciences comportementales. Ce projet représente un pari sur le futur et son succès est tributaire de l'adhésion des chercheurs américains au nouveau cadre de référence qu'il propose, cadre qui reste encore largement à construire.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The World Health Organization (WHO) plans to submit the 11th revision of the International Classification of Diseases (ICD) to the World Health Assembly in 2018. The WHO is working toward a revised classification system that has an enhanced ability to capture health concepts in a manner that reflects current scientific evidence and that is compatible with contemporary information systems. In this paper, we present recommendations made to the WHO by the ICD revision's Quality and Safety Topic Advisory Group (Q&S TAG) for a new conceptual approach to capturing healthcare-related harms and injuries in ICD-coded data. The Q&S TAG has grouped causes of healthcare-related harm and injuries into four categories that relate to the source of the event: (a) medications and substances, (b) procedures, (c) devices and (d) other aspects of care. Under the proposed multiple coding approach, one of these sources of harm must be coded as part of a cluster of three codes to depict, respectively, a healthcare activity as a 'source' of harm, a 'mode or mechanism' of harm and a consequence of the event summarized by these codes (i.e. injury or harm). Use of this framework depends on the implementation of a new and potentially powerful code-clustering mechanism in ICD-11. This new framework for coding healthcare-related harm has great potential to improve the clinical detail of adverse event descriptions, and the overall quality of coded health data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to broaden our knowledge and understanding of the decision steps in the criminal investigation process, we started by evaluating the decision to analyse a trace and the factors involved in this decision step. This decision step is embedded in the complete criminal investigation process, involving multiple decision and triaging steps. Considering robbery cases occurring in a geographic region during a 2-year-period, we have studied the factors influencing the decision to submit biological traces, directly sampled on the scene of the robbery or on collected objects, for analysis. The factors were categorised into five knowledge dimensions: strategic, immediate, physical, criminal and utility and decision tree analysis was carried out. Factors in each category played a role in the decision to analyse a biological trace. Interestingly, factors involving information available prior to the analysis are of importance, such as the fact that a positive result (a profile suitable for comparison) is already available in the case, or that a suspect has been identified through traditional police work before analysis. One factor that was taken into account, but was not significant, is the matrix of the trace. Hence, the decision to analyse a trace is not influenced by this variable. The decision to analyse a trace first is very complex and many of the tested variables were taken into account. The decisions are often made on a case-by-case basis.