977 resultados para Classification tree
Resumo:
For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
Resumo:
Understanding niche evolution, dynamics, and the response of species to climate change requires knowledge of the determinants of the environmental niche and species range limits. Mean values of climatic variables are often used in such analyses. In contrast, the increasing frequency of climate extremes suggests the importance of understanding their additional influence on range limits. Here, we assess how measures representing climate extremes (i.e., interannual variability in climate parameters) explain and predict spatial patterns of 11 tree species in Switzerland. We find clear, although comparably small, improvement (+20% in adjusted D(2), +8% and +3% in cross-validated True Skill Statistic and area under the receiver operating characteristics curve values) in models that use measures of extremes in addition to means. The primary effect of including information on climate extremes is a correction of local overprediction and underprediction. Our results demonstrate that measures of climate extremes are important for understanding the climatic limits of tree species and assessing species niche characteristics. The inclusion of climate variability likely will improve models of species range limits under future conditions, where changes in mean climate and increased variability are expected.
Resumo:
Introduction: Measures of the degree of lumbar spinal stenosis (LSS) such as antero-posterior diameter of the canal, and dural sac cross sectional area vary, and do not correlate with symptoms or results of surgery. We created a grading system, comprised of seven categories, based on the morphology of the dural sac and its contents as seen on T2 axial images. The categories take into account the ratio of rootlet/ CSF content. Grade A indicates no significant compression, grade D is equivalent to a total myelograhic block. We compared this classification with commonly used criteria of severity of stenosis. Methods: Fifty T2 axial MRI images taken at disc level from 27 symptomatic LSS patients undergoing decompressive surgery were classified twice by two radiologists and three spinal surgeons working at different institutions and countries. Dural sac cross-sectional surface area and AP diameter of the canal were measured both at disc and pedicle level from DICOM images using OsiriX software. Intraand inter-observer reliability were assessed using Cohen's, Fleiss' kappa statistics, and t test. Results: For the morphological grading the average intra-and inter observer kappas were 0.76 and 0.69+, respectively, for physicians working in the study originating country. Combining all observers the kappa values were 0.57 ± 0.19. and 0.44 ± 0.19, respectively. AP diameter and dural sac cross-sectional area measurements showed no statistically significant differences between observers. No correlation between morphological grading and AP diameter or dural sac crosssectional areawas observed in 13 (26%) and 8 cases (16%), respectively. Discussion: The proposed morphological grading relies on the identification of the dural sac and CSF better seen on full MRI series. This was not available to the external observers, which might explain the lower overall kappa values. Since no specific measurement tools are needed the grading suits everyday clinical practice and favours communication of degree of stenosis between practising physicians. The absence of a strict correlation with the dural sac surface suggests that measuring the surface alone might be insufficient in defining LSS as it is essentially a mismatch between the spinal canal and its contents. This grading is now adopted in our unit and further studies concentrating on relation between morphology, clinical symptoms and surgical results are underway.
Resumo:
Automatic classification of makams from symbolic data is a rarely studied topic. In this paper, first a review of an n-gram based approach is presented using various representations of the symbolic data. While a high degree of precision can be obtained, confusion happens mainly for makams using (almost) the same scale and pitch hierarchy but differ in overall melodic progression, seyir. To further improve the system, first n-gram based classification is tested for various sections of the piece to take into account a feature of the seyir that melodic progression starts in a certain region of the scale. In a second test, a hierarchical classification structure is designed which uses n-grams and seyir features in different levels to further improve the system.
Resumo:
Introduction: Responses to external stimuli are typically investigated by averaging peri-stimulus electroencephalography (EEG) epochs in order to derive event-related potentials (ERPs) across the electrode montage, under the assumption that signals that are related to the external stimulus are fixed in time across trials. We demonstrate the applicability of a single-trial model based on patterns of scalp topographies (De Lucia et al, 2007) that can be used for ERP analysis at the single-subject level. The model is able to classify new trials (or groups of trials) with minimal a priori hypotheses, using information derived from a training dataset. The features used for the classification (the topography of responses and their latency) can be neurophysiologically interpreted, because a difference in scalp topography indicates a different configuration of brain generators. An above chance classification accuracy on test datasets implicitly demonstrates the suitability of this model for EEG data. Methods: The data analyzed in this study were acquired from two separate visual evoked potential (VEP) experiments. The first entailed passive presentation of checkerboard stimuli to each of the four visual quadrants (hereafter, "Checkerboard Experiment") (Plomp et al, submitted). The second entailed active discrimination of novel versus repeated line drawings of common objects (hereafter, "Priming Experiment") (Murray et al, 2004). Four subjects per experiment were analyzed, using approx. 200 trials per experimental condition. These trials were randomly separated in training (90%) and testing (10%) datasets in 10 independent shuffles. In order to perform the ERP analysis we estimated the statistical distribution of voltage topographies by a Mixture of Gaussians (MofGs), which reduces our original dataset to a small number of representative voltage topographies. We then evaluated statistically the degree of presence of these template maps across trials and whether and when this was different across experimental conditions. Based on these differences, single-trials or sets of a few single-trials were classified as belonging to one or the other experimental condition. Classification performance was assessed using the Receiver Operating Characteristic (ROC) curve. Results: For the Checkerboard Experiment contrasts entailed left vs. right visual field presentations for upper and lower quadrants, separately. The average posterior probabilities, indicating the presence of the computed template maps in time and across trials revealed significant differences starting at ~60-70 ms post-stimulus. The average ROC curve area across all four subjects was 0.80 and 0.85 for upper and lower quadrants, respectively and was in all cases significantly higher than chance (unpaired t-test, p<0.0001). In the Priming Experiment, we contrasted initial versus repeated presentations of visual object stimuli. Their posterior probabilities revealed significant differences, which started at 250ms post-stimulus onset. The classification accuracy rates with single-trial test data were at chance level. We therefore considered sub-averages based on five single trials. We found that for three out of four subjects' classification rates were significantly above chance level (unpaired t-test, p<0.0001). Conclusions: The main advantage of the present approach is that it is based on topographic features that are readily interpretable along neurophysiologic lines. As these maps were previously normalized by the overall strength of the field potential on the scalp, a change in their presence across trials and between conditions forcibly reflects a change in the underlying generator configurations. The temporal periods of statistical difference between conditions were estimated for each training dataset for ten shuffles of the data. Across the ten shuffles and in both experiments, we observed a high level of consistency in the temporal periods over which the two conditions differed. With this method we are able to analyze ERPs at the single-subject level providing a novel tool to compare normal electrophysiological responses versus single cases that cannot be considered part of any cohort of subjects. This aspect promises to have a strong impact on both basic and clinical research.
Resumo:
A new radiolarian order - Archaeospicularia - is proposed for some Lower Paleozoic radiolarians previously considered to belong to Spumellaria and to Collodaria. It is characterized by a globular shell made of several spicules which can be free, interlocked, or fused to formed a latticed wall. The present paper gives the definition of this order and proposes a first classification. It is supposed that the Archaeospicularia represents the oldest radiolarian group and that in the Lower Paleozoic it gave rise to the orders Entactinaria, Albaillellaria, and probably Spumellaria by the reduction of the number of initial spicules. The origin of this order and its relationships with other groups of organisms with siliceous skeletons are also briefly discussed. (C) 2000 Academie des sciences / Editions scientifiques et medicales Elsevier SAS.
Resumo:
Quantifying the impacts of inbreeding and genetic drift on fitness traits in fragmented populations is becoming a major goal in conservation biology. Such impacts occur at different levels and involve different sets of loci. Genetic drift randomly fixes slightly deleterious alleles leading to different fixation load among populations. By contrast, inbreeding depression arises from highly deleterious alleles in segregation within a population and creates variation among individuals. A popular approach is to measure correlations between molecular variation and phenotypic performances. This approach has been mainly used at the individual level to detect inbreeding depression within populations and sometimes at the population level but without consideration about the genetic processes measured. For the first time, we used in this study a molecular approach considering both the interpopulation and intrapopulation level to discriminate the relative importance of inbreeding depression vs. fixation load in isolated and non-fragmented populations of European tree frog (Hyla arborea), complemented with interpopulational crosses. We demonstrated that the positive correlations observed between genetic heterozygosity and larval performances on merged data were mainly caused by co-variations in genetic diversity and fixation load among populations rather than by inbreeding depression and segregating deleterious alleles within populations. Such a method is highly relevant in a conservation perspective because, depending on how populations lose fitness (inbreeding vs. fixation load), specific management actions may be designed to improve the persistence of populations.
Resumo:
Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.
Resumo:
The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.
Resumo:
Population viability analyses (PVA) are increasingly used in metapopulation conservation plans. Two major types of models are commonly used to assess vulnerability and to rank management options: population-based stochastic simulation models (PSM such as RAMAS or VORTEX) and stochastic patch occupancy models (SPOM). While the first set of models relies on explicit intrapatch dynamics and interpatch dispersal to predict population levels in space and time, the latter is based on spatially explicit metapopulation theory where the probability of patch occupation is predicted given the patch area and isolation (patch topology). We applied both approaches to a European tree frog (Hyla arborea) metapopulation in western Switzerland in order to evaluate the concordances of both models and their applications to conservation. Although some quantitative discrepancies appeared in terms of network occupancy and equilibrium population size, the two approaches were largely concordant regarding the ranking of patch values and sensitivities to parameters, which is encouraging given the differences in the underlying paradigms and input data.