970 resultados para hierarchical classification structures
Resumo:
For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
Resumo:
Introduction: Measures of the degree of lumbar spinal stenosis (LSS) such as antero-posterior diameter of the canal, and dural sac cross sectional area vary, and do not correlate with symptoms or results of surgery. We created a grading system, comprised of seven categories, based on the morphology of the dural sac and its contents as seen on T2 axial images. The categories take into account the ratio of rootlet/ CSF content. Grade A indicates no significant compression, grade D is equivalent to a total myelograhic block. We compared this classification with commonly used criteria of severity of stenosis. Methods: Fifty T2 axial MRI images taken at disc level from 27 symptomatic LSS patients undergoing decompressive surgery were classified twice by two radiologists and three spinal surgeons working at different institutions and countries. Dural sac cross-sectional surface area and AP diameter of the canal were measured both at disc and pedicle level from DICOM images using OsiriX software. Intraand inter-observer reliability were assessed using Cohen's, Fleiss' kappa statistics, and t test. Results: For the morphological grading the average intra-and inter observer kappas were 0.76 and 0.69+, respectively, for physicians working in the study originating country. Combining all observers the kappa values were 0.57 ± 0.19. and 0.44 ± 0.19, respectively. AP diameter and dural sac cross-sectional area measurements showed no statistically significant differences between observers. No correlation between morphological grading and AP diameter or dural sac crosssectional areawas observed in 13 (26%) and 8 cases (16%), respectively. Discussion: The proposed morphological grading relies on the identification of the dural sac and CSF better seen on full MRI series. This was not available to the external observers, which might explain the lower overall kappa values. Since no specific measurement tools are needed the grading suits everyday clinical practice and favours communication of degree of stenosis between practising physicians. The absence of a strict correlation with the dural sac surface suggests that measuring the surface alone might be insufficient in defining LSS as it is essentially a mismatch between the spinal canal and its contents. This grading is now adopted in our unit and further studies concentrating on relation between morphology, clinical symptoms and surgical results are underway.
Resumo:
Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Resumo:
Previous microarray studies on breast cancer identified multiple tumour classes, of which the most prominent, named luminal and basal, differ in expression of the oestrogen receptor alpha gene (ER). We report here the identification of a group of breast tumours with increased androgen signalling and a 'molecular apocrine' gene expression profile. Tumour samples from 49 patients with large operable or locally advanced breast cancers were tested on Affymetrix U133A gene expression microarrays. Principal components analysis and hierarchical clustering split the tumours into three groups: basal, luminal and a group we call molecular apocrine. All of the molecular apocrine tumours have strong apocrine features on histological examination (P=0.0002). The molecular apocrine group is androgen receptor (AR) positive and contains all of the ER-negative tumours outside the basal group. Kolmogorov-Smirnov testing indicates that oestrogen signalling is most active in the luminal group, and androgen signalling is most active in the molecular apocrine group. ERBB2 amplification is commoner in the molecular apocrine than the other groups. Genes that best split the three groups were identified by Wilcoxon test. Correlation of the average expression profile of these genes in our data with the expression profile of individual tumours in four published breast cancer studies suggest that molecular apocrine tumours represent 8-14% of tumours in these studies. Our data show that it is possible with microarray data to divide mammary tumour cells into three groups based on steroid receptor activity: luminal (ER+ AR+), basal (ER- AR-) and molecular apocrine (ER- AR+).
Resumo:
Error-correcting codes and matroids have been widely used in the study of ordinary secret sharing schemes. In this paper, the connections between codes, matroids, and a special class of secret sharing schemes, namely, multiplicative linear secret sharing schemes (LSSSs), are studied. Such schemes are known to enable multiparty computation protocols secure against general (nonthreshold) adversaries.Two open problems related to the complexity of multiplicative LSSSs are considered in this paper. The first one deals with strongly multiplicative LSSSs. As opposed to the case of multiplicative LSSSs, it is not known whether there is an efficient method to transform an LSSS into a strongly multiplicative LSSS for the same access structure with a polynomial increase of the complexity. A property of strongly multiplicative LSSSs that could be useful in solving this problem is proved. Namely, using a suitable generalization of the well-known Berlekamp–Welch decoder, it is shown that all strongly multiplicative LSSSs enable efficient reconstruction of a shared secret in the presence of malicious faults. The second one is to characterize the access structures of ideal multiplicative LSSSs. Specifically, the considered open problem is to determine whether all self-dual vector space access structures are in this situation. By the aforementioned connection, this in fact constitutes an open problem about matroid theory, since it can be restated in terms of representability of identically self-dual matroids by self-dual codes. A new concept is introduced, the flat-partition, that provides a useful classification of identically self-dual matroids. Uniform identically self-dual matroids, which are known to be representable by self-dual codes, form one of the classes. It is proved that this property also holds for the family of matroids that, in a natural way, is the next class in the above classification: the identically self-dual bipartite matroids.
Resumo:
Introduction: Responses to external stimuli are typically investigated by averaging peri-stimulus electroencephalography (EEG) epochs in order to derive event-related potentials (ERPs) across the electrode montage, under the assumption that signals that are related to the external stimulus are fixed in time across trials. We demonstrate the applicability of a single-trial model based on patterns of scalp topographies (De Lucia et al, 2007) that can be used for ERP analysis at the single-subject level. The model is able to classify new trials (or groups of trials) with minimal a priori hypotheses, using information derived from a training dataset. The features used for the classification (the topography of responses and their latency) can be neurophysiologically interpreted, because a difference in scalp topography indicates a different configuration of brain generators. An above chance classification accuracy on test datasets implicitly demonstrates the suitability of this model for EEG data. Methods: The data analyzed in this study were acquired from two separate visual evoked potential (VEP) experiments. The first entailed passive presentation of checkerboard stimuli to each of the four visual quadrants (hereafter, "Checkerboard Experiment") (Plomp et al, submitted). The second entailed active discrimination of novel versus repeated line drawings of common objects (hereafter, "Priming Experiment") (Murray et al, 2004). Four subjects per experiment were analyzed, using approx. 200 trials per experimental condition. These trials were randomly separated in training (90%) and testing (10%) datasets in 10 independent shuffles. In order to perform the ERP analysis we estimated the statistical distribution of voltage topographies by a Mixture of Gaussians (MofGs), which reduces our original dataset to a small number of representative voltage topographies. We then evaluated statistically the degree of presence of these template maps across trials and whether and when this was different across experimental conditions. Based on these differences, single-trials or sets of a few single-trials were classified as belonging to one or the other experimental condition. Classification performance was assessed using the Receiver Operating Characteristic (ROC) curve. Results: For the Checkerboard Experiment contrasts entailed left vs. right visual field presentations for upper and lower quadrants, separately. The average posterior probabilities, indicating the presence of the computed template maps in time and across trials revealed significant differences starting at ~60-70 ms post-stimulus. The average ROC curve area across all four subjects was 0.80 and 0.85 for upper and lower quadrants, respectively and was in all cases significantly higher than chance (unpaired t-test, p<0.0001). In the Priming Experiment, we contrasted initial versus repeated presentations of visual object stimuli. Their posterior probabilities revealed significant differences, which started at 250ms post-stimulus onset. The classification accuracy rates with single-trial test data were at chance level. We therefore considered sub-averages based on five single trials. We found that for three out of four subjects' classification rates were significantly above chance level (unpaired t-test, p<0.0001). Conclusions: The main advantage of the present approach is that it is based on topographic features that are readily interpretable along neurophysiologic lines. As these maps were previously normalized by the overall strength of the field potential on the scalp, a change in their presence across trials and between conditions forcibly reflects a change in the underlying generator configurations. The temporal periods of statistical difference between conditions were estimated for each training dataset for ten shuffles of the data. Across the ten shuffles and in both experiments, we observed a high level of consistency in the temporal periods over which the two conditions differed. With this method we are able to analyze ERPs at the single-subject level providing a novel tool to compare normal electrophysiological responses versus single cases that cannot be considered part of any cohort of subjects. This aspect promises to have a strong impact on both basic and clinical research.
Resumo:
A new radiolarian order - Archaeospicularia - is proposed for some Lower Paleozoic radiolarians previously considered to belong to Spumellaria and to Collodaria. It is characterized by a globular shell made of several spicules which can be free, interlocked, or fused to formed a latticed wall. The present paper gives the definition of this order and proposes a first classification. It is supposed that the Archaeospicularia represents the oldest radiolarian group and that in the Lower Paleozoic it gave rise to the orders Entactinaria, Albaillellaria, and probably Spumellaria by the reduction of the number of initial spicules. The origin of this order and its relationships with other groups of organisms with siliceous skeletons are also briefly discussed. (C) 2000 Academie des sciences / Editions scientifiques et medicales Elsevier SAS.
Resumo:
Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.
Resumo:
The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.
Resumo:
The transpressional boundary between the Australian and Pacific plates in the central South Island of New Zealand comprises the Alpine Fault and a broad region of distributed strain concentrated in the Southern Alps but encompassing regions further to the east, including the northwest Canterbury Plains. Low to moderate levels of seismicity (e. g., 2 > M 5 events since 1974 and 2 > M 4.0 in 2009) and Holocene sediments offset or disrupted along rare exposed active fault segments are evidence for ongoing tectonism in the northwest plains, the surface topography of which is remarkably flat and even. Because the geology underlying the late Quaternary alluvial fan deposits that carpet most of the plains is not established, the detailed tectonic evolution of this region and the potential for larger earthquakes is only poorly understood. To address these issues, we have processed and interpreted high-resolution (2.5 m subsurface sampling interval) seismic data acquired along lines strategically located relative to extensive rock exposures to the north, west, and southwest and rare exposures to the east. Geological information provided by these rock exposures offer important constraints on the interpretation of the seismic data. The processed seismic reflection sections image a variably thick layer of generally undisturbed younger (i.e., < 24 ka) Quaternary alluvial sediments unconformably overlying an older (> 59 ka) Quaternary sedimentary sequence that shows evidence of moderate faulting and folding during and subsequent to deposition. These Quaternary units are in unconformable contact with Late Cretaceous-Tertiary interbedded sedimentary and volcanic rocks that are highly faulted, folded, and tilted. The lowest imaged unit is largely reflection-free Permian Triassic basement rocks. Quaternary-age deformation has affected all the rocks underlying the younger alluvial sediments, and there is evidence for ongoing deformation. Eight primary and numerous secondary faults as well as a major anticlinal fold are revealed on the seismic sections. Folded sedimentary and volcanic units are observed in the hanging walls and footwalls of most faults. Five of the primary faults represent plausible extensions of mapped faults, three of which are active. The major anticlinal fold is the probable continuation of known active structure. A magnitude 7.1 earthquake occurred on 4 September 2010 near the southeastern edge of our study area. This predominantly right-lateral strike-slip event and numerous aftershocks (ten with magnitudes >= 5 within one week of the main event) highlight the primary message of our paper: that the generally flat and topographically featureless Canterbury Plains is underlain by a network of active faults that have the potential to generate significant earthquakes.
Resumo:
Aquest treball és una revisió d'alguns sistemes de Traducció Automàtica que segueixen l'estratègia de Transfer i fan servir estructures de trets com a eina de representació. El treball s'integra dins el projecte MLAP-9315, projecte que investiga la reutilització de les especificacions lingüístiques del projecte EUROTRA per estàndards industrials.