37 resultados para Functional classification
Resumo:
For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
Resumo:
Genetic and functional data indicate that variation in the expression of the neurotrophin-3 receptor gene (NTRK3) may have an impact on neuronal plasticity, suggesting a role for NTRK3 in the pathophysiology of anxiety disorders. MicroRNA (miRNA) posttranscriptional gene regulators act by base-pairing to specific sequence sites, usually at the 3'UTR of the target mRNA. Variants at these sites might result in gene expression changes contributing to disease susceptibility. We investigated genetic variation in two different isoforms of NTRK3 as candidate susceptibility factors for anxiety by resequencing their 3'UTRs in patients with panic disorder (PD), obsessive-compulsive disorder (OCD), and in controls. We have found the C allele of rs28521337, located in a functional target site for miR-485-3p in the truncated isoform of NTRK3, to be significantly associated with the hoarding phenotype of OCD. We have also identified two new rare variants in the 3'UTR of NTRK3, ss102661458 and ss102661460, each present only in one chromosome of a patient with PD. The ss102661458 variant is located in a functional target site for miR-765, and the ss102661460 in functional target sites for two miRNAs, miR-509 and miR-128, the latter being a brain-enriched miRNA involved in neuronal differentiation and synaptic processing. Interestingly, these two variants significantly alter the miRNA-mediated regulation of NTRK3, resulting in recovery of gene expression. These data implicate miRNAs as key posttranscriptional regulators of NTRK3 and provide a framework for allele-specific miRNA regulation of NTRK3 in anxiety disorders.
Resumo:
Background: Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect of non-synonymous SNPs in the context of biological networks such as cell signalling pathways. UniProt provides curated information about the functional and phenotypic effects of sequence variation, including SNPs, as well as on mutations of protein sequences. However, no strategy has been developed to integrate this information with biological networks, with the ultimate goal of studying the impact of the functional effect of SNPs in the structure and dynamics of biological networks. Results: First, we identified the different challenges posed by the integration of the phenotypic effect of sequence variants and mutations with biological networks. Second, we developed a strategy for the combination of data extracted from public resources, such as UniProt, NCBI dbSNP, Reactome and BioModels. We generated attribute files containing phenotypic and genotypic annotations to the nodes of biological networks, which can be imported into network visualization tools such as Cytoscape. These resources allow the mapping and visualization of mutations and natural variations of human proteins and their phenotypic effect on biological networks (e.g. signalling pathways, protein-protein interaction networks, dynamic models). Finally, an example on the use of the sequence variation data in the dynamics of a network model is presented. Conclusion: In this paper we present a general strategy for the integration of pathway and sequence variation data for visualization, analysis and modelling purposes, including the study of the functional impact of protein sequence variations on the dynamics of signalling pathways. This is of particular interest when the SNP or mutation is known to be associated to disease. We expect that this approach will help in the study of the functional impact of disease-associated SNPs on the behaviour of cell signalling pathways, which ultimately will lead to a better understanding of the mechanisms underlying complex diseases.
Resumo:
Automatic classification of makams from symbolic data is a rarely studied topic. In this paper, first a review of an n-gram based approach is presented using various representations of the symbolic data. While a high degree of precision can be obtained, confusion happens mainly for makams using (almost) the same scale and pitch hierarchy but differ in overall melodic progression, seyir. To further improve the system, first n-gram based classification is tested for various sections of the piece to take into account a feature of the seyir that melodic progression starts in a certain region of the scale. In a second test, a hierarchical classification structure is designed which uses n-grams and seyir features in different levels to further improve the system.
Resumo:
Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.
Resumo:
The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.
Resumo:
In this paper we propose a Pyramidal Classification Algorithm,which together with an appropriate aggregation index producesan indexed pseudo-hierarchy (in the strict sense) withoutinversions nor crossings. The computer implementation of thealgorithm makes it possible to carry out some simulation testsby Monte Carlo methods in order to study the efficiency andsensitivity of the pyramidal methods of the Maximum, Minimumand UPGMA. The results shown in this paper may help to choosebetween the three classification methods proposed, in order toobtain the classification that best fits the original structureof the population, provided we have an a priori informationconcerning this structure.