852 resultados para Domain-specific language
Resumo:
Reorganizing a dataset so that its hidden structure can be observed is useful in any data analysis task. For example, detecting a regularity in a dataset helps us to interpret the data, compress the data, and explain the processes behind the data. We study datasets that come in the form of binary matrices (tables with 0s and 1s). Our goal is to develop automatic methods that bring out certain patterns by permuting the rows and columns. We concentrate on the following patterns in binary matrices: consecutive-ones (C1P), simultaneous consecutive-ones (SC1P), nestedness, k-nestedness, and bandedness. These patterns reflect specific types of interplay and variation between the rows and columns, such as continuity and hierarchies. Furthermore, their combinatorial properties are interlinked, which helps us to develop the theory of binary matrices and efficient algorithms. Indeed, we can detect all these patterns in a binary matrix efficiently, that is, in polynomial time in the size of the matrix. Since real-world datasets often contain noise and errors, we rarely witness perfect patterns. Therefore we also need to assess how far an input matrix is from a pattern: we count the number of flips (from 0s to 1s or vice versa) needed to bring out the perfect pattern in the matrix. Unfortunately, for most patterns it is an NP-complete problem to find the minimum distance to a matrix that has the perfect pattern, which means that the existence of a polynomial-time algorithm is unlikely. To find patterns in datasets with noise, we need methods that are noise-tolerant and work in practical time with large datasets. The theory of binary matrices gives rise to robust heuristics that have good performance with synthetic data and discover easily interpretable structures in real-world datasets: dialectical variation in the spoken Finnish language, division of European locations by the hierarchies found in mammal occurrences, and co-occuring groups in network data. In addition to determining the distance from a dataset to a pattern, we need to determine whether the pattern is significant or a mere occurrence of a random chance. To this end, we use significance testing: we deem a dataset significant if it appears exceptional when compared to datasets generated from a certain null hypothesis. After detecting a significant pattern in a dataset, it is up to domain experts to interpret the results in the terms of the application.
Resumo:
All protein-encoding genes in eukaryotes are transcribed into messenger RNA (mRNA) by RNA Polymerase II (RNAP II), whose activity therefore needs to be tightly controlled. An important and only partially understood level of regulation is the multiple phosphorylations of RNAP II large subunit C-terminal domain (CTD). Sequential phosphorylations regulate transcription initiation and elongation, and recruit factors involved in co-transcriptional processing of mRNA. Based largely on studies in yeast models and in vitro, the kinase activity responsible for the phosphorylation of the serine-5 (Ser5) residues of RNAP II CTD has been attributed to the Mat1/Cdk7/CycH trimer as part of Transcription Factor IIH. However, due to the lack of good mammalian genetic models, the roles of both RNAP II Ser5 phosphorylation as well as TFIIH kinase in transcription have provided ambiguous results and the in vivo kinase of Ser5 has remained elusive. The primary objective of this study was to elucidate the role of mammalian TFIIH, and specifically the Mat1 subunit in CTD phosphorylation and general RNAP II-mediated transcription. The approach utilized the Cre-LoxP system to conditionally delete murine Mat1 in cardiomyocytes and hepatocytes in vivo and and in cell culture models. The results identify the TFIIH kinase as the major mammalian Ser5 kinase and demonstrate its requirement for general transcription, noted by the use of nascent mRNA labeling. Also a role for Mat1 in regulating general mRNA turnover was identified, providing a possible rationale for earlier negative findings. A secondary objective was to identify potential gene- and tissue-specific roles of Mat1 and the TFIIH kinase through the use of tissue-specific Mat1 deletion. Mat1 was found to be required for the transcriptional function of PGC-1 in cardiomyocytes. Transriptional activation of lipogenic SREBP1 target genes following Mat1 deletion in hepatocytes revealed a repressive role for Mat1apparently mediated via co-repressor DMAP1 and the DNA methyltransferase Dnmt1. Finally, Mat1 and Cdk7 were also identified as a negative regulators of adipocyte differentiation through the inhibitory phosphorylation of Peroxisome proliferator-activated receptor (PPAR) γ. Together, these results demonstrate gene- and tissue-specific roles for the Mat1 subunit of TFIIH and open up new therapeutic possibilities in the treatment of diseases such as type II diabetes, hepatosteatosis and obesity.
Resumo:
In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.
Resumo:
Biological membranes are tightly linked to the evolution of life, because they provide a way to concentrate molecules into partially closed compartments. The dynamic shaping of cellular membranes is essential for many physiological processes, including cell morphogenesis, motility, cytokinesis, endocytosis, and secretion. It is therefore essential to understand the structure of the membrane and recognize the players that directly sculpt the membrane and enable it to adopt different shapes. The actin cytoskeleton provides the force to push eukaryotic plasma membrane in order to form different protrusions or/and invaginations. It has now became evident that actin directly co-operates with many membrane sculptors, including BAR domain proteins, in these important events. However, the molecular mechanisms behind BAR domain function and the differences between the members of this large protein family remain largely unresolved. In this thesis, the structure and functions of the I-BAR domain family members IRSp53 and MIM were thoroughly analyzed. By using several methods such as electron microscopy and systematic mutagenesis, we showed that these I-BAR domain proteins bind to PI(4,5)P2-rich membranes, generate negative membrane curvature and are involved in the formation of plasma membrane protrusions in cells e.g. filopodia. Importantly, we characterized a novel member of the BAR-domain superfamily which we named Pinkbar. We revealed that Pinkbar is specifically expressed in kidney and epithelial cells, and it localizes to Rab13-positive vesicles in intestinal epithelial cells. Remarkably, we learned that the I-BAR domain of Pinkbar does not generate membrane curvature but instead stabilizes planar membranes. Based on structural, mutagenesis and biochemical work we present a model for the mechanism of the novel membrane deforming activity of Pinkbar. Collectively, this work describes the mechanism by which I-BAR domain proteins deform membranes and provides new information about the biological roles of these proteins. Intriguingly, this work also gives evidence that significant functional plasticity exists within the I-BAR domain family. I-BAR proteins can either generate negative membrane curvature or stabilize planar membrane sheets, depending on the specific structural properties of their I-BAR domains. The results presented in this thesis expand our knowledge on membrane sculpting mechanisms and shows for the first time how flat membranes can be generated in cells.
Resumo:
This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described. Moreover, results achieved in first five months of the project, i.e. language whitepapers, metadata specification and IPR, are presented.
Resumo:
The activity of many proteins orchestrating different biological processes is regulated by allostery, where ligand binding at one site alters the function of another site. Allosteric changes can be brought about by either a change in the dynamics of a protein, or alteration in its mean structure. We have investigated the mechanisms of allostery induced by chemically distinct ligands in the cGMP-binding, cGMP-specific phosphodiesterase, PDE5. PDE5 is the target for catalytic site inhibitors, such as sildenafil, that are used for the treatment of erectile dysfunction and pulmonary hypertension. PDE5 is a multidomain protein and contains two N-terminal cGMP-specific phosphodiesterase, bacterial adenylyl cyclase, FhLA transcriptional regulator (GAF) domains, and a C-terminal catalytic domain. Cyclic GMP binding to the GAFa domain and sildenafil binding to the catalytic domain result in conformational changes, which to date have been studied either with individual domains or with purified enzyme. Employing intramolecular bioluminescence resonance energy transfer, which can monitor conformational changes both in vitro and in intact cells, we show that binding of cGMP and sildenafil to PDE5 results in distinct conformations of the protein. Metal ions bound to the catalytic site also allosterically modulated cGMP- and sildenafil-induced conformational changes. The sildenafil-induced conformational change was temperature-sensitive, whereas cGMP-induced conformational change was independent of temperature. This indicates that different allosteric ligands can regulate the conformation of a multidomain protein by distinct mechanisms. Importantly, this novel PDE5 sensor has general physiological and clinical relevance because it allows the identification of regulators that can modulate PDE5 conformation in vivo.
Resumo:
Soluble chromatin was prepared from rat testes after a brief micrococcal nuclease digestion. After adsorption onto hydroxylapatite at low ionic strength, the histone Hl subtypes were eluted with a shallow salt gradient of 0.3 M NaCl to 0.7 M NaCl. Histone Hlt was eluted at 0.4 MNaCl, while histones H1a and Hlc were eluted at 0.43 M NaCl and 0.45 M respectively. The extreme divergence of the amino acid sequence of the C-terminal half of histone Hlt, the major DNA binding domain of histone Hl, from that of the somatic consensus sequence may contribute to the weaker interaction of histone Hlt with the rat testis chromatin. Further, histone Hlt was not phosphorylated in vivo in contrast to histone Hla and Hlc, as is evident from the observation that histone Hlt lacks the SPKK motif recognized by the CDC-2kinase or the RR/KXS motif recognized by protein kinase A.
Resumo:
We present a improved language modeling technique for Lempel-Ziv-Welch (LZW) based LID scheme. The previous approach to LID using LZW algorithm prepares the language pattern table using LZW algorithm. Because of the sequential nature of the LZW algorithm, several language specific patterns of the language were missing in the pattern table. To overcome this, we build a universal pattern table, which contains all patterns of different length. For each language it's corresponding language specific pattern table is constructed by retaining the patterns of the universal table whose frequency of appearance in the training data is above the threshold.This approach reduces the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score[LZW-WDS]) for non native languages and increases the LID performance considerably.
Resumo:
Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR.
Resumo:
Overexpression of Notch receptors and ligands has been associated with various cancers and developmental disorders, making Notch a potential therapeutic target. Here, we report characterization of Notch1 monoclonal antibodies (mAb) with therapeutic potential. The mAbs generated against epidermal growth factor (EGF) repeats 11 to 15 inhibited binding of Jagged1 and Delta-like4 and consequently, signaling in a dose-dependent manner, the antibodies against EGF repeats 11 to 12 being more effective than those against repeats 13 to 15. These data emphasize the role of EGF repeats 11 to 12 in ligand binding. One of the mAbs, 602.101, which specifically recognizes Notch1, inhibited ligand-dependent expression of downstream target genes of Notch such as HES-1, HES-5, and HEY-L in the breast cancer cell line MDA-MB-231. The mAb also decreased cell proliferation and induced apoptotic cell death. Furthermore, exposure to this antibody reduced CD44(Hi)/CD24(Low) subpopulation in MDA-MB-231 cells, suggesting a decrease in the cancer stem-like cell subpopulation. This was confirmed by showing that exposure to the antibody decreased the primary, secondary, and tertiary mammosphere formation efficiency of the cells. Interestingly, effect of the antibody on the putative stem-like cells appeared to be irreversible, because the mammosphere-forming efficiency could not be salvaged even after antibody removal during the secondary sphere formation. The antibody also modulated expression of genes associated with stemness and epithelial-mesenchymal transition. Thus, targeting individual Notch receptors by specific mAbs is a potential therapeutic strategy to reduce the potential breast cancer stem-like cell subpopulation. Mol Cancer Ther; 11(1); 77-86. (C) 2011 AACR.
Resumo:
Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.
Resumo:
The exoloops of glycoprotein hormone receptors (GpHRs) transduce the signal generated by the ligand-ectodomain interactions to the transmembrane helices either through direct hormonal contact and/or by modulating the interdomain interactions between the hinge region (HinR) and the transmembrane domain (TMD). The ligand-induced conformational alterations in the HinRs and the interhelical loops of luteinizing hormone receptor/follicle stimulating hormone receptor/thyroid stimulating hormone receptor were mapped using exoloop-specific antibodies generated against a mini-TMD protein designed to mimic the native exoloop conformations that were created by joining the thyroid stimulating hormone receptor exoloops constrained through helical tethers and library-derived linkers. The antibody against the mini-TMD specifically recognized all three GpHRs and inhibited the basal and hormone-stimulated cAMP production without affecting hormone binding. Interestingly, binding of the antibody to all three receptors was abolished by prior incubation of the receptors with the respective hormones, suggesting that the exoloops are buried in the hormone-receptor complexes. The antibody also suppressed the high basal activities of gain-of-function mutations in the HinRs, exoloops, and TMDs such as those involved in precocious puberty and thyroid toxic adenomas. Using the antibody and point/deletion/chimeric receptor mutants, we demonstrate that changes in the HinR-exoloop interactions play an important role in receptor activation. Computational analysis suggests that the mini-TMD antibodies act by conformationally locking the transmembrane helices by means of restraining the exoloops and the juxta-membrane regions. Using GpHRs as a model, we describe a novel computational approach of generating soluble TMD mimics that can be used to explain the role of exoloops during receptor activation and their interplay with TMDs.
Resumo:
The Notch signalling pathway is implicated in a wide variety of cellular processes throughout metazoan development. Although the downstream mechanism of Notch signalling has been extensively studied, the details of its ligand-mediated receptor activation are not clearly understood. Although the role of Notch ELRs EGF (epidermal growth factor)-like-repeats] 11-12 in ligand binding is known, recent studies have suggested interactions within different ELRs of the Notch receptor whose significance remains to be understood. Here, we report critical inter-domain interactions between human Notch1 ELRs 21-30 and the ELRs 11-15 that are modulated by calcium. Surface plasmon resonance analysis revealed that the interaction between ELRs 21-30 and ELRs 11-15 is similar to 10-fold stronger than that between ELRs 11-15 and the ligands. Although there was no interaction between Notch 1 ELRs 21-30 and the ligands in vitro, addition of pre-clustered Jagged1Fc resulted in the dissociation of the preformed complex between ELRs 21-30 and 11-15, suggesting that inter-domain interactions compete for ligand binding. Furthermore, the antibodies against ELRs 21-30 inhibited ligand binding to the full-length Notch1 and subsequent receptor activation, with the antibodies against ELRs 25-26 being the most effective. These results suggest that the ELRs 25-26 represent a cryptic ligand-binding site which becomes exposed only upon the presence of the ligand. Thus, using specific antibodies against various domains of the Notch1 receptor, we demonstrate that, although ELRs 11-12 are the principal ligand-binding site, the ELRs 25-26 serve as a secondary binding site and play an important role in receptor activation.
Resumo:
Guanylyl cyclase C (GC-C) is a multidomain, membrane-associated receptor guanylyl cyclase. GC-C is primarily expressed in the gastrointestinal tract, where it mediates fluid-ion homeostasis, intestinal inflammation, and cell proliferation in a cGMP-dependent manner, following activation by its ligands guanylin, uroguanylin, or the heat-stable enterotoxin peptide (ST). GC-C is also expressed in neurons, where it plays a role in satiation and attention deficiency/hyperactive behavior. GC-C is glycosylated in the extracellular domain, and differentially glycosylated forms that are resident in the endoplasmic reticulum (130 kDa) and the plasma membrane (145 kDa) bind the ST peptide with equal affinity. When glycosylation of human GC-C was prevented, either by pharmacological intervention or by mutation of all of the 10 predicted glycosylation sites, ST binding and surface localization was abolished. Systematic mutagenesis of each of the 10 sites of glycosylation in GC-C, either singly or in combination, identified two sites that were critical for ligand binding and two that regulated ST-mediated activation. We also show that GC-C is the first identified receptor client of the lectin chaperone vesicular integral membrane protein, VIP36. Interaction with VIP36 is dependent on glycosylation at the same sites that allow GC-C to fold and bind ligand. Because glycosylation of proteins is altered in many diseases and in a tissue-dependent manner, the activity and/or glycan-mediated interactions of GC-C may have a crucial role to play in its functions in different cell types.
Resumo:
Genomic data of several organisms have revealed the presence of a vast repertoire of multi-domain proteins. The role played by individual domains in a multi-domain protein has a profound influence on the overall function of the protein. In the present analysis an attempt has been made to better understand the tethering preferences of domain families that occur in multi-domain proteins. The analysis has been carried out on an exhaustive dataset of 2 961 898 sequences of proteins from 930 organisms, where 741 274 proteins are comprised of at least two domain families. For every domain family, the number of other domain families with which it co-occurs within a protein in this dataset has been enumerated and is referred to as the tethering number of the domain family. It was found that, in the general dataset, the AAA ATPase family and the family of Ser/Thr kinases have the highest tethering numbers of 450 and 444 respectively. Further analysis reveals significant correlation between the number of members in a family and its tethering number. Positive correlation was also observed for the extent of a sequence and functional diversity within a family and the tethering numbers of domain families. Domain families that are present ubiquitously in diverse organisms tend to have large tethering numbers, while organism/kingdom-specific families have low tethering numbers. Thus, the analysis uncovers how domain families recombine and evolve to give rise to multi-domain proteins.