952 resultados para Rna Secondary Structures
Resumo:
Dans un premier temps, nous avons modélisé la structure d’une famille d’ARN avec une grammaire de graphes afin d’identifier les séquences qui en font partie. Plusieurs autres méthodes de modélisation ont été développées, telles que des grammaires stochastiques hors-contexte, des modèles de covariance, des profils de structures secondaires et des réseaux de contraintes. Ces méthodes de modélisation se basent sur la structure secondaire classique comparativement à nos grammaires de graphes qui se basent sur les motifs cycliques de nucléotides. Pour exemplifier notre modèle, nous avons utilisé la boucle E du ribosome qui contient le motif Sarcin-Ricin qui a été largement étudié depuis sa découverte par cristallographie aux rayons X au début des années 90. Nous avons construit une grammaire de graphes pour la structure du motif Sarcin-Ricin et avons dérivé toutes les séquences qui peuvent s’y replier. La pertinence biologique de ces séquences a été confirmée par une comparaison des séquences d’un alignement de plus de 800 séquences ribosomiques bactériennes. Cette comparaison a soulevée des alignements alternatifs pour quelques unes des séquences que nous avons supportés par des prédictions de structures secondaires et tertiaires. Les motifs cycliques de nucléotides ont été observés par les membres de notre laboratoire dans l'ARN dont la structure tertiaire a été résolue expérimentalement. Une étude des séquences et des structures tertiaires de chaque cycle composant la structure du Sarcin-Ricin a révélé que l'espace des séquences dépend grandement des interactions entre tous les nucléotides à proximité dans l’espace tridimensionnel, c’est-à-dire pas uniquement entre deux paires de bases adjacentes. Le nombre de séquences générées par la grammaire de graphes est plus petit que ceux des méthodes basées sur la structure secondaire classique. Cela suggère l’importance du contexte pour la relation entre la séquence et la structure, d’où l’utilisation d’une grammaire de graphes contextuelle plus expressive que les grammaires hors-contexte. Les grammaires de graphes que nous avons développées ne tiennent compte que de la structure tertiaire et négligent les interactions de groupes chimiques spécifiques avec des éléments extra-moléculaires, comme d’autres macromolécules ou ligands. Dans un deuxième temps et pour tenir compte de ces interactions, nous avons développé un modèle qui tient compte de la position des groupes chimiques à la surface des structures tertiaires. L’hypothèse étant que les groupes chimiques à des positions conservées dans des séquences prédéterminées actives, qui sont déplacés dans des séquences inactives pour une fonction précise, ont de plus grandes chances d’être impliqués dans des interactions avec des facteurs. En poursuivant avec l’exemple de la boucle E, nous avons cherché les groupes de cette boucle qui pourraient être impliqués dans des interactions avec des facteurs d'élongation. Une fois les groupes identifiés, on peut prédire par modélisation tridimensionnelle les séquences qui positionnent correctement ces groupes dans leurs structures tertiaires. Il existe quelques modèles pour adresser ce problème, telles que des descripteurs de molécules, des matrices d’adjacences de nucléotides et ceux basé sur la thermodynamique. Cependant, tous ces modèles utilisent une représentation trop simplifiée de la structure d’ARN, ce qui limite leur applicabilité. Nous avons appliqué notre modèle sur les structures tertiaires d’un ensemble de variants d’une séquence d’une instance du Sarcin-Ricin d’un ribosome bactérien. L’équipe de Wool à l’université de Chicago a déjà étudié cette instance expérimentalement en testant la viabilité de 12 variants. Ils ont déterminé 4 variants viables et 8 létaux. Nous avons utilisé cet ensemble de 12 séquences pour l’entraînement de notre modèle et nous avons déterminé un ensemble de propriétés essentielles à leur fonction biologique. Pour chaque variant de l’ensemble d’entraînement nous avons construit des modèles de structures tertiaires. Nous avons ensuite mesuré les charges partielles des atomes exposés sur la surface et encodé cette information dans des vecteurs. Nous avons utilisé l’analyse des composantes principales pour transformer les vecteurs en un ensemble de variables non corrélées, qu’on appelle les composantes principales. En utilisant la distance Euclidienne pondérée et l’algorithme du plus proche voisin, nous avons appliqué la technique du « Leave-One-Out Cross-Validation » pour choisir les meilleurs paramètres pour prédire l’activité d’une nouvelle séquence en la faisant correspondre à ces composantes principales. Finalement, nous avons confirmé le pouvoir prédictif du modèle à l’aide d’un nouvel ensemble de 8 variants dont la viabilité à été vérifiée expérimentalement dans notre laboratoire. En conclusion, les grammaires de graphes permettent de modéliser la relation entre la séquence et la structure d’un élément structural d’ARN, comme la boucle E contenant le motif Sarcin-Ricin du ribosome. Les applications vont de la correction à l’aide à l'alignement de séquences jusqu’au design de séquences ayant une structure prédéterminée. Nous avons également développé un modèle pour tenir compte des interactions spécifiques liées à une fonction biologique donnée, soit avec des facteurs environnants. Notre modèle est basé sur la conservation de l'exposition des groupes chimiques qui sont impliqués dans ces interactions. Ce modèle nous a permis de prédire l’activité biologique d’un ensemble de variants de la boucle E du ribosome qui se lie à des facteurs d'élongation.
Resumo:
Les ARN non codants (ARNnc) sont des transcrits d'ARN qui ne sont pas traduits en protéines et qui pourtant ont des fonctions clés et variées dans la cellule telles que la régulation des gènes, la transcription et la traduction. Parmi les nombreuses catégories d'ARNnc qui ont été découvertes, on trouve des ARN bien connus tels que les ARN ribosomiques (ARNr), les ARN de transfert (ARNt), les snoARN et les microARN (miARN). Les fonctions des ARNnc sont étroitement liées à leurs structures d’où l’importance de développer des outils de prédiction de structure et des méthodes de recherche de nouveaux ARNnc. Les progrès technologiques ont mis à la disposition des chercheurs des informations abondantes sur les séquences d'ARN. Ces informations sont accessibles dans des bases de données telles que Rfam, qui fournit des alignements et des informations structurelles sur de nombreuses familles d'ARNnc. Dans ce travail, nous avons récupéré toutes les séquences des structures secondaires annotées dans Rfam, telles que les boucles en épingle à cheveux, les boucles internes, les renflements « bulge », etc. dans toutes les familles d'ARNnc. Une base de données locale, RNAstem, a été créée pour faciliter la manipulation et la compilation des données sur les motifs de structure secondaire. Nous avons analysé toutes les boucles terminales et internes ainsi que les « bulges » et nous avons calculé un score d’abondance qui nous a permis d’étudier la fréquence de ces motifs. Tout en minimisant le biais de la surreprésentation de certaines classes d’ARN telles que l’ARN ribosomal, l’analyse des scores a permis de caractériser les motifs rares pour chacune des catégories d’ARN en plus de confirmer des motifs communs comme les boucles de type GNRA ou UNCG. Nous avons identifié des motifs abondants qui n’ont pas été étudiés auparavant tels que la « tetraloop » UUUU. En analysant le contenu de ces motifs en nucléotides, nous avons remarqué que ces régions simples brins contiennent beaucoup plus de nucléotides A et U. Enfin, nous avons exploré la possibilité d’utiliser ces scores pour la conception d’un filtre qui permettrait d’accélérer la recherche de nouveaux ARN non-codants. Nous avons développé un système de scores, RNAscore, qui permet d’évaluer un ARN en se basant sur son contenu en motifs et nous avons testé son applicabilité avec différents types de contrôles.
Resumo:
If secondary structure predictions are to be incorporated into fold recognition methods, an assessment of the effect of specific types of errors in predicted secondary structures on the sensitivity of fold recognition should be carried out. Here, we present a systematic comparison of different secondary structure prediction methods by measuring frequencies of specific types of error. We carry out an evaluation of the effect of specific types of error on secondary structure element alignment (SSEA), a baseline fold recognition method. The results of this evaluation indicate that missing out whole helix or strand elements, or predicting the wrong type of element, is more detrimental than predicting the wrong lengths of elements or overpredicting helix or strand. We also suggest that SSEA scoring is an effective method for assessing accuracy of secondary structure prediction and perhaps may also provide a more appropriate assessment of the “usefulness” and quality of predicted secondary structure, if secondary structure alignments are to be used in fold recognition.
Resumo:
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (±20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed
Resumo:
Shrimp farming is one of the activities that contribute most to the growth of global aquaculture. However, this business has undergone significant economic losses due to the onset of viral diseases such as Infectious Myonecrosis (IMN). The IMN is already widespread throughout Northeastern Brazil and affects other countries such as Indonesia, Thailand and China. The main symptom of disease is myonecrosis, which consists of necrosis of striated muscles of the abdomen and cephalothorax of shrimp. The IMN is caused by infectious myonecrosis virus (IMNV), a non-enveloped virus which has protrusions along its capsid. The viral genome consists of a single molecule of double-stranded RNA and has two Open Reading Frames (ORFs). The ORF1 encodes the major capsid protein (MCP) and a potential RNA binding protein (RBP). ORF2 encodes a probable RNA-dependent RNA polymerase (RdRp) and classifies IMNV in Totiviridae family. Thus, the objective of this research was study the IMNV complete genome and encoded proteins in order to develop a system differentiate virus isolates based on polymorphisms presence. The phylogenetic relationship among some totivirus was investigated and showed a new group to IMNV within Totiviridae family. Two new genomes were sequenced, analyzed and compared to two other genomes already deposited in GenBank. The new genomes were more similar to each other than those already described. Conserved and variable regions of the genome were identified through similarity graphs and alignments using the four IMNV sequences. This analyze allowed mapping of polymorphic sites and revealed that the most variable region of the genome is in the first half of ORF1, which coincides with the regions that possibly encode the viral protrusion, while the most stable regions of the genome were found in conserved domains of proteins that interact with RNA. Moreover, secondary structures were predicted for all proteins using various softwares and protein structural models were calculated using threading and ab initio modeling approaches. From these analyses was possible to observe that the IMNV proteins have motifs and shapes similar to proteins of other totiviruses and new possible protein functions have been proposed. The genome and proteins study was essential for development of a PCR-based detection system able to discriminate the four IMNV isolates based on the presence of polymorphic sites
Resumo:
Small nuclear RNAs (snRNAs) are important factors in the functioning of eukaryotic cells that form several small complexes with proteins; these ribonucleoprotein particles (U snRNPs) have an essential role in the pre-mRNA processing, particularly in splicing, catalyzed by spliceosomes, large RNA-protein complexes composed of various snRNPs. Even though they are well defined in mammals, snRNPs are still not totally characterized in certain trypanosomatids as Trypanosoma cruzi. For this reason we subjected snRNAs (U2, U4, U5, and U6) from T. cruzi epimastigotes to molecular characterization by polymerase chain reaction (PCR) and reverse transcription-PCR. These amplified sequences were cloned, sequenced, and compared with those other of trypanosomatids. Among these snRNAs, U5 was less conserved and U6 the most conserved. Their respective secondary structures were predicted and compared with known T. brucei structures. In addition, the copy number of each snRNA in the T. cruzi genome was characterized by Southern blotting.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Multifunctional Structures (MFS) represent one of the most promising disruptive technologies in the space industry. The possibility to merge spacecraft primary and secondary structures as well as attitude control, power management and onboard computing functions is expected to allow for mass, volume and integration effort savings. Additionally, this will bring the modular construction of spacecraft to a whole new level, by making the development and integration of spacecraft modules, or building blocks, leaner, reducing lead times from commissioning to launch from the current 3-6 years down to the order of 10 months, as foreseen by the latest Operationally Responsive Space (ORS) initiatives. Several basic functionalities have been integrated and tested in specimens of various natures over the last two decades. However, a more integrated, system-level approach was yet to be developed. The activity reported in this thesis was focused on the system-level approach to multifunctional structures for spacecraft, namely in the context of nano- and micro-satellites. This thesis documents the work undertaken in the context of the MFS program promoted by the European Space Agency under the Technology Readiness Program (TRP): a feasibility study, including specimens manufacturing and testing. The work sequence covered a state of the art review, with particular attention to traditional modular architectures implemented in ALMASat-1 and ALMASat-EO satellites, and requirements definition, followed by the development of a modular multi-purpose nano-spacecraft concept, and finally by the design, integration and testing of integrated MFS specimens. The approach for the integration of several critical functionalities into nano-spacecraft modules was validated and the overall performance of the system was verified through relevant functional and environmental testing at University of Bologna and University of Southampton laboratories.
Resumo:
Cancer is a multifactorial disease characterized by a very complex etiology. Basing on its complex nature, a promising therapeutic strategy could be based by the “Multi-Target-Directed Ligand” (MTDL) approach, based on the assumption that a single molecule could hit several targets responsible for the pathology. Several agents acting on DNA are clinically used, but the severe deriving side effects limit their therapeutic application. G-quadruplex structures are DNA secondary structures located in key zones of human genome; targeting quadruplex structures could allow obtaining an anticancer therapy more free from side effects. In the last years it has been proved that epigenetic modulation can control the expression of human genes, playing a crucial role in carcinogenesis and, in particular, an abnormal expression of histone deacetylase enzymes are related to tumor onset and progression. This thesis deals with the design and synthesis of new naphthalene diimide (NDI) derivatives endowed with anticancer activity, interacting with DNA together with other targets implicated in cancer development, such as HDACs. NDI-polyamine and NDI-polyamine-hydroxamic acid conjugates have been designed with the aim to provide potential MTDLs, in order to create molecules able simultaneously to interact with different targets involved in this pathology, specifically the G-quadruplex structures and HDAC, and to exploit the polyamine transport system to get selectively into cancer cells. Macrocyclic NDIs have been designed with the aim to improve the quadruplex targeting profile of the disubstituted NDIs. These compounds proved the ability to induce a high and selective stabilization of the quadruplex structures, together with cytotoxic activities in the micromolar range. Finally, trisubstituted NDIs have been developed as G-quadruplex-binders, potentially effective against pancreatic adenocarcinoma. In conclusion, all these studies may represent a promising starting point for the development of new interesting molecules useful for the treatment of cancer, underlining the versatility of the NDI scaffold.
Resumo:
Wie alle Eukaryoten besitzen auch höhere Pflanzen ein mikrotubuläres Cytoskelett. Einige Funktionen dieses Cytoskeletts sind relativ stark konserviert, andere dagegen scheinen sehr pflanzenspezifisch zu sein. Dies betrifft insbesondere charakteristische mikrotubuläre Netzwerke, die bei der Neubildung und der Verstärkung der Zellwände wichtige Rollen übernehmen. Wie der Aufbau dieser Netzwerke kontrolliert wird, ist bisher relativ unklar. Typische Mikrotubuli organisierende Zentren (MTOC), insbesondere Centrosomen oder Spindelpolkörper, sind bei höheren Pflanzen nicht beobachtet worden. Von pilzlichen und tierischen Organismen weiß man, dass gamma-Tubulin (gTUB) mit seinen assoziierten Proteinen in den MTOC bei der Nukleation von Mikrotubuli eine Schlüsselfunktion hat. Dieses Mitglied der Tubulin-Superfamilie wird aber auch in Pflanzen gefunden, dessen genaue Funktion bisher unbekannt ist. Zu Beginn der Arbeit wurden mittels in silico Berechnungen Strukturmodelle des pflanzlichen gTUBs aus Nicotiana tabacum erarbeitet, da die Struktur, die zu einem Verständnis der pflanzlichen Wachstumsregulation beitragen könnte, bisher unbekannt ist. Auf Grundlage der bioinformatischen Daten konnte für weitere Studien eine notwendige gTUB-Deletionsmutante entwickelt werden. Für Röntgendiffraktionsstudien und gTUB-Interaktionspartneranalysen war die Verfügbarkeit verhältnismäßig großer Proteinmengen notwendig. Die Expression der gTUB-Volllängensequenz in gelöster und aktiver Form stellte einen immanent wichtigen Zwischenschritt dar. Das Escherichia coli T7/lacO-Expressionssystem lieferte, trotz vielversprechender Erfolge in der Vergangenheit, kein gelöstes rekombinantes gTUB. So wurden zwar verhältnismäßig hohe Expressionsraten erzielt, aber das rekombinante gTUB lag quantitativ als Inclusion bodies vor. Eine Variationen der Expressionsparameter sowie umfangreiche Versuche mittels verschiedenster Konstrukte sowie potentiell die Löslichkeit erhöhenden Tags gTUB in gelöster Form in E. coli zu exprimieren blieben erfolglos. Eine Denaturierung der Inclusion bodies und Rückfaltung wurde aufgrund der wohl bei der Tubulinfaltung notwendigen komplexeren Chaperone sowie thermodynamischer Überlegungen ausgeschlossen. Die höher evolvierte Chaperonausstattung war ein Hauptgrund für die Verwendung der eukaryotischen Hefe-Expressionssysteme K. lactis und des S. cerevisiae-Stammes FGY217 zur gTUB-Expression. So konnten nach der Selektion nur transgene Hefe-Zellen dokumentiert werden, die die gTUB-Expressionskassette nachweislich an der vorgesehenen Zielposition in ihrem Genom integrierten, aber keine dokumentierbare Expression zeigten. Die wahrscheinlichste Begründung hierfür ist, dass ein erhöhter intrazellulärer gTUB-Titer mit dem Zellwachstum und der Zellteilung dieser eukaryotischen Organismen interferierte und durch Rückkopplungen die rekombinante gTUB-CDS aus N. tabacum ausgeschaltet wurde. Der Versuch einer transienten gTUB-Überexpression in differenzierten Blattgeweben höherer Pflanzen war eine logische Konsequenz aus den vorherigen Ergebnissen und lieferte, wenn auch nicht die für eine Proteinkristallisation notwendigen Mengen, gelöstes gTUB. Bestrebungen einer stabilen Transfektion von A. thaliana oder BY-2-Zellkulturen mit einer gTUB-CDS lieferten keine transgenen Organismen, was starke Interferenzen der rekombinanten gTUB-CDS in den Zellen vermuten lies. Transfektionsversuche mit nur GFP tragenden Konstrukten ergaben hingegen eine hohe Anzahl an transgenen Organismen, die auch verhältnismäßig starke Expressionsraten zeigten. Die erzielten Proteinmengen bei der transienten gTUB-Überexpression in N. benthamiana Blattgeweben, in Co-Expression mit dem Posttransriptional Gene Silencing-Suppressorprotein p19, waren für einen Pull-Down sowie eine massenspektroskopische Analyse der Interaktionspartner ausreichend und ergaben Befunde. Eine abschließende Auswertung des erarbeiteten massenspektroskopischen Datensatzes wird jedoch erst dann möglich sein, wenn das Tabak-Proteom vollständig sequenziert ist. Die Erweiterung der bestehenden pflanzlichen Vergleichsdatenbanken um das bisher bekannte Tabak-Proteom vervielfachte die Anzahl der in dieser Studie identifizierten gTUB-Interaktionspartner. Interaktionen mit dem TCP1-Chaperon untermauern die Hypothese der zur Faltung pflanzlichen gTUBs notwendigen Chaperone. Beobachtete gTUB-Degradationsmuster in Verbindung mit Interaktionen des 26S-Proteasoms deuten auf eine Gegenregulationen bei erhöhtem gTUB-Titer auf Proteinebene hin. Da Blattgewebe selbst nur noch über eine sehr geringe und inhomogene Teilungsaktivität verfügen ist diese Regulation hoch spannend. Auch konnte durch Co-Expression des PTGS-Suppressorproteins p19 gezeigt werden, dass bei der gTUB-Expression eine Regulation auf RNA-Ebene erfolgt.
Resumo:
The Reoviridae virus family is a group of economically and pathologically important viruses that have either single-, double-, or triple-shelled protein layers enclosing a segmented double stranded RNA genome. Each virus particle in this family has its own viral RNA dependent RNA polymerase and the enzymatic activities necessary for the mature RNA synthesis. Based on the structure of the inner most cores of the viruses, the Reoviridae viruses can be divided into two major groups. One group of viruses has a smooth surfaced inner core, surrounded by complete outer shells of one or two protein layers. The other group has an inner core decorated with turrets on the five-fold vertices, and could either completely lack or have incomplete outer protein layers. The structural difference is one of the determinant factors for their biological differences during the infection. ^ Cytoplasmic polyhedrosis virus (CPV) is a single-shelled, turreted virus and the structurally simplest member in Reoviridae. It causes specific chronic infections in the insect gut epithelial cells. Due to its wide range of insect hosts, CPV has been engineered as a potential insecticide for use in fruit and vegetable farming. Its unique structural simplicity, unparalleled capsid stability and ease of purification make CPV an ideal model system for studying the structural basis of dsRNA virus assembly at the highest possible resolution by electron cryomicroscopy (cryoEM) and three-dimensional (3D) reconstruction. ^ In this thesis work, I determined the first 3D structure of CPV capsids using 100 kV cryoEM. At an effective resolution of 17 Å, the full capsid reveals a 600-Å diameter, T = 1 icosahedral shell decorated with A and B spikes at the 5-fold vertices. The internal space of the empty CPV is unoccupied except for 12 mushroom-shaped densities that are attributed to the transcriptional enzyme complexes. The inside of the full capsid is packed with icosahedrally-ordered viral genomic RNA. The interactions of viral RNA with the transcriptional enzyme complexes and other capsid proteins suggest a mechanism for RNA transcription and subsequent release. ^ Second, the interactions between the turret proteins (TPs) and the major capsid shell protein (CSPs) have been identified through 3D structural comparisons of the intact CPV capsids with the spikeless CPV capsids, which were generated by chemical treatments. The differential effects of these chemical treatment experiments also indicated that CPV has a significantly stronger structural integrity than other dsRNA viruses, such as the orthoreovirus subcores, which are normally enclosed within outer protein shells. ^ Finally, we have reconstructed the intact CPV to an unprecendented 8 Å resolution from several thousand of 400kV cryoEM images. The 8 Å structure reveals interactions among the 120 molecules of each of the capsid shell protein (CSP), the large protrusion protein (LPP), and 60 molecules of the turret protein (TP). A total of 1980 α-helices and 720 β-sheets have been identified in these capsid proteins. The CSP structure is largely conserved, with the majority of the secondary structures homologous to those observed in the x-ray structures of corresponding proteins of other reoviruses, such as orthoreovirus and bluetongue virus. The three domains of TP are well positioned to play multifunctional roles during viral transcription. The completely non-equivalent interactions between LPP and CSP and those between the anchoring domain of TP and CSP account for the unparalleled stability of this structurally simplest member of the Reoviridae. ^
Resumo:
The structure of a 29-nucleotide RNA containing the sarcin/ricin loop (SRL) of rat 28 S rRNA has been determined at 2.1 Å resolution. Recognition of the SRL by elongation factors and by the ribotoxins, sarcin and ricin, requires a nearly universal dodecamer sequence that folds into a G-bulged cross-strand A stack and a GAGA tetraloop. The juxtaposition of these two motifs forms a distorted hairpin structure that allows direct recognition of bases in both grooves as well as recognition of nonhelical backbone geometry and two 5′-unstacked purines. Comparisons with other RNA crystal structures establish the cross-strand A stack and the GNRA tetraloop as defined and modular RNA structural elements. The conserved region at the top is connected to the base of the domain by a region presumed to be flexible because of the sparsity of stabilizing contacts. Although the conformation of the SRL RNA previously determined by NMR spectroscopy is similar to the structure determined by x-ray crystallography, significant differences are observed in the “flexible” region and to a lesser extent in the G-bulged cross-strand A stack.
Resumo:
The objectives of this and the following paper are to identify commonalities and disparities of the extended environment of mononuclear metal sites centering on Cu, Fe, Mn, and Zn. The extended environment of a metal site within a protein embodies at least three layers: the metal core, the ligand group, and the second shell, which is defined here to consist of all residues distant less than 3.5 Å from some ligand of the metal core. The ligands and second-shell residues can be characterized in terms of polarity, hydrophobicity, secondary structures, solvent accessibility, hydrogen-bonding interactions, and membership in statistically significant residue clusters of different kinds. Findings include the following: (i) Both histidine ligands of type I copper ions exclusively attach the Nδ1 nitrogen of the histidine imidazole ring to the metal, whereas histidine ligands for all mononuclear iron ions and nearly all type II copper ions are ligated via the Nɛ2 nitrogen. By contrast, multinuclear copper centers are coordinated predominantly by histidine Nɛ2, whereas diiron histidine contacts are predominantly Nδ1. Explanations in terms of steric differences between Nδ1 and Nɛ2 are considered. (ii) Except for blue copper (type I), the second-shell composition favors polar residues. (iii) For blue copper, the second shell generally contains multiple methionine residues, which are elements of a statistically significant histidine–cysteine–methionine cluster. Almost half of the second shell of blue copper consists of solvent-accessible residues, putatively facilitating electron transfer. (iv) Mononuclear copper atoms are never found with acidic carboxylate ligands, whereas single Mn2+ ion ligands are predominantly acidic and the second shell tends to be mostly buried. (v) The extended environment of mononuclear Fe sites often is associated with histidine–tyrosine or histidine–acidic clusters.
Resumo:
Signal recognition particle (SRP) is a stable cytoplasmic ribonucleoprotein complex that serves to translocate secretory proteins across membranes during translation. The SRP Database (SRPDB) provides compilations of SRP components, ordered alphabetically and phylogenetically. Alignments emphasize phylogenetically-supported base pairs in SRP RNA and conserved residues in the proteins. Data are provided in various formats including a column arrangement for improved access and simplified computational usability. Included are motifs for identification of new sequences, SRP RNA secondary structure diagrams, 3-D models and links to high-resolution structures. This release includes 11 new SRP RNA sequences (total of 129), two protein SRP9 sequences (total of seven), two protein SRP14 sequences (total of 10), two protein SRP19 sequences (total of 16), 10 new SRP54 (ffh) sequences (total of 66), two protein SRP68 sequences (total of seven) and two protein SRP72 sequences (total of nine). Seven sequences of the SRP receptor α-subunit and its FtsY homolog (total of 51) are new. Also considered are β-subunit of SRP receptor, Flhf, Hbsu, CaM kinase II and cpSRP43. Access to SRPDB is at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html and the European mirror http://www.medkem.gu.se/dbs/SRPDB/SRPDB.html
Resumo:
Rational engineering of enzymes involves introducing key amino acids guided by a knowledge of protein structure to effect a desirable change in function. To date, all successful attempts to change specificity have been limited to substituting individual amino acids within a protein fold. However, the infant field of protein engineering will only reach maturity when changes in function can be generated by rationally engineering secondary structures. Guided by x-ray crystal structures and molecular modeling, site-directed mutagenesis has been used to systematically invert the coenzyme specificity of Thermus thermophilus isopropylmalate dehydrogenase from a 100-fold preference for NAD to a 1000-fold preference for NADP. The engineered mutant, which is twice as active as wild type, contains four amino acid substitutions and an alpha-helix and loop that replaces the original beta-turn. These results demonstrate that rational engineering of secondary structures to produce enzymes with novel properties is feasible.