968 resultados para Protein Sequence Analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimethyl sulfide dehydrogenase from the purple phototrophic bacterium Rhodovulum sulfidophilum catalyzes the oxidation of dimethyl sulfide to dimethyl sulfoxide. Recent DNA sequence analysis of the ddh operon, encoding dimethyl sulfide dehydrogenase (ddhABC), and biochemical analysis (1) have revealed that it is a member of the DMSO reductase family of molybdenum enzymes and is closely related to respiratory nitrate reductase (NarGHI). Variable temperature X-band EPR spectra (120122 K) of purified heterotrimeric dimethyl sulfide dehydrogenase showed resonances arising from multiple redox centers, Mo(V), [3Fe-4S](+), [4Fe-4S](+), and a b-type heme. A pH-dependent EPR study of the Mo(V) center in (H2O)-H-1 and (H2O)-H-2 revealed the presence of three Mo(V) species in equilibrium, Mo(V)-OH2, Mo(v)-anion, and Mo(V)-OH. Above pH 8.2 the dominant species was Mo(V)-OH. The maximum specific activity occurred at pH 9.27. Comparison of the rhombicity and anisotropy parameters for the Mo(V) species in DMS dehydrogenase with other molybdenum enzymes of the DMSO reductase family showed that it was most similar to the low-pH nitrite spectrum of Escherichia coli nitrate reductase (NarGHI), consistent with previous sequence analysis of DdhA and NarG. A sequence comparison of DdhB and NarH has predicted the presence of four [Fe-S] clusters in DdhB. A [3Fe-4S](+) cluster was identified in dimethyl sulfide dehydrogenase whose properties resembled those of center 2 of NarH. A [4Fe-4S](+) cluster was also identified with unusual spin Hamiltonian parameters, suggesting that one of the iron atoms may have a fifth non-sulfur ligand. The g matrix for this cluster is very similar to that found for the minor conformation of center 1 in NarH [Guigliarelli, B., Asso, M., More, C., Augher, V., Blasco, F., Pommier, J., Giodano, G., and Bertrand, P. (1992) Eur. J. Biochem. 307,63-68]. Analysis of a ddhC mutant showed that this gene encodes the b-type cytochrome in dimethyl sulfide dehydrogenase. Magnetic circular dichroism studies revealed that the axial ligands to the iron in this cytochrome are a histidine and methionine, consistent with predictions from protein sequence analysis. Redox potentiometry showed that the b-type cytochrome has a high midpoint redox potential (E-o = +315 mV, pH 8).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).Results: We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes.Conclusion: This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les polymorphonucléaires neutrophiles (PMNs) représentent une arme primordiale dans la défense contre divers agents pathogènes; notamment les bactéries, les champignons, les cellules tumorales de même que les cellules infectées par des virus. Cependant, certaines pathologies reliées à l’inflammation chronique soulèvent l’implication des neutrophiles notamment dans l’arthrite rhumatoïde. La réponse inflammatoire persistante générée par l’activation et la survie des neutrophiles engendre une destruction des tissus environnants suite à la sécrétion non contrôlée de leurs produits cytotoxiques. Même si l’activation chronique des neutrophiles est néfaste dans plusieurs pathologies, elle pourrait s’avérer un bon outil en cas de neutropénie, comme c’est souvent le cas les patients ayant reçu des traitements de chimiothérapie. Ce projet fait suite aux travaux doctoraux de Lagraoui (1999). Il vise à identifier le(s) facteur(s) du liquide synovial qui augmente la survie des neutrophiles ainsi que le mécanisme d’action impliqué dans ce processus. Similairement au facteur semi-pur isolés par Lagraoui (1999), le milieu conditionné concentré (MCC) augmente la survie des PMNs de 75% (39% ± 9.5 vs 68% ± 2.5, p<0.01). Suivant le séquençage du MCC parallèlement au facteur semi-pur actif, deux protéines ont été identifiées à la fois dans le MCC et dans le facteur semi-pur soient : l’albumine et la fétuine. Notre projet vise donc à comparer les effets de l’albumine et de la fétuine à ceux du GM-CSF dans l’optique d’une thérapie alternative au GM-CSF en tant qu’adjuvant de chimiothérapie. La présence d’albumine, de fétuine ou de GM-CSF chez les PMNs incubés 24 heures avec la Mutamycin® induit une diminution du nombre de cellules en apoptose par rapport à la Mutamycin® (Ctrl : 43% ± 10; A : 74% ± 3; F : (82% ± 6 et GM : 74% ± 7; p<0.01). L’effet de l’albumine dépend de la voie de la kinase PI3 mais également celle la kinase ERK, alors que celle de la fétuine dépend de la kinase PI3. Similairement l’EPO, l’albumine et la fétuine supporte la différentiation des HSCs en précurseurs érythrocytaires de type BFU-E. Dans un modèle murin de chiomioprotection, l’albumine augmente la concentration cellulaire rapport au groupe contrôle des leukocytes de la rate (66 ±8 x106c/ml vs 81 ±16 x106c/ml) et du sang (3.6 ±0.4 x106c/ml vs 5.7 ±2.3 x106c/ml). Donc, in vitro, l’albumine et la fétuine sont comparables au GM-CSF au niveau fonctionalité et mécansimes d’action. Cependant, vu leur manque de spécificité, l’application thérapeutique en tant qu’adjuvant de chiomiothérapie de l’albumine et la fétuine est peu prometteuse. Par contre, les maladies dégénératives et les évènements ischémiques pourraient s’avérer de bonnes cibles thérapeutiques, principalement pour l’albumine.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The resurgence of the enteric pathogen Vibrio cholerae, the causative organism of epidemic cholera, remains a major health problem in many developing countries like India. The southern Indian state of Kerala is endemic to cholera. The outbreaks of cholera follow a seasonal pattern in regions of endemicity. Marine aquaculture settings and mangrove environments of Kerala serve as reservoirs for V. cholerae. The non-O1/non-O139 environmental isolates of V. cholerae with incomplete ‘virulence casette’ are to be dealt with caution as they constitute a major reservoir of diverse virulence genes in the marine environment and play a crucial role in pathogenicity and horizontal gene transfer. The genes coding cholera toxin are borne on, and can be infectiously transmitted by CTXΦ, a filamentous lysogenic vibriophages. Temperate phages can provide crucial virulence and fitness factors affecting cell metabolism, bacterial adhesion, colonization, immunity, antibiotic resistance and serum resistance. The present study was an attempt to screen the marine environments like aquafarms and mangroves of coastal areas of Alappuzha and Cochin, Kerala for the presence of lysogenic V. cholerae, to study their pathogenicity and also gene transfer potential. Phenotypic and molecular methods were used for identification of isolates as V. cholerae. The thirty one isolates which were Gram negative, oxidase positive, fermentative, with or without gas production on MOF media and which showed yellow coloured colonies on TCBS (Thiosulfate Citrate Bile salt Sucrose) agar were segregated as vibrios. Twenty two environmental V. cholerae strains of both O1 and non- O1/non-O139 serogroups on induction with mitomycin C showed the presence of lysogenic phages. They produced characteristic turbid plaques in double agar overlay assay using the indicator strain V. cholerae El Tor MAK 757. PCR based molecular typing with primers targeting specific conserved sequences in the bacterial genome, demonstrated genetic diversity among these lysogen containing non-O1 V. cholerae . Polymerase chain reaction was also employed as a rapid screening method to verify the presence of 9 virulence genes namely, ctxA, ctxB, ace, hlyA, toxR, zot,tcpA, ninT and nanH, using gene specific primers. The presence of tcpA gene in ALPVC3 was alarming, as it indicates the possibility of an epidemic by accepting the cholera. Differential induction studies used ΦALPVC3, ΦALPVC11, ΦALPVC12 and ΦEKM14, underlining the possibility of prophage induction in natural ecosystems, due to abiotic factors like antibiotics, pollutants, temperature and UV. The efficiency of induction of prophages varied considerably in response to the different induction agents. The growth curve of lysogenic V. cholerae used in the study drastically varied in the presence of strong prophage inducers like antibiotics and UV. Bacterial cell lysis was directly proportional to increase in phage number due to induction. Morphological characterization of vibriophages by Transmission Electron Microscopy revealed hexagonal heads for all the four phages. Vibriophage ΦALPVC3 exhibited isometric and contractile tails characteristic of family Myoviridae, while phages ΦALPVC11 and ΦALPVC12 demonstrated the typical hexagonal head and non-contractile tail of family Siphoviridae. ΦEKM14, the podophage was distinguished by short non-contractile tail and icosahedral head. This work demonstrated that environmental parameters can influence the viability and cell adsorption rates of V. cholerae phages. Adsorption studies showed 100% adsorption of ΦALPVC3 ΦALPVC11, ΦALPVC12 and ΦEKM14 after 25, 30, 40 and 35 minutes respectively. Exposure to high temperatures ranging from 50ºC to 100ºC drastically reduced phage viability. The optimum concentration of NaCl required for survival of vibriophages except ΦEKM14 was 0.5 M and that for ΦEKM14 was 1M NaCl. Survival of phage particles was maximum at pH 7-8. V. cholerae is assumed to have existed long before their human host and so the pathogenic clones may have evolved from aquatic forms which later colonized the human intestine by progressive acquisition of genes. This is supported by the fact that the vast majority of V. cholerae strains are still part of the natural aquatic environment. CTXΦ has played a critical role in the evolution of the pathogenicity of V. cholerae as it can transmit the ctxAB gene. The unusual transformation of V. cholerae strains associated with epidemics and the emergence of V. cholera O139 demonstrates the evolutionary success of the organism in attaining greater fitness. Genetic changes in pathogenic V. cholerae constitute a natural process for developing immunity within an endemically infected population. The alternative hosts and lysogenic environmental V. cholerae strains may potentially act as cofactors in promoting cholera phage ‘‘blooms’’ within aquatic environments, thereby influencing transmission of phage sensitive, pathogenic V. cholerae strains by aquatic vehicles. Differential induction of the phages is a clear indication of the impact of environmental pollution and global changes on phage induction. The development of molecular biology techniques offered an accessible gateway for investigating the molecular events leading to genetic diversity in the marine environment. Using nucleic acids as targets, the methods of fingerprinting like ERIC PCR and BOX PCR, revealed that the marine environment harbours potentially pathogenic group of bacteria with genetic diversity. The distribution of virulence associated genes in the environmental isolates of V. cholerae provides tangible material for further investigation. Nucleotide and protein sequence analysis alongwith protein structure prediction aids in better understanding of the variation inalleles of same gene in different ecological niche and its impact on the protein structure for attaining greater fitness of pathogens. The evidences of the co-evolution of virulence genes in toxigenic V. cholerae O1 from different lineages of environmental non-O1 strains is alarming. Transduction studies would indicate that the phenomenon of acquisition of these virulence genes by lateral gene transfer, although rare, is not quite uncommon amongst non-O1/non-O139 V. cholerae and it has a key role in diversification. All these considerations justify the need for an integrated approach towards the development of an effective surveillance system to monitor evolution of V. cholerae strains with epidemic potential. Results presented in this study, if considered together with the mechanism proposed as above, would strongly suggest that the bacteriophage also intervenes as a variable in shaping the cholera bacterium, which cannot be ignored and hinting at imminent future epidemics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Envenomation by arachnids of the genus Loxosceles leads to local dermonecrosis and serious systemic toxicity mainly induced by sphingomyelinases D (SMase D). These enzymes catalyze the hydrolysis of sphingomyelin resulting in the formation of ceramide-phosphate and choline as well as the cleavage of lysophosphatidyl choline generating the lipid mediator lysophosphatidic acid. We have, previously, cloned and expressed two functional SMase D isoforms, named P1 and P2, from Loxosceles intertnedia venom and comparative protein sequence analysis revealed that they are highly homologous to SMase I from Loxosceles laeta which folds to form an (alpha/beta)(8) barrel. In order to further characterize these proteins, pH dependence kinetic experiments and chemical modification of the two active SMases D isoforms were performed. We show here that the amino acids involved in catalysis and in the metal ion binding sites are strictly conserved in the SMase D isoforms from L. intermedia. However, the kinetic studies indicate that SMase P1 hydrolyzes sphingomyelin less efficiently than P2, which can be attributed to a substitution at position 203 (Pro-Leu) and local amino acid substitutions in the hydrophobic channel that could probably play a role in the substrate recognition and binding. (c) 2005 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to identify genes expressed in the pistil that may have a role in the reproduction process, we have established an expressed sequence tags project to randomly sequence clones from a Nicotiana tabacum stigma/style cDNA library. A cDNA clone (MTL-8) showing high sequence similarity to genes encoding glycine-rich RNA-binding proteins was chosen for further characterization. Based on the extensive identity of MTL-8 to the RGP-1a sequence of N. sylvestris, a primer was defined to extend the 5′ sequence of MTL-8 by RT-PCR from stigma/style RNAs. The amplification product was sequenced and it was confirmed that MTL-8 corresponds to an mRNA encoding a glycine-rich RNA-binding protein. Two transcripts of different sizes and expression patterns were identified when the MTL-8 cDNA insert was used as a probe in RNA blots. The largest is 1,100 nucleotides (nt) long and markedly predominant in ovaries. The smaller transcript, with 600 nt, is ubiquitous to the vegetative and reproductive organs analyzed (roots, stems, leaves, sepals, petals, stamens, stigmas/styles and ovaries). Plants submitted to stress (wounding, virus infection and ethylene treatment) presented an increased level of the 600-nt transcript in leaves, especially after tobacco necrosis virus infection. In contrast, the level of the 1,100-nt transcript seems to be unaffected by the stress conditions tested. Results of Southern blot experiments have suggested that MTL-8 is present in one or two copies in the tobacco genome. Our results suggest that the shorter transcript is related to stress while the larger one is a flower predominant and nonstress-inducible messenger.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The presenilins are the catalytic component of the gamma-secretase protease complex, involved in the regulated intramembrane proteolysis of numerous type-1 transmembrane proteins, including Amyloid precursor protein (APP) and Notch. In addition to their role in the γ-secretase complex the presenilins are involved in a number of γ-secretase independent functions such as calcium homeostasis, apoptosis, inflammation and protein trafficking. Presenilin function is known to be regulated through posttranslational modifications like endoproteolysis, phosphorylation and ubiquitination. Using a bioinformatics and protein sequence analysis approach this lab has identified a putative ubiquitin binding CUE domain in the presenilins. The aim of this project was to characterise the function of the presenilin CUE domains. Firstly, the presenilins are shown to contain a functional ubiquitin-binding CUE domain that preferentially binds to K63-linked polyubiquitin chains. The PS1 CUE domain is shown to be dispensable for PS1 endoproteolysis and γ-secretase mediated cleavage of APP, Notch and IL-1R1. This suggests the PS1 CUE domain is involved in a γ-secretase independent PS1 function. Our hypothesis is that the PS1 CUE domain is involved in regulating PS1’s intermolecular protein-protein interactions or intramolecular PS1:PS1 interactions. Here the PS1 CUE domain is shown to be dispensable for the interaction of PS1 and the K63-linked polyubiquitinated PS1 interacting proteins P75NTR, IL-1R1, TRAF6, TRAF2 and RIP1. To further investigate PS1 CUE domain function a mass spectrometry proteomics based approach is used to identify PS1 CUE domain interacting proteins. This proteomics approach demonstrated that the PS1 CUE domain is not required for PS1 dimerization. Instead a number of proteins thatinteract with the PS1 CUE domain are identified as well as proteins whose interaction with PS1 is downregulated by the presence of the PS1 CUE domain. Bioinformatic analysis of these proteins suggests possible roles for the PS1 CUE domain in regulating cell signalling, ubiquitination or cellular trafficking.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a consequence of selective pressure exerted by the immune response during hepatitis C virus (HCV) infection, a high rate of nucleotide mutations in the viral genome is observed which leads to the emergence of viral escape mutants. The aim of this study was to evaluate the evolution of the amino acid (aa) sequence of the HCV nonstructural protein 3 (NS3) in viral isolates after liver transplantation. Six patients with HCV-induced liver disease undergoing liver transplantation (LT) were followed up for sequence analysis. Hepatitis C recurrence was observed in all patients after LT. The rate of synonymous (dS) nucleotide substitutions was much higher than that of nonsynonymous (dN) ones in the NS3 encoding region. The high values of the dS/dN ratios suggest no sustained adaptive evolution selection pressure and, therefore, absence of specific NS3 viral populations. Clinical genotype assignments were supported by phylogenetic analysis. Serial samples from each patient showed lower mean nucleotide genetic distance when compared with samples of the same HCV genotype and subtype. The NS3 samples studied had an N-terminal aa sequence with several differences as compared with reference ones, mainly in genotype 1b-infected patients. After LT, as compared with the sequences before, a few reverted aa substitutions and several established aa substitutions were observed at the N-terminal of NS3. Sites described to be involved in important functions of NS3, notably those of the catalytic triad and zinc binding, remained unaltered in terms of aa sequence. Rare or frequent aa substitutions occurred indiscriminately in different positions. Several cytotoxic T lymphocyte epitopes described for HCV were present in our 1b samples. Nevertheless, the deduced secondary structure of the NS3 protease showed a few alterations in samples from genotype 3a patients, but none were seen in 1b cases. Our data, obtained from patients under important selective pressure during LT, show that the NS3 protease remains well conserved, mainly in HCV 3a patients. It reinforces its potential use as an antigenic candidate for further studies aiming at the development of a protective immune response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Epstein-Barr virus (EBV)-encoded oncogene latent membrane protein (LMP) 1, which is consistently expressed in multiple EBV-associated malignancies, has been proposed as a potential target antigen for any future vaccine designed to control these malignancies. However, the high degree of genetic variation in the LMP1 sequence has been considered a major impediment for its use as a potential immunotherapeutic target for the treatment of EBV-associated malignancies. In the present study, we have employed a highly efficient strategy, based on ex vivo functional assays, to conduct an extensive sequence-wide analysis of LMP1-specific T-cell responses in a large panel of healthy virus carriers of diverse ethnic origin and nasopharyngeal carcinoma patients. By comparing the frequencies of T cells specific for overlapping peptides spanning LMP1, we mapped a number of novel HLA class I- and class II-restricted LMP1 T-cell epitopes, including an epitope with dual HLA class I restriction. More importantly, extensive sequence analysis of LMP1 revealed that the majority of the T-cell epitopes were highly conserved in EBV isolates from Caucasian, Papua New Guinean, African, and Southeast Asian populations, while unique geographically constrained genetic variation was observed within one HLA A2 supertype-restricted epitope. These findings indicate that conserved LMP1 epitopes should be considered in designing epitope-based immunotherapeutic strategies against EBV-associated malignancies in different ethnic populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Xylella fastidiosa is a fastidious, xylem-limited bacterium that causes a range of economically important plant diseases. Here we report the complete genome sequence of X. fastidiosa clone 9a5c, which causes citrus variegated chlorosis - a serious disease of orange trees. The genome comprises a 52.7% GC-rich 2,679,305-base-pair (bp) circular chromosome and 'two plasmids of 51,158 bp and 1,285 bp. We can assign putative functions to47% of the 2,904 predicted coding regions. Efficient metabolic functions are predicted, with sugars as the principal energy and carbon source, supporting existence in the nutrient-poor xylem sap. The mechanisms associated with pathogenicity and virulence involve toxins, antibiotics and ion sequestration systems, as well as bacterium-bacterium and bacterium-host interactions mediated by a range of proteins. Orthologues of some of these proteins have only been identified in animal and human pathogens; their presence in X. fastidiosa indicates that the molecular basis for bacterial pathogenicity is both conserved and independent of host. At least 83 genes are bacteriophage-derived and include virulence-associated genes from other bacteria, providing direct evidence of phage-mediated horizontal gene transfer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.