917 resultados para Peptide secondary structure
Resumo:
O Brasil possui uma posição privilegiada quando se refere à produção de etanol. Por questões históricas e geográficas o país é responsável por mais de 30 % da produção mundial de etanol, com uma produção nacional de mais de 28 bilhões de litros em 2014. Para maximizar o rendimento desse processo, está em desenvolvimento a tecnologia associada ao etanol de segunda geração ou etanol lignocelulósico. Os principais desafios desta tecnologia são: melhorar a eficiência de conversão do substrato em produto e a produção em grande escala utilizando substratos de baixo custo. Com o objetivo de melhorar a eficiência do processo de conversão foram estudadas proteínas auxiliares (expansinas) que, em conjunto com celulases, melhoram a despolimerização de biomassa lignocelulósica em açúcares fermentescíveis. Além disso, realizou-se também a caracterização de enzimas ativas de carboidratos (CAZymes) de origem termofílica do organismo Thermogemmatispora sp. T81, devido a capacidade que estas proteínas apresentam de manter a atividade e conformação estrutural em altas temperaturas por um prolongado período de tempo. A partir de análises utilizando bioinformática, os genes que codificam para expansinas de Xanthomonas campestris, Bacillus licheniformis e Trichoderma reesei foram clonados e expressos em E. coli, e seus produtos gênicos (as expansinas) tiveram seus índices de sinergismo (devido atuação conjunta com coquetéis comerciais) e atividade catalítica determinados. Adicionalmente, dispondo de alinhamentos estruturais, foi proposto um mecanismo hidrolítico para elas. Em relação à bactéria Thermogemmatispora sp. T81, foram realizadas análises genômicas e proteômicas, a fim de selecionar enzimas superexpressas em meio celulósico. Seus genes foram clonados heterologamente em E. coli e o produto de expressão caracterizado bioquimicamente (cromatografia, ensaios de atividade e perfil de hidrólise) e estruturalmente (SAXS e dicroísmo circular). Os índices de sinergismo determinados foram de 2,47; 1,96 e 2,44 para as expansinas de Xanthomonas campestris, Bacillus licheniformis e Trichoderma reesei, respectivamente. A partir dos alinhamentos estruturais foi proposto a díade Asp/Glu como sitio catalítico em expansinas. As análises de proteômica possibilitaram a seleção de quatro alvos de clonagem, por apresentarem alto índice de expressão quando a bactéria foi cultivada em meio celulósico. Estas proteínas foram caracterizadas quanto a atividade e apresentaram um perfil comum: temperatura ótima de ação (de 70 a 75 °C), pH ótimo de 5, e hidrolisam preferencialmente substratos hemicelulósicos (xilano). A porcentagem de estruturais secundárias das proteínas em estudo foram confirmadas com predições teóricas ao se utilizar a técnica de dicroísmo circular. Desta maneira, os objetivos iniciais propostos neste projeto foram concluídos com a determinação do grau de sinergismo das proteínas expansinas em estudo e a proposição de um mecanismo de hidrólise para as mesmas, considerando que tais proteínas por mais de 20 anos tiveram sua atividade definida exclusivamente como acessória. Além disso, este estudo contribui com a identificação e seleção de genes para CAZymes termofilícas com aplicação biotecnológica devido às propriedades termoestáveis apresentadas.
Resumo:
The albA gene from Klebsiella oxytoca encodes a protein that binds albicidin phytotoxins and antibiotics with high affinity. Previously, it has been shown that shifting pH from 6 to 4 reduces binding activity of AlbA by about 30%, indicating that histidine residues might be involved in substrate binding. In this study, molecular analysis of the albA coding region revealed sequence discrepancies with the albA sequence reported previously, which were probably due to sequencing errors. The albA gene was subsequently cloned from K oxytoca ATCC 13182(T) to establish the revised sequence. Biochemical and molecular approaches were used to determine the functional role of four histidine residues (His(78), HiS(125), HiS(141) and His(189)) in the corrected sequence for AlbA. Treatment of AlbA with diethyl pyrocarbonate (DEPC), a histidine-specific alkylating reagent, reduced binding activity by about 95%. DEPC treatment increased absorbance at 240-244 nm by an amount indicating conversion to N-carbethoxyhistidine of a single histidine residue per AlbA molecule. Pretreatment with albicidin protected AlbA against modification by DEPC, with a 1 : 1 molar ratio of albicidin to the protected histidine residues. Based on protein secondary structure and amino acid surface probability indices, it is predicted that HiS125 might be the residue required for albicidin binding. Mutation of HiS125 to either alanine or leucine resulted in about 32% loss of binding activity, and deletion of HiS125 totally abolished binding activity. Mutation of HiS125 to arginine and tyrosine had no effect. These results indicate that HiS125 plays a key role either in an electrostatic interaction between AlbA and albicidin or in the conformational dynamics of the albicidin-binding site.
Resumo:
Limited but significant sequence similarity has been observed between an uncharacterized human protein, SIN1, and the S. pombe SIN1, Dictyostelium RIP3 and S. cerevisiae AVO1 proteins. The human Sin1 gene has been automatically predicted (MAPKAP1; GenBank accession number NM_024117); however, this sequence appears to be incomplete. In this study, we have cloned and characterized the full-length human Sin1 mRNA and identified a highly conserved domain that defines the family of SIN1 orthologues, members of which are widely distributed in the fungal and metazoan kingdoms. We demonstrate that Sin1 transcripts can use alternative polyadenylation signals and describe a number of Sin1 splice variants that potentially encode functionally different isoforms. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
DNA Microarray is a powerful tool to measure the level of a mixed population of nucleic acids at one time, which has great impact in many aspects of life sciences research. In order to distinguish nucleic acids with very similar composition by hybridization, it is necessary to design microarray probes with high specificities and sensitivities. Highly specific probes correspond to probes having unique DNA sequences; whereas highly sensitive probes correspond to those with melting temperature within a desired range and having no secondary structure. The selection of these probes from a set of functional DNA sequences (exons) constitutes a computationally expensive discrete non-linear search problem. We delegate the search task to a simple yet effective Evolution Strategy algorithm. The computational efficiency is also greatly improved by making use of an available bioinformatics tool.
Resumo:
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.
Resumo:
Cystic fibrosis is caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which encodes a chloride channel present in many cells. In cardiomyocytes, we report that multiple exon 1 usage and alternative splicing produces four CFTR transcripts, with different 5'-untranslated regions, CFTRTRAD-139, CFTR-1C/-1A, CFTR-1C, and CFTR-1B. CFTR transcripts containing the novel upstream exons (exons -1C, -1B, and -1A) represent more than 90% of cardiac expressed CFTR mRNA. Regulation of cardiac CFTR expression, in response to developmental and pathological stimuli, is exclusively due to the modulation of CFTR-1C and CFTR-1C/-1A expression. Upstream open reading frames have been identified in the 5'-untranslated regions of all CFTR transcripts that, in conjunction with adjacent stem-loop structures, modulate the efficiency of translation initiation at the AUG codon of the main CFTR coding region in CFTRTRAD-139 and CFTR-1C/-1A transcripts. Exon(-1A), only present in CFTR-1C/-1A transcripts, encodes an AUG codon that is in-frame with the main CFTR open reading frame, the efficient translation of which produces a novel CFTR protein isoform with a curtailed amino terminus. As the expression of this CFTR transcript parallels the spatial and temporal distribution of the cAMP-activated whole-cell current density in normal and diseased hearts, we suggest that CFTR-1C/-1A provides the molecular basis for the cardiac cAMP-activated chloride channel. Our findings provide further insight into the complex nature of in vivo CFTR expression, to which multiple mRNA transcripts, protein isoforms, and post-transcriptional regulatory mechanisms are now added.
Resumo:
Eukaryotic gene expression, reflected in the amount of steady-state mRNA, is regulated at the post-transcriptional level. The 5'-untranslated regions (5'-UTRs) of some transcripts contain cis-acting elements, including upstream open reading frames (uORFs), that have been identified as being fundamental in modulating translation efficiency and mRNA stability. Previously, we demonstrated that uORFs present in the 5'-UTR of cystic fibrosis transmembrane conductance regular (CFTR) transcripts expressed in the heart were able to modulate translation efficiency of the main CFTR ORF. Here, we show that the same 5'-UTR elements are associated with the differential stability of the 5'-UTR compared to the main coding region of CFTR transcripts. Furthermore, these post-transcriptional mechanisms are important factors governing regulated CFTR expression in the heart, in response to developmental and pathophysiological stimuli. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
The PotE protein is a putrescine-ornithine antiporter found in many gram-negative bacteria. It is a member of the APA family of transporters and has 12 predicted alpha-helical transmembrane spanning segments (TMS). While the substrate binding site has previously been mapped to a region near the surface of the cytoplasmic lipid layer, no structural feature within the periplasmic domains of PotE have been shown to be important for function. We examined the role of the only large outer loop, situated between transmembrane spanning segment 7 and 8, in putrescine uptake. Deletion of the highly conserved amino acids in the region closest to transmembrane spanning segment 7 produced a protein with little activity. Glycine-scanning mutagenesis of this region showed that Val(249) and Leu(254) were required for optimal transporter function. The V249G mutant transported putrescine at a lower maximal rate compared to wild-type (WT) but with the same substrate binding affinity. In contrast, the L254G mutant had a higher substrate affinity. A series of Val(249) mutants indicated that the hydrophobicity of this residue, which is located at or near the membrane surface, is important for PotE function. Secondary structure predictions of the large outer loop indicated the presence of a hydrophobic alpha-helix in the centre with a hydrophobic region at each end suggesting that the loop was not entirely exposed to the aqueous periplasmic space. The study shows that loop 7-8 is important for PotE function, possibly by forming a re-entrant loop in the channel of the transporter. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
To better understand the evolution of mitochondrial (mt) genomes in the Acari (mites and ticks), we sequenced the mt genome of the chigger mite, Leptotrombidium pallidum (Arthropoda: Acari: Acariformes). This genome is highly rearranged relative to that of the hypothetical ancestor of the arthropods and the other species of Acari studied. The mt genome of L. pallidum has two genes for large subunit rRNA, a pseudogene for small subunit rRNA, and four nearly identical large noncoding regions. Nineteen of the 22 tRNAs encoded by this genome apparently lack either a T-arm or a D-arm. Further, the mt genome of L. pallidum has two distantly separated sections with identical sequences but opposite orientations of transcription. This arrangement cannot be accounted for by homologous recombination or by previously known mechanisms of mt gene rearrangement. The most plausible explanation for the origin of this arrangement is illegitimate inter-mtDNA recombination, which has not been reported previously in animals. In light of the evidence from previous experiments on recombination in nuclear and mt genomes of animals, we propose a model of illegitimate inter-mtDNA recombination to account for the novel gene content and gene arrangement in the mt genome of L. pallidum.
Resumo:
In humans, a polymorphic gene encodes the drug-metabolizing enzyme NATI (arylamine N-acetyltransferase Type 1), which is widely expressed throughout the body. While the protein-coding region of NATI is contained within a single exon, examination of the human EST (expressed sequence tag) database at the NCBI revealed the presence of nine separate exons, eight of which were located in the 5'non-coding region of NATI. Differential splicing produced at least eight unique mRNA isoforms that could be grouped according to the location of the first exon, which suggested that NATI expression occurs from three alternative promoters. Using RT (reverse transcriptase)-PCR, we identified one major transcript in various epithelial cells derived from different tissues. In contrast, multiple transcripts were observed in blood-derived cell lines (CEM, THP-1 and Jurkat), with a novel variant, not identified in the EST database, found in CEM cells only. The major splice variant increased gene expression 9-11-fold in a luciferase reporter assay, while the other isoforrns were similar or slightly greater than the control. We examined the upstream region of the most active splice variant in a promoter-reporter assay, and isolated a 257 bp sequence that produced maximal promoter activity. This sequence lacked a TATA box, but contained a consensus Sp1 site and a CAAT box, as well as several other putative transcription-factor-binding sites. Cell-specific expression of the different NATI transcripts may contribute to the variation in NATI activity in vivo.
Resumo:
Motivation: Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. Results: We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
Recently, we identified a large number of ultraconserved (uc) sequences in noncoding regions of human, mouse, and rat genomes that appear to be essential for vertebrate and amniote ontogeny. Here, we used similar methods to identify ultraconserved genomic regions between the insect species Drosophila melanogaster and Drosophila pseudoobscura, as well as the more distantly related Anopheles gambiae. As with vertebrates, ultraconserved sequences in insects appear to Occur primarily in intergenic and intronic sequences, and at intron-exon junctions. The sequences are significantly associated with genes encoding developmental regulators and transcription factors, but are less frequent and are smaller in size than in vertebrates. The longest identical, nongapped orthologous match between the three genomes was found within the homothorax (hth) gene. This sequence spans an internal exon-intron junction, with the majority located within the intron, and is predicted to form a highly stable stem-loop RNA structure. Real-time quantitative PCR analysis of different hth splice isoforms and Northern blotting showed that the conserved element is associated with a high incidence of intron retention in hth pre-mRNA, suggesting that the conserved intronic element is critically important in the post-transcriptional regulation of hth expression in Diptera.
Resumo:
Insoluble expression of heterologous proteins in Escherichia coli is a major bottleneck of many structural genomics and high-throughput protein biochemistry projects. Many of these proteins may be amenable to refolding, but their identification is hampered by a lack of high-throughput methods. We have developed a matrix-assisted refolding approach in which correctly folded proteins are distinguished from misfolded proteins by their elution from affinity resin under nondenaturing conditions. Misfolded proteins remain adhered to the resin, presumably via hydrophobic interactions. The assay can be applied to insoluble proteins on an individual basis but is particularly well suited for high-throughput applications because it is rapid, automatable and has no rigorous sample preparation requirements. The efficacy of the screen is demonstrated on small-scale expression samples for 15 proteins. Refolding is then validated by large-scale expressions using SEC and circular dichroism.