6 resultados para structure determination
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The study of protein fold is a central problem in life science, leading in the last years to several attempts for improving our knowledge of the protein structures. In this thesis this challenging problem is tackled by means of molecular dynamics, chirality and NMR studies. In the last decades, many algorithms were designed for the protein secondary structure assignment, which reveals the local protein shape adopted by segments of amino acids. In this regard, the use of local chirality for the protein secondary structure assignment was demonstreted, trying to correlate as well the propensity of a given amino acid for a particular secondary structure. The protein fold can be studied also by Nuclear Magnetic Resonance (NMR) investigations, finding the average structure adopted from a protein. In this context, the effect of Residual Dipolar Couplings (RDCs) in the structure refinement was shown, revealing a strong improvement of structure resolution. A wide extent of this thesis is devoted to the study of avian prion protein. Prion protein is the main responsible of a vast class of neurodegenerative diseases, known as Bovine Spongiform Encephalopathy (BSE), present in mammals, but not in avian species and it is caused from the conversion of cellular prion protein to the pathogenic misfolded isoform, accumulating in the brain in form of amiloyd plaques. In particular, the N-terminal region, namely the initial part of the protein, is quite different between mammal and avian species but both of them contain multimeric sequences called Repeats, octameric in mammals and hexameric in avians. However, such repeat regions show differences in the contained amino acids, in particular only avian hexarepeats contain tyrosine residues. The chirality analysis of avian prion protein configurations obtained from molecular dynamics reveals a high stiffness of the avian protein, which tends to preserve its regular secondary structure. This is due to the presence of prolines, histidines and especially tyrosines, which form a hydrogen bond network in the hexarepeat region, only possible in the avian protein, and thus probably hampering the aggregation.
Resumo:
Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.
Resumo:
The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.
Resumo:
"Bioactive compounds" are extranutritional constituents that typically occur in small quantities in food. They are being intensively studied to evaluate their effects on health. Bioactive compounds include both water soluble compounds, such as phenolics, and lipidic substances such as n-3 fatty acids, tocopherols and sterols. Phenolic compounds, tocopherols and sterols are present in all plants and have been studied extensively in cereals, nuts and oil. n-3 fatty acids are present in fish and all around the vegetable kingdom. The aim of the present work was the determination of bioactive and potentially toxic compounds in cereal based foods and nuts. The first section of this study was focused on the determination of bioactive compounds in cereals. Because of that the different forms of phytosterols were investigated in hexaploid and tetraploid wheats. Hexaploid cultivars were the best source of esterified sterols (40.7% and 37.3% of total sterols for Triticum aestivum and Triticum spelta, respectively). Significant amounts of free sterols (65.5% and 60.7% of total sterols for Triticum durum and Triticum dicoccon, respectively) were found in the tetraploid cultivars. Then, free and bound phenolic compounds were identified in barley flours. HPLCESI/ MSD analysis in negative and positive ion mode established that barley free flavan-3- ols and proanthocyanidins were four dimers and four trimers having (epi)catechin and/or (epi)gallocatechin (C and/or GC) subunits. Hydroxycinnamic acids and their derivatives were the main bound phenols in barley flours. The results obtained demonstrated that barley flours were rich in phenolic compounds that showed high antioxidant activity. The study also examined the relationships between phenolic compounds and lipid oxidation of bakery. To this purpose, the investigated barley flours were used in the bakery production. The formulated oven products presented an interesting content of phenolic compounds, but they were not able to contain the lipid oxidation. Furthermore, the influence of conventional packaging on lipid oxidation of pasta was evaluated in n-3 enriched spaghetti and egg spaghetti. The results proved that conventional packaging was not appropriated to preserve pasta from lipid oxidation; in fact, pasta that was exposed to light showed a high content of potentially toxic compounds derived from lipid oxidation (such as peroxide, oxidized fatty acids and COPs). In the second section, the content of sterols, phenolic compounds, n-3 fatty acids and tocopherols in walnuts were reported. Rapid analytical techniques were used to analyze the lipid fraction and to characterize phenolic compounds in walnuts. Total lipid chromatogram was used for the simultaneous determination of the profile of sterols and tocopherols. Linoleic and linolenic acids were the most representative n-6 and n-3 essential dietary fatty acids present in these nuts. Walnuts contained substantial amounts of γ- and δ-tocopherol, which explained their antioxidant properties. Sitosterol, Δ5-avenasterol and campesterol were the major free sterols found. Capillary electrophoresis coupled to DAD and microTOF was utilized to determine phenolic content of walnut. A new compound in walnut ((2E,4E)- 8-hydroxy-2,7-dimethyl-2,4-decadiene-1,10-dioic acid 6-O-β-D-glucopiranosyl ester, [M−H]− 403.161m/z) with a structure similar to glansreginins was also identified. Phenolic compounds corresponded to 14–28% of total polar compounds quantified. Aglycone and glycosylated ellagic acid represented the principal components and account for 64–75% of total phenols in walnuts. However, the sum of glansreginins A, B and ((2E,4E)-8-hydroxy- 2,7-dimethyl-2,4-decadiene-1,10-dioic acid 6-O-β-D-glucopiranosyl ester was in the range of 72–86% of total quantified compounds.
Resumo:
From the late 1980s, the automation of sequencing techniques and the computer spread gave rise to a flourishing number of new molecular structures and sequences and to proliferation of new databases in which to store them. Here are presented three computational approaches able to analyse the massive amount of publicly avalilable data in order to answer to important biological questions. The first strategy studies the incorrect assignment of the first AUG codon in a messenger RNA (mRNA), due to the incomplete determination of its 5' end sequence. An extension of the mRNA 5' coding region was identified in 477 in human loci, out of all human known mRNAs analysed, using an automated expressed sequence tag (EST)-based approach. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 and the consequences for the functional studies are discussed. The second approach analyses the codon bias, the phenomenon in which distinct synonymous codons are used with different frequencies, and, following integration with a gene expression profile, estimates the total number of codons present across all the expressed mRNAs (named here "codonome value") in a given biological condition. Systematic analyses across different pathological and normal human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. The third approach is useful to studies the expression of human autism spectrum disorder (ASD) implicated genes. ASD implicated genes sharing microRNA response elements (MREs) for the same microRNA are co-expressed in brain samples from healthy and ASD affected individuals. The different expression of a recently identified long non coding RNA which have four MREs for the same microRNA could disrupt the equilibrium in this network, but further analyses and experiments are needed.
Resumo:
The Large Magellanic Cloud (LMC) is widely considered as the first step of the cosmological distance ladder, since it contains many different distance indicators. An accurate determination of the distance to the LMC allows one to calibrate these distance indicators that are then used to measure the distance to far objects. The main goal of this thesis is to study the distance and structure of the LMC, as traced by different distance indicators. For these purposes three types of distance indicators were chosen: Classical Cepheids,``hot'' eclipsing binaries and RR Lyrae stars. These objects belong to different stellar populations tracing, in turn, different sub-structures of the LMC. The RR Lyrae stars (age >10 Gyr) are distributed smoothly and likely trace the halo of the LMC. Classical Cepheids are young objects (age 50-200 Myr), mainly located in the bar and spiral arm of the galaxy, while ``hot'' eclipsing binaries mainly trace the star forming regions of the LMC. Furthermore, we have chosen these distance indicators for our study, since the calibration of their zero-points is based on fundamental geometric methods. The ESA cornerstone mission Gaia, launched on 19 December 2013, will measure trigonometric parallaxes for one billion stars with an accuracy of 20 micro-arcsec at V=15 mag, and 200 micro-arcsec at V=20 mag, thus will allow us to calibrate the zero-points of Classical Cepheids, eclipsing binaries and RR Lyrae stars with an unprecedented precision.