11 resultados para PROTEIN PRECIPITATION METHODS
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.
Resumo:
Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.
Resumo:
A systematic characterization of the composition and structure of the bacterial cell-surface proteome and its complexes can provide an invaluable tool for its comprehensive understanding. The knowledge of protein complexes composition and structure could offer new, more effective targets for a more specific and consequently effective immune response against a complex instead of a single protein. Large-scale protein-protein interaction screens are the first step towards the identification of complexes and their attribution to specific pathways. Currently, several methods exist for identifying protein interactions and protein microarrays provide the most appealing alternative to existing techniques for a high throughput screening of protein-protein interactions in vitro under reasonably straightforward conditions. In this study approximately 100 proteins of Group A Streptococcus (GAS) predicted to be secreted or surface exposed by genomic and proteomic approaches were purified in a His-tagged form and used to generate protein microarrays on nitrocellulose-coated slides. To identify protein-protein interactions each purified protein was then labeled with biotin, hybridized to the microarray and interactions were detected with Cy3-labelled streptavidin. Only reciprocal interactions, i. e. binding of the same two interactors irrespective of which of the two partners is in solid-phase or in solution, were taken as bona fide protein-protein interactions. Using this approach, we have identified 20 interactors of one of the potent toxins secreted by GAS and known as superantigens. Several of these interactors belong to the molecular chaperone or protein folding catalyst families and presumably are involved in the secretion and folding of the superantigen. In addition, a very interesting interaction was found between the superantigen and the substrate binding subunit of a well characterized ABC transporter. This finding opens a new perspective on the current understanding of how superantigens are modified by the bacterial cell in order to become major players in causing disease.
Resumo:
We present a study of the metal sites of different proteins through X-ray Absorption Fine Structure (XAFS) spectroscopy. First of all, the capabilities of XAFS analysis have been improved by ab initio simulation of the near-edge region of the spectra, and an original analysis method has been proposed. The method subsequently served ad a tool to treat diverse biophysical problems, like the inhibition of proton-translocating proteins by metal ions and the matrix effect exerted on photosynthetic proteins (the bacterial Reaction Center, RC) by strongly dehydrate sugar matrices. A time-resolved study of Fe site of RC with μs resolution has been as well attempted. Finally, a further step aimed to improve the reliability of XAFS analysis has been performed by calculating the dynamical parameters of the metal binding cluster by means of DFT methods, and the theoretical result obtained for MbCO has been successfully compared with experimental data.
Resumo:
The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.
Resumo:
Nano(bio)science and nano(bio)technology play a growing and tremendous interest both on academic and industrial aspects. They are undergoing rapid developments on many fronts such as genomics, proteomics, system biology, and medical applications. However, the lack of characterization tools for nano(bio)systems is currently considered as a major limiting factor to the final establishment of nano(bio)technologies. Flow Field-Flow Fractionation (FlFFF) is a separation technique that is definitely emerging in the bioanalytical field, and the number of applications on nano(bio)analytes such as high molar-mass proteins and protein complexes, sub-cellular units, viruses, and functionalized nanoparticles is constantly increasing. This can be ascribed to the intrinsic advantages of FlFFF for the separation of nano(bio)analytes. FlFFF is ideally suited to separate particles over a broad size range (1 nm-1 μm) according to their hydrodynamic radius (rh). The fractionation is carried out in an empty channel by a flow stream of a mobile phase of any composition. For these reasons, fractionation is developed without surface interaction of the analyte with packing or gel media, and there is no stationary phase able to induce mechanical or shear stress on nanosized analytes, which are for these reasons kept in their native state. Characterization of nano(bio)analytes is made possible after fractionation by interfacing the FlFFF system with detection techniques for morphological, optical or mass characterization. For instance, FlFFF coupling with multi-angle light scattering (MALS) detection allows for absolute molecular weight and size determination, and mass spectrometry has made FlFFF enter the field of proteomics. Potentialities of FlFFF couplings with multi-detection systems are discussed in the first section of this dissertation. The second and the third sections are dedicated to new methods that have been developed for the analysis and characterization of different samples of interest in the fields of diagnostics, pharmaceutics, and nanomedicine. The second section focuses on biological samples such as protein complexes and protein aggregates. In particular it focuses on FlFFF methods developed to give new insights into: a) chemical composition and morphological features of blood serum lipoprotein classes, b) time-dependent aggregation pattern of the amyloid protein Aβ1-42, and c) aggregation state of antibody therapeutics in their formulation buffers. The third section is dedicated to the analysis and characterization of structured nanoparticles designed for nanomedicine applications. The discussed results indicate that FlFFF with on-line MALS and fluorescence detection (FD) may become the unparallel methodology for the analysis and characterization of new, structured, fluorescent nanomaterials.
Resumo:
The subject of this Ph.D. research thesis is the development and application of multiplexed analytical methods based on bioluminescent whole-cell biosensors. One of the main goals of analytical chemistry is multianalyte testing in which two or more analytes are measured simultaneously in a single assay. The advantages of multianalyte testing are work simplification, high throughput, and reduction in the overall cost per test. The availability of multiplexed portable analytical systems is of particular interest for on-field analysis of clinical, environmental or food samples as well as for the drug discovery process. To allow highly sensitive and selective analysis, these devices should combine biospecific molecular recognition with ultrasensitive detection systems. To address the current need for rapid, highly sensitive and inexpensive devices for obtaining more data from each sample,genetically engineered whole-cell biosensors as biospecific recognition element were combined with ultrasensitive bioluminescence detection techniques. Genetically engineered cell-based sensing systems were obtained by introducing into bacterial, yeast or mammalian cells a vector expressing a reporter protein whose expression is controlled by regulatory proteins and promoter sequences. The regulatory protein is able to recognize the presence of the analyte (e.g., compounds with hormone-like activity, heavy metals…) and to consequently activate the expression of the reporter protein that can be readily measured and directly related to the analyte bioavailable concentration in the sample. Bioluminescence represents the ideal detection principle for miniaturized analytical devices and multiplexed assays thanks to high detectability in small sample volumes allowing an accurate signal localization and quantification. In the first chapter of this dissertation is discussed the obtainment of improved bioluminescent proteins emitting at different wavelenghts, in term of increased thermostability, enhanced emission decay kinetic and spectral resolution. The second chapter is mainly focused on the use of these proteins in the development of whole-cell based assay with improved analytical performance. In particular since the main drawback of whole-cell biosensors is the high variability of their analyte specific response mainly caused by variations in cell viability due to aspecific effects of the sample’s matrix, an additional bioluminescent reporter has been introduced to correct the analytical response thus increasing the robustness of the bioassays. The feasibility of using a combination of two or more bioluminescent proteins for obtaining biosensors with internal signal correction or for the simultaneous detection of multiple analytes has been demonstrated by developing a dual reporter yeast based biosensor for androgenic activity measurement and a triple reporter mammalian cell-based biosensor for the simultaneous monitoring of two CYP450 enzymes activation, involved in cholesterol degradation, with the use of two spectrally resolved intracellular luciferases and a secreted luciferase as a control for cells viability. In the third chapter is presented the development of a portable multianalyte detection system. In order to develop a portable system that can be used also outside the laboratory environment even by non skilled personnel, cells have been immobilized into a new biocompatible and transparent polymeric matrix within a modified clear bottom black 384 -well microtiter plate to obtain a bioluminescent cell array. The cell array was placed in contact with a portable charge-coupled device (CCD) light sensor able to localize and quantify the luminescent signal produced by different bioluminescent whole-cell biosensors. This multiplexed biosensing platform containing whole-cell biosensors was successfully used to measure the overall toxicity of a given sample as well as to obtain dose response curves for heavy metals and to detect hormonal activity in clinical samples (PCT/IB2010/050625: “Portable device based on immobilized cells for the detection of analytes.” Michelini E, Roda A, Dolci LS, Mezzanotte L, Cevenini L , 2010). At the end of the dissertation some future development steps are also discussed in order to develop a point of care (POCT) device that combine portability, minimum sample pre-treatment and highly sensitive multiplexed assays in a short assay time. In this POCT perspective, field-flow fractionation (FFF) techniques, in particular gravitational variant (GrFFF) that exploit the earth gravitational field to structure the separation, have been investigated for cells fractionation, characterization and isolation. Thanks to the simplicity of its equipment, amenable to miniaturization, the GrFFF techniques appears to be particularly suited for its implementation in POCT devices and may be used as pre-analytical integrated module to be applied directly to drive target analytes of raw samples to the modules where biospecifc recognition reactions based on ultrasensitive bioluminescence detection occurs, providing an increase in overall analytical output.
Resumo:
Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
Recent advances in the fast growing area of therapeutic/diagnostic proteins and antibodies - novel and highly specific drugs - as well as the progress in the field of functional proteomics regarding the correlation between the aggregation of damaged proteins and (immuno) senescence or aging-related pathologies, underline the need for adequate analytical methods for the detection, separation, characterization and quantification of protein aggregates, regardless of the their origin or formation mechanism. Hollow fiber flow field-flow fractionation (HF5), the miniaturized version of FlowFFF and integral part of the Eclipse DUALTEC FFF separation system, was the focus of this research; this flow-based separation technique proved to be uniquely suited for the hydrodynamic size-based separation of proteins and protein aggregates in a very broad size and molecular weight (MW) range, often present at trace levels. HF5 has shown to be (a) highly selective in terms of protein diffusion coefficients, (b) versatile in terms of bio-compatible carrier solution choice, (c) able to preserve the biophysical properties/molecular conformation of the proteins/protein aggregates and (d) able to discriminate between different types of protein aggregates. Thanks to the miniaturization advantages and the online coupling with highly sensitive detection techniques (UV/Vis, intrinsic fluorescence and multi-angle light scattering), HF5 had very low detection/quantification limits for protein aggregates. Compared to size-exclusion chromatography (SEC), HF5 demonstrated superior selectivity and potential as orthogonal analytical method in the extended characterization assays, often required by therapeutic protein formulations. In addition, the developed HF5 methods have proven to be rapid, highly selective, sensitive and repeatable. HF5 was ideally suitable as first dimension of separation of aging-related protein aggregates from whole cell lysates (proteome pre-fractionation method) and, by HF5-(UV)-MALS online coupling, important biophysical information on the fractionated proteins and protein aggregates was gathered: size (rms radius and hydrodynamic radius), absolute MW and conformation.
Resumo:
Nowadays, in developed countries, the excessive food intake, in conjunction with a decreased physical activity, has led to an increase in lifestyle-related diseases, such as obesity, cardiovascular diseases, type -2 diabetes, a range of cancer types and arthritis. The socio-economic importance of such lifestyle-related diseases has encouraged countries to increase their efforts in research, and many projects have been initiated recently in research that focuses on the relationship between food and health. Thanks to these efforts and to the growing availability of technologies, the food companies are beginning to develop healthier food. The necessity of rapid and affordable methods, helping the food industries in the ingredient selection has stimulated the development of in vitro systems that simulate the physiological functions to which the food components are submitted when administrated in vivo. One of the most promising tool now available appears the in vitro digestion, which aims at predicting, in a comparative way among analogue food products, the bioaccessibility of the nutrients of interest.. The adoption of the foodomics approach has been chosen in this work to evaluate the modifications occurring during the in vitro digestion of selected protein-rich food products. The measure of the proteins breakdown was performed via NMR spectroscopy, the only techniques capable of observing, directly in the simulated gastric and duodenal fluids, the soluble oligo- and polypeptides released during the in vitro digestion process. The overall approach pioneered along this PhD work, has been discussed and promoted in a large scientific community, with specialists networked under the INFOGEST COST Action, which recently released a harmonized protocol for the in vitro digestion. NMR spectroscopy, when used in tandem with the in vitro digestion, generates a new concept, which provides an additional attribute to describe the food quality: the comparative digestibility, which measures the improvement of the nutrients bioaccessibility.