7 resultados para local sequence alignment problem
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.
Resumo:
It has long been known that amino acids are the building blocks for proteins and govern their folding into specific three-dimensional structures. However, the details of this process are still unknown and represent one of the main problems in structural bioinformatics, which is a highly active research area with the focus on the prediction of three-dimensional structure and its relationship to protein function. The protein structure prediction procedure encompasses several different steps from searches and analyses of sequences and structures, through sequence alignment to the creation of the structural model. Careful evaluation and analysis ultimately results in a hypothetical structure, which can be used to study biological phenomena in, for example, research at the molecular level, biotechnology and especially in drug discovery and development. In this thesis, the structures of five proteins were modeled with templatebased methods, which use proteins with known structures (templates) to model related or structurally similar proteins. The resulting models were an important asset for the interpretation and explanation of biological phenomena, such as amino acids and interaction networks that are essential for the function and/or ligand specificity of the studied proteins. The five proteins represent different case studies with their own challenges like varying template availability, which resulted in a different structure prediction process. This thesis presents the techniques and considerations, which should be taken into account in the modeling procedure to overcome limitations and produce a hypothetical and reliable three-dimensional structure. As each project shows, the reliability is highly dependent on the extensive incorporation of experimental data or known literature and, although experimental verification of in silico results is always desirable to increase the reliability, the presented projects show that also the experimental studies can greatly benefit from structural models. With the help of in silico studies, the experiments can be targeted and precisely designed, thereby saving both money and time. As the programs used in structural bioinformatics are constantly improved and the range of templates increases through structural genomics efforts, the mutual benefits between in silico and experimental studies become even more prominent. Hence, reliable models for protein three-dimensional structures achieved through careful planning and thoughtful executions are, and will continue to be, valuable and indispensable sources for structural information to be combined with functional data.
Resumo:
Avidins (Avds) are homotetrameric or homodimeric glycoproteins with typically less than 130 amino acid residues per monomer. They form a highly stable, non-covalent complex with biotin (vitamin H) with Kd = 10-15 M (for chicken Avd). The best-studied Avds are the chicken Avd from Gallus gallus and streptavidin from Streptomyces avidinii, although other Avd studies have also included Avds from various origins, e.g., from frogs, fishes, mushrooms and from many different bacteria. Several engineered Avds have been reported as well, e.g., dual-chain Avds (dcAvds) and single-chain Avds (scAvds), circular permutants with up to four simultaneously modifiable ligand-binding sites. These engineered Avds along with the many native Avds have potential to be used in various nanobiotechnological applications. In this study, we made a structure-based alignment representing all currently available sequences of Avds and studied the evolutionary relationship of Avds using phylogenetic analysis. First, we created an initial multiple sequence alignment of Avds using 42 closely related sequences, guided by the known Avd crystal structures. Next, we searched for non-redundant Avd sequences from various online databases, including National Centre for Biotechnology Information and the Universal Protein Resource; the identified sequences were added to the initial alignment to expand it to a final alignment of 242 Avd sequences. The MEGA software package was used to create distance matrices and a phylogenetic tree. Bootstrap reproducibility of the tree was poor at multiple nodes and may reflect on several possible issues with the data: the sequence length compared is relatively short and, whereas some positions are highly conserved and functional, others can vary without impinging on the structure or the function, so there are few informative sites; it may be that periods of rapid duplication have led to paralogs and that the differences among them are within the error limit of the data; and there may be other yet unknown reasons. Principle component analysis applied to alternative distance data did segregate the major groups, and success is likely due to the multivariate consideration of all the information. Furthermore, based on our extensive alignment and phylogenetic analysis, we expressed two novel Avds, lacavidin from Lactrodectus Hesperus, a western black widow spider, and hoefavidin from Hoeflea phototrophica, an aerobic marine bacterium, the ultimate aim being to determine their X-ray structures. These Avds were selected because of their unique sequences: lacavidin has an N-terminal Avd-like domain but a long C-terminal overhang, whereas hoefavidin was thought to be a dimeric Avd. Both these Avds could be used as novel scaffolds in biotechnological applications.
Resumo:
Glutathione transferases (GSTs) are a diverse family of enzymes that catalyze the glutathione-dependent detoxification of toxic compounds. GSTs are responsible for the conjugation of the tripeptide glutathione (GSH) to a wide range of electrophilic substrates. These include industrial pollutants, drugs, genotoxic carcinogen metabolites, antibiotics, insecticides and herbicides. In light of applications in biomedicine and biotechnology as cellular detoxification agents, detailed structural and functional studies of GSTs are required. Plant tau class GSTs play crucial catalytic and non-catalytic roles in cellular xenobiotic detoxification process in agronomically important crops. The abundant existence of GSTs in Glycine max and their ability to provide resistance to abiotic and biotic stresses such as herbicide tolerance is of great interest in agriculture because they provide effective and suitable tools for selective weed control. Structural and catalytic studies on tau class GST isoenzymes from Glycine max (GmGSTU10-10, GmGSTU chimeric clone 14 (Sh14), and GmGSTU2-2) were performed. Crystal structures of GmGSTU10-10 in complex with glutathione sulfenic acid (GSOH) and Sh14 in complex with S-(p-nitrobenzyl)-glutathione (Nb-GSH) were determined by molecular replacement at 1.6 Å and 1.75 Å, respectively. Major structural variations that affect substrate recognition and catalytic mechanism were revealed in the upper part of helix H4 and helix H9 of GmGSTU10-10. Structural analysis of Sh14 showed that the Trp114Cys point mutation is responsible for the enhanced catalytic activity of the enzyme. Furthermore, two salt bridges that trigger an allosteric effect between the H-sites were identified at the dimer interface between Glu66 and Lys104. The 3D structure of GmGSTU2-2 was predicted using homology modeling. Structural and phylogenetic analysis suggested GmGSTU2-2 shares residues that are crucial for the catalytic activity of other tau class GSTs–Phe10, Trp11, Ser13, Arg20, Tyr30, Leu37, Lys40, Lys53, Ile54, Glu66 and Ser67. This indicates that the catalytic and ligand binding site in GmGSTU2-2 are well-conserved. Nevertheless, at the ligandin binding site a significant variation was observed. Tyr32 is replaced by Ser32 in GmGSTU2-2 and thismay affect the ligand recognition and binding properties of GmGSTU2-2. Moreover, docking studies revealed important amino acid residues in the hydrophobic binding site that can affect the substrate specificity of the enzyme. Phe10, Pro12, Phe15, Leu37, Phe107, Trp114, Trp163, Phe208, Ile212, and Phe216 could form the hydrophobic ligand binding site and bind fluorodifen. Additionally, side chains of Arg111 and Lys215 could stabilize the binding through hydrogen bonds with the –NO2 groups of fluorodifen. GST gene family from the pathogenic soil bacterium Agrobacterium tumefaciens C58 was characterized and eight GST-like proteins in A. tumefaciens (AtuGSTs) were identified. Phylogenetic analysis revealed that four members of AtuGSTs belong to a previously recognized bacterial beta GST class and one member to theta class. Nevertheless, three AtuGSTs do not belong to any previously known GST classes. The 3D structures of AtuGSTs were predicted using homology modeling. Comparative structural and sequence analysis of the AtuGSTs showed local sequence and structural characteristics between different GST isoenzymes and classes. Interactions at the G-site are conserved, however, significant variations were seen at the active site and the H5b helix at the C-terminal domain. H5b contributes to the formation of the hydrophobic ligand binding site and is responsible for recognition of the electrophilic moiety of the xenobiotic. It is noted that the position of H5b varies among models, thus providing different specificities. Moreover, AtuGSTs appear to form functional dimers through diverse modes. AtuGST1, AtuGST3, AtuGST4 and AtuGST8 use hydrophobic ‘lock–and–key’-like motifs whereas the dimer interface of AtuGST2, AtuGST5, AtuGST6 and AtuGST7 is dominated by polar interactions. These results suggested that AtuGSTs could be involved in a broad range of biological functions including stress tolerance and detoxification of toxic compounds.
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.
Resumo:
Apoptotic beta cell death is an underlying cause majorly for type I and to a lesser extent for type II diabetes. Recently, MST1 kinase was identified as a key apoptotic agent in diabetic condition. In this study, I have examined MST1 and closely related kinases namely, MST2, MST3 and MST4, aiming to tackle diabetes by exploring ways to selectively block MST1 kinase activity. The first investigation was directed towards evaluating possibilities of selectively blocking the ATP binding site of MST1 kinase that is essential for the activity of the enzymes. Structure and sequence analyses of this site however revealed a near absolute conservation between the MSTs and very few changes with other kinases. The observed residue variations also displayed similar physicochemical properties making it hard for selective inhibition of the enzyme. Second, possibilities for allosteric inhibition of the enzyme were evaluated. Analysis of the recognized allosteric site also posed the same problem as the MSTs shared almost all of the same residues. The third analysis was made on the SARAH domain, which is required for the dimerization and activation of MST1 and MST2 kinases. MST3 and MST4 lack this domain, hence selectivity against these two kinases can be achieved. Other proteins with SARAH domains such as the RASSF proteins were also examined. Their interaction with the MST1 SARAH domain were evaluated to mimic their binding pattern and design a peptide inhibitor that interferes with MST1 SARAH dimerization. In molecular simulations the RASSF5 SARAH domain was shown to strongly interact with the MST1 SARAH domain and possibly preventing MST1 SARAH dimerization. Based on this, the peptidic inhibitor was suggested to be based on the sequence of RASSF5 SARAH domain. Since the MST2 kinase also interacts with RASSF5 SARAH domain, absolute selectivity might not be achieved.