971 resultados para SEQUENCE ALIGNMENT
Resumo:
Background The genome of a wide variety of prokaryotes contains the luxS gene homologue, which encodes for the protein S-ribosylhomocysteinelyase (LuxS). This protein is responsible for the production of the quorum sensing molecule, AI-2 and has been implicated in a variety of functions such as flagellar motility, metabolic regulation, toxin production and even in pathogenicity. A high structural similarity is present in the LuxS structures determined from a few species. In this study, we have modelled the structures from several other species and have investigated their dimer interfaces. We have attempted to correlate the interface features of LuxS with the phenotypic nature of the organisms. Results The protein structure networks (PSN) are constructed and graph theoretical analysis is performed on the structures obtained from X-ray crystallography and on the modelled ones. The interfaces, which are known to contain the active site, are characterized from the PSNs of these homodimeric proteins. The key features presented by the protein interfaces are investigated for the classification of the proteins in relation to their function. From our analysis, structural interface motifs are identified for each class in our dataset, which showed distinctly different pattern at the interface of LuxS for the probiotics and some extremophiles. Our analysis also reveals potential sites of mutation and geometric patterns at the interface that was not evident from conventional sequence alignment studies. Conclusion The structure network approach employed in this study for the analysis of dimeric interfaces in LuxS has brought out certain structural details at the side-chain interaction level, which were elusive from the conventional structure comparison methods. The results from this study provide a better understanding of the relation between the luxS gene and its functional role in the prokaryotes. This study also makes it possible to explore the potential direction towards the design of inhibitors of LuxS and thus towards a wide range of antimicrobials.
Resumo:
Previous microarray analyses identified 22 microRNAs (miRNAs) differentially expressed in paired ectopic and eutopic endometrium of women with and without endometriosis. To investigate further the role of these miRNAs in women with endometriosis, we conducted an association study aiming to explore the relationship between endometriosis risk and single-nucleotide polymorphisms (SNPs) in miRNA target sites for these differentially expressed miRNAs. A panel of 102 SNPs in the predicted miRNA binding sites were evaluated for an endometriosis association study and an ingenuity pathway analysis was performed. Fourteen rare variants were identified in this study. We found SNP rs14647 in the Wolf-Hirschhorn syndrome candidate gene1 (WHSC1) 3'UTR (untranslated region) was associated with endometriosis-related infertility presenting an odds ratio of 12.2 (95% confidence interval = 2.4-60.7, P = 9.03 x 10(-5)). SNP haplotype AGG in the solute carrier family 22, member 23 (SLC22A23) 3'UTR was associated with endometriosis-related infertility and more severe disease. With the individual genotyping data, ingenuity pathways analysis identified the tumour necrosis factor and cyclin-dependant kinase inhibitor as major factors in the molecular pathways. Significant associations between WHSC1 alleles and endometriosis-related infertility and SLC22A23 haplotypes and the disease severe stage were identified. These findings may help focus future research on subphenotypes of this disease. Replication studies in independent large sample sets to confirm and characterize the involvement of the gene variation in the pathogenesis of endometriosis are needed.
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
Interferon-induced transmembrane protein 5 or bone-restricted i ifitm-like gene (Bril) was first identified as a bone gene in 2008, although no in vivo role was identified at that time. A role in human bone has now been demonstrated with a number of recent studies identifying a single point mutation in Bril as the causative mutation in osteogenesis imperfecta type V (OI type V). Such a discovery suggests a key role for Bril in skeletal regulation, and the completely novel nature of the gene raises the possibility of a new regulatory pathway in bone. Furthermore, the phenotype of OI type V has unique and quite divergent features compared with other forms of OI involving defects in collagen biology. Currently it appears that the underlying genetic defect in OI type V may be unrelated to collagen regulation, which also raises interesting questions about the classification of this form of OI. This review will discuss current knowledge of OI type V, the function of Bril, and the implications of this recent discovery.
Resumo:
Mycobacterium leprae recA harbors an in-frame insertion sequence that encodes an intein homing endonuclease (PI-MleI). Most inteins (intein endonucleases) possess two conserved LAGLIDADG (DOD) motifs at their ctive center. A common feature of LAGLIDADG-type homing endonucleases is that they recognize and cleave the same or very similar DNA sequences. However, PI-MleI is distinctive from other members of the family of LAGLIDADG-type HEases for its modular structure with functionally separable domains for DNA-binding and cleavage, each with distinct sequence preferences. Sequence alignment analyses of PI-MleI revealed three putative LAGLIDADG motifs; however, there is conflicting bioinformatics data in regard to their identity and specific location within the intein polypeptide. To resolve this conflict and to determine the active-site residues essential for DNA target site recognition and double-stranded DNA cleavage, we performed site-directed mutagenesis of presumptive catalytic residues in the LAGLIDADG motifs. Analysis of target DNA recognition and kinetic parameters of the wild-type PI-MleI and its variants disclosed that the two amino acid residues, Asp(122) (in Block C) and Asp(193) (in functional Block E), are crucial to the double-stranded DNA endonuclease activity, whereas Asp(218) (in pseudo-Block E) is not. However, despite the reduced catalytic activity, the PI-MleI variants, like the wild-type PI-MleI, generated a footprint of the same length around the insertion site. The D122T variant showed significantly reduced catalytic activity, and D122A and D193A mutations although failed to affect their DNA-binding affinities, but abolished the double-stranded DNA cleavage activity. On the other hand, D122C variant showed approximately twofold higher double-stranded DNA cleavage activity, compared with the wild-type PI-MleI. These results provide compelling evidence that Asp(122) and Asp(193) in DOD motif I and II, respectively, are bona fide active-site residues essential for DNA cleavage activity. The implications of these results are discussed in this report.
Resumo:
Background:Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. Methodology/Principal Findings: We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses.Conclusion/Significance: Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes.
Resumo:
Pulicat Lake sediments are often severely polluted with the toxic heavy metal mercury. Several mercury-resistant strains of Bacillus species were isolated from the sediments and all the isolates exhibited broad spectrum resistance (resistance to both organic and inorganic mercuric compounds). Plasmid curing assay showed that all the isolated Bacillus strains carry chromosomally borne mercury resistance. Polymerase chain reaction and southern hybridization analyses using merA and merB3 gene primers/probes showed that five of the isolated Bacillus strains carry sequences similar to known merA and merB3 genes. Results of multiple sequence alignment revealed 99% similarity with merA and merB3 of TnMERI1 (class II transposons). Other mercury resistant Bacillus species lacking homology to these genes were not able to volatilize mercuric chloride, indicating the presence of other modes of resistance to mercuric compounds.
Resumo:
Background: HU a small, basic, histone like protein is a major component of the bacterial nucleoid. E. coli has two subunits of HU coded by hupA and hupB genes whereas Mycobacterium tuberculosis (Mtb) has only one subunit of HU coded by ORF Rv2986c (hupB gene). One noticeable feature regarding Mtb HupB, based on sequence alignment of HU orthologs from different bacteria, was that HupB(Mtb) bears at its C-terminal end, a highly basic extension and this prompted an examination of its role in Mtb HupB function. Methodology/Principal Findings: With this objective two clones of Mtb HupB were generated; one expressing full length HupB protein (HupB(Mtb)) and another which expresses only the N terminal region (first 95 amino acid) of hupB (HupB(MtbN)). Gel retardation assays revealed that HupBMtbN is almost like E. coli HU (heat stable nucleoid protein) in terms of its DNA binding, with a binding constant (K-d) for linear dsDNA greater than 1000 nM, a value comparable to that obtained for the HU alpha alpha and HU alpha beta forms. However CTR (C-terminal Region) of HupB(Mtb) imparts greater specificity in DNA binding. HupB(Mtb) protein binds more strongly to supercoiled plasmid DNA than to linear DNA, also this binding is very stable as it provides DNase I protection even up to 5 minutes. Similar results were obtained when the abilities of both proteins to mediate protection against DNA strand cleavage by hydroxyl radicals generated by the Fenton's reaction, were compared. It was also observed that both the proteins have DNA binding preference for A: T rich DNA which may occur at the regulatory regions of ORFs and the oriC region of Mtb. Conclusions/Significance: These data thus point that HupB(Mtb) may participate in chromosome organization in-vivo, it may also play a passive, possibly an architectural role.
Resumo:
Pulicat Lake sediments are often severely polluted with the toxic heavy metal mercury. Several mercury-resistant strains of Bacillus species were isolated from the sediments and all the isolates exhibited broad spectrum resistance (resistance to both organic and inorganic mercuric compounds). Plasmid curing assay showed that all the isolated Bacillus strains carry chromosomally borne mercury resistance. Polymerase chain reaction and southern hybridization analyses using merA and merB3 gene primers/probes showed that five of the isolated Bacillus strains carry sequences similar to known merA and merB3 genes. Results of multiple sequence alignment revealed 99% similarity with merA and merB3 of TnMERI1 (class II transposons). Other mercury resistant Bacillus species lacking homology to these genes were not able to volatilize mercuric chloride, indicating the presence of other modes of resistance to mercuric compounds.
Resumo:
The Basic Local Alignment Search Tool (BLAST) is one of the most widely used sequence alignment programs with which similarity searches, for both protein and nucleic acid sequences, can be performed against large databases at high speed. A large number of tools exist for processing BLAST output, but none of them provide three-dimensional structure visualization. This shortcoming has been addressed in the proposed tool BLAST Server for Structural Biologists (BSSB), which maps a BLAST output onto the three-dimensional structure of the subject protein. The three-dimensional structure of the subject protein is represented using a three-color coding scheme (identical: red; similar: yellow; and mismatch: white) based on the pairwise alignment obtained. Thus, the user will be able to visualize a possible three-dimensional structure for the query protein sequence. This information can be used to gain a deeper insight into the sequence-structure correlation. Furthermore, the additional structure-level information enables the user to make coherent and logical decisions regarding the type of input model structure or fragment that can be used for molecular replacement calculations. This tool is freely available to all users at http://bioserver1.physics.iisc.ernet.in/bssb/.
Resumo:
The three dimensional structure of a protein provides major insights into its function. Protein structure comparison has implications in functional and evolutionary studies. A structural alphabet (SA) is a library of local protein structure prototypes that can abstract every part of protein main chain conformation. Protein Blocks (PBS) is a widely used SA, composed of 16 prototypes, each representing a pentapeptide backbone conformation defined in terms of dihedral angles. Through this description, the 3D structural information can be translated into a 1D sequence of PBs. In a previous study, we have used this approach to compare protein structures encoded in terms of PBs. A classical sequence alignment procedure based on dynamic programming was used, with a dedicated PB Substitution Matrix (SM). PB-based pairwise structural alignment method gave an excellent performance, when compared to other established methods for mining. In this study, we have (i) refined the SMs and (ii) improved the Protein Block Alignment methodology (named as iPBA). The SM was normalized in regards to sequence and structural similarity. Alignment of protein structures often involves similar structural regions separated by dissimilar stretches. A dynamic programming algorithm that weighs these local similar stretches has been designed. Amino acid substitutions scores were also coupled linearly with the PB substitutions. iPBA improves (i) the mining efficiency rate by 6.8% and (ii) more than 82% of the alignments have a better quality. A higher efficiency in aligning multi-domain proteins could be also demonstrated. The quality of alignment is better than DALI and MUSTANG in 81.3% of the cases. Thus our study has resulted in an impressive improvement in the quality of protein structural alignment. (C) 2011 Elsevier Masson SAS. All rights reserved.
Resumo:
The role of FIC (Filamentation induced by cAMP)(2) domain containing proteins in the regulation of many vital pathways, mostly through the transfer of NMPs from NTPs to specific target proteins (NMPylation), in microorganisms, higher eukaryotes, and plants is emerging. The identity and function of FIC domain containing protein of the human pathogen, Mycobacterium tuberculosis, remains unknown. In this regard, M. tuberculosis fic gene (Mtfic) was cloned, overexpressed, and purified to homogeneity for its biochemical characterisation. It has the characteristic FIC motif, HPFREGNGRSTR (HPFxxGNGRxxR), spanning 144th to 155th residue. Neither the His-tagged nor the GST-tagged MtFic protein, overexpressed in Escherichia coil, nor expression of Mtfic in Mycobacterium smegmatis, yielded the protein in the soluble fraction. However, the maltose binding protein (MBP) tagged MtFic (MBP-MtFic) could be obtained partly in the soluble fraction. The cloned, overexpressed, and purified recombinant MBP-MtFic showed conversion of ATP, GTP, CTP, and UTP into AMP. GMP, CMP, and UMP, respectively. Sequence alignment with several FIC motif containing proteins, complemented with homology modeling on the FIC motif containing protein, VbhT of Bartonella schoenbuchensis as the template, showed conservation and interaction of residues constituting the FIC domain. Site-specific mutagenesis of the His144, or Glu148, or Asn150 of the FIC motif, or of Arg87 residue that constitutes the FIC domain, or complete deletion of the FIC motif, abolished the NTP to NMP conversion activity. The design of NMP formation assay using the recombinant, soluble MtFic would enable identification of its target substrate for NMPylation. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Background: Interaction of non-structural protein 5A (NS5A) of Hepatitis C virus (HCV) with human kinases namely, casein kinase 1 alpha (ck1 alpha) and protein kinase R (PKR) have different functional implications such as regulation of viral replication and evasion of interferon induced immune response respectively. Understanding the structural and molecular basis of interactions of the viral protein with two different human kinases can be useful in developing strategies for treatment against HCV. Results: Serine 232 of NS5A is known to be phosphorylated by human ck1 alpha. A structural model of NS5A peptide containing phosphoacceptor residue Serine 232 bound to ck1 alpha has been generated using the known 3-D structures of kinase-peptide complexes. The substrate interacting residues in ck1 alpha has been identified from the model and these are found to be conserved well in the ck1 family. ck1 alpha - substrate peptide complex has also been used to understand the structural basis of association between ck1 alpha and its other viral stress induced substrate, tumour suppressor p53 transactivation domain which has a crystal structure available. Interaction of NS5A with another human kinase PKR is primarily genotype specific. NS5A from genotype 1b has been shown to interact and inhibit PKR whereas NS5A from genotype 2a/3a are unable to bind and inhibit PKR efficiently. This is one of the main reasons for the varied response to interferon therapy in HCV patients across different genotypes. Using PKR crystal structure, sequence alignment and evolutionary trace analysis some of the critical residues responsible for the interaction of NS5A 1b with PKR have been identified. Conclusions: The substrate interacting residues in ck1 alpha have been identified using the structural model of kinase substrate peptide. The PKR interacting NS5A 1b residues have also been predicted using PKR crystal structure, NS5A sequence analysis along with known experimental results. Functional significance and nature of interaction of interferon sensitivity determining region and variable region 3 of NS5A in different genotypes with PKR which was experimentally shown are also supported by the findings of evolutionary trace analysis. Designing inhibitors to prevent this interaction could enable the HCV genotype 1 infected patients respond well to interferon therapy.
Resumo:
Thiolases are enzymes involved in lipid metabolism. Thiolases remove the acetyl-CoA moiety from 3-ketoacyl-CoAs in the degradative reaction. They can also catalyze the reverse Claisen condensation reaction, which is the first step of biosynthetic processes such as the biosynthesis of sterols and ketone bodies. In human, six distinct thiolases have been identified. Each of these thiolases is different from the other with respect to sequence, oligomeric state, substrate specificity and subcellular localization. Four sequence fingerprints, identifying catalytic loops of thiolases, have been described. In this study genome searches of two mycobacterial species (Mycobacterium tuberculosis and Mycobacterium smegmatis), were carried out, using the six human thiolase sequences as queries. Eight and thirteen different thiolase sequences were identified in M. tuberculosis and M. smegmatis, respectively. In addition, thiolase-like proteins (one encoded in the Mtb and two in the Msm genome) were found. The purpose of this study is to classify these mostly uncharacterized thiolases and thiolase-like proteins. Several other sequences obtained by searches of genome databases of bacteria, mammals and the parasitic protist family of the Trypanosomatidae were included in the analysis. Thiolase-like proteins were also found in the trypanosomatid genomes, but not in those of mammals. In order to study the phylogenetic relationships at a high confidence level, additional thiolase sequences were included such that a total of 130 thiolases and thiolase-like protein sequences were used for the multiple sequence alignment. The resulting phylogenetic tree identifies 12 classes of sequences, each possessing a characteristic set of sequence fingerprints for the catalytic loops. From this analysis it is now possible to assign the mycobacterial thiolases to corresponding homologues in other kingdoms of life. The results of this bioinformatics analysis also show interesting differences between the distributions of M. tuberculosis and M. smegmatis thiolases over the 12 different classes. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Amino acid substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid substitution matrix CBSM60. The matrix shows an improved performance in conformational segment search and homolog detection.