831 resultados para Klebsiella pneumoniae genome sequence
Resumo:
Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.
Resumo:
This PhD Thesis is the result of my research activity in the last three years. My main research interest was centered on the evolution of mitochondrial genome (mtDNA), and on its usefulness as a phylogeographic and phylogenetic marker at different taxonomic levels in different taxa of Metazoa. From a methodological standpoint, my main effort was dedicated to the sequencing of complete mitochondrial genomes, and the approach to whole-genome sequencing was based on the application of Long-PCR and shotgun sequences. Moreover, this research project is a part of a bigger sequencing project of mtDNAs in many different Metazoans’ taxa, and I mostly dedicated myself to sequence and analyze mtDNAs in selected taxa of bivalves and hexapods (Insecta). Sequences of bivalve mtDNAs are particularly limited, and my study contributed to extend the sampling. Moreover, I used the bivalve Musculista senhousia as model taxon to investigate the molecular mechanisms and the evolutionary significance of their aberrant mode of mitochondrial inheritance (Doubly Uniparental Inheritance, see below). In Insects, I focused my attention on the Genus Bacillus (Insecta Phasmida). A detailed phylogenetic analysis was performed in order to assess phylogenetic relationships within the genus, and to investigate the placement of Phasmida in the phylogenetic tree of Insecta. The main goal of this part of my study was to add to the taxonomic coverage of sequenced mtDNAs in basal insects, which were only partially analyzed.
Resumo:
Analysis of publicly available genomes of Streptococcus pneumoniae has led to the identification of a new genomic element resembling gram-positive pilus islets (PIs). Here, we demonstrate that this genomic region, herein referred to as PI-2 (containing the genes pitA, sipA, pitB, srtG1, and srtG2) codes for a novel functional pilus in pneumococcus. Therefore, there are two pilus islets identified so far in this pathogen (PI-1 and PI-2). Polymerization of the PI-2 pilus requires the backbone protein PitB as well as the sortase SrtG1 and the signal peptidase-like protein SipA. PI-2 is associated with serotypes 1, 2, 7F, 19A, and 19F, considered to be emerging in both industrialized and developing countries. Interestingly, strains belonging to clonal complex 271 (CC271) contain both PI-1 and PI-2, as revealed by genome analyses. In these strains both pili are surface exposed and independently assembled. Furthermore, in vitro experiments provide evidence that the pilus encoded by PI-2 of S. pneumoniae is involved in adherence. Thus, pneumococci encode at least two types of pili that may play a role in the initial host cell contact to the respiratory tract. In addition, the pilus proteins are potential antigens for inclusion in a new generation of pneumococcal vaccines. Adherence by pili could represent important factor in bacterial community formation, since it has been demonstrated that bacterial community formation plays an important role in pneumococcal otitis media. In vitro quantification of bacterial community formation by S. pneumoniae was performed in order to investigate the possible role of pneumococcal pili to form communities. By using different growth media we were not able to see clear association between pili and community formation. But our findings revealed that strains belonging to MLST clonal complex CC15 efficiently form bacterial communities in vitro in a glucose dependent manner. We compared the genome of forty-four pneumococcal isolates discovering four open reading frames specifically associated with CC15. These four genes are annotated as members of an operon responsible for the biosynthesis of a putative lanctibiotic peptide, described to be involved in bacterial community formation. Our experiments show that the lanctibiotic operon deletion affects glucose mediated community formation in CC 15 strain INV200. Moreover, since glucose consumption during bacterial growth produce an acidic environment, we tested bacterial community formation at different pH and we showed that the lanctibiotic operon deletion affected pH mediated community formation in CC 15 strain INV200. In conclusion, these data demonstrate that the putative lanctibiotic operon is associated with pneumococcal CC 15 strains in vitro bacterial community formation.
Resumo:
Streptococcus pneumoniae is an important life threatening human pathogen causing agent of invasive diseases such as otitis media, pneumonia, sepsis and meningitis, but is also a common inhabitant of the respiratory tract of children and healthy adults. Likewise most streptococci, S. pneumoniae decorates its surface with adhesive pili, composed of covalently linked subunits and involved in the attachment to epithelial cells and virulence. The pneumococcal pili are encoded by two genomic regions, pilus islet 1 (PI-1), and pilus islet-2 (PI-2), which are present in about 30% and 16% of the pneumococcal strains, respectively. PI-1 exists in three clonally related variants, whereas PI-2 is highly conserved. The presence of the islets does not correlate with the serotype of the strains, but with the genotype (as determined by Multi Locus Sequence Typing). The prevalence of PI-1 and PI-2 positive strains is similar in isolates from invasive disease and carriage. To better dissect a possible association between PIs presence and disease we evaluated the distribution of the two PIs in a panel of 113 acute otitis media (AOM) clinical isolates from Israel. PI-1 was present in 30.1% (N=34) of the isolates tested, and PI-2 in 7% (N=8). We found that 50% of the PI-1 positive isolates belonged to the international clones Spain9V-3 (ST156) and Taiwan19F-14 (ST236), and that PI-2 was not present in the absence of Pl-1. In conclusion, there was no correlation between PIs presence and AOM, and, in general, the observed differences in PIs prevalence are strictly dependent upon regional differences in the distribution of the clones. Finally, in the AOM collection the prevalence of PI-1 was higher among antibiotic resistant isolates, confirming previous indications obtained by the in silico analysis of the MLST database collection. Since the pilus-1 subunits were shown to confer protection in mouse models of infection both in active and passive immunization studies, and were regarded as potential candidates for a new generation of protein-based vaccines, the functional characterization was mainly focused on S. pneumoniae pilus -1 components. The pneumococcal pilus-1 is composed of three subunits, RrgA, RrgB and RrgC, each stabilized by intra-molecular isopeptide bonds and covalently polymerized by means of inter-molecular isopeptide bonds to form an extended fibre. The pilus shaft is a multimeric structure mainly composed by the RrgB backbone subunit. The minor ancillary proteins are located at the tip and at the base of the pilus, where they have been proposed to act as the major adhesin (RrgA) and as the pilus anchor (RrgC), respectively. RrgA is protective in in vivo mouse models, and exists in two variants (clades I and II). Mapping of the sequence variability onto the RrgA structure predicted from X-ray data showed that the diversity was restricted to the “head” of the protein, which contains the putative binding domains, whereas the elongated “stalk” was mostly conserved. To investigate whether this variability could influence the adhesive capacity of RrgA and to map the regions important for binding, two full-length protein variants and three recombinant RrgA portions were tested for adhesion to lung epithelial cells and to purified extracellular matrix (ECM) components. The two RrgA variants displayed similar binding abilities, whereas none of the recombinant fragments adhered at levels comparable to those of the full-length protein, suggesting that proper folding and structural arrangement are crucial to retain protein functionality. Furthermore, the two RrgA variants were shown to be cross-reactive in vitro and cross-protective in vivo in a murine model of passive immunization. Taken together, these data indicate that the region implicated in adhesion and the functional epitopes responsible for the protective ability of RrgA may be conserved and that the considerable level of variation found within the “head” domain of RrgA may have been generated by immunologic pressure without impairing the functional integrity of the pilus.
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
The potential for mitochondrial (mt) DNA mutation accumulation during antiretroviral therapy (ART), and preferential accumulation in patients with lipoatrophy compared with control participants, remains controversial. We sequenced the entire mitochondrial genome, both before ART and after ART exposure, in 29 human immunodeficiency virus (HIV)-infected Swiss HIV Cohort Study participants initiating a first-line thymidine analogue-containing ART regimen. No accumulation of mtDNA mutations or deletions was detected in 13 participants who developed lipoatrophy or in 16 control participants after significant and comparable ART exposure (median duration, 3.3 and 3.7 years, respectively). In HIV-infected persons, the development of lipoatrophy is unlikely to be associated with accumulation of mtDNA mutations detectable in peripheral blood.
Resumo:
Cytochrome P450 enzymes (CYP450s) represent a superfamily of haem-thiolate proteins. CYP450s are most abundant in the liver, a major site of drug metabolism, and play key roles in the metabolism of a variety of substrates, including drugs and environmental contaminants. Interaction of two or more different drugs with the same enzyme can account for adverse effects and failure of therapy. Human CYP3A4 metabolizes about 50% of all known drugs, but little is known about the orthologous CYP450s in horses. We report here the genomic organization of the equine CYP3A gene cluster as well as a comparative analysis with the human CYP3A gene cluster. The equine CYP450 genes of the 3A family are located on ECA 13 between 6.97-7.53 Mb, in a region syntenic to HSA 7 99.05-99.35 Mb. Seven potential, closely linked equine CYP3A genes were found, in contrast to only four genes in the human genome. RNA was isolated from an equine liver sample, and the approximately 1.5-kb coding sequence of six CYP3A genes could be amplified by RT-PCR. Sequencing of the RT-PCR products revealed numerous hitherto unknown single nucleotide polymorphisms (SNPs) in these six CYP3A genes, and one 6-bp deletion compared to the reference sequence (EquCab2.0). The presence of the variants was confirmed in a sample of genomic DNA from the same horse. In conclusion, orthologous genes for the CYP3A family exist in horses, but their number differs from those of the human CYP3A gene family. CYP450 genes of the same family show high homology within and between mammalian species, but can be highly polymorphic.
Resumo:
A porcine BAC clone harboring the tightly linked IFNAR1 and IFNGR2 genes was identified by comparative analysis of the publicly available porcine BAC end sequences. The complete 168,835 bp insert sequence of this clone was determined. Sequence comparisons of the genomic sequence with EST sequences from public databases were performed and allowed a detailed annotation of the IFNAR1 and IFNGR2 genes. The analyzed genes showed a conserved genomic organization with their known mammalian orthologs, however the sequence conservation of these genes across species was relatively low. In addition to the IFNAR1 and IFNGR2 genes, which were completely sequenced, the analyzed BAC clone also contained parts of an orphan gene encoding a putative transmembrane protein (TMEM50B). In contrast to the IFNAR1 and IFNGR2 genes the sequence conservation of the TMEM50B gene across different mammalian species was extremely high.
Resumo:
Defensins are a family of evolutionary ancient antimicrobial peptides consisting of three sub-families: alpha-, beta- and theta-defensins. This investigation was focused on the genomic characterization of equine beta-defensins and the investigation of the potential clustering of beta-defensin genes in the equine genome. Six genomic BAC clones were isolated from the CHORI-241 library and one of these was mapped by FISH to ECA 27q17. This location was confirmed by RH-mapping. The contiguous 212 kb sequence of this clone was determined. Sequence analysis revealed the identification of ten pseudogenes and nine genes, six of which were highly homologous to human beta-defensin DEFB4. Clustering of the beta-defensin genes was confirmed and the order of the genes on the analyzed BAC was related to the corresponding defensin cluster on HSA 8. The knowledge about the sequence and the genomic structure of the equine beta-defensin genes will improve the classification of different paralogous defensin genes and is a prerequisite for subsequent functional studies. Additionally, the first alpha-defensin-like sequence outside the groups of primates, lagomorphs and rodents (glires) was identified.
Resumo:
The gene for agouti signaling protein (ASIP) is centrally involved in the expression of coat color traits in animals. The Mangalitza pig breed is characterized by a black-and-tan phenotype with black dorsal pigmentation and yellow or white ventral pigmentation. We investigated a Mangalitza x Piétrain cross and observed a coat color segregation pattern in the F2 generation that can be explained by virtue of two alleles at the MC1R locus and two alleles at the ASIP locus. Complete linkage of the black-and-tan phenotype to microsatellite alleles at the ASIP locus on SSC 17q21 was observed. Corroborated by the knowledge of similar mouse coat color mutants, it seems therefore conceivable that the black-and-tan pigmentation of Mangalitza pigs is caused by an ASIP allele a(t), which is recessive to the wild-type allele A. Toward positional cloning of the a(t) mutation, a 200-kb genomic BAC/PAC contig of this chromosomal region has been constructed and subsequently sequenced. Full-length ASIP cDNAs obtained by RACE differed in their 5' untranslated regions, whereas they shared a common open reading frame. Comparative sequencing of all ASIP exons and ASIP cDNAs between Mangalitza and Piétrain pigs did not reveal any differences associated with the coat color phenotype. Relative qRT-PCR analyses showed different dorsoventral skin expression intensities of the five ASIP transcripts in black-and-tan Mangalitza. The a(t) mutation is therefore probably a regulatory ASIP mutation that alters its dorsoventral expression pattern.
Resumo:
Restriction fragment length polymorphism (RFLP) analysis is an economic and fast technique for molecular typing but has the drawback of difficulties in accurately sizing DNA fragments and comparing banding patterns on agarose gels. We aimed to improve RFLP for typing of the important human pathogen Streptococcus pneumoniae and to compare the results with the commonly used typing techniques of pulsed-field gel electrophoresis and multilocus sequence typing. We designed primers to amplify a noncoding region adjacent to the pneumolysin gene. The PCR product was digested separately with six restriction endonucleases, and the DNA fragments were analyzed using an Agilent 2100 bioanalyzer for accurate sizing. The combined RFLP results for all enzymes allowed us to assign each of the 47 clinical isolates of S. pneumoniae tested to one of 33 RFLP types. RFLP analyzed using the bioanalyzer allowed discrimination between strains similar to that obtained by the more commonly used techniques of pulsed-field gel electrophoresis, which discriminated between 34 types, and multilocus sequence typing, which discriminated between 35 types, but more quickly and with less expense. RFLP of a noncoding region using the Agilent 2100 bioanalyzer could be a useful addition to the molecular typing techniques in current use for S. pneumoniae, especially as a first screen of a local population.
Resumo:
The epidemiology, phylogeny, and biology of nonencapsulated Streptococcus pneumoniae are largely unknown. Increased colonization capacity and transformability are, however, intriguing features of these pneumococci and play an important role. Twenty-seven nonencapsulated pneumococci were identified in a nationwide collection of 1,980 nasopharyngeal samples and 215 blood samples obtained between 1998 and 2002. On the basis of multilocus sequence typing and capsule region analysis we divided the nonencapsulated pneumococci into two groups. Group I was closely related to encapsulated strains. Group II had a clonal population structure, including two geographically widespread clones able to cause epidemic conjunctivitis and invasive diseases. Group II strains also carried a 1,959-bp homologue of aliB (aliB-like ORF 2) in the capsule region, which was highly homologous to a sequence in the capsule region of Streptococcus mitis. In addition, strains of the two major clones in group II had an additional sequence, aliB-like ORF 1 (1,968 to 2,004 bp), upstream of aliB-like ORF 2. Expression of aliB-like ORF 1 was detected by reverse transcription-PCR, and the corresponding RNA was visualized by Northern blotting. A gene fragment homologous to capN of serotypes 33 and 37 suggests that group II strains were derived from encapsulated pneumococci some time ago. Therefore, loss of capsule expression in vivo was found to be associated with the importation of one or two aliB homologues in some nonencapsulated pneumococci.
Resumo:
BACKGROUND: The mollicute Mycoplasma conjunctivae is the etiological agent leading to infectious keratoconjunctivitis (IKC) in domestic sheep and wild caprinae. Although this pathogen is relatively benign for domestic animals treated by antibiotics, it can lead wild animals to blindness and death. This is a major cause of death in the protected species in the Alps (e.g., Capra ibex, Rupicapra rupicapra). METHODS: The genome was sequenced using a combined technique of GS-FLX (454) and Sanger sequencing, and annotated by an automatic pipeline that we designed using several tools interconnected via PERL scripts. The resulting annotations are stored in a MySQL database. RESULTS: The annotated sequence is deposited in the EMBL database (FM864216) and uploaded into the mollicutes database MolliGen http://cbi.labri.fr/outils/molligen/ allowing for comparative genomics. CONCLUSION: We show that our automatic pipeline allows for annotating a complete mycoplasma genome and present several examples of analysis in search for biological targets (e.g., pathogenic proteins).