3 resultados para NUCLEOTIDE-SEQUENCES
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
From September 2005 to December 2006, in order to define the prevalence of Helicobacter pullorum in broiler chickens, laying hens and turkey, a total of 365 caecum contents of animals reared in 76 different farms were collected at the slaughterhouse. A caecum content of a ostrich was also sampled. In addition, with the aim of investigating the occurrence of H. pullorum in humans, 151 faeces were collected at the Sant’Orsola-Malpighi University Hospital of Bologna from patients suffering of gastroenteritis. A modified Steele–McDermott membrane filter method was used. Gram-negative curved rod bacteria were preliminary identified as H. pullorum by a PCR assay based on 16S rRNA, then subjected to a RFLP-PCR assay to distinguish between H. pullorum and H. canadensis. One isolate from each farm was randomly selected for phenotypic characterization by biochemical methods and 1D SDSPAGE analysis of whole cell proteins profiles. Minimum Inhibitory Concentration (MIC) for seven different antibiotics were also determined by agar dilution method. Moreover, to examine the intraspecific genomic variability, two strains isolated from 17 different farms were submitted to genotyping by Pulse-Field Gel Electrophoresis (PFGE). In order to assess the molecular basis of fluorquinolone resistance in H. pullorum, gyrA of H. pullorum CIP 104787T was sequenced and nucleotide sequences of the Quinolone Resistance Determining Region (QRDR) of a total of 18 poultry isolates, with different MIC values for ciprofloxacin and nalidixic acid, were compared. According to the PCR and PCR-RFLP results, 306 out of 366 animals examined were positive for H. pullorum (83,6%) and 96,1% of farms resulted infected. All positive samples showed a high number of colonies (>50) phenotipically consistent with H. pullorum on the first isolation media, which suggests that this microrganism, when present, colonizes the poultry caecum at an elevate load. No human sample resulted positive for H. pullorum. The 1D SDS-PAGE whole protein profile analysis showed high similarity among the 74 isolates tested and with the type strain H. pullorum CIP 104787T. Regarding the MIC values, a monomodal distribution was found for ampicillin, chloramphenicol, gentamicin and nalidixic acid, whereas a bimodal trend was noticed for erythromycin, ciprofloxacin and tetracycline (indicating an acquired resistance for these antibiotics). Applying the breakpoints indicated by the CSLI, we may assume that all the H. pullorum tested are sensitive only to gentamicin. The intraspecific genomic variability observed in this study confirm that this species don’t have a clonal population structure, as motioned by other autors. The 2490 bp gyrA gene of H. pullorum CIP104787T with an Open Reading Frame (ORF) encoding a polypeptide of 829 amino acids was for the first time sequenced and characterized. All ciprofloxacin resistant poultry isolates showed ACA®ATA (Thr®Ile) substitution at codon 84 of gyrA corresponding to codons of gyrA 86, 87 and 83 of the Campylobacter jejuni, H. pylori and Escherichia coli, respectively. This substitution was functionally confirmed to be associated with the ciprofloxacin resistant phenotype of poultry isolates. This is the first report of isolation of H. pullorum in turkey and in ostrich, indicating that poultry species are the reservoir of this potential zoonotic microorganisms. In order to understand the potential role as food-borne human pathogen of H. pullorum, further studies must be carried on.
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.