4 resultados para recurrent sequence
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
Clostridium difficile is an obligate anaerobic, Gram-positive, endospore-forming bacterium. Although an opportunistic pathogen, it is one of the important causes of healthcare-associated infections. While toxins TcdA and TcdB are the main virulence factors of C. difficile, the factors or processes involved in gut colonization during infection remain unclear. The biofilm-forming ability of bacterial pathogens has been associated with increased antibiotic resistance and chronic recurrent infections. Little is known about biofilm formation by anaerobic gut species. Biofilm formation by C. difficile could play a role in virulence and persistence of C. difficile, as seen for other intestinal pathogens. We demonstrate that C. difficile clinical strains, 630, and the strain isolated in the outbreak, R20291, form structured biofilms in vitro. Biofilm matrix is made of proteins, DNA and polysaccharide. Strain R20291 accumulates substantially more biofilm. Employing isogenic mutants, we show that virulence-associated proteins, Cwp84, flagella and a putative quorum sensing regulator, LuxS, Spo0A, are required for maximal biofilm formation by C. difficile. Moreover we demonstrate that bacteria in C. difficile biofilms are more resistant to high concentrations of vancomycin, a drug commonly used for treatment of CDI, and that inhibitory and sub-inhibitory concentrations of the same antibiotic induce biofilm formation. Surprisingly, clinical C. difficile strains from the same out-break, but from different origin, show differences in biofilm formation. Genome sequence analysis of these strains showed presence of a single nucleoide polymorphism (SNP) in the anti-σ factor RsbW, which regulates the stress-induced alternative sigma factor B (σB). We further demonstrate that RsbW, a negative regulator of alternative sigma factor B, has a role in biofilm formation and sporulation of C. difficile. Our data suggest that biofilm formation by C. difficile is a complex multifactorial process and may be a crucial mechanism for clostridial persistence in the host.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.