931 resultados para Bioinformatics
Resumo:
Abstract Background HCV is prevalent throughout the world. It is a major cause of chronic liver disease. There is no effective vaccine and the most common therapy, based on Peginterferon, has a success rate of ~50%. The mechanisms underlying viral resistance have not been elucidated but it has been suggested that both host and virus contribute to therapy outcome. Non-structural 5A (NS5A) protein, a critical virus component, is involved in cellular and viral processes. Methods The present study analyzed structural and functional features of 345 sequences of HCV-NS5A genotypes 1 or 3, using in silico tools. Results There was residue type composition and secondary structure differences between the genotypes. In addition, second structural variance were statistical different for each response group in genotype 3. A motif search indicated conserved glycosylation, phosphorylation and myristoylation sites that could be important in structural stabilization and function. Furthermore, a highly conserved integrin ligation site was identified, and could be linked to nuclear forms of NS5A. ProtFun indicated NS5A to have diverse enzymatic and nonenzymatic activities, participating in a great range of cell functions, with statistical difference between genotypes. Conclusion This study presents new insights into the HCV-NS5A. It is the first study that using bioinformatics tools, suggests differences between genotypes and response to therapy that can be related to NS5A protein features. Therefore, it emphasizes the importance of using bioinformatics tools in viral studies. Data acquired herein will aid in clarifying the structure/function of this protein and in the development of antiviral agents.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
The control of gene expression by miRNAs has been widely investigated in different species and cell types. Following a probabilistic rather than a deterministic regimen, the action of these short nucleotide sequences on specific genes depends on intracellular concentration,which in turn reflects the balance between biosynthesis and degradation. Recent studies have described the involvement of XRN2, an exoribonuclease, in miRNA degradation and PAPD4, an atypical poly(A) polymerase, in miRNA stability. Herein, we examined the expression of XRN2 and PAPD4 in developing and adult rat hippocampi. Combining bioinformatics and real-time PCR,we demonstrated that XRN2 and PAPD4 expression is regulated by the uncorrelated action of transcription factors, resulting in distinct gene expression profiles during development. Analyses of nuclei position and nestin labeling revealed that both proteins progressively accumulated during neuronal differentiation, and that they are weakly expressed in immature neurons and absent in glial and endothelial cells. Despite the differences in subcellular localization, both genes were concurrently identified within identical neuronal subpopulations, including specific inhibitory interneurons. Thus, we cope with a singular circumstance in biology: an almost complete intersected expression of functional-opposed genes, reinforcing that their antagonistically driven actions on miRNAs “make sense” if simultaneously present at the same cells. Considering that the transcriptome in the nervous system is finely tuned to physiological processes, it was remarkable that miRNA stability-related genes were oncurrently identified in neurons that play essential roles in cognitive functions such as memory and learning. In summary, this study reveals a possible new mechanism for the control of miRNA expression.
Resumo:
Although it is well known that the thyroid hormone (T3) is an important positive regulator of cardiac function over a short term and that it also promotes deleterious effects over a long term, the molecular mechanisms for such effects are not yet well understood. Because most alterations in cardiac function are associated with changes in sarcomeric machinery, the present work was undertaken to find novel sarcomeric hot spots driven by T3 in the heart. A microarray analysis indicated that the M-band is a major hot spot, and the structural sarcomeric gene coding for the M-protein is severely down-regulated by T3. Real-time quantitative PCR-based measurements confirmed that T3 (1, 5, 50, and 100 physiological doses for 2 days) sharply decreased the M-protein gene and protein expression in vivo in a dose-dependent manner. Furthermore, the M-protein gene expression was elevated 3.4-fold in hypothyroid rats. Accordingly, T3 was able to rapidly and strongly reduce the M-protein gene expression in neonatal cardiomyocytes. Deletions at the M-protein promoter and bioinformatics approach suggested an area responsive to T3, which was confirmed by chromatin immunoprecipitation assay. Functional assays in cultured neonatal cardiomyocytes revealed that depletion of M-protein (by small interfering RNA) drives a severe decrease in speed of contraction. Interestingly, mRNA and protein levels of other M-band components, myomesin and embryonic-heart myomesin, were not altered by T3. We concluded that the M-protein expression is strongly and rapidly repressed by T3 in cardiomyocytes, which represents an important aspect for the basis of T3-dependent sarcomeric deleterious effects in the heart.
Resumo:
BACKGROUND: In the alpha subclass of proteobacteria iron homeostasis is controlled by diverse iron responsive regulators. Caulobacter crescentus, an important freshwater α-proteobacterium, uses the ferric uptake repressor (Fur) for such purpose. However, the impact of the iron availability on the C. crescentus transcriptome and an overall perspective of the regulatory networks involved remain unknown. RESULTS: In this work we report the identification of iron-responsive and Fur-regulated genes in C. crescentus using microarray-based global transcriptional analyses. We identified 42 genes that were strongly upregulated both by mutation of fur and by iron limitation condition. Among them, there are genes involved in iron uptake (four TonB-dependent receptor gene clusters, and feoAB), riboflavin biosynthesis and genes encoding hypothetical proteins. Most of these genes are associated with predicted Fur binding sites, implicating them as direct targets of Fur-mediated repression. These data were validated by β-galactosidase and EMSA assays for two operons encoding putative transporters. The role of Fur as a positive regulator is also evident, given that 27 genes were downregulated both by mutation of fur and under low-iron condition. As expected, this group includes many genes involved in energy metabolism, mostly iron-using enzymes. Surprisingly, included in this group are also TonB-dependent receptors genes and the genes fixK, fixT and ftrB encoding an oxygen signaling network required for growth during hypoxia. Bioinformatics analyses suggest that positive regulation by Fur is mainly indirect. In addition to the Fur modulon, iron limitation altered expression of 113 more genes, including induction of genes involved in Fe-S cluster assembly, oxidative stress and heat shock response, as well as repression of genes implicated in amino acid metabolism, chemotaxis and motility. CONCLUSIONS: Using a global transcriptional approach, we determined the C. crescentus iron stimulon. Many but not all of iron responsive genes were directly or indirectly controlled by Fur. The iron limitation stimulon overlaps with other regulatory systems, such as the RpoH and FixK regulons. Altogether, our results showed that adaptation of C. crescentus to iron limitation not only involves increasing the transcription of iron-acquisition systems and decreasing the production of iron-using proteins, but also includes novel genes and regulatory mechanisms
Resumo:
In the post genomic era with the massive production of biological data the understanding of factors affecting protein stability is one of the most important and challenging tasks for highlighting the role of mutations in relation to human maladies. The problem is at the basis of what is referred to as molecular medicine with the underlying idea that pathologies can be detailed at a molecular level. To this purpose scientific efforts focus on characterising mutations that hamper protein functions and by these affect biological processes at the basis of cell physiology. New techniques have been developed with the aim of detailing single nucleotide polymorphisms (SNPs) at large in all the human chromosomes and by this information in specific databases are exponentially increasing. Eventually mutations that can be found at the DNA level, when occurring in transcribed regions may then lead to mutated proteins and this can be a serious medical problem, largely affecting the phenotype. Bioinformatics tools are urgently needed to cope with the flood of genomic data stored in database and in order to analyse the role of SNPs at the protein level. In principle several experimental and theoretical observations are suggesting that protein stability in the solvent-protein space is responsible of the correct protein functioning. Then mutations that are found disease related during DNA analysis are often assumed to perturb protein stability as well. However so far no extensive analysis at the proteome level has investigated whether this is the case. Also computationally methods have been developed to infer whether a mutation is disease related and independently whether it affects protein stability. Therefore whether the perturbation of protein stability is related to what it is routinely referred to as a disease is still a big question mark. In this work we have tried for the first time to explore the relation among mutations at the protein level and their relevance to diseases with a large-scale computational study of the data from different databases. To this aim in the first part of the thesis for each mutation type we have derived two probabilistic indices (for 141 out of 150 possible SNPs): the perturbing index (Pp), which indicates the probability that a given mutation effects protein stability considering all the “in vitro” thermodynamic data available and the disease index (Pd), which indicates the probability of a mutation to be disease related, given all the mutations that have been clinically associated so far. We find with a robust statistics that the two indexes correlate with the exception of all the mutations that are somatic cancer related. By this each mutation of the 150 can be coded by two values that allow a direct comparison with data base information. Furthermore we also implement computational methods that starting from the protein structure is suited to predict the effect of a mutation on protein stability and find that overpasses a set of other predictors performing the same task. The predictor is based on support vector machines and takes as input protein tertiary structures. We show that the predicted data well correlate with the data from the databases. All our efforts therefore add to the SNP annotation process and more importantly found the relationship among protein stability perturbation and the human variome leading to the diseasome.
Resumo:
In Group B Streptococcus (GBS) three structurally distinct types of pili have been discovered as potential virulence factors and vaccine candidates. The pilus-forming proteins are assembled into high-molecular weight polymers via a transpeptidation mechanism mediated by specific class C sortases. Using a multidisciplinary approach including bioinformatics, structural and biochemical studies and in vivo mutagenesis we performed a broad characterization of GBS sortase C. The high resolution X-ray structure of the enzymes revealed that the active site, located into the β-barrel core of the enzyme, is made of the catalytic triad His157-Cys219-Arg228 and covered by a loop, known as the “lid”. We show that the catalytic triad and the predicted N- and C-terminal trans-membrane regions are required for the enzyme activity. Interestingly, by in vivo complementation mutagenesis studies we found that the deletion of the entire lid loop or mutations in specific lid key residues had no effect on catalytic activity of the enzyme. In addition, kinetic characterizations of recombinant enzymes indicate that the lid mutants can still recognize and cleave the substrate-mimicking peptide at least as well as the wild type protein.
Resumo:
Für eine Reihe einzelner genetischer Faktoren und Promotorelemente wurde in der Vergangenheit eine Regulation der Genexpression in der Leber (und auch in anderen Geweben) gezeigt. Mit der Verfügbarkeit des gesamten humanen Genoms sowie dessen Expressionsdaten in großen Microarray- und SAGE-Datenbanken bietet sich die Möglichkeit, solche Regulationsmechanismen in großem, genomweitem Maßstab zu untersuchen. Dabei geht diese Arbeit der Frage nach, ob es übergeordnete, eine Expression speziell in der Leber fördernde oder hemmende Faktoren gibt oder ob jedes Gen von einer unabhängigen Kombination von Faktoren reguliert wird, in dessen Summe die Expression des individuellen Gens in der Leber am stärksten ist. Sollten sich übergeordnete, eine Expression in der Leber stimulierende Faktoren finden, wären diese interessant für die Entwicklung neuer Behandlungskonzepte bei Lebererkrankungen. Zur Untersuchung dieser Fragestellung wurden aus einem Affymetrix Microarray Datenset für 12 Gewebe die Expressiondaten von insgesamt jeweils 15.472 Genen extrahiert. In einem zweiten Schritt wurden zusätzlich die Promotorsequenzen der einzelnen zugehörigen Gene, definiert als eine 1000 bp Region upstream des Transkriptionsstarts, in dieselbe Datenbank abgelegt. Die Promotorsequenzen wurden über den PromotorScan-Algorithmus analysiert. Auf diese Weise wurden Transkriptionsfaktorbindungsstellen auf 7042 der Promotoren identifiziert. Es fand sich eine Gesamtzahl von 241.984 Transkriptionsfaktorbindungsstellen. Anhand der Microarray-Expressionsdaten wurde die Gesamtgruppe der verfügbaren Gene und Promotoren in zwei Gruppen unterteilt, nämlich in die Gruppe der Gene, deren Expression in der Leber deutlich am höchsten gefunden wurde und in die Gruppe der Gene, die in anderen Geweben am höchsten exprimiert waren. Jeder potentiell bindende Transkriptionsfaktor wurde auf unterschiedliches Vorkommen in diesen beiden Gruppen hin untersucht. Dies geschah unter der Vorstellung, dass übergeordnete Faktoren, die eine Expression in der Leber stimulieren in der Gruppe der Gene, die in der Leber am höchsten exprimiert sind, verhältnismäßig wesentlich häufiger zu finden sein könnten. Eine solches häufigeres Vorkommen ließ sich jedoch für keinen einzigen Faktor nachweisen. Transkriptionsfaktorbindungsstellen sind typischerweise zwischen 5 und 15 bp lang. Um auszuschließen, dass mit dem verwendeten PromotorScan-Algorithmus Transkriptionsfaktorbindungsstellen, die bisher nicht bekannt sind, nicht übersehen wurden, wurden die Häufigkeit sämtlicher möglicher 8 bp (48) und 10 bp (410) Nukleotid-Kombinationen in diesen Promotoren untersucht. Biologisch relevante Unterschiede fanden sich zwischen den beiden Gruppen nicht. In gleicher Weise wurde auch die Bedeutung von TATA-Boxen untersucht. TATA-Boxen kommt bei der Transkriptionsinitiierung eine wichtige Rolle zu, indem über sie die Bindung des initialen Transkriptionskomplexes vermittelt wird. Insgesamt 1033 TATA-Boxen wurden ebenfalls mittels PromotorScan vorausgesagt. Dabei waren 57 auf Promotoren von Genen, die in der Leber überexprimiert waren und 976 auf Promotoren von Genen, die in anderen Geweben überexprimiert waren. Der Vergleich dieser beiden Gruppen ließ keine signifikant unterschiedliche Häufigkeit an TATA-Boxen erkennen. Im weiteren wurde die Bedeutung von CpG-Islands für eine potentiell differentielle Regulation untersucht. Insgesamt wurden 8742 CpG-Islands in einem Bereich von bis zu 5 kb upstream des Transkriptionsstarts identifiziert, 364 davon auf Promotoren von Genen, die am höchsten in der Leber exprimiert waren, 8378 auf Promotoren von Genen, die in anderen Geweben am höchsten exprimiert waren. Signifikante Unterschiede in der Verteilung von CpG-Islands auf Promotoren dieser beiden Gengruppen ließen sich nicht nachweisen. Schließlich wurden die RNA- und Proteinsequenzen des Transkriptoms und Proteoms hinsichtlich ihrer Zusammensetzung aus einzelnen Nukleotiden bzw. Aminosäuren analysiert. Auch hierbei fanden sich keine signifikanten Unterschiede in der Verteilung zwischen beiden Gengruppen. Die Zusammenschau der Ergebnisse zeigt, dass die Regulation der einzelnen Gene im Lebergewebe im wesentlichen individuell erfolgt. Im Rahmen der vorgelegten bioinformatischen Analysen fanden sich keine übergeordneten genetischen „Leberfaktoren“, die speziell eine Expression von Genen in der Leber stimulieren. Neue therapeutische Ansätze, die auf eine Regulation der Genexpression in der Leber zielen, werden somit auch weiterhin auf die Beeinflussung individueller Gene fokussiert bleiben.
Resumo:
Il problema dell'antibiotico-resistenza è un problema di sanità pubblica per affrontare il quale è necessario un sistema di sorveglianza basato sulla raccolta e l'analisi dei dati epidemiologici di laboratorio. Il progetto di dottorato è consistito nello sviluppo di una applicazione web per la gestione di tali dati di antibiotico sensibilità di isolati clinici utilizzabile a livello di ospedale. Si è creata una piattaforma web associata a un database relazionale per avere un’applicazione dinamica che potesse essere aggiornata facilmente inserendo nuovi dati senza dover manualmente modificare le pagine HTML che compongono l’applicazione stessa. E’ stato utilizzato il database open-source MySQL in quanto presenta numerosi vantaggi: estremamente stabile, elevate prestazioni, supportato da una grande comunità online ed inoltre gratuito. Il contenuto dinamico dell’applicazione web deve essere generato da un linguaggio di programmazione tipo “scripting” che automatizzi operazioni di inserimento, modifica, cancellazione, visualizzazione di larghe quantità di dati. E’ stato scelto il PHP, linguaggio open-source sviluppato appositamente per la realizzazione di pagine web dinamiche, perfettamente utilizzabile con il database MySQL. E’ stata definita l’architettura del database creando le tabelle contenenti i dati e le relazioni tra di esse: le anagrafiche, i dati relativi ai campioni, microrganismi isolati e agli antibiogrammi con le categorie interpretative relative al dato antibiotico. Definite tabelle e relazioni del database è stato scritto il codice associato alle funzioni principali: inserimento manuale di antibiogrammi, importazione di antibiogrammi multipli provenienti da file esportati da strumenti automatizzati, modifica/eliminazione degli antibiogrammi precedenti inseriti nel sistema, analisi dei dati presenti nel database con tendenze e andamenti relativi alla prevalenza di specie microbiche e alla chemioresistenza degli stessi, corredate da grafici. Lo sviluppo ha incluso continui test delle funzioni via via implementate usando reali dati clinici e sono stati introdotti appositi controlli e l’introduzione di una semplice e pulita veste grafica.
Resumo:
My PhD project was focused on Atlantic bluefin tuna, Thunnus thynnus, a fishery resource overexploited in the last decades. For a better management of stocks, it was necessary to improve scientific knowledge of this species and to develop novel tools to avoid collapse of this important commercial resource. To do this, we used new high throughput sequencing technologies, as Next Generation Sequencing (NGS), and markers linked to expressed genes, as SNPs (Single Nucleotide Polymorphisms). In this work we applied a combined approach: transcriptomic resources were used to build cDNA libreries from mRNA isolated by muscle, and genomic resources allowed to create a reference backbone for this species lacking of reference genome. All cDNA reads, obtained from mRNA, were mapped against this genome and, employing several bioinformatics tools and different restricted parameters, we achieved a set of contigs to detect SNPs. Once a final panel of 384 SNPs was developed, following the selection criteria, it was genotyped in 960 individuals of Atlantic bluefin tuna, including all size/age classes, from larvae to adults, collected from the entire range of the species. The analysis of obtained data was aimed to evaluate the genetic diversity and the population structure of Thunnus thynnus. We detect a low but significant signal of genetic differentiation among spawning samples, that can suggest the presence of three genetically separate reproduction areas. The adult samples resulted instead genetically undifferentiated between them and from the spawning populations, indicating a presence of panmictic population of adult bluefin tuna in the Mediterranean Sea, without different meta populations.
Resumo:
Im Laufe der Evolution müssen Sauerstoff-metabolisierende Organismen eine Reihe von Anpassungen entwickelt haben, um in der zytotoxischen oxidativen Umgebung der sauerstoff-haltigen Erdatmosphäre überleben zu können. Die im Rahmen dieser Arbeit durchgeführten vergleichenden Analysen mitochondrial kodierter und kern-kodierter Proteome mehrerer hundert Spezies haben ergeben, dass die Evolution eines alternativen genetischen Codes in Mitochondrien eine moderne Adaptation in diesem Sinne war. Viele aerobe Tiere und Pilze dekodieren in Abweichung vom genetischen Standard-Code das Codon AUA als Methionin. In der vorliegenden Arbeit wird gezeigt, dass diese Spezies dadurch eine massive Akkumulation der sehr leicht oxidierbaren Aminosäure Methionin in ihren Atmungskettenkomplexen erreichen, die generell ein bevorzugtes Ziel reaktiver Sauerstoffspezies sind. Der gewonnene Befund lässt sich widerspruchsfrei nur unter Annahme einer antioxidativen Wirkung dieser Aminosäure erklären, wie sie erstmals 1996 von R. Levine anhand von Oxidationsmessungen in Modellproteinen postuliert worden war. In der vorliegenden Arbeit wird diese Hypothese nun direkt mittels neuartiger Modellsubstanzen in lebenden Zellen bestätigt. Die durchgeführten bioinformatischen Analysen und zellbiologischen Experimente belegen, dass kollektive Proteinveränderungen die Triebkraft für die Evolution abweichender genetischer Codes sein können.rnDie Bedeutung von oxidativem Stress wurde darüber hinaus auch im Referenzrahmen einer akuten oxidativen Schädigung im Einzelorganismus untersucht. Da oxidativer Stress in der Pathogenese altersassoziierter neurodegenerativer Erkrankungen wie der Alzheimerschen Krankheit prominent involviert zu sein scheint, wurden die Auswirkungungen von Umwelt-induziertem oxidativem Stress auf den histopathologischen Verlauf in einem transgenen Modell der Alzheimerschen Krankheit in vivo untersucht. Dabei wurden transgene Mäuse des Modells APP23 im Rahmen von Fütterungsversuchen einer lebenslangen Defizienz der Antioxidantien Selen oder Vitamin E ausgesetzt. Während die Selenoproteinexpression durch die selendefiziente Diät gewebespezifisch reduziert wurde, ergaben sich keine Anzeichen eines beschleunigten Auftretens pathologischer Marker wie amyloider Plaques oder Neurodegeneration. Es war vielmehr ein unerwarteter Trend hinsichtlich einer geringeren Plaquebelastung in Vitamin E-defizienten Alzheimermäusen zu erkennen. Auch wenn diese Daten aufgrund einer geringen Versuchstiergruppengröße nur mit Vorsicht interpretiert werden dürfen, so scheint doch ein Mangel an essentiellen antioxidativen Nährstoffen die Progression in einem anerkannten Alzheimermodell nicht negativ zu beeinflussen.rn
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
Parasiten der Apicomplexa umfassen sowohl humanpathogene, als auch tierpathogene Protozoen. Beispiele für wichtige Vertreter human- und tierpathogener Parasiten sind Plasmodium falciparum und Eimeria tenella. E. tenella verursacht die Kokzidiose des Hühnchens, eine Darmerkrankung die weltweit für Verluste in einer geschätzten Höhe von bis zu 3 Milliarden US$ verantwortlich zeichnet. Eine prophylaktische Vakzinierung gegen diese Krankheit ist ökonomisch meist ineffizient, und eine Behandlung mit Kokzidiostatika wird durch häufige Resistenzbildung gegen bekannte Wirkstoffe erschwert. Diese Situation erfordert die Entwicklung neuer kostengünstiger Alternativen. Geeignete Zielproteine für die Entwicklung neuartiger Arzneistoffe zur Behandlung der Kokzidiose sind die Zyklin-abhängigen Kinasen (CDKs), zu denen auch die CDK-related Kinase 2 (EtCRK2) aus E. tenella gehört. Diese Proteine sind maßgeblich an der Regulation des Zellzyklus beteiligt. Durch chemische Validierung mit dem CDK Inhibitor Flavopiridol konnte nachgewiesen werden, dass ein Funktionsverlust von CDKs in E. tenella die Vermehrung des Parasiten in Zellkultur inhibiert. E. tenella CDKs sind daher als Zielproteine für die Entwicklung einer Chemotherapie der Kokzidiose geeignet. Mittels bioinformatischer Tiefenanalysen sollten CDK Proteine im Parasiten E. tenella identifiziert werden. Das Genom von E. tenella liegt in Rohfassung vor [ftp://ftp.sanger.ac.uk]. Jedoch waren zum Zeitpunkt dieser Arbeiten viele Sequenzen des Genoms noch nicht annotiert. Homologe CDK Proteine von E. tenella konnten durch den Vergleich von Sequenzinformationen mit anderen Organismen der Apicomplexa identifiziert und analysiert werden. Durch diese Analysen konnten neben der bereits bekannten EtCRK2, drei weitere, bislang nicht annotierte CDKs in E. tenella identifiziert werden (EtCRK1, EtCRK3 sowie EtMRK). Darüber hinaus wurde eine Analyse der entsprechenden Zykline – der Aktivatoren der CDKs – bezüglich Funktion und Struktur, sowie eine Datenbanksuche nach bisher nicht beschriebenen Zyklinen in E. tenella durchgeführt. Diese Suchen ergaben vier neue potentielle Zykline für E. tenella, wovon EtCYC3a als Aktivator der EtCRK2 von María L. Suárez Fernández (Intervet Innovation GmbH, Schwabenheim) bestätigt werden konnte. Sequenzvergleiche lassen vermuten, dass auch EtCYC1 und EtCYC3b in der Lage sind, EtCRK2 zu aktivieren. Außerdem ist anzunehmen, dass EtCYC4 als Aktivator der EtCRK1 fungiert. Ein weiterer Schwerpunkt der vorliegenden Arbeit war die Suche und Optimierung nach neuen Inhibitoren von CDKs aus E. tenella. In vorangegangenen Arbeiten konnten bereits Inhibitoren der EtCRK2 gefunden werden [BEYER, 2007]. Mittels Substruktur- und Ähnlichkeitssuchen konnten im Rahmen dieser Arbeit weitere Inhibitoren der EtCRK2 identifiziert werden. Vier dieser Strukturklassen erfüllen die Kriterien einer Leitstruktur. Eine dieser Leitstrukturen gehört zur Strukturklasse der Benzimidazol-Carbonitrile und ist bislang nicht als Inhibitor anderer Kinasen beschrieben. Diese neu identifizierte Leitstruktur konnte in silico weiter optimiert werden. Im Rahmen dieser Arbeit wurden Bindungsenergien von Vertretern dieser Strukturklasse berechnet, um einen wahrscheinlichen Bindemodus vorherzusagen. Für die weiterführende in silico Optimierung wurde eine virtuelle kombinatorische Substanzbibliothek dieser Klasse erstellt. Die Auswahl geeigneter Verbindungen für eine chemische Synthese erfolgte durch molekulares Docking unter Nutzung von Homologiemodellen der EtCRK2. Darüber hinaus wurde ein in silico Screening nach potentiellen Inhibitoren der PfMRK und EtMRK durchgeführt. Dabei konnten weitere interessante virtuelle Hit-Strukturen aus einer Substanzdatenbank kommerziell erhältlicher Verbindungen gefunden werden. Durch dieses virtuelle Screening konnten jeweils sieben Verbindungen als virtuelle Hits der PfMRK sowie der EtMRK identifiziert werden. Die Häufung von Strukturklassen mit bekannter CDK Aktivität deutet darauf hin, dass während des virtuellen Screenings eine Anreicherung von CDK Inhibitoren stattgefunden hat. Diese Ergebnisse lassen auf eine Weiterentwicklung neuer Wirkstoffe gegen Kokzidiose und Malaria hoffen.
Resumo:
In this thesis we address a collection of Network Design problems which are strongly motivated by applications from Telecommunications, Logistics and Bioinformatics. In most cases we justify the need of taking into account uncertainty in some of the problem parameters, and different Robust optimization models are used to hedge against it. Mixed integer linear programming formulations along with sophisticated algorithmic frameworks are designed, implemented and rigorously assessed for the majority of the studied problems. The obtained results yield the following observations: (i) relevant real problems can be effectively represented as (discrete) optimization problems within the framework of network design; (ii) uncertainty can be appropriately incorporated into the decision process if a suitable robust optimization model is considered; (iii) optimal, or nearly optimal, solutions can be obtained for large instances if a tailored algorithm, that exploits the structure of the problem, is designed; (iv) a systematic and rigorous experimental analysis allows to understand both, the characteristics of the obtained (robust) solutions and the behavior of the proposed algorithm.
Resumo:
Il progresso tecnologico nel campo della biologia molecolare, pone la comunità scientifica di fronte all’esigenza di dare un’interpretazione all’enormità di sequenze biologiche che a mano a mano vanno a costituire le banche dati, siano esse proteine o acidi nucleici. In questo contesto la bioinformatica gioca un ruolo di primaria importanza. Un nuovo livello di possibilità conoscitive è stato introdotto con le tecnologie di Next Generation Sequencing (NGS), per mezzo delle quali è possibile ottenere interi genomi o trascrittomi in poco tempo e con bassi costi. Tra le applicazioni del NGS più rilevanti ci sono senza dubbio quelle oncologiche che prevedono la caratterizzazione genomica di tessuti tumorali e lo sviluppo di nuovi approcci diagnostici e terapeutici per il trattamento del cancro. Con l’analisi NGS è possibile individuare il set completo di variazioni che esistono nel genoma tumorale come varianti a singolo nucleotide, riarrangiamenti cromosomici, inserzioni e delezioni. Va però sottolineato che le variazioni trovate nei geni vanno in ultima battuta osservate dal punto di vista degli effetti a livello delle proteine in quanto esse sono le responsabili più dirette dei fenotipi alterati riscontrabili nella cellula tumorale. L’expertise bioinformatica va quindi collocata sia a livello dell’analisi del dato prodotto per mezzo di NGS ma anche nelle fasi successive ove è necessario effettuare l’annotazione dei geni contenuti nel genoma sequenziato e delle relative strutture proteiche che da esso sono espresse, o, come nel caso dello studio mutazionale, la valutazione dell’effetto della variazione genomica. È in questo contesto che si colloca il lavoro presentato: da un lato lo sviluppo di metodologie computazionali per l’annotazione di sequenze proteiche e dall’altro la messa a punto di una pipeline di analisi di dati prodotti con tecnologie NGS in applicazioni oncologiche avente come scopo finale quello della individuazione e caratterizzazione delle mutazioni genetiche tumorali a livello proteico.