738 resultados para Annotation de génomes


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Self-incompatibility (SI) systems have evolved in many flowering plants to prevent self-fertilization and thus promote outbreeding. Pear and apple, as many of the species belonging to the Rosaceae, exhibit RNase-mediated gametophytic self-incompatibility, a widespread system carried also by the Solanaceae and Plantaginaceae. Pear orchards must for this reason contain at least two different cultivars that pollenize each other; to guarantee an efficient cross-pollination, they should have overlapping flowering periods and must be genetically compatible. This compatibility is determined by the S-locus, containing at least two genes encoding for a female (pistil) and a male (pollen) determinant. The female determinant in the Rosaceae, Solanaceae and Plantaginaceae system is a stylar glycoprotein with ribonuclease activity (S-RNase), that acts as a specific cytotoxin in incompatible pollen tubes degrading cellular RNAs. Since its identification, the S-RNase gene has been intensively studied and the sequences of a large number of alleles are available in online databases. On the contrary, the male determinant has been only recently identified as a pollen-expressed protein containing a F-box motif, called S-Locus F-box (abbreviated SLF or SFB). Since F-box proteins are best known for their participation to the SCF (Skp1 - Cullin - F-box) E3 ubiquitine ligase enzymatic complex, that is involved in protein degradation through the 26S proteasome pathway, the male determinant is supposed to act mediating the ubiquitination of the S-RNases, targeting them for the degradation in compatible pollen tubes. Attempts to clone SLF/SFB genes in the Pyrinae produced no results until very recently; in apple, the use of genomic libraries allowed the detection of two F-box genes linked to each S haplotype, called SFBB (S-locus F-Box Brothers). In Japanese pear, three SFBB genes linked to each haplotype were cloned from pollen cDNA. The SFBB genes exhibit S haplotype-specific sequence divergence and pollen-specific expression; their multiplicity is a feature whose interpretation is unclear: it has been hypothesized that all of them participate in the S-specific interaction with the RNase, but it is also possible that only one of them is involved in this function. Moreover, even if the S locus male and female determinants are the only responsible for the specificity of the pollen-pistil recognition, many other factors are supposed to play a role in GSI; these are not linked to the S locus and act in a S-haplotype independent manner. They can have a function in regulating the expression of S determinants (group 1 factors), modulating their activity (group 2) or acting downstream, in the accomplishment of the reaction of acceptance or rejection of the pollen tube (group 3). This study was aimed to the elucidation of the molecular mechanism of GSI in European pear (Pyrus communis) as well as in the other Pyrinae; it was divided in two parts, the first focusing on the characterization of male determinants, and the second on factors external to the S locus. The research of S locus F-box genes was primarily aimed to the identification of such genes in European pear, for which sequence data are still not available; moreover, it allowed also to investigate about the S locus structure in the Pyrinae. The analysis was carried out on a pool of varieties of the three species Pyrus communis (European pear), Pyrus pyrifolia (Japanese pear), and Malus × domestica (apple); varieties carrying S haplotypes whose RNases are highly similar were chosen, in order to check whether or not the same level of similarity is maintained also between the male determinants. A total of 82 sequences was obtained, 47 of which represent the first S-locus F-box genes sequenced from European pear. The sequence data strongly support the hypothesis that the S locus structure is conserved among the three species, and presumably among all the Pyrinae; at least five genes have homologs in the analysed S haplotypes, but the number of F-box genes surrounding the S-RNase could be even greater. The high level of sequence divergence and the similarity between alleles linked to highly conserved RNases, suggest a shared ancestral polymorphism also for the F-box genes. The F-box genes identified in European pear were mapped on a segregating population of 91 individuals from the cross 'Abbé Fétel' × 'Max Red Bartlett'. All the genes were placed on the linkage group 17, where the S locus has been placed both in pear and apple maps, and resulted strongly associated to the S-RNase gene. The linkage with the RNase was perfect for some of the F-box genes, while for others very rare single recombination events were identified. The second part of this study was focused on the research of other genes involved in the SI response in pear; it was aimed on one side to the identification of genes differentially expressed in compatible and incompatible crosses, and on the other to the cloning and characterization of the transglutaminase (TGase) gene, whose role may be crucial in pollen rejection. For the identification of differentially expressed genes, controlled pollinations were carried out in four combinations (self pollination, incompatible, half-compatible and fully compatible cross-pollination); expression profiles were compared through cDNA-AFLP. 28 fragments displaying an expression pattern related to compatibility or incompatibility were identified, cloned and sequenced; the sequence analysis allowed to assign a putative annotation to a part of them. The identified genes are involved in very different cellular processes or in defense mechanisms, suggesting a very complex change in gene expression following the pollen/pistil recognition. The pool of genes identified with this technique offers a good basis for further study toward a better understanding of how the SI response is carried out. Among the factors involved in SI response, moreover, an important role may be played by transglutaminase (TGase), an enzyme involved both in post-translational protein modification and in protein cross-linking. The TGase activity detected in pear styles was significantly higher when pollinated in incompatible combinations than in compatible ones, suggesting a role of this enzyme in the abnormal cytoskeletal reorganization observed during pollen rejection reaction. The aim of this part of the work was thus to identify and clone the pear TGase gene; the PCR amplification of fragments of this gene was achieved using primers realized on the alignment between the Arabidopsis TGase gene sequence and several apple EST fragments; the full-length coding sequence of the pear TGase gene was then cloned from cDNA, and provided a precious tool for further study of the in vitro and in vivo action of this enzyme.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The construction and use of multimedia corpora has been advocated for a while in the literature as one of the expected future application fields of Corpus Linguistics. This research project represents a pioneering experience aimed at applying a data-driven methodology to the study of the field of AVT, similarly to what has been done in the last few decades in the macro-field of Translation Studies. This research was based on the experience of Forlixt 1, the Forlì Corpus of Screen Translation, developed at the University of Bologna’s Department of Interdisciplinary Studies in Translation, Languages and Culture. As a matter of fact, in order to quantify strategies of linguistic transfer of an AV product, we need to take into consideration not only the linguistic aspect of such a product but all the meaning-making resources deployed in the filmic text. Provided that one major benefit of Forlixt 1 is the combination of audiovisual and textual data, this corpus allows the user to access primary data for scientific investigation, and thus no longer rely on pre-processed material such as traditional annotated transcriptions. Based on this rationale, the first chapter of the thesis sets out to illustrate the state of the art of research in the disciplinary fields involved. The primary objective was to underline the main repercussions on multimedia texts resulting from the interaction of a double support, audio and video, and, accordingly, on procedures, means, and methods adopted in their translation. By drawing on previous research in semiotics and film studies, the relevant codes at work in visual and acoustic channels were outlined. Subsequently, we concentrated on the analysis of the verbal component and on the peculiar characteristics of filmic orality as opposed to spontaneous dialogic production. In the second part, an overview of the main AVT modalities was presented (dubbing, voice-over, interlinguistic and intra-linguistic subtitling, audio-description, etc.) in order to define the different technologies, processes and professional qualifications that this umbrella term presently includes. The second chapter focuses diachronically on various theories’ contribution to the application of Corpus Linguistics’ methods and tools to the field of Translation Studies (i.e. Descriptive Translation Studies, Polysystem Theory). In particular, we discussed how the use of corpora can favourably help reduce the gap existing between qualitative and quantitative approaches. Subsequently, we reviewed the tools traditionally employed by Corpus Linguistics in regard to the construction of traditional “written language” corpora, to assess whether and how they can be adapted to meet the needs of multimedia corpora. In particular, we reviewed existing speech and spoken corpora, as well as multimedia corpora specifically designed to investigate Translation. The third chapter reviews Forlixt 1's main developing steps, from a technical (IT design principles, data query functions) and methodological point of view, by laying down extensive scientific foundations for the annotation methods adopted, which presently encompass categories of pragmatic, sociolinguistic, linguacultural and semiotic nature. Finally, we described the main query tools (free search, guided search, advanced search and combined search) and the main intended uses of the database in a pedagogical perspective. The fourth chapter lists specific compilation criteria retained, as well as statistics of the two sub-corpora, by presenting data broken down by language pair (French-Italian and German-Italian) and genre (cinema’s comedies, television’s soapoperas and crime series). Next, we concentrated on the discussion of the results obtained from the analysis of summary tables reporting the frequency of categories applied to the French-Italian sub-corpus. The detailed observation of the distribution of categories identified in the original and dubbed corpus allowed us to empirically confirm some of the theories put forward in the literature and notably concerning the nature of the filmic text, the dubbing process and Italian dubbed language’s features. This was possible by looking into some of the most problematic aspects, like the rendering of socio-linguistic variation. The corpus equally allowed us to consider so far neglected aspects, such as pragmatic, prosodic, kinetic, facial, and semiotic elements, and their combination. At the end of this first exploration, some specific observations concerning possible macrotranslation trends were made for each type of sub-genre considered (cinematic and TV genre). On the grounds of this first quantitative investigation, the fifth chapter intended to further examine data, by applying ad hoc models of analysis. Given the virtually infinite number of combinations of categories adopted, and of the latter with searchable textual units, three possible qualitative and quantitative methods were designed, each of which was to concentrate on a particular translation dimension of the filmic text. The first one was the cultural dimension, which specifically focused on the rendering of selected cultural references and on the investigation of recurrent translation choices and strategies justified on the basis of the occurrence of specific clusters of categories. The second analysis was conducted on the linguistic dimension by exploring the occurrence of phrasal verbs in the Italian dubbed corpus and by ascertaining the influence on the adoption of related translation strategies of possible semiotic traits, such as gestures and facial expressions. Finally, the main aim of the third study was to verify whether, under which circumstances, and through which modality, graphic and iconic elements were translated into Italian from an original corpus of both German and French films. After having reviewed the main translation techniques at work, an exhaustive account of possible causes for their non-translation was equally provided. By way of conclusion, the discussion of results obtained from the distribution of annotation categories on the French-Italian corpus, as well as the application of specific models of analysis allowed us to underline possible advantages and drawbacks related to the adoption of a corpus-based approach to AVT studies. Even though possible updating and improvement were proposed in order to help solve some of the problems identified, it is argued that the added value of Forlixt 1 lies ultimately in having created a valuable instrument, allowing to carry out empirically-sound contrastive studies that may be usefully replicated on different language pairs and several types of multimedia texts. Furthermore, multimedia corpora can also play a crucial role in L2 and translation teaching, two disciplines in which their use still lacks systematic investigation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the post genomic era with the massive production of biological data the understanding of factors affecting protein stability is one of the most important and challenging tasks for highlighting the role of mutations in relation to human maladies. The problem is at the basis of what is referred to as molecular medicine with the underlying idea that pathologies can be detailed at a molecular level. To this purpose scientific efforts focus on characterising mutations that hamper protein functions and by these affect biological processes at the basis of cell physiology. New techniques have been developed with the aim of detailing single nucleotide polymorphisms (SNPs) at large in all the human chromosomes and by this information in specific databases are exponentially increasing. Eventually mutations that can be found at the DNA level, when occurring in transcribed regions may then lead to mutated proteins and this can be a serious medical problem, largely affecting the phenotype. Bioinformatics tools are urgently needed to cope with the flood of genomic data stored in database and in order to analyse the role of SNPs at the protein level. In principle several experimental and theoretical observations are suggesting that protein stability in the solvent-protein space is responsible of the correct protein functioning. Then mutations that are found disease related during DNA analysis are often assumed to perturb protein stability as well. However so far no extensive analysis at the proteome level has investigated whether this is the case. Also computationally methods have been developed to infer whether a mutation is disease related and independently whether it affects protein stability. Therefore whether the perturbation of protein stability is related to what it is routinely referred to as a disease is still a big question mark. In this work we have tried for the first time to explore the relation among mutations at the protein level and their relevance to diseases with a large-scale computational study of the data from different databases. To this aim in the first part of the thesis for each mutation type we have derived two probabilistic indices (for 141 out of 150 possible SNPs): the perturbing index (Pp), which indicates the probability that a given mutation effects protein stability considering all the “in vitro” thermodynamic data available and the disease index (Pd), which indicates the probability of a mutation to be disease related, given all the mutations that have been clinically associated so far. We find with a robust statistics that the two indexes correlate with the exception of all the mutations that are somatic cancer related. By this each mutation of the 150 can be coded by two values that allow a direct comparison with data base information. Furthermore we also implement computational methods that starting from the protein structure is suited to predict the effect of a mutation on protein stability and find that overpasses a set of other predictors performing the same task. The predictor is based on support vector machines and takes as input protein tertiary structures. We show that the predicted data well correlate with the data from the databases. All our efforts therefore add to the SNP annotation process and more importantly found the relationship among protein stability perturbation and the human variome leading to the diseasome.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Grape berry is considered a non climacteric fruit, but there are some evidences that ethylene plays a role in the control of berry ripening. This PhD thesis aimed to give insights in the role of ethylene and ethylene-related genes in the regulation of grape berry ripening. During this study a small increase in ethylene concentration one week before véraison has been measured in Vitis vinifera L. ‘Pinot Noir’ grapes confirming previous findings in ‘Cabernet Sauvignon’. In addition, ethylene-related genes have been identified in the grapevine genome sequence. Similarly to other species, biosynthesis and ethylene receptor genes are present in grapevine as multi-gene families and their expression appeared tissue or developmental specific. All the other elements of the ethylene signal transduction cascade were also identified in the grape genome. Among them, there were ethylene response factors (ERF) which modulate the transcription of many effector genes in response to ethylene. In this study seven grapevine ERFs have been characterized and they showed tissue and berry development specific expression profiles. Two sequences, VvERF045 and VvERF063, seemed likely involved in berry ripening control due to their expression profiles and their sequence annotation. VvERF045 was induced before véraison and was specific of the ripe berry, by sequence similarity it was likely a transcription activator. VvERF063 displayed high sequence similarity to repressors of transcription and its expression, very high in green berries, was lowest at véraison and during ripening. To functionally characterize VvERF045 and VvERF063, a stable transformation strategy was chosen. Both sequences were cloned in vectors for over-expression and silencing and transferred in grape by Agrobacterium-mediated or biolistic-mediated gene transfer. In vitro, transgenic VvERF045 over-expressing plants displayed an epinastic phenotype whose extent was correlated to the transgene expression level. Four pathogen stress response genes were significantly induced in the transgenic plants, suggesting a putative function of VvERF045 in biotic stress defense during berry ripening. Further molecular analysis on the transgenic plants will help in identifying the actual VvERF045 target genes and together with the phenotypic characterization of the adult transgenic plants, will allow to extensively define the role of VvERF045 in berry ripening.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is currently widely accepted that the understanding of complex cell functions depends on an integrated network theoretical approach and not on an isolated view of the different molecular agents. Aim of this thesis was the examination of topological properties that mirror known biological aspects by depicting the human protein network with methods from graph- and network theory. The presented network is a partial human interactome of 9222 proteins and 36324 interactions, consisting of single interactions reliably extracted from peer-reviewed scientific publications. In general, one can focus on intra- or intermodular characteristics, where a functional module is defined as "a discrete entity whose function is separable from those of other modules". It is found that the presented human network is also scale-free and hierarchically organised, as shown for yeast networks before. The interactome also exhibits proteins with high betweenness and low connectivity which are biologically analyzed and interpreted here as shuttling proteins between organelles (e.g. ER to Golgi, internal ER protein translocation, peroxisomal import, nuclear pores import/export) for the first time. As an optimisation for finding proteins that connect modules, a new method is developed here based on proteins located between highly clustered regions, rather than regarding highly connected regions. As a proof of principle, the Mediator complex is found in first place, the prime example for a connector complex. Focusing on intramodular aspects, the measurement of k-clique communities discriminates overlapping modules very well. Twenty of the largest identified modules are analysed in detail and annotated to known biological structures (e.g. proteasome, the NFκB-, TGF-β complex). Additionally, two large and highly interconnected modules for signal transducer and transcription factor proteins are revealed, separated by known shuttling proteins. These proteins yield also the highest number of redundant shortcuts (by calculating the skeleton), exhibit the highest numbers of interactions and might constitute highly interconnected but spatially separated rich-clubs either for signal transduction or for transcription factors. This design principle allows manifold regulatory events for signal transduction and enables a high diversity of transcription events in the nucleus by a limited set of proteins. Altogether, biological aspects are mirrored by pure topological features, leading to a new view and to new methods that assist the annotation of proteins to biological functions, structures and subcellular localisations. As the human protein network is one of the most complex networks at all, these results will be fruitful for other fields of network theory and will help understanding complex network functions in general.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Il progresso tecnologico nel campo della biologia molecolare, pone la comunità scientifica di fronte all’esigenza di dare un’interpretazione all’enormità di sequenze biologiche che a mano a mano vanno a costituire le banche dati, siano esse proteine o acidi nucleici. In questo contesto la bioinformatica gioca un ruolo di primaria importanza. Un nuovo livello di possibilità conoscitive è stato introdotto con le tecnologie di Next Generation Sequencing (NGS), per mezzo delle quali è possibile ottenere interi genomi o trascrittomi in poco tempo e con bassi costi. Tra le applicazioni del NGS più rilevanti ci sono senza dubbio quelle oncologiche che prevedono la caratterizzazione genomica di tessuti tumorali e lo sviluppo di nuovi approcci diagnostici e terapeutici per il trattamento del cancro. Con l’analisi NGS è possibile individuare il set completo di variazioni che esistono nel genoma tumorale come varianti a singolo nucleotide, riarrangiamenti cromosomici, inserzioni e delezioni. Va però sottolineato che le variazioni trovate nei geni vanno in ultima battuta osservate dal punto di vista degli effetti a livello delle proteine in quanto esse sono le responsabili più dirette dei fenotipi alterati riscontrabili nella cellula tumorale. L’expertise bioinformatica va quindi collocata sia a livello dell’analisi del dato prodotto per mezzo di NGS ma anche nelle fasi successive ove è necessario effettuare l’annotazione dei geni contenuti nel genoma sequenziato e delle relative strutture proteiche che da esso sono espresse, o, come nel caso dello studio mutazionale, la valutazione dell’effetto della variazione genomica. È in questo contesto che si colloca il lavoro presentato: da un lato lo sviluppo di metodologie computazionali per l’annotazione di sequenze proteiche e dall’altro la messa a punto di una pipeline di analisi di dati prodotti con tecnologie NGS in applicazioni oncologiche avente come scopo finale quello della individuazione e caratterizzazione delle mutazioni genetiche tumorali a livello proteico.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Erkrankungen des Skelettapparats wie beispielsweise die Osteoporose oder Arthrose gehören neben den Herz-Kreislauferkrankungen und Tumoren zu den Häufigsten Erkrankungen des Menschen. Ein besseres Verständnis der Bildung und des Erhalts von Knochen- oder Knorpelgewebe ist deshalb von besonderer Bedeutung. Viele bisherige Ansätze zur Identifizierung hierfür relevanter Gene, deren Produkte und Interaktionen beruhen auf der Untersuchung pathologischer Situationen. Daher ist die Funktion vieler Gene nur im Zusammenhang mit Krankheiten beschrieben. Untersuchungen, die die Genaktivität bei der Normalentwicklung von knochen- und knorpelbildenden Geweben zum Ziel haben, sind dagegen weit weniger oft durchgeführt worden. rnEines der entwicklungsphysiologisch interessantesten Gewebe ist die Epiphysenfuge der Röhrenknochen. In dieser sogenannten Wachstumsfuge ist insbesondere beim fötalen Gewebe eine sehr hohe Aktivität derjenigen Gene zu erwarten, die an der Knochen- und Knorpelbildung beteiligt sind. In der vorliegenden Arbeit wurde daher aus der Epiphysenfuge von Kälberknochen RNA isoliert und eine cDNA-Bibliothek konstruiert. Von dieser wurden ca. 4000 Klone im Rahmen eines klassischen EST-Projekts sequenziert. Durch die Analyse konnte ein ungefähr 900 Gene umfassendes Expressionsprofil erstellt werden und viele Transkripte für Komponenten der regulatorischen und strukturbildenden Bestandteile der Knochen- und Knorpelentwicklung identifiziert werden. Neben den typischen Genen für Komponenten der Knochenentwicklung sind auch deutlich Bestandteile für embryonale Entwicklungsprozesse vertreten. Zu ersten gehören in erster Linie die Kollagene, allen voran Kollagen II alpha 1, das mit Abstand höchst exprimierte Gen in der fötalen Wachstumsfuge. Nach den ribosomalen Proteinen stellen die Kollagene mit ca. 10 % aller auswertbaren Sequenzen die zweitgrößte Gengruppe im erstellten Expressionsprofil dar. Proteoglykane und andere niedrig exprimierte regulatorische Elemente, wie Transkriptionsfaktoren, konnten im EST-Projekt aufgrund der geringen Abdeckung nur in sehr geringer Kopienzahl gefunden werden. Allerdings förderte die EST-Analyse mehrere interessante, bisher nicht bekannte Transkripte zutage, die detaillierter untersucht wurden. Dazu gehören Transkripte die, die dem LOC618319 zugeordnet werden konnten. Neben den bisher beschriebenen drei Exonbereichen konnte ein weiteres Exon im 3‘-UTR identifiziert werden. Im abgeleiteten Protein, das mindestens 121 AS lang ist, wurden ein Signalpeptid und eine Transmembrandomäne nachgewiesen. In Verbindung mit einer möglichen Glykosylierung ist das Genprodukt in die Gruppe der Proteoglykane einzuordnen. Leicht abweichend von den typischen Strukturen knochen- und knorpelspezifischer Proteoglykane ist eine mögliche Funktion dieses Genprodukts bei der Interaktion mit Integrinen und der Zell-Zellinteraktion, aber auch bei der Signaltransduktion denkbar. rnDie EST-Sequenzierungen von ca. 4000 cDNA-Klonen können aber in der Regel nur einen Bruchteil der möglichen Transkripte des untersuchten Gewebes abdecken. Mit den neuen Sequenziertechnologien des „Next Generation Sequencing“ bestehen völlig neue Möglichkeiten, komplette Transkriptome mit sehr hoher Abdeckung zu sequenzieren und zu analysieren. Zur Unterstützung der EST-Daten und zur deutlichen Verbreiterung der Datenbasis wurde das Transkriptom der bovinen fötalen Wachstumsfuge sowohl mit Hilfe der Roche-454/FLX- als auch der Illumina-Solexa-Technologie sequenziert. Bei der Auswertung der ca. 40000 454- und 75 Millionen Illumina-Sequenzen wurden Verfahren zur allgemeinen Handhabung, der Qualitätskontrolle, dem „Clustern“, der Annotation und quantitativen Auswertung von großen Mengen an Sequenzdaten etabliert. Beim Vergleich der Hochdurchsatz Blast-Analysen im klassischen „Read-Count“-Ansatz mit dem erstellten EST-Expressionsprofil konnten gute Überstimmungen gezeigt werden. Abweichungen zwischen den einzelnen Methoden konnten nicht in allen Fällen methodisch erklärt werden. In einigen Fällen sind Korrelationen zwischen Transkriptlänge und „Read“-Verteilung zu erkennen. Obwohl schon simple Methoden wie die Normierung auf RPKM („reads per kilo base transkript per million mappable reads“) eine Verbesserung der Interpretation ermöglichen, konnten messtechnisch durch die Art der Sequenzierung bedingte systematische Fehler nicht immer ausgeräumt werden. Besonders wichtig ist daher die geeignete Normalisierung der Daten beim Vergleich verschieden generierter Datensätze. rnDie hier diskutierten Ergebnisse aus den verschiedenen Analysen zeigen die neuen Sequenziertechnologien als gute Ergänzung und potentiellen Ersatz für etablierte Methoden zur Genexpressionsanalyse.rn

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Identification and genetic diversity of phytoplasmas infecting tropical plant species, selected among those most agronomically relevant in South-east Asia and Latin America were studied. Correlation between evolutionary divergence of relevant phytoplasma strains and their geographic distribution by comparison on homologous genes of phytoplasma strains detected in the same or related plant species in other geographical areas worldwide was achieved. Molecular diversity was studied on genes coding ribosomal proteins, groEL, tuf and amp besides phytoplasma 16S rRNA. Selected samples infected by phytoplasmas belonging to diverse ribosomal groups were also studied by in silico RFLP followed by phylogenetic analyses. Moreover a partial genome annotation of a ‘Ca. P. brasiliense’ strain was done towards future application for epidemiological studies. Phytoplasma presence in cassava showing frog skin (CFSD) and witches’ broom (CWB) diseases in Costa Rica - Paraguay and in Vietnam – Thailand, respectively, was evaluated. In both cases, the diseases were associated with phytoplasmas related to aster yellows, apple proliferation and “stolbur” groups, while only phytoplasma related to X-disease group in CFSD, and to hibiscus witches’ broom, elm yellows and clover proliferation groups in CWB. Variability was found among strains belonging to the same ribosomal group but having different geographic origin and associated with different disease. Additionally, a dodder transmission assay to elucidate the role of phytoplasmas in CWB disease was carried out, and resulted in typical phytoplasma symptoms in periwinkle plants associated with the presence of aster yellows-related strains. Lethal wilt disease, a severe disease of oil palm in Colombia that is spreading throughout South America was also studied. Phytoplasmas were detected in symptomatic oil palm and identified as ‘Ca. P. asteris’, ribosomal subgroup 16SrI-B, and were distinguished from other aster yellows phytoplasmas used as reference strains; in particular, from an aster yellows strain infecting corn in the same country.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biological data are inherently interconnected: protein sequences are connected to their annotations, the annotations are structured into ontologies, and so on. While protein-protein interactions are already represented by graphs, in this work I am presenting how a graph structure can be used to enrich the annotation of protein sequences thanks to algorithms that analyze the graph topology. We also describe a novel solution to restrict the data generation needed for building such a graph, thanks to constraints on the data and dynamic programming. The proposed algorithm ideally improves the generation time by a factor of 5. The graph representation is then exploited to build a comprehensive database, thanks to the rising technology of graph databases. While graph databases are widely used for other kind of data, from Twitter tweets to recommendation systems, their application to bioinformatics is new. A graph database is proposed, with a structure that can be easily expanded and queried.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In der vorliegenden Arbeit wurden Essigsäure-, Propionsäure und Buttersäure-bildende Bakterien aus einer thermophilen und drei mesophilen Biogasanlagen sowie aus zwei Hochdruck-Biogas-Laborfermentern isoliert. Die Fermenter waren mit dem nachwachsenden Rohstoff Maissilage, teilweise mit Rinder- oder Schweinegülle und weiteren festen Inputstoffen gefüttert. Für die Isolierung von Säure-bildenden Bakterien wurde ein Mineralsalzmedium verwendet, welchem als Kohlenstoffquelle Na-DL-Laktat, Succinat, Ethanol, Glycerin, Glucose oder eine Aminosäuremischung (Alanin, Serin, Threonin, Glutaminsäure, Methionin und Cystein) hinzugefügt wurde. Hierbei handelt es sich um Substrate, welche beim anaeroben Abbau während der Hydrolyse oder der primären Gärung entstehen können. Die erhaltenen Isolate waren in der Lage, aus diesen Substraten Essigsäure, Propionsäure oder Buttersäure zu bilden. Insgesamt wurden aus den beprobten Anlagen 49 Isolate gewonnen, welche zu den Phyla Firmicutes, Tenericutes oder Thermotogae gehörten. Mit Hilfe von 16S rDNA-Sequenzen konnten die meisten Isolate als Clostridium sporosphaeroides, Defluviitoga tunisiensis und Dendrosporobacter sp. identifiziert werden. Die Bildung von Essigsäure, Propionsäure oder Buttersäure wurde in Kulturen von Isolaten festgestellt, welche als folgende Arten identifiziert wurden: Bacillus thermoamylovorans, Clostridium aminovalericum, Clostridium cochlearium/Clostridium tetani, Clostridium sporosphaeroides, Dendrosporobacter sp., Proteiniborus sp., Selenomonas bovis und Tepidanaerobacter sp. Zwei Isolate, verwandt mit Thermoanaerobacterium thermosaccharolyticum, konnten Buttersäure und Milchsäure bilden. In Kulturen von Defluviitoga tunisiensis wurde Essigsäurebildung festgestellt. Ein Vergleich der 16S rDNA-Sequenzen mit Datenbanken und die Ergebnisse der PCR-Amplifikationen mit Isolat-spezifischen Primerpaaren ergaben zusätzlich Hinweise, dass es sich bei einigen Isolaten um neue Arten handeln könnte (z. B. Stamm Tepidanaerobacter sp. AS34, Stamm Proteiniborus sp. ASG1.4, Stamm Dendrosporobacter sp. LG2.4, Stamm Desulfotomaculum sp. EG2.4, Stamm Gallicola sp. SG1.4B und Stamm Acholeplasma sp. ASSH51). Durch die Entwicklung Isolat-spezifischer Primerpaare, abgeleitet von 16S rDNA-Sequenzen der Isolate oder Referenzstämmen, konnten die Isolate in Biogasanlagen detektiert und mittels qPCR quantifiziert werden (hauptsächlich im Bereich zwischen 1000 bis 100000000 Kopien der 16S rDNA/g BGA-Probe). Weiterhin konnten die Isolate mit Hilfe physiologischer Versuche charakterisiert und deren Rolle in der anaeroben Abbaukette diskutiert werden. Die Art Defluviitoga tunisiensis scheint eine große Bedeutung in Biogasanlagen zu spielen. Defluviitoga tunisiensis wurde am häufigsten in Untersuchungen im Rahmen der vorliegenden Arbeit isoliert und konnte auch mit Hilfe des entwickelten Primerpaares in hohen Abundanzen in den beprobten Biogasanlagen detektiert werden (10000 - 100000000 Kopien der 16S rDNA/g BGA-Probe). Die manuelle Annotation des Gesamtgenoms sowie die Substratverwertungsversuche haben gezeigt, dass Defluviitoga tunisiensis ein sehr breites Substratspektrum in der Verwertung von Kohlenhydraten besitzt und dadurch möglicherweise eine wichtige Rolle bei der Verwertung von Biomasse in Biogasanlagen einnimmt. Mit Hilfe der Ergebnisse der vorliegenden Arbeit konnten somit neue Einblicke in die zweite Stufe des anaeroben Abbaus, die Acidogenese, in Biogasanlagen gegeben werden. rn

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Moderne ESI-LC-MS/MS-Techniken erlauben in Verbindung mit Bottom-up-Ansätzen eine qualitative und quantitative Charakterisierung mehrerer tausend Proteine in einem einzigen Experiment. Für die labelfreie Proteinquantifizierung eignen sich besonders datenunabhängige Akquisitionsmethoden wie MSE und die IMS-Varianten HDMSE und UDMSE. Durch ihre hohe Komplexität stellen die so erfassten Daten besondere Anforderungen an die Analysesoftware. Eine quantitative Analyse der MSE/HDMSE/UDMSE-Daten blieb bislang wenigen kommerziellen Lösungen vorbehalten. rn| In der vorliegenden Arbeit wurden eine Strategie und eine Reihe neuer Methoden zur messungsübergreifenden, quantitativen Analyse labelfreier MSE/HDMSE/UDMSE-Daten entwickelt und als Software ISOQuant implementiert. Für die ersten Schritte der Datenanalyse (Featuredetektion, Peptid- und Proteinidentifikation) wird die kommerzielle Software PLGS verwendet. Anschließend werden die unabhängigen PLGS-Ergebnisse aller Messungen eines Experiments in einer relationalen Datenbank zusammengeführt und mit Hilfe der dedizierten Algorithmen (Retentionszeitalignment, Feature-Clustering, multidimensionale Normalisierung der Intensitäten, mehrstufige Datenfilterung, Proteininferenz, Umverteilung der Intensitäten geteilter Peptide, Proteinquantifizierung) überarbeitet. Durch diese Nachbearbeitung wird die Reproduzierbarkeit der qualitativen und quantitativen Ergebnisse signifikant gesteigert.rn| Um die Performance der quantitativen Datenanalyse zu evaluieren und mit anderen Lösungen zu vergleichen, wurde ein Satz von exakt definierten Hybridproteom-Proben entwickelt. Die Proben wurden mit den Methoden MSE und UDMSE erfasst, mit Progenesis QIP, synapter und ISOQuant analysiert und verglichen. Im Gegensatz zu synapter und Progenesis QIP konnte ISOQuant sowohl eine hohe Reproduzierbarkeit der Proteinidentifikation als auch eine hohe Präzision und Richtigkeit der Proteinquantifizierung erreichen.rn| Schlussfolgernd ermöglichen die vorgestellten Algorithmen und der Analyseworkflow zuverlässige und reproduzierbare quantitative Datenanalysen. Mit der Software ISOQuant wurde ein einfaches und effizientes Werkzeug für routinemäßige Hochdurchsatzanalysen labelfreier MSE/HDMSE/UDMSE-Daten entwickelt. Mit den Hybridproteom-Proben und den Bewertungsmetriken wurde ein umfassendes System zur Evaluierung quantitativer Akquisitions- und Datenanalysesysteme vorgestellt.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Enterococcus hirae ATCC 9790 is a Gram-positive lactic acid bacterium that has been used in basic research for over 4 decades. Here we report the sequence and annotation of the 2.8-Mb genome of E. hirae and its endemic 29-kb plasmid pTG9790.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the advent of high through-put sequencing (HTS), the emerging science of metagenomics is transforming our understanding of the relationships of microbial communities with their environments. While metagenomics aims to catalogue the genes present in a sample through assessing which genes are actively expressed, metatranscriptomics can provide a mechanistic understanding of community inter-relationships. To achieve these goals, several challenges need to be addressed from sample preparation to sequence processing, statistical analysis and functional annotation. Here we use an inbred non-obese diabetic (NOD) mouse model in which germ-free animals were colonized with a defined mixture of eight commensal bacteria, to explore methods of RNA extraction and to develop a pipeline for the generation and analysis of metatranscriptomic data. Applying the Illumina HTS platform, we sequenced 12 NOD cecal samples prepared using multiple RNA-extraction protocols. The absence of a complete set of reference genomes necessitated a peptide-based search strategy. Up to 16% of sequence reads could be matched to a known bacterial gene. Phylogenetic analysis of the mapped ORFs revealed a distribution consistent with ribosomal RNA, the majority from Bacteroides or Clostridium species. To place these HTS data within a systems context, we mapped the relative abundance of corresponding Escherichia coli homologs onto metabolic and protein-protein interaction networks. These maps identified bacterial processes with components that were well-represented in the datasets. In summary this study highlights the potential of exploiting the economy of HTS platforms for metatranscriptomics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A porcine BAC clone harboring the tightly linked IFNAR1 and IFNGR2 genes was identified by comparative analysis of the publicly available porcine BAC end sequences. The complete 168,835 bp insert sequence of this clone was determined. Sequence comparisons of the genomic sequence with EST sequences from public databases were performed and allowed a detailed annotation of the IFNAR1 and IFNGR2 genes. The analyzed genes showed a conserved genomic organization with their known mammalian orthologs, however the sequence conservation of these genes across species was relatively low. In addition to the IFNAR1 and IFNGR2 genes, which were completely sequenced, the analyzed BAC clone also contained parts of an orphan gene encoding a putative transmembrane protein (TMEM50B). In contrast to the IFNAR1 and IFNGR2 genes the sequence conservation of the TMEM50B gene across different mammalian species was extremely high.