149 resultados para Phylogenetic
Resumo:
Bananas are one of the world's most important food crops, providing sustenance and income for millions of people in developing countries and supporting large export industries. Viruses are considered major constraints to banana production, germplasm multiplication and exchange, and to genetic improvement of banana through traditional breeding. In Africa, the two most important virus diseases are bunchy top, caused by Banana bunchy top virus (BBTV), and banana streak disease, caused by Banana streak virus (BSV). BBTV is a serious production constraint in a number of countries within/bordering East Africa, such as Burundi, Democratic Republic of Congo, Malawi, Mozambique, Rwanda and Zambia, but is not present in Kenya, Tanzania and Uganda. Additionally, epidemics of banana streak disease are occurring in Kenya and Uganda. The rapidly growing tissue culture (TC) industry within East Africa, aiming to provide planting material to banana farmers, has stimulated discussion about the need for virus indexing to certify planting material as virus-free. Diagnostic methods for BBTV and BSV have been reported and, for BBTV, PCR-based assays are reliable and relatively straightforward. However for BSV, high levels of serological and genetic variability and the presence of endogenous virus sequences within the banana genome complicate diagnosis. Uganda has been shown to contain the greatest diversity in BSV isolates found anywhere in the world. A broad-spectrum diagnostic test for BSV detection, which can discriminate between endogenous and episomal BSV sequences, is a priority. This PhD project aimed to establish diagnostic methods for banana viruses, with a particular focus on the development of novel methods for BSV detection, and to use these diagnostic methods for the detection and characterisation of banana viruses in East Africa. A novel rolling-circle amplification (RCA) method was developed for the detection of BSV. Using samples of Banana streak MY virus (BSMYV) and Banana streak OL virus (BSOLV) from Australia, this method was shown to distinguish between endogenous and episomal BSV sequences in banana plants. The RCA assay was used to screen a collection of 56 banana samples from south-west Uganda for BSV. RCA detected at least five distinct BSV isolates in these samples, including BSOLV and Banana streak GF virus (BSGFV) as well as three BSV isolates (Banana streak Uganda-I, -L and -M virus) for which only partial sequences had been previously reported. These latter three BSV had only been detected using immuno-capture (IC)-PCR and thus were possible endogenous sequences. In addition to its ability to detect BSV, the RCA protocol was also demonstrated to detect other viruses within the family Caulimoviridae, including Sugar cane bacilliform virus, and Cauliflower mosaic virus. Using the novel RCA method, three distinct BSV isolates from both Kenya and Uganda were identified and characterised. The complete genome of these isolates was sequenced and annotated. All six isolates were shown to have a characteristic badnavirus genome organisation with three open reading frames (ORFs) and the large polyprotein encoded by ORF 3 was shown to contain conserved amino acid motifs for movement, aspartic protease, reverse transcriptase and ribonuclease H activities. As well, several sequences important for expression and replication of the virus genome were identified including the conserved tRNAmet primer binding site present in the intergenic region of all badnaviruses. Based on the International Committee on Taxonomy of Viruses (ICTV) guidelines for species demarcation in the genus Badnavirus, these six isolates were proposed as distinct species, and named Banana streak UA virus (BSUAV), Banana streak UI virus (BSUIV), Banana streak UL virus (BSULV), Banana streak UM virus (BSUMV), Banana streak CA virus (BSCAV) and Banana streak IM virus (BSIMV). Using PCR with species-specific primers designed to each isolate, a genotypically diverse collection of 12 virus-free banana cultivars were tested for the presence of endogenous sequences. For five of the BSV no amplification was observed in any cultivar tested, while for BSIMV, four positive samples were identified in cultivars with a B-genome component. During field visits to Kenya, Tanzania and Uganda, 143 samples were collected and assayed for BSV. PCR using nine sets of species-specific primers, and RCA, were compared for BSV detection. For five BSV species with no known endogenous counterpart (namely BSCAV, BSUAV, BSUIV, BSULV and BSUMV), PCR was used to detect 30 infections from the 143 samples. Using RCA, 96.4% of these samples were considered positive, with one additional sample detected using RCA which was not positive using PCR. For these five BSV, PCR and RCA were both useful for identifying infected samples, irrespective of the host cultivar genotype (Musa A- or B-genome components). For four additional BSV with known endogenous counterparts in the M. balbisiana genome (BSOLV, BSGFV, BSMYV and BSIMV), PCR was shown to detect 75 infections from the 143 samples. In 30 samples from cultivars with an A-only genome component there was 96.3% agreement between PCR positive samples and detection using RCA, again demonstrating either PCR or RCA are suitable methods for detection. However, in 45 samples from cultivars with some B-genome component, the level of agreement between PCR positive samples and RCA positive samples was 70.5%. This suggests that, in cultivars with some B-genome component, many infections were detected using PCR which were the result of amplification of endogenous sequences. In these latter cases, RCA or another method which discriminates between endogenous and episomal sequences, such as immuno-capture PCR, is needed to diagnose episomal BSV infection. Field visits were made to Malawi and Rwanda to collect local isolates of BBTV for validation of a PCR-based diagnostic assay. The presence of BBTV in samples of bananas with bunchy top disease was confirmed in 28 out of 39 samples from Malawi and all nine samples collected in Rwanda, using PCR and RCA. For three isolates, one from Malawi and two from Rwanda, the complete nucleotide sequences were determined and shown to have a similar genome organisation to previously published BBTV isolates. The two isolates from Rwanda had at least 98.1% nucleotide sequence identity between each of the six DNA components, while the similarity between isolates from Rwanda and Malawi was between 96.2% and 99.4% depending on the DNA component. At the amino acid level, similarities in the putative proteins encoded by DNA-R, -S, -M, - C and -N were found to range between 98.8% to 100%. In a phylogenetic analysis, the three East African isolates clustered together within the South Pacific subgroup of BBTV isolates. Nucleotide sequence comparison to isolates of BBTV from outside Africa identified India as the possible origin of East African isolates of BBTV.
Resumo:
To date, a molecular phylogenetic approach has not been used to investigate the evolutionary structure of Trogoderma and closely related genera. Using two mitochondrial genes, Cytochrome Oxidase I and Cytochrome B, and the nuclear gene, 18S, the reported polyphyletic positioning of Trogoderma was examined. Paraphyly in Trogoderma was observed, with one Australian Trogoderma species reconciled as sister to all Dermestidae and the Anthrenocerus genus deeply nested within the Australian Trogoderma clade. In addition, time to most recent common ancestor for a number of Dermestidae was calculated. Based on these estimations, the Dermestidae origin exceeded 175 million years, placing the origins of this family in Pangaea.
Resumo:
Research over the last two decades has significantly increased our understanding of the evolutionary position of the insects among other arthropods, and the relationships among the insect Orders. Many of these insights have been established through increasingly sophisticated analyses of DNA sequence data from a limited number of genes. Recent results have established the relationships of the Holometabola, but relationships among the hemimetabolous orders have been more difficult to elucidate. A strong consensus on the relationships among the Palaeoptera (Ephemeroptera and Odonata) and their relationship to the Neoptera has not emerged with all three possible resolutions supported by different data sets. While polyneopteran relationships generally have resisted significant resolution, it is now clear that termites, Isoptera, are nested within the cockroaches, Blattodea. The newly discovered order Mantophasmatodea is difficult to place with the balance of studies favouring Grylloblattodea as sister-group. While some studies have found the paraneopteran orders (Hemiptera, Thysanoptera, Phthiraptera and Psocoptera) monophyletic, evidence suggests that parasitic lice (Phthiraptera) have evolved from groups within the book and bark lice (Psocoptera), and may represent parallel evolutions of parasitism within two major louse groups. Within Holometabola, it is now clear that Hymenoptera are the sister to the other orders, that, in turn are divided into two clades, the Neuropteroidea (Coleoptera, Neuroptera and relatives) and the Mecopterida (Trichoptera, Lepidoptera, Diptera and their relatives). The enigmatic order Strepsiptera, the twisted wing insects, have now been placed firmly near Coleoptera, rejecting their close relationship to Diptera that was proposed some 15years ago primarily based on ribosomal DNA data. Phylogenomic-scale analyses are just beginning to be focused on the relationships of the insect orders, and this is where we expect to see resolution of palaeopteran and polyneopteran relationships. Future research will benefit from greater coordination between intra and inter-ordinal analyses. This will maximise the opportunities for appropriate outgroup choice at the intraordinal level and provide the background knowledge for the interordinal analyses to span the maximum phylogenetic scope within groups.
Novel molecular markers of Chlamydia pecorum genetic diversity in the koala (Phascolarctos cinereus)
Resumo:
Background Chlamydia pecorum is an obligate intracellular bacterium and the causative agent of reproductive and ocular disease in several animal hosts including koalas, sheep, cattle and goats. C. pecorum strains detected in koalas are genetically diverse, raising interesting questions about the origin and transmission of this species within koala hosts. While the ompA gene remains the most widely-used target in C. pecorum typing studies, it is generally recognised that surface protein encoding genes are not suited for phylogenetic analysis and it is becoming increasingly apparent that the ompA gene locus is not congruent with the phylogeny of the C. pecorum genome. Using the recently sequenced C. pecorum genome sequence (E58), we analysed 10 genes, including ompA, to evaluate the use of ompA as a molecular marker in the study of koala C. pecorum genetic diversity. Results Three genes (incA, ORF663, tarP) were found to contain sufficient nucleotide diversity and discriminatory power for detailed analysis and were used, with ompA, to genotype 24 C. pecorum PCR-positive koala samples from four populations. The most robust representation of the phylogeny of these samples was achieved through concatenation of all four gene sequences, enabling the recreation of a "true" phylogenetic signal. OmpA and incA were of limited value as fine-detailed genetic markers as they were unable to confer accurate phylogenetic distinctions between samples. On the other hand, the tarP and ORF663 genes were identified as useful "neutral" and "contingency" markers respectively, to represent the broad evolutionary history and intra-species genetic diversity of koala C. pecorum. Furthermore, the concatenation of ompA, incA and ORF663 sequences highlighted the monophyletic nature of koala C. pecorum infections by demonstrating a single evolutionary trajectory for koala hosts that is distinct from that seen in non-koala hosts. Conclusions While the continued use of ompA as a fine-detailed molecular marker for epidemiological analysis appears justified, the tarP and ORF663 genes also appear to be valuable markers of phylogenetic or biogeographic divisions at the C. pecorum intra-species level. This research has significant implications for future typing studies to understand the phylogeny, genetic diversity, and epidemiology of C. pecorum infections in the koala and other animal species.
Resumo:
Background The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Results Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Conclusions Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the loss of this gene lead to the formation of minicircles in lice.
Resumo:
With well over 700 species, the Tribe Dacini is one of the most species-rich clades within the dipteran family Tephritidae, the true fruit flies. Nearly all Dacini belong to one of two very large genera, Dacus Fabricius and Bactrocera Macquart. The distribution of the genera overlap in or around the Indian subcontinent, but the greatest diversity of Dacus is in Africa and the greatest diversity of Bactrocera is in south-east Asia and the Pacific. The monophyly of these two genera has not been rigorously established, with previous phylogenies only including a small number of species and always heavily biased to one genus over the other. Moreover, the subgeneric taxonomy within both genera is complex and the monophyly of many subgenera has not been explicitly tested. Previous hypotheses about the biogeography of the Dacini based on morphological reviews and current distributions of taxa have invoked an out-of-India hypothesis; however this has not been tested in a phylogenetic framework. We attempted to resolve these issues with a dated, molecular phylogeny of 125 Dacini species generated using 16S, COI, COII and white eye genes. The phylogeny shows that Bactrocera is not monophyletic, but rather consists of two major clades: Bactrocera s.s. and the ‘Zeugodacus group of subgenera’ (a recognised, but informal taxonomic grouping of 15 Bactrocera subgenera). This ‘Zeugodacus’ clade is the sister group to Dacus, not Bactrocera and, based on current distributions, split from Dacus before that genus moved into Africa. We recommend that taxonomic consideration be given to raising Zeugodacus to genus level. Supportive of predictions following from the out-of-India hypothesis, the first common ancestor of the Dacini arose in the mid-Cretaceous approximately 80 mya. Major divergence events occurred during the Indian rafting period and diversification of Bactrocera apparently did not begin until after India docked with Eurasia (50–35 mya). In contrast, diversification in Dacus, at approximately 65 mya, apparently began much earlier than predicted by the out-of-India hypothesis, suggesting that, if the Dacini arose on the Indian plate, then ancestral Dacus may have left the plate in the mid to late Cretaceous via the well documented India–Madagascar–Africa migration route. We conclude that the phylogeny does not disprove the predictions of an out-of-India hypothesis for the Dacini, although modification of the original hypothesis is required.
Resumo:
Despite their ecological significance as decomposers and their evolutionary significance as the most speciose eusocial insect group outside the Hymenoptera, termite (Blattodea: Termitoidae or Isoptera) evolutionary relationships have yet to be well resolved. Previous morphological and molecular analyses strongly conflict at the family level and are marked by poor support for backbone nodes. A mitochondrial (mt) genome phylogeny of termites was produced to test relationships between the recognised termite families, improve nodal support and test the phylogenetic utility of rare genomic changes found in the termite mt genome. Complete mt genomes were sequenced for 7 of the 9 extant termite families with additional representatives of each of the two most speciose families Rhinotermitidae (3 of 7 subfamilies) and Termitidae (3 of 8 subfamilies). The mt genome of the well supported sister group of termites, the subsocial cockroach Cryptocercus, was also sequenced. A highly supported tree of termite relationships was produced by all analytical methods and data treatment approaches, however the relationship of the termites + Cryptocercus clade to other cockroach lineages was highly affected by the strong nucleotide compositional bias found in termites relative to other dictyopterans. The phylogeny supports previously proposed suprafamilial termite lineages, the Euisoptera and Neoisoptera, a later derived Kalotermitidae as sister group of the Neoisoptera and a monophyletic clade of dampwood (Stolotermitidae, Archotermopsidae) and harvester termites (Hodotermitidae). In contrast to previous termite phylogenetic studies, nodal supports were very high for family-level relationships within termites. Two rare genomic changes in the mt genome control region were found to be molecular synapomorphies for major clades. An elongated stem-loop structure defined the clade Polyphagidae + (Cryptocercus + termites), and a further series of compensatory base changes in this stem loop is synapomorphic for the Neoisoptera. The complicated repeat structures first identified in Reticulitermes, composed of short (A-type) and long (B-type repeats) defines the clade Heterotermitinae + Termitidae, while the secondary loss of A-type repeats is synapomorphic for the non-macrotermitine Termitidae.
Resumo:
We took a comparative approach utilizing clines to investigate the extent to which natural selection may have shaped population divergence in cuticular hydrocarbons (CHCs) that are also under sexual selection in Drosophila. We detected the presence of CHC clines along a latitudinal gradient on the east coast of Australia in two fly species with independent phylogenetic and population histories, suggesting adaptation to shared abiotic factors. For both species, significant associations were detected between clinal variation in CHCs and temperature variation along the gradient, suggesting temperature maxima as a candidate abiotic factor shaping CHC variation among populations. However, rainfall and humidity correlated with CHC variation to differing extents in the two species, suggesting that response to these abiotic factors may vary in a species-specific manner. Our results suggest that natural selection, in addition to sexual selection, plays a significant role in structuring among-population variation in sexually selected traits in Drosophila.
Resumo:
The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup.
Resumo:
Members of the Calliphoridae (blowflies) are significant for medical and veterinary management, due to the ability of some species to consume living flesh as larvae, and for forensic investigations due to the ability of others to develop in corpses. Due to the difficulty of accurately identifying larval blowflies to species there is a need for DNA-based diagnostics for this family, however the widely used DNA-barcoding marker, cox1, has been shown to fail for several groups within this family. Additionally, many phylogenetic relationships within the Calliphoridae are still unresolved, particularly deeper level relationships. Sequencing whole mt genomes has been demonstrated both as an effective method for identifying the most informative diagnostic markers and for resolving phylogenetic relationships. Twenty-seven complete, or nearly so, mt genomes were sequenced representing 13 species, seven genera and four calliphorid subfamilies and a member of the related family Tachinidae. PCR and sequencing primers developed for sequencing one calliphorid species could be reused to sequence related species within the same superfamily with success rates ranging from 61% to 100%, demonstrating the speed and efficiency with which an mt genome dataset can be assembled. Comparison of molecular divergences for each of the 13 protein-coding genes and 2 ribosomal RNA genes, at a range of taxonomic scales identified novel targets for developing as diagnostic markers which were 117–200% more variable than the markers which have been used previously in calliphorids. Phylogenetic analysis of whole mt genome sequences resulted in much stronger support for family and subfamily-level relationships. The Calliphoridae are polyphyletic, with the Polleninae more closely related to the Tachinidae, and the Sarcophagidae are the sister group of the remaining calliphorids. Within the Calliphoridae, there was strong support for the monophyly of the Chrysomyinae and Luciliinae and for the sister-grouping of Luciliinae with Calliphorinae. Relationships within Chrysomya were not well resolved. Whole mt genome data, supported the previously demonstrated paraphyly of Lucilia cuprina with respect to L. sericata and allowed us to conclude that it is due to hybrid introgression prior to the last common ancestor of modern sericata populations, rather than due to recent hybridisation, nuclear pseudogenes or incomplete lineage sorting.
Resumo:
Psittacine beak and feather disease (PBFD), caused by Beak and feather disease virus (BFDV), is the most significant infectious disease in psittacines. PBFD is thought to have originated in Australia but is now found worldwide; in Africa, it threatens the survival of the indigenous endangered Cape parrot and the vulnerable black-cheeked lovebird. We investigated the genetic diversity of putative BFDVs from southern Africa. Feathers and heparinized blood samples were collected from 27 birds representing 9 psittacine species, all showing clinical signs of PBFD. DNA extracted from these samples was used for PCR amplification of the putative BFDV coat protein (CP) gene. The nucleotide sequences of the CP genes of 19 unique BFDV isolates were determined and compared with the 24 previously described sequences of BFDV isolates from Australasia and America. Phylogenetic analysis revealed eight BFDV lineages, with the southern African isolates representing at least three distinctly unique genotypes; 10 complete genome sequences were determined, representing at least one of every distinct lineage. The nucleotide diversity of the southern African isolates was calculated to be 6.4% and is comparable to that found in Australia and New Zealand. BFDVs in southern Africa have, however, diverged substantially from viruses found in other parts of the world, as the average distance between the southern African isolates and BFDV isolates from Australia ranged from 8.3 to 10.8%. In addition to point mutations, recombination was found to contribute substantially to the level of genetic variation among BFDVs, with evidence of recombination in all but one of the genomes analyzed.
Resumo:
An open question amongst papillomavirus taxonomists is whether recombination has featured in the evolutionary history of these viruses. Since the onset of the global AIDS epidemic, the question is somewhat less academic, because immune-compromised human immunodeficiency virus patients are often co-infected with extraordinarily diverse mixtures of human papillomavirus (HPV) types. It is expected that these conditions may facilitate the emergence of HPV recombinants, some of which might have novel pathogenic properties. Here, a range of rigorous analyses is applied to full-genome sequences of papillomaviruses to provide convincing statistical and phylogenetic evidence that evolutionarily relevant papillomavirus recombination can occur. © 2006 SGM.
Resumo:
Psittacine beak and feather disease (PBFD) has a broad host range and is widespread in wild and captive psittacine populations in Asia, Africa, the Americas, Europe and Australasia. Beak and feather disease circovirus (BFDV) is the causative agent. BFDV has an ~2 kb single stranded circular DNA genome encoding just two proteins (Rep and CP). In this study we provide support for demarcation of BFDV strains by phylogenetic analysis of 65 complete genomes from databases and 22 new BFDV sequences isolated from infected psittacines in South Africa. We propose 94% genome-wide sequence identity as a strain demarcation threshold, with isolates sharing > 94% identity belonging to the same strain, and strain subtypes sharing> 98% identity. Currently, BFDV diversity falls within 14 strains, with five highly divergent isolates from budgerigars probably representing a new species of circovirus with three strains (budgerigar circovirus; BCV-A, -B and -C). The geographical distribution of BFDV and BCV strains is strongly linked to the international trade in exotic birds; strains with more than one host are generally located in the same geographical area. Lastly, we examined BFDV and BCV sequences for evidence of recombination, and determined that recombination had occurred in most BFDV and BCV strains. We established that there were two globally significant recombination hotspots in the viral genome: the first is along the entire intergenic region and the second is in the C-terminal portion of the CP ORF. The implications of our results for the taxonomy and classification of circoviruses are discussed. © 2011 SGM.
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
Sequencing of mba gene fragments of reference strains of Ureaplasma urealyticum serovars 1, 3, 6, 14, in addition to 33 clinical U. urealyticum isolates is reported. A phylogenetic tree deduced from an alignment of these sequences clearly demonstrates two major clusters (confidence limit 100%), which equate to the parvo and T960 biovars, and five types which we have designated mba 1, 3, 6, 8 and X. These relationships are supported by bootstrap analysis. Polymorphisms within the mba fragment of types mba 1, 3, and 6 were used to define nine subtypes (mba 1a, 1b, 3a, 3b, 3c, 3d, 3e, 6a, and 6b) thus facilitating high resolution typing of U. urealyticum. Inclusion of the reference strains for serovars 1, 3, 6, and 8 in the mba typing scheme showed that the results of this analysis are broadly consistent with currently accepted serotyping. In addition a ure gene fragment from nine of the clinical isolates was amplified and sequenced. Comparisons of the sequences clearly distinguished the two biovars of U. urealyticum; however this fragment was invariant within the parvo biovar. This study has shown that the sequence of the mba can reveal the fine details of the relationships between U. urealyticum isolates and also supports the significant evolutionary gap between the two biovars.