959 resultados para Phylogenetic
Resumo:
Psittacine beak and feather disease (PBFD) has a broad host range and is widespread in wild and captive psittacine populations in Asia, Africa, the Americas, Europe and Australasia. Beak and feather disease circovirus (BFDV) is the causative agent. BFDV has an ~2 kb single stranded circular DNA genome encoding just two proteins (Rep and CP). In this study we provide support for demarcation of BFDV strains by phylogenetic analysis of 65 complete genomes from databases and 22 new BFDV sequences isolated from infected psittacines in South Africa. We propose 94% genome-wide sequence identity as a strain demarcation threshold, with isolates sharing > 94% identity belonging to the same strain, and strain subtypes sharing> 98% identity. Currently, BFDV diversity falls within 14 strains, with five highly divergent isolates from budgerigars probably representing a new species of circovirus with three strains (budgerigar circovirus; BCV-A, -B and -C). The geographical distribution of BFDV and BCV strains is strongly linked to the international trade in exotic birds; strains with more than one host are generally located in the same geographical area. Lastly, we examined BFDV and BCV sequences for evidence of recombination, and determined that recombination had occurred in most BFDV and BCV strains. We established that there were two globally significant recombination hotspots in the viral genome: the first is along the entire intergenic region and the second is in the C-terminal portion of the CP ORF. The implications of our results for the taxonomy and classification of circoviruses are discussed. © 2011 SGM.
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
Sequencing of mba gene fragments of reference strains of Ureaplasma urealyticum serovars 1, 3, 6, 14, in addition to 33 clinical U. urealyticum isolates is reported. A phylogenetic tree deduced from an alignment of these sequences clearly demonstrates two major clusters (confidence limit 100%), which equate to the parvo and T960 biovars, and five types which we have designated mba 1, 3, 6, 8 and X. These relationships are supported by bootstrap analysis. Polymorphisms within the mba fragment of types mba 1, 3, and 6 were used to define nine subtypes (mba 1a, 1b, 3a, 3b, 3c, 3d, 3e, 6a, and 6b) thus facilitating high resolution typing of U. urealyticum. Inclusion of the reference strains for serovars 1, 3, 6, and 8 in the mba typing scheme showed that the results of this analysis are broadly consistent with currently accepted serotyping. In addition a ure gene fragment from nine of the clinical isolates was amplified and sequenced. Comparisons of the sequences clearly distinguished the two biovars of U. urealyticum; however this fragment was invariant within the parvo biovar. This study has shown that the sequence of the mba can reveal the fine details of the relationships between U. urealyticum isolates and also supports the significant evolutionary gap between the two biovars.
Resumo:
Carrion-breeding Sarcophagidae (Diptera) can be used to estimate the post-mortem interval (PMI) in forensic cases. Difficulties with accurate morphological identifications at any life stage and a lack of documented thermobiological profiles have limited their current usefulness of these flies. The molecular-based approach of DNA barcoding, which utilises a 648-bp fragment of the mitochondrial cytochrome oxidase subunit I gene, was previously evaluated in a pilot study for the discrimination between 16 Australian sarcophagids. The current study comprehensively evaluated DNA barcoding on a larger taxon set of 588 adult Australian sarcophagids. A total of 39 of the 84 known Australian species were represented by 580 specimens, which includes 92% of potentially forensically important species. A further eight specimens could not be reliably identified, but included as six unidentifable taxa. A neighbour-joining phylogenetic tree was generated and nucleotide sequence divergences were calculated using the Kimura-two-parameter distance model. All species except Sarcophaga (Fergusonimyia) bancroftorum, known for high morphological variability, were resolved as reciprocally monophyletic (99.2% of cases), with most having bootstrap support of 100. Excluding S. bancroftorum, the mean intraspecific and interspecific variation ranged from 0.00-1.12% and 2.81-11.23%, respectively, allowing for species discrimination. DNA barcoding was therefore validated as a suitable method for the molecular identification of the Australian Sarcophagidae, which will aid in the implementation of this fauna in forensic entomology.
Resumo:
Approximately 2500 fly species comprise the Sarcophagidae family worldwide. The complete mitochondrial genome of the carrion-breeding, forensically important Sarcophaga impatiens Walker (Diptera: Sarcophagidae) from Australia was sequenced. The 15,169 bp circular genome contains the 37 genes found in a typical Metazoan genome: 13 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes. It also contains one non-coding A+T-rich region. The arrangement of the genes was the same as that found in the ancestral insect. All the protein initiation codons are ATN, except for cox1 that begins with TCG (encoding S). The 22 tRNA anticodons of S. impatiens are consistent with those observed in Drosophila yakuba, and all form the typical cloverleaf structure, except for tRNA-Ser(AGN) that lacks the DHU arm. The mitochondrial genome of Sarcophaga presented will be valuable for resolving phylogenetic relationships within the family Sarcophagidae and the order Diptera, and could be used to identify favourable genetic markers for species identifications for forensic purposes.
Resumo:
The three genera of smut fungi, Ustilago, Sporisorium and Macalpinomyces, form a complex that has eluded resolution by morphology (Langdon & Fullerton 1975, Vánky 1991, Piepenbring et al. 1998) and molecular phylogenetic analysis (Stoll et al. 2003, 2005). Two suggestions to reconcile the taxonomy of the complex have been proposed. The first was to break up the current taxa into several smaller genera and subgenera, and the second to unify the three genera into a single genus, Ustilago (Vánky 2002, Piepenbring 2004). The former solution is dependent on finding morphological synapomorphies that can delimit the genera, and the latter solution dismisses the wide morphological diversity within the group (McTaggart et al. 2012b). Synapomorphic morphological characters and host plant classification delimited clades in the Ustilago-Sporisorium-Macalpinomyces complex (McTaggart et al. 2012a). The current study defines these synapomorphic characters and proposes a new classification for many species currently placed in Ustilago, Sporisorium and Macalpinomyces. This approach preserves the well-known genera Ustilago, Sporisorium and Macalpinomyces, and enables the classification to reflect morphological diversity in the complex.
Resumo:
The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression.
Resumo:
Phylogenetic inference from sequences can be misled by both sampling (stochastic) error and systematic error (nonhistorical signals where reality differs from our simplified models). A recent study of eight yeast species using 106 concatenated genes from complete genomes showed that even small internal edges of a tree received 100% bootstrap support. This effective negation of stochastic error from large data sets is important, but longer sequences exacerbate the potential for biases (systematic error) to be positively misleading. Indeed, when we analyzed the same data set using minimum evolution optimality criteria, an alternative tree received 100% bootstrap support. We identified a compositional bias as responsible for this inconsistency and showed that it is reduced effectively by coding the nucleotides as purines and pyrimidines (RY-coding), reinforcing the original tree. Thus, a comprehensive exploration of potential systematic biases is still required, even though genome-scale data sets greatly reduce sampling error.
Four new avian mitochondrial genomes help get to basic evolutionary questions in the late cretaceous
Resumo:
Good phylogenetic trees are required to test hypotheses about evolutionary processes. We report four new avian mitochondrial genomes, which together with an improved method of phylogenetic analysis for vertebrate mt genomes give results for three questions in avian evolution. The new mt genomes are: magpie goose (Anseranas semipalmata), an owl (morepork, Ninox novaeseelandiae); a basal passerine (rifleman, or New Zealand wren, Acanthisitta chloris); and a parrot (kakapo or owl-parrot, Strigops habroptilus). The magpie goose provides an important new calibration point for avian evolution because the well-studied Presbyornis fossils are on the lineage to ducks and geese, after the separation of the magpie goose. We find, as with other animal mitochondrial genomes, that RY-coding is helpful in adjusting for biases between pyrimidines and between purines. When RY-coding is used at third positions of the codon, the root occurs between paleognath and neognath birds (as expected from morphological and nuclear data). In addition, passerines form a relatively old group in Neoaves, and many modern avian lineages diverged during the Cretaceous. Although many aspects of the avian tree are stable, additional taxon sampling is required.
Resumo:
The generation of a correlation matrix from a large set of long gene sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. The generation is not only computationally intensive but also requires significant memory resources as, typically, few gene sequences can be simultaneously stored in primary memory. The standard practice in such computation is to use frequent input/output (I/O) operations. Therefore, minimizing the number of these operations will yield much faster run-times. This paper develops an approach for the faster and scalable computing of large-size correlation matrices through the full use of available memory and a reduced number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different problems with different correlation matrix sizes. The significant performance improvement of the approach over the existing approaches is demonstrated through benchmark examples.
Resumo:
BACKGROUND: Infection by dengue virus (DENV) is a major public health concern in hundreds of tropical and subtropical countries. French Polynesia (FP) regularly experiences epidemics that initiate, or are consecutive to, DENV circulation in other South Pacific Island Countries (SPICs). In January 2009, after a decade of serotype 1 (DENV-1) circulation, the first cases of DENV-4 infection were reported in FP. Two months later a new epidemic emerged, occurring about 20 years after the previous circulation of DENV-4 in FP. In this study, we investigated the epidemiological and molecular characteristics of the introduction, spread and genetic microevolution of DENV-4 in FP. METHODOLOGY/PRINCIPAL FINDINGS: Epidemiological data suggested that recent transmission of DENV-4 in FP started in the Leeward Islands and this serotype quickly displaced DENV-1 throughout FP. Phylogenetic analyses of the nucleotide sequences of the envelope (E) gene of 64 DENV-4 strains collected in FP in the 1980s and in 2009-2010, and some additional strains from other SPICs showed that DENV-4 strains from the SPICs were distributed into genotypes IIa and IIb. Recent FP strains were distributed into two clusters, each comprising viruses from other but distinct SPICs, suggesting that emergence of DENV-4 in FP in 2009 resulted from multiple introductions. Otherwise, we observed that almost all strains collected in the SPICs in the 1980s exhibit an amino acid (aa) substitution V287I within domain I of the E protein, and all recent South Pacific strains exhibit a T365I substitution within domain III. CONCLUSIONS/SIGNIFICANCE: This study confirmed the cyclic re-emergence and displacement of DENV serotypes in FP. Otherwise, our results showed that specific aa substitutions on the E protein were present on all DENV-4 strains circulating in SPICs. These substitutions probably acquired and subsequently conserved could reflect a founder effect to be associated with epidemiological, geographical, eco-biological and social specificities in SPICs.
Resumo:
Intra-host sequence data from RNA viruses have revealed the ubiquity of defective viruses in natural viral populations, sometimes at surprisingly high frequency. Although defective viruses have long been known to laboratory virologists, their relevance in clinical and epidemiological settings has not been established. The discovery of long-term transmission of a defective lineage of dengue virus type 1 (DENV-1) in Myanmar, first seen in 2001, raised important questions about the emergence of transmissible defective viruses and their role in viral epidemiology. By combining phylogenetic analyses and dynamical modelling, we investigate how evolutionary and ecological processes at the intra-host and inter-host scales shaped the emergence and spread of the defective DENV-1 lineage. We show that this lineage of defective viruses emerged between June 1998 and February 2001, and that the defective virus was transmitted primarily through co-transmission with the functional virus to uninfected individuals. We provide evidence that, surprisingly, this co-transmission route has a higher transmission potential than transmission of functional dengue viruses alone. Consequently, we predict that the defective lineage should increase overall incidence of dengue infection, which could account for the historically high dengue incidence reported in Myanmar in 2001-2002. Our results show the unappreciated potential for defective viruses to impact the epidemiology of human pathogens, possibly by modifying the virulence-transmissibility trade-off, or to emerge as circulating infections in their own right. They also demonstrate that interactions between viral variants, such as complementation, can open new pathways to viral emergence.
Resumo:
Between 50 and 100 million people are infected with dengue viruses each year and more than 100,000 of these die. Dr Choudhury has demonstrated that populations of dengue viruses in individual patients are genetically and functionally very diverse and that this diversity changes significantly at the time of major outbreaks of disease. The results of his studies may inform strategies which will make dengue vaccines far more effective.
Resumo:
The feral pig, Sus scrofa, is a widespread and abundant invasive species in Australia. Feral pigs pose a significant threat to the environment, agricultural industry, and human health, and in far north Queensland they endanger World Heritage values of the Wet Tropics. Historical records document the first introduction of domestic pigs into Australia via European settlers in 1788 and subsequent introductions from Asia from 1827 onwards. Since this time, domestic pigs have been accidentally and deliberately released into the wild and significant feral pig populations have become established, resulting in the declaration of this species as a class 2 pest in Queensland. The overall objective of this study was to assess the population genetic structure of feral pigs in far north Queensland, in particular to enable delineation of demographically independent management units. The identification of ecologically meaningful management units using molecular techniques can assist in targeting feral pig control to bring about effective long-term management. Molecular genetic analysis was undertaken on 434 feral pigs from 35 localities between Tully and Innisfail. Seven polymorphic and unlinked microsatellite loci were screened and fixation indices (FST and analogues) and Bayesian clustering methods were used to identify population structure and management units in the study area. Sequencing of the hyper-variable mitochondrial control region (D-loop) of 35 feral pigs was also examined to identify pig ancestry. Three management units were identified in the study at a scale of 25 to 35 km. Even with the strong pattern of genetic structure identified in the study area, some evidence of long distance dispersal and/or translocation was found as a small number of individuals exhibited ancestry from a management unit outside of which they were sampled. Overall, gene flow in the study area was found to be influenced by environmental features such as topography and land use, but no distinct or obvious natural or anthropogenic geographic barriers were identified. Furthermore, strong evidence was found for non-random mating between pigs of European and Asian breeds indicating that feral pig ancestry influences their population genetic structure. Phylogenetic analysis revealed two distinct mitochondrial DNA clades, representing Asian domestic pig breeds and European breeds. A significant finding was that pigs of Asian origin living in Innisfail and south Tully were not mating randomly with European breed pigs populating the nearby Mission Beach area. Feral pig control should be implemented in each of the management units identified in this study. The control should be coordinated across properties within each management unit to prevent re-colonisation from adjacent localities. The adjacent rainforest and National Park Estates, as well as the rainforest-crop boundary should be included in a simultaneous control operation for greater success.
Resumo:
The sheep (Ovis aries) is commonly used as a large animal model in skeletal research. Although the sheep genome has been sequenced there are still only a limited number of annotated mRNA sequences in public databases. A complementary DNA (cDNA) library was constructed to provide a generic resource for further exploration of genes that are actively expressed in bone cells in sheep. It was anticipated that the cDNA library would provide molecular tools for further research into the process of fracture repair and bone homeostasis, and add to the existing body of knowledge. One of the hallmarks of cDNA libraries has been the identification of novel genes and in this library the full open reading frame of the gene C12orf29 was cloned and characterised. This gene codes for a protein of unknown function with a molecular weight of 37 kDa. A literature search showed that no previous studies had been conducted into the biological role of C12orf29, except for some bioinformatics studies that suggested a possible link with cancer. Phylogenetic analyses revealed that C12orf29 had an ancient pedigree with a homologous gene found in some bacterial taxa. This implied that the gene was present in the last common eukaryotic ancestor, thought to have existed more than 2 billion years ago. This notion was further supported by the fact that the gene is found in taxa belonging to the two major eukaryotic branches, bikonts and unikonts. In the bikont supergroup a C12orf29-like gene was found in the single celled protist Naegleria gruberi, whereas in the unikont supergroup, encompassing the metazoa, the gene is universal to all chordate and, therefore, vertebrate species. It appears to have been lost to the majority of cnidaria and protostomes taxa; however, C12orf29-like genes have been found in the cnidarian freshwater hydra and the protostome Pacific oyster. The experimental data indicate that C12orf29 has a structural role in skeletal development and tissue homeostasis, whereas in silico analysis of the human C12orf29 promoter region suggests that its expression is potentially under the control of the NOTCH, WNT and TGF- developmental pathways, as well SOX9 and BAPX1; pathways that are all heavily involved in skeletogenesis. Taken together, this investigation provides strong evidence that C12orf29 has a very important role in the chordate body plan, in early skeletal development, cartilage homeostasis, and also a possible link with spina bifida in humans.