953 resultados para Bayesian phylogenetic analysis
Resumo:
Phylogenetic inference consist in the search of an evolutionary tree to explain the best way possible genealogical relationships of a set of species. Phylogenetic analysis has a large number of applications in areas such as biology, ecology, paleontology, etc. There are several criterias which has been defined in order to infer phylogenies, among which are the maximum parsimony and maximum likelihood. The first one tries to find the phylogenetic tree that minimizes the number of evolutionary steps needed to describe the evolutionary history among species, while the second tries to find the tree that has the highest probability of produce the observed data according to an evolutionary model. The search of a phylogenetic tree can be formulated as a multi-objective optimization problem, which aims to find trees which satisfy simultaneously (and as much as possible) both criteria of parsimony and likelihood. Due to the fact that these criteria are different there won't be a single optimal solution (a single tree), but a set of compromise solutions. The solutions of this set are called "Pareto Optimal". To find this solutions, evolutionary algorithms are being used with success nowadays.This algorithms are a family of techniques, which aren’t exact, inspired by the process of natural selection. They usually find great quality solutions in order to resolve convoluted optimization problems. The way this algorithms works is based on the handling of a set of trial solutions (trees in the phylogeny case) using operators, some of them exchanges information between solutions, simulating DNA crossing, and others apply aleatory modifications, simulating a mutation. The result of this algorithms is an approximation to the set of the “Pareto Optimal” which can be shown in a graph with in order that the expert in the problem (the biologist when we talk about inference) can choose the solution of the commitment which produces the higher interest. In the case of optimization multi-objective applied to phylogenetic inference, there is open source software tool, called MO-Phylogenetics, which is designed for the purpose of resolving inference problems with classic evolutionary algorithms and last generation algorithms. REFERENCES [1] C.A. Coello Coello, G.B. Lamont, D.A. van Veldhuizen. Evolutionary algorithms for solving multi-objective problems. Spring. Agosto 2007 [2] C. Zambrano-Vega, A.J. Nebro, J.F Aldana-Montes. MO-Phylogenetics: a phylogenetic inference software tool with multi-objective evolutionary metaheuristics. Methods in Ecology and Evolution. En prensa. Febrero 2016.
Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
Resumo:
Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.
Resumo:
In this work we compare Grapholita molesta Busck (Lepidoptera: Tortricidae) populations originated from Brazil, Chile, Spain, Italy and Greece using power spectral density and phylogenetic analysis to detect any similarities between the population macro- and the molecular micro-level. Log-transformed population data were normalized and AR(p) models were developed to generate for each case population time series of equal lengths. The time-frequency/scale properties of the population data were further analyzed using wavelet analysis to detect any population dynamics frequency changes and cluster the populations. Based on the power spectral of each population time series and the hierarchical clustering schemes, populations originated from Southern America (Brazil and Chile) exhibit similar rhythmic properties and are both closer related with populations originated from Greece. Populations from Spain and especially Italy, have higher distance by terms of periodic changes on their population dynamics. Moreover, the members within the same cluster share similar spectral information, therefore they are supposed to participate in the same temporally regulated population process. On the contrary, the phylogenetic approach revealed a less structured pattern that bears indications of panmixia, as the two clusters contain individuals from both Europe and South America. This preliminary outcome will be further assessed by incorporating more individuals and likely employed a second molecular marker.
Resumo:
A few studies examined interactive effects between air pollution and temperature on health outcomes. This study is to examine if temperature modified effects of ozone and cardiovascular mortality in 95 large US cities. A nonparametric and a parametric regression models were separately used to explore interactive effects of temperature and ozone on cardiovascular mortality during May and October, 1987-2000. A Bayesian meta-analysis was used to pool estimates. Both models illustrate that temperature enhanced the ozone effects on mortality in the northern region, but obviously in the southern region. A 10-ppb increment in ozone was associated with 0.41 % (95% posterior interval (PI): -0.19 %, 0.93 %), 0.27 % (95% PI: -0.44 %, 0.87 %) and 1.68 % (95% PI: 0.07 %, 3.26 %) increases in daily cardiovascular mortality corresponding to low, moderate and high levels of temperature, respectively. We concluded that temperature modified effects of ozone, particularly in the northern region.
Resumo:
Zoonotic infections are a growing threat to global health. Chlamydia pneumoniae is a major human pathogen that is widespread in human populations, causing acute respiratory disease, and has been associated with chronic disease. C. pneumoniae was first identified solely in human populations; however, its host range now includes other mammals, marsupials, amphibians, and reptiles. Australian koalas (Phascolarctos cinereus) are widely infected with two species of Chlamydia, C. pecorum and C. pneumoniae. Transmission of C. pneumoniae between animals and humans has not been reported; however, two other chlamydial species, C. psittaci and C. abortus, are known zoonotic pathogens. We have sequenced the 1,241,024-bp chromosome and a 7.5-kb cryptic chlamydial plasmid of the koala strain of C. pneumoniae (LPCoLN) using the whole-genome shotgun method. Comparative genomic analysis, including pseudogene and single-nucleotide polymorphism (SNP) distribution, and phylogenetic analysis of conserved genes and SNPs against the human isolates of C. pneumoniae show that the LPCoLN isolate is basal to human isolates. Thus, we propose based on compelling genomic and phylogenetic evidence that humans were originally infected zoonotically by an animal isolate(s) of C. pneumoniae which adapted to humans primarily through the processes of gene decay and plasmid loss, to the point where the animal reservoir is no longer required for transmission.
Resumo:
Dasheen mosaic potyvirus (DsMV) is an important virus affecting taro. The virus has been found wherever taro is grown and infects both the edible and ornamental aroids, causing yield losses of up to 60%. The presence of DsMV, and other viruses,prevents the international movement of taro germplasm between countries. This has a significant negative impact on taro production in many countries due to the inability to access improved taro lines produced in breeding programs. To overcome this problem, sensitive and reliable virus diagnostic tests need to be developed to enable the indexing of taro germplasm. The aim of this study was to generate an antiserum against a recombinant DsMV coat protein (CP) and to develop a serological-based diagnostic test that would detect Pacific Island isolates of the virus. The CP-coding region of 16 DsMV isolates from Papua New Guinea, Samoa, Solomon Islands, French Polynesia, New Caledonia and Vietnam were amplified,cloned and sequenced. The size of the CP-coding region ranged from 939 to 1038 nucleotides and encoded putative proteins ranged from 313 to 346 amino acids, with the molecular mass ranging from 34 to 38 kDa. Analysis ofthe amino acid sequences revealed the presence of several amino acid motifs typically found in potyviruses,including DAG, WCIE/DN, RQ and AFDF. When the amino acid sequences were compared with each other and the DsMV sequences on the database, the maximum variability was21.9%. When the core region ofthe CP was analysed, the maximum variability dropped to 6% indicating most variability was present in the N terminus. Within seven PNG isolates ofDsMV, the maximum variability was 16.9% and 3.9% over the entire CP-coding region and core region, respectively. The sequence ofPNG isolate P1 was most similar to all other sequences. Phylogenetic analysis indicated that almost all isolates grouped according to their provenance. Further, the seven PNG isolates were grouped according to the region within PNG from which they were obtained. Due to the extensive variability over the entire CP-coding region, the core region ofthe CP ofPNG isolate Pl was cloned into a protein expression vector and expressed as a recombinant protein. The protein was purified by chromatography and SDS-PAGE and used as an antigen to generate antiserum in a rabbit. In western blots, the antiserum reacted with bands of approximately 45-47 kDa in extracts from purified DsMV and from known DsMV -infected plants from PNG; no bands were observed using healthy plant extracts. The antiserum was subsequently incorporated into an indirect ELISA. This procedure was found to be very sensitive and detected DsMV in sap diluted at least 1:1,000. Using both western blot and ELISA formats,the antiserum was able to detect a wide range ofDsMV isolates including those from Australia, New Zealand, Fiji, French Polynesia, New Caledonia, Papua New Guinea, Samoa, Solomon Islands and Vanuatu. These plants were verified to be infected with DsMV by RT-PCR. In specificity tests, the antiserum was also found to react with sap from plants infected with SCMV, PRSV-P, PRSV-W, but not with PVY or CMV -infected plants.
Resumo:
Lateral gene transfer (LGT) from prokaryotes to microbial eukaryotes is usually detected by chance through genome-sequencing projects. Here, we explore a different, hypothesis-driven approach. We show that the fitness advantage associated with the transferred gene, typically invoked only in retrospect, can be used to design a functional screen capable of identifying postulated LGT cases. We hypothesized that beta-glucuronidase (gus) genes may be prone to LGT from bacteria to fungi (thought to lack gus) because this would enable fungi to utilize glucuronides in vertebrate urine as a carbon source. Using an enrichment procedure based on a glucose-releasing glucuronide analog (cellobiouronic acid), we isolated two gus(+) ascomycete fungi from soils (Penicillium canescens and Scopulariopsis sp.). A phylogenetic analysis suggested that their gus genes, as well as the gus genes identified in genomic sequences of the ascomycetes Aspergillus nidulans and Gibberella zeae, had been introgressed laterally from high-GC gram(+) bacteria. Two such bacteria (Arthrobacter spp.), isolated together with the gus(+) fungi, appeared to be the descendants of a bacterial donor organism from which gus had been transferred to fungi. This scenario was independently supported by similar substrate affinities of the encoded beta-glucuronidases, the absence of introns from fungal gus genes, and the similarity between the signal peptide-encoding 5' extensions of some fungal gus genes and the Arthrobacter sequences upstream of gus. Differences in the sequences of the fungal 5' extensions suggested at least two separate introgression events after the divergence of the two main Euascomycete classes. We suggest that deposition of glucuronides on soils as a result of the colonization of land by vertebrates may have favored LGT of gus from bacteria to fungi in soils.
Resumo:
Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.
Resumo:
Light plays a unique role for plants as it is both a source of energy for growth and a signal for development. Light captured by the pigments in the light harvesting complexes is used to drive the synthesis of the chemical energy required for carbon assimilation. The light perceived by photoreceptors activates effectors, such as transcription factors (TFs), which modulate the expression of light-responsive genes. Recently, it has been speculated that increasing the photosynthetic rate could further improve the yield potential of three carbon (C3) crops such as wheat. However, little is currently known about the transcriptional regulation of photosynthesis genes, particularly in crop species. Nuclear factor Y (NF-Y) TF is a functionally diverse regulator of growth and development in the model plant species, with demonstrated roles in embryo development, stress response, flowering time and chloroplast biogenesis. Furthermore, a light-responsive NF-Y binding site (CCAAT-box) is present in the promoter of a spinach photosynthesis gene. As photosynthesis genes are co-regulated by light and co-regulated genes typically have similar regulatory elements in their promoters, it seems likely that other photosynthesis genes would also have light-responsive CCAAT-boxes. This provided the impetus to investigate the NF-Y TF in bread wheat. This thesis is focussed on wheat NF-Y members that have roles in light-mediated gene regulation with an emphasis on their involvement in the regulation of photosynthesis genes. NF-Y is a heterotrimeric complex, comprised of the three subunits NF-YA, NF-YB and NF-YC. Unlike the mammalian and yeast counterparts, each of the three subunits is encoded by multiple genes in Arabidopsis. The initial step taken in this study was the identification of the wheat NF-Y family (Chapter 3). A search of the current wheat nucleotide sequence databases identified 37 NF-Y genes (10 NF-YA, 11 NF-YB, 14 NF-YC & 2 Dr1). Phylogenetic analysis revealed that each of the three wheat NF-Y (TaNF-Y) subunit families could be divided into 4-5 clades based on their conserved core regions. Outside of the core regions, eleven motifs were identified to be conserved between Arabidopsis, rice and wheat NF-Y subunit members. The expression profiles of TaNF-Y genes were constructed using quantitative real-time polymerase chain reaction (RT-PCR). Some TaNF-Y subunit members had little variation in their transcript levels among the organs, while others displayed organ-predominant expression profiles, including those expressed mainly in the photosynthetic organs. To investigate their potential role in light-mediated gene regulation, the light responsiveness of the TaNF-Y genes were examined (Chapters 4 and 5). Two TaNF-YB and five TaNF-YC members were markedly upregulated by light in both the wheat leaves and seedling shoots. To identify the potential target genes of the light-upregulated NF-Y subunit members, a gene expression correlation analysis was conducted using publically available Affymetrix Wheat Genome Array datasets. This analysis revealed that the transcript expression levels of TaNF-YB3 and TaNF-YC11 were significantly correlated with those of photosynthesis genes. These correlated express profiles were also observed in the quantitative RT-PCR dataset from wheat plants grown under light and dark conditions. Sequence analysis of the promoters of these wheat photosynthesis genes revealed that they were enriched with potential NF-Y binding sites (CCAAT-box). The potential role of TaNF-YB3 in the regulation of photosynthetic genes was further investigated using a transgenic approach (Chapter 5). Transgenic wheat lines constitutively expressing TaNF-YB3 were found to have significantly increased expression levels of photosynthesis genes, including those encoding light harvesting chlorophyll a/b-binding proteins, photosystem I reaction centre subunits, a chloroplast ATP synthase subunit and glutamyl-tRNA reductase (GluTR). GluTR is a rate-limiting enzyme in the chlorophyll biosynthesis pathway. In association with the increased expression of the photosynthesis genes, the transgenic lines had a higher leaf chlorophyll content, increased photosynthetic rate and had a more rapid early growth rate compared to the wild-type wheat. In addition to its role in the regulation of photosynthesis genes, TaNF-YB3 overexpression lines flower on average 2-days earlier than the wild-type (Chapter 6). Quantitative RT-PCR analysis showed that there was a 13-fold increase in the expression level of the floral integrator, TaFT. The transcript levels of other downstream genes (TaFT2 and TaVRN1) were also increased in the transgenic lines. Furthermore, the transcript levels of TaNF-YB3 were significantly correlated with those of constans (CO), constans-like (COL) and timing of chlorophyll a/b-binding (CAB) expression 1 [TOC1; (CCT)] domain-containing proteins known to be involved in the regulation of flowering time. To summarise the key findings of this study, 37 NF-Y genes were identified in the crop species wheat. An in depth analysis of TaNF-Y gene expression profiles revealed that the potential role of some light-upregulated members was in the regulation of photosynthetic genes. The involvement of TaNF-YB3 in the regulation of photosynthesis genes was supported by data obtained from transgenic wheat lines with increased constitutive expression of TaNF-YB3. The overexpression of TaNF-YB3 in the transgenic lines revealed this NF-YB member is also involved in the fine-tuning of flowering time. These data suggest that the NF-Y TF plays an important role in light-mediated gene regulation in wheat.
Resumo:
Vitamin A deficiency (VAD) is a serious problem in developing countries, affecting approximately 127 million children of preschool age and 7.2 million pregnant women each year. However, this deficiency is readily treated and prevented through adequate nutrition. This can potentially be achieved through genetically engineered biofortification of staple food crops to enhance provitamin A (pVA) carotenoid content. Bananas are the fourth most important food crop with an annual production of 100 million tonnes and are widely consumed in areas affected by VAD. However, the fruit pVA content of most widely consumed banana cultivars is low (~ 0.2 to 0.5 ìg/g dry weight). This includes cultivars such as the East African highland banana (EAHB), the staple crop in countries such as Uganda, where annual banana consumption is approximately 250 kg per person. This fact, in addition to the agronomic properties of staple banana cultivars such as vegetative reproduction and continuous cropping, make bananas an ideal target for pVA enhancement through genetic engineering. Interestingly, there are banana varieties known with high fruit pVA content (up to 27.8 ìg/g dry weight), although they are not widely consumed due to factors such as cultural preference and availability. The genes involved in carotenoid accumulation during banana fruit ripening have not been well studied and an understanding of the molecular basis for the differential capacity of bananas to accumulate carotenoids may impact on the effective production of genetically engineered high pVA bananas. The production of phytoene by the enzyme phytoene synthase (PSY) has been shown to be an important rate limiting determinant of pVA accumulation in crop systems such as maize and rice. Manipulation of this gene in rice has been used successfully to produce Golden Rice, which exhibits higher seed endosperm pVA levels than wild type plants. Therefore, it was hypothesised that differences between high and low pVA accumulating bananas could be due either to differences in PSY enzyme activity or factors regulating the expression of the psy gene. Therefore, the aim of this thesis was to investigate the role of PSY in accumulation of pVA in banana fruit of representative high (Asupina) and low (Cavendish) pVA banana cultivars by comparing the nucleic acid and encoded amino acid sequences of the banana psy genes, in vivo enzyme activity of PSY in rice callus and expression of PSY through analysis of promoter activity and mRNA levels. Initially, partial sequences of the psy coding region from five banana cultivars were obtained using reverse transcriptase (RT)-PCR with degenerate primers designed to conserved amino acids in the coding region of available psy sequences from other plants. Based on phylogenetic analysis and comparison to maize psy sequences, it was found that in banana, psy occurs as a gene family of at least three members (psy1, psy2a and psy2b). Subsequent analysis of the complete coding regions of these genes from Asupina and Cavendish suggested that they were all capable of producing functional proteins due to high conservation in the catalytic domain. However, inability to obtain the complete mRNA sequences of Cavendish psy2a, and isolation of two non-functional Cavendish psy2a coding region variants, suggested that psy2a expression may be impaired in Cavendish. Sequence analysis indicated that these Cavendish psy2a coding region variants may have resulted from alternate splicing. Evidence of alternate splicing was also observed in one Asupina psy1 coding region variant, which was predicted to produce a functional PSY1 isoform. The complete mRNA sequence of the psy2b coding regions could not be isolated from either cultivar. Interestingly, psy1 was cloned predominantly from leaf while psy2 was obtained preferentially from fruit, suggesting some level of tissue-specific expression. The Asupina and Cavendish psy1 and psy2a coding regions were subsequently expressed in rice callus and the activity of the enzymes compared in vivo through visual observation and quantitative measurement of carotenoid accumulation. The maize B73 psy1 coding region was included as a positive control. After several weeks on selection, regenerating calli showed a range of colours from white to dark orange representing various levels of carotenoid accumulation. These results confirmed that the banana psy coding regions were all capable of producing functional enzymes. No statistically significant differences in levels of activity were observed between banana PSYs, suggesting that differences in PSY activity were not responsible for differences in the fruit pVA content of Asupina and Cavendish. The psy1 and psy2a promoter sequences were isolated from Asupina and Cavendish gDNA using a PCR-based genome walking strategy. Interestingly, three Cavendish psy2a promoter clones of different sizes, representing possible allelic variants, were identified while only single promoter sequences were obtained for the other Asupina and Cavendish psy genes. Bioinformatic analysis of these sequences identified motifs that were previously characterised in the Arabidopsis psy promoter. Notably, an ATCTA motif associated with basal expression in Arabidopsis was identified in all promoters with the exception of two of the Cavendish psy2a promoter clones (Cpsy2apr2 and Cpsy2apr3). G1 and G2 motifs, linked to light-regulated responses in Arabidopsis, appeared to be differentially distributed between psy1 and psy2a promoters. In the untranscribed regulatory regions, the G1 motifs were found only in psy1 promoters, while the G2 motifs were found only in psy2a. Interestingly, both ATCTA and G2 motifs were identified in the 5’ UTRs of Asupina and Cavendish psy1. Consistent with other monocot promoters, introns were present in the Asupina and Cavendish psy1 5’ UTRs, while none were observed in the psy2a 5’ UTRs. Promoters were cloned into expression constructs, driving the â-glucuronidase (GUS) reporter gene. Transient expression of the Asupina and Cavendish psy1 and psy2a promoters in both Cavendish embryogenic cells and Cavendish fruit demonstrated that all promoters were active, except Cpsy2apr2 and Cpsy2apr3. The functional Cavendish psy2a promoter (Cpsy2apr1) appeared to have activity similar to the Asupina psy2a promoter. The activities of the Asupina and Cavendish psy1 promoters were similar to each other, and comparable to those of the functional psy2a promoters. Semi-quantitative PCR analysis of Asupina and Cavendish psy1 and psy2a transcripts showed that psy2a levels were high in green fruit and decreased during ripening, reinforcing the hypothesis that fruit pVA levels were largely dependent on levels of psy2a expression. Additionally, semi-quantitative PCR using intron-spanning primers indicated that high levels of unprocessed psy2a and psy2b mRNA were present in the ripe fruit of Cavendish but not in Asupina. This raised the possibility that differences in intron processing may influence pVA accumulation in Asupina and Cavendish. In this study the role of PSY in banana pVA accumulation was analysed at a number of different levels. Both mRNA accumulation and promoter activity of psy genes studied were very similar between Asupina and Cavendish. However, in several experiments there was evidence of cryptic or alternate splicing that differed in Cavendish compared to Asupina, although these differences were not conclusively linked to the differences in fruit pVA accumulation between Asupina and Cavendish. Therefore, other carotenoid biosynthetic genes or regulatory mechanisms may be involved in determining pVA levels in these cultivars. This study has contributed to an increased understanding of the role of PSY in the production of pVA carotenoids in banana fruit, corroborating the importance of this enzyme in regulating carotenoid production. Ultimately, this work may serve to inform future research into pVA accumulation in important crop varieties such as the EAHB and the discovery of avenues to improve such crops through genetic modification.
Resumo:
In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fortunately, intermediate models employing relaxed molecular clocks have been described. These models open the gate to a new field of “relaxed phylogenetics.” Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times. Our approach also provides a means for measuring the clocklikeness of datasets and comparing this measure between different genes and phylogenies. We find no significant rate autocorrelation among branches in three large datasets, suggesting that autocorrelated models are not necessarily suitable for these data. In addition, we place these datasets on the continuum of clocklikeness between a strict molecular clock and the alternative unrooted extreme. Finally, we present analyses of 102 bacterial, 106 yeast, 61 plant, 99 metazoan, and 500 primate alignments. From these we conclude that our method is phylogenetically more accurate and precise than the traditional unrooted model while adding the ability to infer a timescale to evolution.
Resumo:
Bananas are one of the world's most important food crops, providing sustenance and income for millions of people in developing countries and supporting large export industries. Viruses are considered major constraints to banana production, germplasm multiplication and exchange, and to genetic improvement of banana through traditional breeding. In Africa, the two most important virus diseases are bunchy top, caused by Banana bunchy top virus (BBTV), and banana streak disease, caused by Banana streak virus (BSV). BBTV is a serious production constraint in a number of countries within/bordering East Africa, such as Burundi, Democratic Republic of Congo, Malawi, Mozambique, Rwanda and Zambia, but is not present in Kenya, Tanzania and Uganda. Additionally, epidemics of banana streak disease are occurring in Kenya and Uganda. The rapidly growing tissue culture (TC) industry within East Africa, aiming to provide planting material to banana farmers, has stimulated discussion about the need for virus indexing to certify planting material as virus-free. Diagnostic methods for BBTV and BSV have been reported and, for BBTV, PCR-based assays are reliable and relatively straightforward. However for BSV, high levels of serological and genetic variability and the presence of endogenous virus sequences within the banana genome complicate diagnosis. Uganda has been shown to contain the greatest diversity in BSV isolates found anywhere in the world. A broad-spectrum diagnostic test for BSV detection, which can discriminate between endogenous and episomal BSV sequences, is a priority. This PhD project aimed to establish diagnostic methods for banana viruses, with a particular focus on the development of novel methods for BSV detection, and to use these diagnostic methods for the detection and characterisation of banana viruses in East Africa. A novel rolling-circle amplification (RCA) method was developed for the detection of BSV. Using samples of Banana streak MY virus (BSMYV) and Banana streak OL virus (BSOLV) from Australia, this method was shown to distinguish between endogenous and episomal BSV sequences in banana plants. The RCA assay was used to screen a collection of 56 banana samples from south-west Uganda for BSV. RCA detected at least five distinct BSV isolates in these samples, including BSOLV and Banana streak GF virus (BSGFV) as well as three BSV isolates (Banana streak Uganda-I, -L and -M virus) for which only partial sequences had been previously reported. These latter three BSV had only been detected using immuno-capture (IC)-PCR and thus were possible endogenous sequences. In addition to its ability to detect BSV, the RCA protocol was also demonstrated to detect other viruses within the family Caulimoviridae, including Sugar cane bacilliform virus, and Cauliflower mosaic virus. Using the novel RCA method, three distinct BSV isolates from both Kenya and Uganda were identified and characterised. The complete genome of these isolates was sequenced and annotated. All six isolates were shown to have a characteristic badnavirus genome organisation with three open reading frames (ORFs) and the large polyprotein encoded by ORF 3 was shown to contain conserved amino acid motifs for movement, aspartic protease, reverse transcriptase and ribonuclease H activities. As well, several sequences important for expression and replication of the virus genome were identified including the conserved tRNAmet primer binding site present in the intergenic region of all badnaviruses. Based on the International Committee on Taxonomy of Viruses (ICTV) guidelines for species demarcation in the genus Badnavirus, these six isolates were proposed as distinct species, and named Banana streak UA virus (BSUAV), Banana streak UI virus (BSUIV), Banana streak UL virus (BSULV), Banana streak UM virus (BSUMV), Banana streak CA virus (BSCAV) and Banana streak IM virus (BSIMV). Using PCR with species-specific primers designed to each isolate, a genotypically diverse collection of 12 virus-free banana cultivars were tested for the presence of endogenous sequences. For five of the BSV no amplification was observed in any cultivar tested, while for BSIMV, four positive samples were identified in cultivars with a B-genome component. During field visits to Kenya, Tanzania and Uganda, 143 samples were collected and assayed for BSV. PCR using nine sets of species-specific primers, and RCA, were compared for BSV detection. For five BSV species with no known endogenous counterpart (namely BSCAV, BSUAV, BSUIV, BSULV and BSUMV), PCR was used to detect 30 infections from the 143 samples. Using RCA, 96.4% of these samples were considered positive, with one additional sample detected using RCA which was not positive using PCR. For these five BSV, PCR and RCA were both useful for identifying infected samples, irrespective of the host cultivar genotype (Musa A- or B-genome components). For four additional BSV with known endogenous counterparts in the M. balbisiana genome (BSOLV, BSGFV, BSMYV and BSIMV), PCR was shown to detect 75 infections from the 143 samples. In 30 samples from cultivars with an A-only genome component there was 96.3% agreement between PCR positive samples and detection using RCA, again demonstrating either PCR or RCA are suitable methods for detection. However, in 45 samples from cultivars with some B-genome component, the level of agreement between PCR positive samples and RCA positive samples was 70.5%. This suggests that, in cultivars with some B-genome component, many infections were detected using PCR which were the result of amplification of endogenous sequences. In these latter cases, RCA or another method which discriminates between endogenous and episomal sequences, such as immuno-capture PCR, is needed to diagnose episomal BSV infection. Field visits were made to Malawi and Rwanda to collect local isolates of BBTV for validation of a PCR-based diagnostic assay. The presence of BBTV in samples of bananas with bunchy top disease was confirmed in 28 out of 39 samples from Malawi and all nine samples collected in Rwanda, using PCR and RCA. For three isolates, one from Malawi and two from Rwanda, the complete nucleotide sequences were determined and shown to have a similar genome organisation to previously published BBTV isolates. The two isolates from Rwanda had at least 98.1% nucleotide sequence identity between each of the six DNA components, while the similarity between isolates from Rwanda and Malawi was between 96.2% and 99.4% depending on the DNA component. At the amino acid level, similarities in the putative proteins encoded by DNA-R, -S, -M, - C and -N were found to range between 98.8% to 100%. In a phylogenetic analysis, the three East African isolates clustered together within the South Pacific subgroup of BBTV isolates. Nucleotide sequence comparison to isolates of BBTV from outside Africa identified India as the possible origin of East African isolates of BBTV.
Novel molecular markers of Chlamydia pecorum genetic diversity in the koala (Phascolarctos cinereus)
Resumo:
Background Chlamydia pecorum is an obligate intracellular bacterium and the causative agent of reproductive and ocular disease in several animal hosts including koalas, sheep, cattle and goats. C. pecorum strains detected in koalas are genetically diverse, raising interesting questions about the origin and transmission of this species within koala hosts. While the ompA gene remains the most widely-used target in C. pecorum typing studies, it is generally recognised that surface protein encoding genes are not suited for phylogenetic analysis and it is becoming increasingly apparent that the ompA gene locus is not congruent with the phylogeny of the C. pecorum genome. Using the recently sequenced C. pecorum genome sequence (E58), we analysed 10 genes, including ompA, to evaluate the use of ompA as a molecular marker in the study of koala C. pecorum genetic diversity. Results Three genes (incA, ORF663, tarP) were found to contain sufficient nucleotide diversity and discriminatory power for detailed analysis and were used, with ompA, to genotype 24 C. pecorum PCR-positive koala samples from four populations. The most robust representation of the phylogeny of these samples was achieved through concatenation of all four gene sequences, enabling the recreation of a "true" phylogenetic signal. OmpA and incA were of limited value as fine-detailed genetic markers as they were unable to confer accurate phylogenetic distinctions between samples. On the other hand, the tarP and ORF663 genes were identified as useful "neutral" and "contingency" markers respectively, to represent the broad evolutionary history and intra-species genetic diversity of koala C. pecorum. Furthermore, the concatenation of ompA, incA and ORF663 sequences highlighted the monophyletic nature of koala C. pecorum infections by demonstrating a single evolutionary trajectory for koala hosts that is distinct from that seen in non-koala hosts. Conclusions While the continued use of ompA as a fine-detailed molecular marker for epidemiological analysis appears justified, the tarP and ORF663 genes also appear to be valuable markers of phylogenetic or biogeographic divisions at the C. pecorum intra-species level. This research has significant implications for future typing studies to understand the phylogeny, genetic diversity, and epidemiology of C. pecorum infections in the koala and other animal species.
Resumo:
Members of the Calliphoridae (blowflies) are significant for medical and veterinary management, due to the ability of some species to consume living flesh as larvae, and for forensic investigations due to the ability of others to develop in corpses. Due to the difficulty of accurately identifying larval blowflies to species there is a need for DNA-based diagnostics for this family, however the widely used DNA-barcoding marker, cox1, has been shown to fail for several groups within this family. Additionally, many phylogenetic relationships within the Calliphoridae are still unresolved, particularly deeper level relationships. Sequencing whole mt genomes has been demonstrated both as an effective method for identifying the most informative diagnostic markers and for resolving phylogenetic relationships. Twenty-seven complete, or nearly so, mt genomes were sequenced representing 13 species, seven genera and four calliphorid subfamilies and a member of the related family Tachinidae. PCR and sequencing primers developed for sequencing one calliphorid species could be reused to sequence related species within the same superfamily with success rates ranging from 61% to 100%, demonstrating the speed and efficiency with which an mt genome dataset can be assembled. Comparison of molecular divergences for each of the 13 protein-coding genes and 2 ribosomal RNA genes, at a range of taxonomic scales identified novel targets for developing as diagnostic markers which were 117–200% more variable than the markers which have been used previously in calliphorids. Phylogenetic analysis of whole mt genome sequences resulted in much stronger support for family and subfamily-level relationships. The Calliphoridae are polyphyletic, with the Polleninae more closely related to the Tachinidae, and the Sarcophagidae are the sister group of the remaining calliphorids. Within the Calliphoridae, there was strong support for the monophyly of the Chrysomyinae and Luciliinae and for the sister-grouping of Luciliinae with Calliphorinae. Relationships within Chrysomya were not well resolved. Whole mt genome data, supported the previously demonstrated paraphyly of Lucilia cuprina with respect to L. sericata and allowed us to conclude that it is due to hybrid introgression prior to the last common ancestor of modern sericata populations, rather than due to recent hybridisation, nuclear pseudogenes or incomplete lineage sorting.