928 resultados para Repetitive DNA sequences
Resumo:
In recent years there has been much progress in our understanding of the phylogeny and evolution of ticks, in particular the hard ticks (Ixodidae). Indeed, a consensus about the phylogeny of the hard ticks has emerged which is quite different to the working hypothesis of 10 years ago. So that the classification reflects our knowledge of ticks, several changes to the nomenclature of ticks are imminent or have been made. One subfamily, the Hyalomminae, should be sunk, while another, the Bothriocrotoninae, has been created (Klompen, Dobson & Barker, 2002). Bothriocrotoninae, and its sole genus Bothriocroton, have been created to house an early-diverging ('basal') lineage of endemic Australian ticks that used to be in the genus Aponomma. The remaining species of the genus Aponomma have been moved to the genus Amblyomma. Thus, the name Aponomma is no longer a valid genus name. The genus Rhipicephalus is paraphyletic with respect to the genus Boophilus. Thus, the genus Boophilus has become a subgenus of the genus Rhipicephalus (Murrell & Barker, 2003). Knowledge of the phylogenetic relationships of ticks has also provided new insights into the evolution of ornateness and of their life cycles, and has allowed the historical zoogeography of ticks to be studied. Finally, we present a list of the 899 valid genus and species names of ticks as of February 2004.
Resumo:
The molecular clock does not tick at a uniform rate in all taxa but maybe influenced by species characteristics. Eusocial species (those with reproductive division of labor) have been predicted to have faster rates of molecular evolution than their nonsocial relatives because of greatly reduced effective population size; if most individuals in a population are nonreproductive and only one or few queens produce all the offspring, then eusocial animals could have much lower effective population sizes than their solitary relatives, which should increase the rate of substitution of nearly neutral mutations. An earlier study reported faster rates in eusocial honeybees and vespid wasps but failed to correct for phylogenetic nonindependence or to distinguish between potential causes of rate variation. Because sociality has evolved independently in many different lineages, it is possible to conduct a more wide-ranging study to test the generality of the relationship. We have conducted a comparative analysis of 25 phylogenetically independent pairs of social lineages and their nonsocial relatives, including bees, wasps, ants, termites, shrimps, and mole rats, using a range of available DNA sequences (mitochondrial and nuclear DNA coding for proteins and RNAs, and nontranslated sequences). By including a wide range of social taxa, we were able to test whether there is a general influence of sociality on rates of molecular evolution and to test specific predictions of the hypothesis: (1) that social species have faster rates because they have reduced effective population sizes; (2) that mitochondrial genes would show a greater effect of sociality than nuclear genes; and (3) that rates of molecular evolution should be correlated with the degree of sociality. We find no consistent pattern in rates of molecular evolution between social and nonsocial lineages and no evidence that mitochondrial genes show faster rates in social taxa. However, we show that the most highly eusocial Hymenoptera do have faster rates than their nonsocial relatives. We also find that social parasites (that utilize the workers from related species to produce their own offspring) have faster rates than their social relatives, which is consistent with an effect of lower effective population size on rate of molecular evolution. Our results illustrate the importance of allowing for phylogenetic nonindependence when conducting investigations of determinants of variation in rate of molecular evolution.
Resumo:
Table beet production in the Lockyer Valley of south-eastern Queensland is known to be adversely affected by soilborne root disease from infection by Pythium spp. However, little is known regarding the species or genotypes that are the causal agents of both pre- and post-emergence damping off. Based on RFLP analysis with HhaI, HinfI and MboI of the PCR amplified ITS region DNA from soil and diseased plant samples, the majority of 130 Pythium isolates could be grouped into three genotypes, designated LVP A, LVP B and LVP C. These groups comprised 43, 41 and 7% of all isolates, respectively. Deoxyribonucleic acid sequence analysis of the ITS region indicated that LVP A was a strain of Pythium aphanidermatum, with greater than 99% similarity to the corresponding P. aphanidermatum sequences from the publicly accessible databases. The DNA sequences from LVP B and LVP C were most closely related to P. ultimum and P. dissotocum, respectively. Lower frequencies of other distinct isolates with unique RFLP patterns were also obtained with high levels of similarity (> 97%) to P. heterothallicum, P. periplocum and genotypes of P. ultimum other than LVP B. Inoculation trials of 1- and 4-week-old beet seedlings indicated that compared with isolates of the LVP B genotype, a higher frequency of LVP A isolates caused disease. Isolates with the LVP A, LVP B and LVP C genotypes were highly sensitive to the fungicide Ridomil MZ, which suppressed radial growth on V8 agar between approximately four and thirty fold at 5 mu g/mL metalaxyl and 40 mu g/mL mancozeb, a concentration far lower than the recommended field application rate.
Resumo:
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a metachain to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely. [Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains.]
Resumo:
Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.
Resumo:
In Late summer 1999, an outbreak of human encephalitis occurred in the northeastern United States that was concurrent with extensive mortality in crows (Corvus species) as well as the deaths of several exotic birds at a zoological park in the same area. Complete genome sequencing of a flavivirus isolated from the brain of a dead Chilean flamingo (Phoenicopterus chilensis), together with partial sequence analysis of envelope glycoprotein (E-glycoprotein) genes amplified from several other species including mosquitoes and two fatal human cases, revealed that West Nile (WN) virus circulated in natural transmission cycles and was responsible for the human disease. Antigenic mapping with E-glycoprotein-specific monoclonal antibodies and E-glycoprotein phylogenetic analysis confirmed these viruses as WN. This North American WN virus was most closely related to a WN virus isolated from a dead goose in Israel in 1998.
Resumo:
We have successfully linked protein library screening directly with the identification of active proteins, without the need for individual purification, display technologies or physical linkage between the protein and its encoding sequence. By using 'MAX' randomization we have rapidly constructed 60 overlapping gene libraries that encode zinc finger proteins, randomized variously at the three principal DNA-contacting residues. Expression and screening of the libraries against five possible target DNA sequences generated data points covering a potential 40,000 individual interactions. Comparative analysis of the resulting data enabled direct identification of active proteins. Accuracy of this library analysis methodology was confirmed by both in vitro and in vivo analyses of identified proteins to yield novel zinc finger proteins that bind to their target sequences with high affinity, as indicated by low nanomolar apparent dissociation constants.
Resumo:
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences.
Resumo:
Разработан и реализован алгоритм выявления фракталоподобных структур в ДНК- последовательностях. Фрактальность трактуется как самоподобие, основанное на свойстве симметрии или комплементарной симметрии. Локальные фракталы интересны своей способностью аккумулировать множественные палиндромно-шпилечные структуры с потенциально возможными регуляторными функциями. Выявлены реальные случаи проявления фрактальности в различных геномах: от вирусов до человека. Рассмотрена возможность использования фракталоподобных структур в качестве маркеров, различающих близкие классы последовательностей.
Resumo:
In the first part of this study human immunodeficiency virus type 1 (HIV-1) proviral DNA sequences derived from 201 clones of the C2-V3 env region and the first exon of the tat gene were obtained from six MV-1 infected heterosexual couples. These molecular data were used to confirm the epidemiological relationships. The ability of the molecular data to draw such conclusions was also tested with multiple phylogenetic analyses. The tat region was much more useful in establishing epidemiological relationships than the commonly used C2-V3.^ Subsequently, using nucleotide sequences from the first exon of the Tat gene, we tested the hypothesis that a Florida dentist (a common source) infected five of his patients in the course of dental procedures, against the null hypothesis that the dentist and each individual of the dental group independently acquired the virus within the local community. Multiple phylogenetic analyses demonstrated that the sequences of the five patients were significantly more related to each other than to sequences of the controls. Our results using Tat sequences, combined with envelope sequence data, strongly support a common phylogenetic epidemiological relationship among these five patients.^ A third study is presented, which deals with the effects of genomic variations in drug resistance. HIV-1 reverse transcriptase (RT) mutations were detected in DNA from peripheral blood mononuclear cells from 11 of 12 HIV-infected children after 11-20 months of zidovudine monotherapy. The codon 41/215 mutant combination was associated with general decline in health status. Patients developing the codon 70 mutation tended to have a better health status. ^
Resumo:
Phylogenetic analyses were performed on six genera and 46 species of the Neotropical palm tribe Geonomeae. The analyses were based on two low copy nuclear DNA sequences from the genes encoding phosphoribulokinase and RNA polymerase II. The basal node of the tribe was polytomous. Pholidostachys formed a monophyletic group. The currently accepted genera Calyptronoma and Calyptrogyne formed a well-supported clade with Calyptronoma resolved as paraphyletic to Calyptrogyne. Geonoma formed a strongly supported monophyletic group consisting of two main clades. ^ An evaluation of the genetic distinctness between Geonoma macrostachys varieties at a local and regional scale using inter-simple sequence repeat (ISSR) markers was performed. Clustering, ordination, and AMOVA suggested a lack of genetic distinctness between varieties at the regional level. A hierarchical AMOVA revealed that the genetic diversity mainly lies among the four localities sampled. A significant genetic differentiation between sympatric varieties occurred in one locality only. The current taxonomy of G. macrostachys, which recognizes only one species, was therefore supported. ^ The preferred habitat of sympatric G. macrostachys varieties with respect to edaphic, topographic, and light factors in three Peruvian lowland forests was studied. The two varieties were mostly encountered in different physiographically defined habitats, with variety acaulis occurring more often in floodplain forest and variety macrostachys in the tierra firme. Comparison of means tests revealed that nine to eleven of the 16 environmental variables were significantly different between varieties. Edaphic factors, mainly soil texture and K content, were better contributors than light conditions to distinguish the habitats occupied by the two varieties in all three study sites. It is concluded that habitat differentiation plays a role in the coexistence of these closely related species taxa. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Community structure of sediment bacteria in the Everglades freshwater marsh, fringing mangrove forest, and Florida Bay seagrass meadows were described based on polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) patterns of 16S rRNA gene fragments and by sequencing analysis of DGGE bands. The DGGE patterns were correlated with the environmental variables by means of canonical correspondence analysis. There was no significant trend in the Shannon–Weiner index among the sediment samples along the salinity gradient. However, cluster analysis based on DGGE patterns revealed that the bacterial community structure differed according to sites. Not only were these salinity/vegetation regions distinct but the sediment bacteria communities were consistently different along the gradient from freshwater marsh, mangrove forest, eastern-central Florida Bay, and western Florida Bay. Actinobacteria- and Bacteroidetes/Chlorobi-like DNA sequences were amplified throughout all sampling sites. More Chloroflexi and members of candidate division WS3 were found in freshwater marsh and mangrove forest sites than in seagrass sites. The appearance of candidate division OP8-like DNA sequences in mangrove sites distinguished these communities from those of freshwater marsh. The seagrass sites were characterized by reduced presence of bands belonging to Chloroflexi with increased presence of those bands related to Cyanobacteria, γ-Proteobacteria, Spirochetes, and Planctomycetes. This included the sulfate-reducing bacteria, which are prevalent in marine environments. Clearly, bacterial communities in the sediment were different along the gradient, which can be explained mainly by the differences in salinity and total phosphorus.