43 resultados para DNA sequences
em Queensland University of Technology - ePrints Archive
Resumo:
Australasian marsupials include three major radiations, the insectivorous/carnivorous Dasyuromorphia, the omnivorous bandicoots (Peramelemorphia), and the largely herbivorous diprotodontians. Morphologists have generally considered the bandicoots and diprotodontians to be closely related, most prominently because they are both syndactylous (with the 2nd and 3rd pedal digits being fused). Molecular studies have been unable to confirm or reject this Syndactyla hypothesis. Here we present new mitochondrial (mt) genomes from a spiny bandicoot (Echymipera rufescens) and two dasyurids, a fat-tailed dunnart (Sminthopsis crassicaudata) and a northern quoll (Dasyurus hallucatus). By comparing trees derived from pairwise base-frequency differences between taxa with standard (absolute, uncorrected) distance trees, we infer that composition bias among mt protein-coding and RNA sequences is sufficient to mislead tree reconstruction. This can explain incongruence between trees obtained from mt and nuclear data sets. However, after excluding major sources of compositional heterogeneity, both the “reduced-bias” mt and nuclear data sets clearly favor a bandicoot plus dasyuromorphian association, as well as a grouping of kangaroos and possums (Phalangeriformes) among diprotodontians. Notably, alternatives to these groupings could only be confidently rejected by combining the mt and nuclear data. Elsewhere on the tree, Dromiciops appears to be sister to the monophyletic Australasian marsupials, whereas the placement of the marsupial mole (Notoryctes) remains problematic. More generally, we contend that it is desirable to combine mt genome and nuclear sequences for inferring vertebrate phylogeny, but as separately modeled process partitions. This strategy depends on detecting and excluding (or accounting for) major sources of nonhistorical signal, such as from compositional nonstationarity.
Resumo:
We used in vivo (biological), in silico (computational structure prediction), and in vitro (model sequence folding) analyses of single-stranded DNA sequences to show that nucleic acid folding conservation is the selective principle behind a high-frequency single-nucleotide reversion observed in a three-nucleotide mutated motif of the Maize streak virus replication associated protein (Rep) gene. In silico and in vitro studies showed that the three-nucleotide mutation adversely affected Rep nucleic acid folding, and that the single-nucleotide reversion [C(601)A] restored wild-type-like folding. In vivo support came from infecting maize with mutant viruses: those with Rep genes containing nucleotide changes predicted to restore a wild-type-like fold [A(601)/G(601)] preferentially accumulated over those predicted to fold differently [C(601)/T(601)], which frequently reverted to A(601) and displaced the original population. We propose that the selection of native nucleic acid folding is an epigenetic effect, which might have broad implications in the evolution of plants and their viruses.
Resumo:
Over the last few years, investigations of human epigenetic profiles have identified key elements of change to be Histone Modifications, stable and heritable DNA methylation and Chromatin remodeling. These factors determine gene expression levels and characterise conditions leading to disease. In order to extract information embedded in long DNA sequences, data mining and pattern recognition tools are widely used, but efforts have been limited to date with respect to analyzing epigenetic changes, and their role as catalysts in disease onset. Useful insight, however, can be gained by investigation of associated dinucleotide distributions. The focus of this paper is to explore specific dinucleotides frequencies across defined regions within the human genome, and to identify new patterns between epigenetic mechanisms and DNA content. Signal processing methods, including Fourier and Wavelet Transformations, are employed and principal results are reported.
Resumo:
In the century since the description of the orthoclad genus Paratrichocladius Santos-Abreu (Diptera: Chironomidae), separation in any life stage from the cosmopolitan, diverse Cricotopus Wulp has been problematic. Molecular analysis reveals the presence of two species in Australia that conform in morphology to Paratrichocladius and which form a well-supported clade including Paratrichocladius micans (Kieffer) from Africa and a distinct southern African larva. This clade clusters with taxa allied with Cricotopus albitibia (Walker), in turn nested within all other sampled Australian Cricotopus. Relevant nodes strongly support Cricotopus as nonmonophyletic without inclusion of Paratrichocladius. We synonymize Paratrichocladius with Cricotopus syn.n, treating Paratrichocladius as a subgenus. Cricotopus (Paratrichocladius) australiensis Cranston sp.n. is described for Trichocladius pluriserialis Freeman from Australia, which is not the same species under that name in New Zealand. Cricotopus (Paratrichocladius) bifenestrus Cranston sp.n. from Australia is described, also in all life stages. The many new combinations, listed in an Appendix, include three replacement names for new secondary homonyms, namely: Cricotopus (Paratrichocladius) sinobicinctus Cranston & Krosch nom.n. for Paratrichocladius bicinctus Fu, Sæther & Wang, Cricotopus draysoni Cranston & Krosch nom.n. for Cricotopus brevicornis Drayson, Krosch & Cranston, and Cricotopus (Paratrichocladius) sikhotealinus Makarchenko & Makarchenko nom.n. for Cricotopus orientalis Kieffer. We conclude with comments on wider issues in the taxonomy of Paratrichocladius, especially concerning New Zealand species.
Resumo:
Studies continue to report ancient DNA sequences and viable microbial cells that are many millions of years old. In this paper we evaluate some of the most extravagant claims of geologically ancient DNA. We conclude that although exciting, the reports suffer from inadequate experimental setup and insufficient authentication of results. Consequently, it remains doubtful whether amplifiable DNA sequences and viable bacteria can survive over geological timescales. To enhance the credibility of future studies and assist in discarding false-positive results, we propose a rigorous set of authentication criteria for work with geologically ancient DNA.
Resumo:
The effect of two different DNA minor groove binding molecules, Hoechst 33258 and distamycin A, on the binding kinetics of NF-κB p50 to three different specific DNA sequences was studied at various salt concentrations. Distamycin A was shown to significantly increase the dissociation rate constant of p50 from the sequences PRDII (5′-GGGAAATTCC-3′) and Ig-κ B (5′-GGGACTTTCC-3′) but had a negligible effect on the dissociation from the palindromic target-κB binding site (5′-GGGAATTCCC-3′). By comparison, the effect of Hoechst 33258 on binding of p50 to each sequence was found to be minimal. The dissociation rates for the protein–DNA complexes increased at higher potassium chloride concentrations for the PRDII and Ig-κB binding motifs and this effect was magnified by distamycin A. In contrast, p50 bound to the palindromic target-κB site with a much higher intrinsic affinity and exhibited a significantly reduced salt dependence of binding over the ionic strength range studied, retaining a KD of less than 10 pM at 150 mM KCl. Our results demonstrate that the DNA binding kinetics of p50 and their salt dependence is strongly sequence-dependent and, in addition, that the binding of p50 to DNA can be influenced by the addition of minor groove-binding drugs in a sequence-dependent manner.
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Resumo:
The DNA of three biological variants, G1, Ic and G2, which originated from the same greenhouse isolate of rice tungro bacilliform virus (RTBV) at the International Rice Research Institute (IRRI), was cloned and sequenced. Comparison of the sequences revealed small differences in genome sizes. The variants were between 95 and 99% identical at the nucleotide and amino acid levels. Alignment of the three genome sequences with those of three published RTBV sequences (Phi-1, Phi-2 and Phi-3) revealed numerous nucleotide substitutions and some insertions and deletions. The published RTBV sequences originated from the same greenhouse isolate at IRRI 20, 11 and 9 years ago. All open reading frames (ORFs) and known functional domains were conserved across the six variants. The cysteine-rich region of ORF3 showed the greatest variation. When the six DNA sequences from IRRI were compared with that of an isolate from Malaysia (Serdang), similar changes were observed in the cysteine-rich region in addition to other nucleotide substitutions and deletions across the genome. The aligned nucleotide sequences of the IRRI variants and Serdang were used to analyse phylogenetic relationships by the bootstrapped parsimony, distance and maximum-likelihood methods. The isolates clustered in three groups: Serdang alone; Ic and G1; and Phi-1, Phi-2, Phi-3 and G2. The distribution of phylogenetically informative residues in the IRRI sequences shared with the Serdang sequence and the differing tree topologies for segments of the genome suggested that recombination, as well as substitutions and insertions or deletions, has played a role in the evolution of RTBV variants. The significance and implications of these evolutionary forces are discussed in comparison with badnaviruses and caulimoviruses.
Resumo:
Bioinformatics involves analyses of biological data such as DNA sequences, microarrays and protein-protein interaction (PPI) networks. Its two main objectives are the identification of genes or proteins and the prediction of their functions. Biological data often contain uncertain and imprecise information. Fuzzy theory provides useful tools to deal with this type of information, hence has played an important role in analyses of biological data. In this thesis, we aim to develop some new fuzzy techniques and apply them on DNA microarrays and PPI networks. We will focus on three problems: (1) clustering of microarrays; (2) identification of disease-associated genes in microarrays; and (3) identification of protein complexes in PPI networks. The first part of the thesis aims to detect, by the fuzzy C-means (FCM) method, clustering structures in DNA microarrays corrupted by noise. Because of the presence of noise, some clustering structures found in random data may not have any biological significance. In this part, we propose to combine the FCM with the empirical mode decomposition (EMD) for clustering microarray data. The purpose of EMD is to reduce, preferably to remove, the effect of noise, resulting in what is known as denoised data. We call this method the fuzzy C-means method with empirical mode decomposition (FCM-EMD). We applied this method on yeast and serum microarrays, and the silhouette values are used for assessment of the quality of clustering. The results indicate that the clustering structures of denoised data are more reasonable, implying that genes have tighter association with their clusters. Furthermore we found that the estimation of the fuzzy parameter m, which is a difficult step, can be avoided to some extent by analysing denoised microarray data. The second part aims to identify disease-associated genes from DNA microarray data which are generated under different conditions, e.g., patients and normal people. We developed a type-2 fuzzy membership (FM) function for identification of diseaseassociated genes. This approach is applied to diabetes and lung cancer data, and a comparison with the original FM test was carried out. Among the ten best-ranked genes of diabetes identified by the type-2 FM test, seven genes have been confirmed as diabetes-associated genes according to gene description information in Gene Bank and the published literature. An additional gene is further identified. Among the ten best-ranked genes identified in lung cancer data, seven are confirmed that they are associated with lung cancer or its treatment. The type-2 FM-d values are significantly different, which makes the identifications more convincing than the original FM test. The third part of the thesis aims to identify protein complexes in large interaction networks. Identification of protein complexes is crucial to understand the principles of cellular organisation and to predict protein functions. In this part, we proposed a novel method which combines the fuzzy clustering method and interaction probability to identify the overlapping and non-overlapping community structures in PPI networks, then to detect protein complexes in these sub-networks. Our method is based on both the fuzzy relation model and the graph model. We applied the method on several PPI networks and compared with a popular protein complex identification method, the clique percolation method. For the same data, we detected more protein complexes. We also applied our method on two social networks. The results showed our method works well for detecting sub-networks and give a reasonable understanding of these communities.
Resumo:
Living mammals can be divided into three subclasses (monotremes, marsupials and placentals) and within these, about 27 orders. Final resolution of the relationships between the orders is only now being achieved with the increased availability of deoxyribonucleic acid (DNA) sequences. Highlights include the deep division of placental mammals into African (Afrotheria), South American (Xenarthra) and northern hemisphere (Boreoeutheria) super-orders, and the finding that the once considered primitive ‘Insectivora’ and ‘Edentata’ clades, in fact, have members distributed widely among these super-orders. Another surprise finding from DNA studies has been that whale origins lie among the even-toed ungulates (Artiodactyla). Our order, Primates is most closely related to the flying lemurs and next, the tree shrews. With the mammal phylogeny becoming well resolved, it is increasingly being used as a framework for inferring evolutionary and ecological processes, such as adaptive radiation.
Resumo:
Background Evolutionary biologists are often misled by convergence of morphology and this has been common in the study of bird evolution. However, the use of molecular data sets have their own problems and phylogenies based on short DNA sequences have the potential to mislead us too. The relationships among clades and timing of the evolution of modern birds (Neoaves) has not yet been well resolved. Evidence of convergence of morphology remain controversial. With six new bird mitochondrial genomes (hummingbird, swift, kagu, rail, flamingo and grebe) we test the proposed Metaves/Coronaves division within Neoaves and the parallel radiations in this primary avian clade. Results Our mitochondrial trees did not return the Metaves clade that had been proposed based on one nuclear intron sequence. We suggest that the high number of indels within the seventh intron of the β-fibrinogen gene at this phylogenetic level, which left a dataset with not a single site across the alignment shared by all taxa, resulted in artifacts during analysis. With respect to the overall avian tree, we find the flamingo and grebe are sister taxa and basal to the shorebirds (Charadriiformes). Using a novel site-stripping technique for noise-reduction we found this relationship to be stable. The hummingbird/swift clade is outside the large and very diverse group of raptors, shore and sea birds. Unexpectedly the kagu is not closely related to the rail in our analysis, but because neither the kagu nor the rail have close affinity to any taxa within this dataset of 41 birds, their placement is not yet resolved. Conclusion Our phylogenetic hypothesis based on 41 avian mitochondrial genomes (13,229 bp) rejects monophyly of seven Metaves species and we therefore conclude that the members of Metaves do not share a common evolutionary history within the Neoaves.
Resumo:
Global aquaculture has expanded rapidly to address the increasing demand for aquatic protein needs and an uncertain future for wild fisheries. To date, however, most farmed aquatic stocks are essentially wild and little is known about their genomes or the genes that affect important economic traits in culture. Biologists have recognized that recent technological advances including next generation sequencing (NGS) have opened up the possibility of generating genome wide sequence data sets rapidly from non-model organisms at a reasonable cost. In an era when virtually any study organism can 'go genomic', understanding gene function and genetic effects on expressed quantitative trait locus phenotypes will be fundamental to future knowledge development. Many factors can influence the individual growth rate in target species but of particular importance in agriculture and aquaculture will be the identification and characterization of the specific gene loci that contribute important phenotypic variation to growth because the information can be applied to speed up genetic improvement programmes and to increase productivity via marker-assisted selection (MAS). While currently there is only limited genomic information available for any crustacean species, a number of putative candidate genes have been identified or implicated in growth and muscle development in some species. In an effort to stimulate increased research on the identification of growth-related genes in crustacean species, here we review the available information on: (i) associations between genes and growth reported in crustaceans, (ii) growth-related genes involved with moulting, (iii) muscle development and degradation genes involved in moulting, and; (iv) correlations between DNA sequences that have confirmed growth trait effects in farmed animal species used in terrestrial agriculture and related sequences in crustacean species. The information in concert can provide a foundation for increasing the rate at which knowledge about key genes affecting growth traits in crustacean species is gained.
Resumo:
In the recent decision Association for Molecular Pathology v. Myriad Genetics1, the US Supreme Court held that naturally occurring sequences from human genomic DNA are not patentable subject matter. Only certain complementary DNAs (cDNA), modified sequences and methods to use sequences are potentially patentable. It is likely that this distinction will hold for all DNA sequences, whether animal, plant or microbial2. However, it is not clear whether this means that other naturally occurring informational molecules, such as polypeptides (proteins) or polysaccharides, will also be excluded from patents. The decision underscores a pressing need for precise analysis of patents that disclose and reference genetic sequences, especially in the claims. Similarly, data sets, standards compliance and analytical tools must be improved—in particular, data sets and analytical tools must be made openly accessible—in order to provide a basis for effective decision making and policy setting to support biological innovation. Here, we present a web-based platform that allows such data aggregation, analysis and visualization in an open, shareable facility. To demonstrate the potential for the extension of this platform to global patent jurisdictions, we discuss the results of a global survey of patent offices that shows that much progress is still needed in making these data freely available for aggregation in the first place.
Resumo:
Transposable elements, which are DNA sequences that can move between different sites in genomes, comprise approximately 40% of the genome of mammals and are emerging as important contributors to biological diversity. Here we report a transcription unit lying within intron 1 of the murine Magi1 (membrane associated guanylate kinase inverted 1) gene that codes for a cell-cell junction scaffolding protein. The transcription unit, termed Magi1OS (Magi1 Opposite Strand), originates from a region with tandem B1 short interspersed nuclear elements (SINEs) and is an antisense gene to Magi1. Mag1OS transcription initiates in a proximal B1 element that shows only 4% divergence from the consensus sequence, indicating that it has been recently inserted into the mouse genome and could be replication competent. Moreover, a chimaeric transcript may result from intra-chromosomal interaction and trans-splicing of the Magi1 antisense transcript (Magi1OS) and Ghrl, which codes for the multifunctional peptide hormone ghrelin. These two genes are 20 megabases apart on chromosome 6 and are transcribed in opposite directions. We propose that the Magi1OS locus may serve as a useful model system to study exaptation and retrotransposition of B1 SINEs, as well as to examine the mechanisms of intra-chromosomal trans-splicing.