139 resultados para Nucleotide-sequence Analysis
Resumo:
Molecular phylogenetic studies of homologous sequences of nucleotides often assume that the underlying evolutionary process was globally stationary, reversible, and homogeneous (SRH), and that a model of evolution with one or more site-specific and time-reversible rate matrices (e.g., the GTR rate matrix) is enough to accurately model the evolution of data over the whole tree. However, an increasing body of data suggests that evolution under these conditions is an exception, rather than the norm. To address this issue, several non-SRH models of molecular evolution have been proposed, but they either ignore heterogeneity in the substitution process across sites (HAS) or assume it can be modeled accurately using the distribution. As an alternative to these models of evolution, we introduce a family of mixture models that approximate HAS without the assumption of an underlying predefined statistical distribution. This family of mixture models is combined with non-SRH models of evolution that account for heterogeneity in the substitution process across lineages (HAL). We also present two algorithms for searching model space and identifying an optimal model of evolution that is less likely to over- or underparameterize the data. The performance of the two new algorithms was evaluated using alignments of nucleotides with 10 000 sites simulated under complex non-SRH conditions on a 25-tipped tree. The algorithms were found to be very successful, identifying the correct HAL model with a 75% success rate (the average success rate for assigning rate matrices to the tree's 48 edges was 99.25%) and, for the correct HAL model, identifying the correct HAS model with a 98% success rate. Finally, parameter estimates obtained under the correct HAL-HAS model were found to be accurate and precise. The merits of our new algorithms were illustrated with an analysis of 42 337 second codon sites extracted from a concatenation of 106 alignments of orthologous genes encoded by the nuclear genomes of Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. castellii, S. kluyveri, S. bayanus, and Candida albicans. Our results show that second codon sites in the ancestral genome of these species contained 49.1% invariable sites, 39.6% variable sites belonging to one rate category (V1), and 11.3% variable sites belonging to a second rate category (V2). The ancestral nucleotide content was found to differ markedly across these three sets of sites, and the evolutionary processes operating at the variable sites were found to be non-SRH and best modeled by a combination of eight edge-specific rate matrices (four for V1 and four for V2). The number of substitutions per site at the variable sites also differed markedly, with sites belonging to V1 evolving slower than those belonging to V2 along the lineages separating the seven species of Saccharomyces. Finally, sites belonging to V1 appeared to have ceased evolving along the lineages separating S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus, implying that they might have become so selectively constrained that they could be considered invariable sites in these species.
Resumo:
The complete nucleotide sequence of rice tungro spherical virus (RTSV) strain Vt6, originally from Mindanao, the Philippines, with higher virulence to resistant rice cultivars, was determined and compared with the published sequence for the Philippine-type strain A (RTSV-A-Shen). It was reported that RTSV-A was not able to infect a rice resistant cultivar TKM 6 (10). RTSV-Vt6 and RTSV-A-Shen share 90% and 95% homology at nucleotide and amino-acid levels, respectively. The N-terminal leader sequence of RTSV-Vt6 contained a 39-amino acids-region (positions 65 to 103) which was totally different from that of RTSV-A-Shen; the difference resulted from frame shifting by nucleotide insertions and deletions. To confirm the amino-acid sequence differences of the leader polypeptide, the same region was cloned and sequenced using a newly obtained variant of RTSV-type 6, which had been collected in the field of IRRI, and seven field isolates from Mindanao, the Philippines. Since all the sequences of the target region are identical to that of the Vt6 leader polypeptide, the sequence difference in the leader region seems not to correlate with the virulence of Vt6.
Resumo:
Ratites are large, flightless birds and include the ostrich, rheas, kiwi, emu, and cassowaries, along with extinct members, such as moa and elephant birds. Previous phylogenetic analyses of complete mitochondrial genome sequences have reinforced the traditional belief that ratites are monophyletic and tinamous are their sister group. However, in these studies ratite monophyly was enforced in the analyses that modeled rate heterogeneity among variable sites. Relaxing this topological constraint results in strong support for the tinamous (which fly) nesting within ratites. Furthermore, upon reducing base compositional bias and partitioning models of sequence evolution among protein codon positions and RNA structures, the tinamou–moa clade grouped with kiwi, emu, and cassowaries to the exclusion of the successively more divergent rheas and ostrich. These relationships are consistent with recent results from a large nuclear data set, whereas our strongly supported finding of a tinamou–moa grouping further resolves palaeognath phylogeny. We infer flight to have been lost among ratites multiple times in temporally close association with the Cretaceous–Tertiary extinction event. This circumvents requirements for transient microcontinents and island chains to explain discordance between ratite phylogeny and patterns of continental breakup. Ostriches may have dispersed to Africa from Eurasia, putting in question the status of ratites as an iconic Gondwanan relict taxon. [Base composition; flightless; Gondwana; mitochondrial genome; Palaeognathae; phylogeny; ratites.]
Resumo:
Background: Dopamine D2 receptor (DRD2) is thought to be critical in regulating the dopaminergic pathway in the brain which is known to be important in the aetiology of schizophrenia. It is therefore not surprising that most antipsychotic medication acts on the Dopamine D2 receptor. DRD2 is widely expressed in brain, levels are reduced in brains of schizophrenia patients and DRD2 polymorphisms have been associated with reduced brain expression. We have previously identified a genetic variant in DRD2, rs6277 to be strongly implicated in schizophrenia susceptibility. Methods: To identity new associations in the DRD2 gene with disease status and clinical severity, we genotyped seven single nucleotide polymorphisms (SNPs) in DRD2 using a multiplex mass spectrometry method. SNPs were chosen using a haplotype block-based gene-tagging approach so the entire DRD2 gene was represented. Results: One polymorphism rs2734839 was found to be significantly associated with schizophrenia as well as late onset age. Individuals carrying the genetic variation were more than twice as likely to have schizophrenia compared to controls. Conclusions: Our results suggest that DRD2 genetic variation is a good indicator for schizophrenia risk and may also be used as a predictor age of onset.
Resumo:
Originally developed in bioinformatics, sequence analysis is being increasingly used in social sciences for the study of life-course processes. The methodology generally employed consists in computing dissimilarities between the trajectories and, if typologies are sought, in clustering the trajectories according to their similarities or dissemblances. The choice of an appropriate dissimilarity measure is a major issue when dealing with sequence analysis for life sequences. Several dissimilarities are available in the literature, but neither of them succeeds to become indisputable. In this paper, instead of deciding upon one dissimilarity measure, we propose to use an optimal convex combination of different dissimilarities. The optimality is automatically determined by the clustering procedure and is defined with respect to the within-class variance.
Resumo:
The complete nucleotide sequence of Subterranean clover mottle virus (SCMoV) genomic RNA has been determined. The SCMoV genome is 4,258 nucleotides in length. It shares most nucleotide and amino acid sequence identity with the genome of Lucerne transient streak virus (LTSV). SCMoV RNA encodes four overlapping open reading frames and has a genome organisation similar to that of Cocksfoot mottle virus (CfMV). ORF1 and ORF4 are predicted to encode single proteins. ORF2 is predicted to encode two proteins that are derived from a -1 translational frameshift between two overlapping reading frames (ORF2a and ORF2b). A search of amino acid databases did not find a significant match for ORF1 and the function of this protein remains unclear. ORF2a contains a motif typical of chymotrypsin-like serine proteases and ORF2b has motifs characteristically present in positive-stranded RNA-dependent RNA polymerases. ORF4 is likely to be expressed from a subgenomic RNA and encodes the viral coat protein. The ORF2a/ORF2b overlapping gene expression strategy used by SCMoV and CfMV is similar to that of the poleroviruses and differ from that of other published sobemoviruses. These results suggest that the sobemoviruses could now be divided into two distinct subgroups based on those that express the RNA-dependent RNA polymerase from a single, in-frame polyprotein, and those that express it via a -1 translational frameshifting mechanism.
Resumo:
GPV is a Chinese serotype isolate of barley yellow dwarf virus (BYDV) that has no reaction with antiserum of MAV, PAV, SGV, RPV and RMV The sequence of the coat protein (CP) of GPV isolate of BYDV was identified and its amino acid sequence was deduced. The coding region for the putative GPV CP is 603 bases nucleotides and encodes a Mr 22 218 (22 ku) protein. The same as MAV, PAV and RPV, GPV contained a second ORF within the coat protein coding region. This protein of 17 024 Mr (17 ku) is thought to correspond to the Virion protein genome linked (Vpg). Sequence comparisons of the CP coding region between the GPV isolate of BYDV and other isolates of BYDV have been done. The nucleotide and amino acid sequence homology of GPV has a greater identity to the sequence of RPV than those of PAV and MAV. The GPV CP sequence stored 83.7% of nucleotide similarity and 77.5% of deduced amino acid similarity, whereas that of the PAV and MAV shared 56.9%, 53.2% and 44.1%, 43.8% respectively. According to BYDV-GPV CP sequence, two primers were designed. The cDNA of CP was produced by RT-PCR. Full-length cDNA of CP was inserted into plasmid to construct expression plasmids named pPPI1, pPPI2 and pPPI5 based on different promoters. The recombinant plasmids were identified by using α-32P-dATP labelled CP probe, α-32P-ATP labelled GPV RNA probe and sequencing to confirm real GPV CP gene cDNA in plasmids.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
Chlamydia pecorum is a significant pathogen of domestic livestock and wildlife. We have developed a C. pecorum-specific multilocus sequence analysis (MLSA) scheme to examine the genetic diversity of and relationships between Australian sheep, cattle, and koala isolates. An MLSA of seven concatenated housekeeping gene fragments was performed using 35 isolates, including 18 livestock isolates (11 Australian sheep, one Australian cow, and six U.S. livestock isolates) and 17 Australian koala isolates. Phylogenetic analyses showed that the koala isolates formed a distinct clade, with limited clustering with C. pecorum isolates from Australian sheep. We identified 11 MLSA sequence types (STs) among Australian C. pecorum isolates, 10 of them novel, with koala and sheep sharing at least one identical ST (designated ST2013Aa). ST23, previously identified in global C. pecorum livestock isolates, was observed here in a subset of Australian bovine and sheep isolates. Most notably, ST23 was found in association with multiple disease states and hosts, providing insights into the transmission of this pathogen between livestock hosts. The complexity of the epidemiology of this disease was further highlighted by the observation that at least two examples of sheep were infected with different C. pecorum STs in the eyes and gastrointestinal tract. We have demonstrated the feasibility of our MLSA scheme for understanding the host relationship that exists between Australian C. pecorum strains and provide the first molecular epidemiological data on infections in Australian livestock hosts.
Resumo:
The native Asian oyster, Crassostrea ariakensis is one of the most common and important Crassostrea species that occur naturally along the coast of East Asia. Molecular species diagnosis is a prerequisite for population genetic analysis of wild oyster populations because oyster species cannot be discriminated reliably using external morphological characters alone due to character ambiguity. To date there have been few phylogeographic studies of natural edible oyster populations in East Asia, in particular this is true of the common species in Korea C. ariakensis. We therefore assessed the levels and patterns of molecular genetic variation in East Asian wild populations of C. ariakensis from Korea, Japan, and China using DNA sequence analysis of five concatenated mtDNA regions namely; 16S rRNA, cytochrome oxidase I, cytochrome oxidase II, cytochrome oxidase III, and cytochrome b. Two divergent C. ariakensis clades were identified between southern China and remaining sites from the northern region. In addition, hierarchical AMOVA and pairwise UST analyses showed that genetic diversity was discontinuous among wild populations of C. ariakensis in East Asia. Biogeographical and historical sea level changes are discussed as potential factors that may have influenced the genetic heterogeneity of wild C. ariakensis stocks across this region.
Resumo:
Banana leaf streak disease, caused by several species of Banana streak virus (BSV), is widespread in East Africa. We surveyed for this disease in Uganda and Kenya, and used rolling-circle amplification (RCA) to detect the presence of BSV in banana. Six distinct badnavirus sequences, three from Uganda and three from Kenya, were amplified for which only partial sequences were previously available. The complete genomes were sequenced and characterised. The size and organisation of all six sequences was characteristic of other badnaviruses, including conserved functional domains present in the putative polyprotein encoded by open reading frame (ORF) 3. Based on nucleotide sequence analysis within the reverse transcriptase/ribonuclease H-coding region of open reading frame 3, we propose that these sequences be recognised as six new species and be designated as Banana streak UA virus, Banana streak UI virus, Banana streak UL virus, Banana streak UM virus, Banana streak CA virus and Banana streak IM virus. Using PCR and species-specific primers to test for the presence of integrated sequences, we demonstrated that sequences with high similarity to BSIMV only were present in several banana cultivars which had tested negative for episomal BSV sequences.
Resumo:
We investigated the role of two genes, ANKH and TNAP, in patients with cuff tear arthropathy. These genes encode proteins which regulate the extracellular concentration of inorganic pyrophosphate, fluctuations of which can lead to calcium crystal formation. Variants were detected by direct sequencing of DNA and their frequencies compared with healthy controls. The effect of variants on protein function was further studied by in vitro approaches. Variant genotypes were observed more frequently in the cases when compared with controls in ANKH (45% and 20%) and TNAP (32% and 9%). Variants in ANKH altered inorganic pyrophosphate (PPi) concentrations in transfected human chondrocytes. There was a higher mean serum concentration of TNAP detected in female patients compared with normal ranges. Cuff tear arthropathy is associated with variants in ANKH and TNAP that alter extracellular inorganic pyrophosphate concentrations causing calcium crystal deposition. This supports a theory that genetic variants predispose patients to primary crystal deposition which when combined with a massive rotator cuff tear leads to the development of arthritis.
Resumo:
With an increased emphasis on genotyping of single nucleotide polymorphisms (SNPs) in disease association studies, the genotyping platform of choice is constantly evolving. In addition, the development of more specific SNP assays and appropriate genotype validation applications is becoming increasingly critical to elucidate ambiguous genotypes. In this study, we have used SNP specific Locked Nucleic Acid (LNA) hybridization probes on a real-time PCR platform to genotype an association cohort and propose three criteria to address ambiguous genotypes. Based on the kinetic properties of PCR amplification, the three criteria address PCR amplification efficiency, the net fluorescent difference between maximal and minimal fluorescent signals and the beginning of the exponential growth phase of the reaction. Initially observed SNP allelic discrimination curves were confirmed by DNA sequencing (n = 50) and application of our three genotype criteria corroborated both sequencing and observed real-time PCR results. In addition, the tested Caucasian association cohort was in Hardy-Weinberg equilibrium and observed allele frequencies were very similar to two independently tested Caucasian association cohorts for the same tested SNP. We present here a novel approach to effectively determine ambiguous genotypes generated from a real-time PCR platform. Application of our three novel criteria provides an easy to use semi-automated genotype confirmation protocol.