87 resultados para Genome sequence analysis
em University of Queensland eSpace - Australia
Resumo:
Despite many successes of conventional DNA sequencing methods, some DNAs remain difficult or impossible to sequence. Unsequenceable regions occur in the genomes of many biologically important organisms, including the human genome. Such regions range in length from tens to millions of bases, and may contain valuable information such as the sequences of important genes. The authors have recently developed a technique that renders a wide range of problematic DNAs amenable to sequencing. The technique is known as sequence analysis via mutagenesis (SAM). This paper presents a number of algorithms for analysing and interpreting data generated by this technique.
Resumo:
There have been no reports of DNA sequences of hepatitis B virus (HBV) strains from Australian Aborigines, although the hepatitis B surface antigen (HBsAg) was discovered among them. To investigate the characteristics of DNA sequences of HBV strains from Australian Aborigines, the complete nucleotide sequences of HBV strains were determined and subjected to molecular evolutionary analysis. Serum samples positive for HBsAg were collected from five Australian Aborigines. Phylogenetic analysis of the five complete nucleotide sequences compared with DNA sequences of 54 global HBV isolates from international databases revealed that three of the five were classified into genotype D and were most closely related in terms of evolutionary distance to a strain isolated from a healthy blood donor in Papua New Guinea. Two of the five were classified into a novel variant genotype C, which has not been reported previously, and were closely related to a strain isolated from Polynesians, particularly in the X and Core genes. These two strains of variant genotype C differed from known genotype C strains by 5.9-7.4% over the complete nucleotide sequence and 4.0-5.6 % in the small-S gene, and had residues Arg(122), Thr(127) and Lys(160) characteristic of serotype ayw3, which have not been reported previously in genotype C. In conclusion, this is the first report of the characteristics of complete nucleotide sequences of HBV from Australian Aborigines. These results contribute to the investigation of the worldwide spread of HBV, the relationship between serotype and genotype and the ancient common origin of Australian Aborigines.
Resumo:
The SOX family of transcription factors are found throughout the animal kingdom and are important in a variety of developmental contexts. Genome analysis has identified 20 Sox genes in human and mouse, which can be subdivided into 8 groups, based on sequence comparison and intron-exon structure. Most of the SOX groups identified in mammals are represented by a single SOX sequence in invertebrate model organisms, suggesting a duplication and divergence mechanism has operated during vertebrate evolution. We have now analysed the Sox gene complement in the pufferfish, Fugu rubripes, in order to shed further light on the diversity and origins of the Sox gene family. Major differences were found between the Sox family in Fugu and those in humans and mice. In particular, Fugu does not have orthologues of Sry, Sox,15 and Sox30, which appear to be specific to mammals, while Sox19, found in Fugu and zebrafish but absent in mammals, seems to be specific to fishes. Six mammalian Sox genes are represented by two copies each in Fugu, indicating a large-scale gene duplication in the fish lineage. These findings point to recent Sox gene loss, duplication and divergence occurring during the evolution of tetrapod and teleost lineages, and provide further evidence for large-scale segmental or a whole-genome duplication occurring early in the radiation of teleosts. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
We completed the genome sequence of Lettuce necrotic yellows virus (LNYV) by determining the nucleotide sequences of the 4a (putative phosphoprotein), 4b, M (matrix protein), G (glycoprotein) and L (polymerase) genes. The genome consists of 12,807 nucleotides and encodes six genes in the order 3' leader-N-4a(P)-4b-M-G-L-5' trailer. Sequences were derived from clones of a cDNA library from LNYV genomic RNA and from fragments amplified using reverse transcription-polymerase chain reaction. The 4a protein has a low isoelectric point characteristic for rhabdovirus phosphoproteins. The 4b protein has significant sequence similarities with the movement proteins of capillo- and trichoviruses and may be involved in cell-to-cell movement. The putative G protein sequence contains a predicted 25 amino acids signal peptide and endopeptidase cleavage site, three predicted glycosylation sites and a putative transmembrane domain. The deduced L protein sequence shows similarities with the L proteins of other plant rhabdoviruses and contains polymerase module motifs characteristic for RNA-dependent RNA polymerases of negative-strand RNA viruses. Phylogenetic analysis of this motif among rhabdoviruses placed LNYV in a group with other sequenced cytorhabdoviruses, most closely related to Strawberry crinkle virus. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a welldefined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
Resumo:
Liver samples from rabbits killed by RHDV, collected from five States in Australia in 1996 and 1997 were analysed by RT-PCR. A 398 bp fragment of the capsid protein (VP60) gene was amplified by PCR and directly sequenced. The alignment of the nucleotide and amino acid sequences and their comparison with the original strain of the virus released in Australia indicated genetic changes after two years have been small with 98.2% to 100% identity. The constructed phylogenetic tree suggests slight differences in nucleotide substitutions in various States but there is no clear evidence of clustering of sequences according to their geographic origin. In practical terms, sequencing of viral RNA provides a means of testing the efficacy of further releases and subsequent spread of the virus if such a strategy is employed as a means of enhancing RHD as a biological control of the wild rabbit in Australia.
Resumo:
Endoparasitoid wasps produce maternal protein secretions, which are transported into the body of insect hosts at oviposition to regulate host physiology for successful development of their offspring. Venturia canescens calyx fluid contains so-called virus-like particles (VLPs) that are essential for immune evasion of the developing parasitoid inside the host. VLPs consist of four major proteins. In this paper, we describe the isolation and molecular cloning of a gene (vlp2) that is a constituent of VLPs and discuss its possible role in VLP structure and function.
Resumo:
Aim: The aim of this study was to characterize the bacterial community adhering to the mucosa of the terminal ileum, and proximal and distal colon of the human digestive tract. Methods and Results: Pinch samples of the terminal ileum, proximal and distal colon were taken from a healthy 35-year-old, and a 68-year-old subject with mild diverticulosis. The 16S rDNA genes were amplified using a low number of PCR cycles, cloned, and sequenced. In total, 361 sequences were obtained comprising 70 operational taxonomic units (OTU), with a calculated coverage of 82.6%. Twenty-three per cent of OTU were common to the terminal ileum, proximal colon and distal colon, but 14% OTU were only found in the terminal ileum, and 43% were only associated with the proximal or distal colon. The most frequently represented clones were from the Clostridium group XIVa (24.7%), and the Bacteroidetes (Cytophaga-Flavobacteria-Bacteroides ) cluster (27.7%). Conclusion: Comparison of 16S rDNA clone libraries of the hindgut across mammalian species confirms that the distribution of phylogenetic groups is similar irrespective of the host species. Lesser site-related differences within groups or clusters of organisms, are probable. Significance and Impact: This study provides further evidence of the distribution of the bacteria on the mucosal surfaces of the human hindgut. Data contribute to the benchmarking of the microbial composition of the human digestive tract.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
The nuclectide sequence for pituitary prolactin cDNA from the marsupial bandicoot (Isoodon macrourus) was determined by reverse transcription-polymerase chain reaction and 5'/3' rapid amplification of cDNA ends. The deduced amino acid sequence showed high sequence identity with brushtail possum prolactin (95%) and all of the expected structural features of a quadruped prolactin. A prolactin gene tree was constructed and rates of evolution calculated for bandicoot, possum, opossum and several mammalian and non-mammalian prolactins. Bootstrap analysis provided strong support for marsupials as a sister group with eutherian mammals and weak support for opossum and bandicoot as an independent grouping from the brushtail possum. The rates of molecular evolution for marsupial prolactins were comparable to the slow rate seen in the majority of quadruped prolactins that have been sequenced. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
Objective: The description and evaluation of the performance of a new real-time seizure detection algorithm in the newborn infant. Methods: The algorithm includes parallel fragmentation of EEG signal into waves; wave-feature extraction and averaging; elementary, preliminary and final detection. The algorithm detects EEG waves with heightened regularity, using wave intervals, amplitudes and shapes. The performance of the algorithm was assessed with the use of event-based and liberal and conservative time-based approaches and compared with the performance of Gotman's and Liu's algorithms. Results: The algorithm was assessed on multi-channel EEG records of 55 neonates including 17 with seizures. The algorithm showed sensitivities ranging 83-95% with positive predictive values (PPV) 48-77%. There were 2.0 false positive detections per hour. In comparison, Gotman's algorithm (with 30 s gap-closing procedure) displayed sensitivities of 45-88% and PPV 29-56%; with 7.4 false positives per hour and Liu's algorithm displayed sensitivities of 96-99%, and PPV 10-25%; with 15.7 false positives per hour. Conclusions: The wave-sequence analysis based algorithm displayed higher sensitivity, higher PPV and a substantially lower level of false positives than two previously published algorithms. Significance: The proposed algorithm provides a basis for major improvements in neonatal seizure detection and monitoring. Published by Elsevier Ireland Ltd. on behalf of International Federation of Clinical Neurophysiology.