933 resultados para GENOMIC SEQUENCES
Resumo:
A new algorithm, PfAGSS, for predicting 3' splice sites in Plasmodium falciparum genomic sequences is described. Application of this program to the published P. falciparum chromosome 2 and 3 data suggests that existing programs result in a high error rate in assigning 3' intron boundaries. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
Background: The increasing number of genomic sequences of bacteria makes it possible to select unique SNPs of a particular strain/species at the whole genome level and thus design specific primers based on the SNPs. The high similarity of genomic sequences among phylogenetically-related bacteria requires the identification of the few loci in the genome that can serve as unique markers for strain differentiation. PrimerSNP attempts to identify reliable strain-specific markers, on which specific primers are designed for pathogen detection purpose.Results: PrimerSNP is an online tool to design primers based on strain specific SNPs for multiple strains/species of microorganisms at the whole genome level. The allele-specific primers could distinguish query sequences of one strain from other homologous sequences by standard PCR reaction. Additionally, PrimerSNP provides a feature for designing common primers that can amplify all the homologous sequences of multiple strains/species of microorganisms. PrimerSNP is freely available at http://cropdisease.ars.usda.gov/similar to primer.Conclusion: PrimerSNP is a high-throughput specific primer generation tool for the differentiation of phylogenetically-related strains/species. Experimental validation showed that this software had a successful prediction rate of 80.4 - 100% for strain specific primer design.
Resumo:
We describe a transgenic mouse line carrying the cre transgene under the control of the adenovirus EIIa promoter that targets expression of the Cre recombinase to the early mouse embryo. To assess the ability of this recombinase to excise loxP-flanked DNA sequences at early stages of development, we bred EIIa-cre transgenic mice to two different mouse lines carrying loxP-flanked target sequences: (i) a strain with a single gene-targeted neomycin resistance gene flanked by 1oxP sites and (ii) a transgenic line carrying multiple transgene copies with internal loxP sites. Mating either of these loxP-carrying mouse lines to EIIa-cre mice resulted in first generation progeny in which the loxP-flanked sequences had been efficiently deleted from all tissues tested, including the germ cells. Interbreeding of these first generation progeny resulted in efficient germ-line transmission of the deletion to subsequent generations. These results demonstrate a method by which loxP-flanked DNA sequences can be efficiently deleted in the early mouse embryo. Potential applications of this approach are discussed, including reduction of multicopy transgene loci to produce single-copy transgenic lines and introduction of a variety of subtle mutations into the line.
Resumo:
During infections, Giardia lamblia undergoes a continuous change of its major surface antigens, the variant-specific surface proteins (VSPs). Many studies on antigenic variation have been performed using G. lamblia clone GS/M-83-H7, which expresses surface antigen VSP H7. The present study was focused on the identification and characterization of vsp gene sequences within the genome of the clonal G. lamblia GS/M-83-H7 line. For this purpose, we applied a PCR which specifically amplified truncated sequences from the 3'-terminal region of the vsp genes. Upon cloning, most of the vsp gene amplification products were shown to be approximately identical in size and thus could not be distinguished from each other by conventional gel electrophoresis. In order to pre-estimate the sequence complexity within the large panel of vsp clones isolated, we elaborated a novel concept which facilitated our large-scale genetic screening approach: PCR products from cloned DNA molecules were generated and then subjected to a DNA melting profile assay based on the use of the LightCycler Instrument. This high-throughput assay system proved to be well suited to monitor sequence differences between the amplification products from closely related vsp genes and thus could be used for the primary, sequence-related discrimination of the corresponding clones. After testing 50 candidates, vsp clones could be divided into five groups, each characterized by an individual DNA melting profile of the corresponding amplification products. Sequence analysis of some of these 50 candidates confirmed data from the aforementioned assay in that clones were demonstrated to be identical within, but different between, the distinct groups. The nucleotide and deduced amino acid sequences of five representative vsp clones showed high similarities both among each other and also with the corresponding gene segment of the variant-specific surface antigen (VSP H7) expressed by the original GS/M-83-H7 variant type. Furthermore, three of the genomic vsp sequences turned out to be identical to vsp sequences that represented previously characterized transcription products from in vivo- or in vitro-switched GS/M-83-H7 trophozoites. In conclusion, the DNA melting profile assay seems to be a versatile tool for the PCR-based genotyping of moderately or highly diversified sequence orthologues.
Resumo:
Bio-systems are inherently complex information processing systems. Furthermore, physiological complexities of biological systems limit the formation of a hypothesis in terms of behavior and the ability to test hypothesis. More importantly the identification and classification of mutation in patients are centric topics in today's cancer research. Next generation sequencing (NGS) technologies can provide genome-wide coverage at a single nucleotide resolution and at reasonable speed and cost. The unprecedented molecular characterization provided by NGS offers the potential for an individualized approach to treatment. These advances in cancer genomics have enabled scientists to interrogate cancer-specific genomic variants and compare them with the normal variants in the same patient. Analysis of this data provides a catalog of somatic variants, present in tumor genome but not in the normal tissue DNA. In this dissertation, we present a new computational framework to the problem of predicting the number of mutations on a chromosome for a certain patient, which is a fundamental problem in clinical and research fields. We begin this dissertation with the development of a framework system that is capable of utilizing published data from a longitudinal study of patients with acute myeloid leukemia (AML), who's DNA from both normal as well as malignant tissues was subjected to NGS analysis at various points in time. By processing the sequencing data at the time of cancer diagnosis using the components of our framework, we tested it by predicting the genomic regions to be mutated at the time of relapse and, later, by comparing our results with the actual regions that showed mutations (discovered at relapse time). We demonstrate that this coupling of the algorithm pipeline can drastically improve the predictive abilities of searching a reliable molecular signature. Arguably, the most important result of our research is its superior performance to other methods like Radial Basis Function Network, Sequential Minimal Optimization, and Gaussian Process. In the final part of this dissertation, we present a detailed significance, stability and statistical analysis of our model. A performance comparison of the results are presented. This work clearly lays a good foundation for future research for other types of cancer.^
Resumo:
Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present 'AnnoTALE', a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities. © 2016, Nature Publishing Group. All rights reserved.
Resumo:
The genomic sequences of the Envelope-Non-Structural protein 1 junction region (E/NS1) of 84 DEN-1 and 22 DEN-2 isolates from Brazil were determined. Most of these strains were isolated in the period from 1995 to 2001 in endemic and regions of recent dengue transmission in São Paulo State. Sequence data for DEN-1 and DEN-2 utilized in phylogenetic and split decomposition analyses also include sequences deposited in GenBank from different regions of Brazil and of the world. Phylogenetic analyses were done using both maximum likelihood and Bayesian approaches. Results for both DEN-1 and DEN-2 data are ambiguous, and support for most tree bipartitions are generally poor, suggesting that E/NS1 region does not contain enough information for recovering phylogenetic relationships among DEN-1 and DEN-2 sequences used in this study. The network graph generated in the split decomposition analysis of DEN-1 does not show evidence of grouping sequences according to country, region and clades. While the network for DEN-2 also shows ambiguities among DEN-2 sequences, it suggests that Brazilian sequences may belong to distinct subtypes of genotype III.
Resumo:
The representational difference analysis (RDA) and other subtraction techniques are used to enrich sample-specific sequences by elimination of ubiquitous sequences existing in both the sample of interest (tester) and the subtraction partner (driver). While applying the RDA to genomic DNA of cutaneous lymphoma cells in order to identify tumor relevant alterations, we predominantly isolated repetitive sequences and artificial repeat-mediated fusion products of otherwise independent PCR fragments (PCR hybrids). Since these products severely interfered with the isolation of tester-specific fragments, we developed a considerably more robust and efficient approach, termed ligation-mediated subtraction (Limes). In first applications of Limes, genomic sequences and/or transcripts of genes involved in the regulation of transcription, such as transforming growth factor β stimulated clone 22 related gene (TSC-22R), cell death and cytokine production (caspase-1) or antigen presentation (HLA class II sequences), were found to be completely absent in a cutaneous lymphoma line. On the assumption that mutations in tumor-relevant genes can affect their transcription pattern, a protocol was developed and successfully applied that allows the identification of such sequences. Due to these results, Limes may substitute/supplement other subtraction/comparison techniques such as RDA or DNA microarray techniques in a variety of different research fields.
Resumo:
The seeds of Theobroma cacao (cacao) are the source of cocoa, the raw material for the multi-billion dollar chocolate industry. Cacao`s two most important traits are its unique seed storage triglyceride (cocoa butter) and the flavor of its fermented beans (chocolate). The genome of T. cacao is being sequenced, and to expand the utility of the genome sequence to the improvement of cacao, we are evaluating Theobroma grandiflorum, the closest economically important species of Theobroma for its potential use in a comparative genomic study. T. grandiflorum differs from cacao in important agronomic traits such as flavor of the fermented beans, disease resistance to witches` broom and abscission of mature fruits. By comparing genomic sequences and analyzing viable inter-specific hybrids, we hope to identify the key genes that regulate cacao`s most important traits. We have investigated the utility in T. grandiflorum of three types of markers (microsatellite markers, single-strand conformational polymorphism markers and single nucleotide polymorphism (SNP) markers) developed in cacao. Through sequencing of amplicons of 12 diverse individuals of both cacao and T. grandiflorum, we have identified new intra- and inter-specific SNPs. Two markers which had no overlap of alleles between the species were used to genotype putative inter-specific hybrid seedlings. Sequence conservation was significant and species-specific differences numerous enough to suggest that comparative genomics of T. grandiflorum and T. cacao will be useful in elucidating the genetic differences that lead to a variety of important agronomic trait differences.
Resumo:
Invasive cervical cancer (ICC) is the third most frequent cancer among women worldwide and is associated with persistent infection by carcinogenic human papillomaviruses (HPVs). The combination of large populations of viral progeny and decades of sustained infection may allow for the generation of intra-patient diversity, in spite of the assumedly low mutation rates of PVs. While the natural history of chronic HPVs infections has been comprehensively described, within-host viral diversity remains largely unexplored. In this study we have applied next generation sequencing to the analysis of intra-host genetic diversity in ten ICC and one condyloma cases associated to single HPV16 infection. We retrieved from all cases near full-length genomic sequences. All samples analyzed contained polymorphic sites, ranging from 3 to 125 polymorphic positions per genome, and the median probability of a viral genome picked at random to be identical to the consensus sequence in the lesion was only 40%. We have also identified two independent putative duplication events in two samples, spanning the L2 and the L1 gene, respectively. Finally, we have identified with good support a chimera of human and viral DNA. We propose that viral diversity generated during HPVs chronic infection may be fueled by innate and adaptive immune pressures. Further research will be needed to understand the dynamics of viral DNA variability, differentially in benign and malignant lesions, as well as in tissues with differential intensity of immune surveillance. Finally, the impact of intralesion viral diversity on the long-term oncogenic potential may deserve closer attention.
Resumo:
One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.