928 resultados para Complete Genome
Resumo:
The current drug options for the treatment of chronic Chagas disease have not been sufficient and high hopes have been placed on the use of genomic data from the human parasite Trypanosoma cruzi to identify new drug targets and develop appropriate treatments for both acute and chronic Chagas disease. However, the lack of a complete assembly of the genomic sequence and the presence of many predicted proteins with unknown or unsure functions has hampered our complete view of the parasite's metabolic pathways. Moreover, pinpointing new drug targets has proven to be more complex than anticipated and has revealed large holes in our understanding of metabolic pathways and their integrated regulation, not only for this parasite, but for many other similar pathogens. Using an in silicocomparative study on pathway annotation and searching for analogous and specific enzymes, we have been able to predict a considerable number of additional enzymatic functions in T. cruzi. Here we focus on the energetic pathways, such as glycolysis, the pentose phosphate shunt, the Krebs cycle and lipid metabolism. We point out many enzymes that are analogous to those of the human host, which could be potential new therapeutic targets.
Resumo:
BACKGROUND: The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. RESULTS: We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. CONCLUSION: There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular - but possibly clusters of genes more generally - might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters.
Resumo:
Dengue virulence and fitness are important factors that determine disease outcome. However, dengue virus (DENV) molecular biology and pathogenesis are not completely elucidated. New insights on those mechanisms have been facilitated by the development of reverse genetic systems in the past decades. Unfortunately, instability of flavivirus genomes cloned in Escherichia coli has been a major problem in these systems. Here, we describe the development of a complete reverse genetics system, based on the construction of an infectious clone and replicon for a low passage DENV-3 genotype III of a clinical isolate. Both constructs were assembled into a newly designed yeast- E. coli shuttle vector by homologous recombination technique and propagated in yeast to prevent any possible genome instability in E. coli . RNA transcripts derived from the infectious clone are infectious upon transfection into BHK-21 cells even after repeated passages of the plasmid in yeast. Transcript-derived DENV-3 exhibited growth kinetics, focus formation size comparable to original DENV-3 in mosquito C6/36 cell culture. In vitro characterisation of DENV-3 replicon confirmed its identity and ability to replicate transiently in BHK-21 cells. The reverse genetics system reported here is a valuable tool that will facilitate further molecular studies in DENV replication, virus attenuation and pathogenesis.
Resumo:
BACKGROUND: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference. RESULTS: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. CONCLUSION: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.
Resumo:
Klebsiella pneumoniae U25 is a multidrug resistant strain isolated from a tertiary care hospital in Chennai, India. Here, we report the complete annotated genome sequence of strain U25 obtained using PacBio RSII. This is the first report of the whole genome of K. pneumoniaespecies from Chennai. It consists of a single circular chromosome of size 5,491,870-bp and two plasmids of size 211,813 and 172,619-bp. The genes associated with multidrug resistance were identified. The chromosome of U25 was found to have eight antibiotic resistant genes [blaOXA-1,blaSHV-28, aac(6’)1b-cr,catB3, oqxAB, dfrA1]. The plasmid pMGRU25-001 was found to have only one resistant gene (catA1) while plasmid pMGRU25-002 had 20 resistant genes [strAB, aadA1,aac(6’)-Ib, aac(3)-IId,sul1,2, blaTEM-1A,1B,blaOXA-9, blaCTX-M-15,blaSHV-11, cmlA1, erm(B),mph(A)]. A mutation in the porin OmpK36 was identified which is likely to be associated with the intermediate resistance to carbapenems in the absence of carbapenemase genes. U25 is one of the few K. pneumoniaestrains to harbour clustered regularly interspaced short palindromic repeats (CRISPR) systems. Two CRISPR arrays corresponding to Cas3 family helicase were identified in the genome. When compared to K. pneumoniaeNTUHK2044, a transposase gene InsH of IS5-13 was found inserted.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
Resumo:
Mammary tumors of a newly isolated strain of Chinese wild mouse (JYG mouse) harbor exogenous mouse mammary tumor virus (MMTV). The complete nucleotide sequence of exogenous JYG-MMTV was determined on the proviral 5' long terminal repeat (LTR)(partial)-gag-pol-env-3' LTR (partial) fragment cloned into a plasmid vector and the cDNA sequence from JYG-MMTV producing cells. Similarly to the other MMTV species the LTR of JYG-MMTV contains an open reading frame (ORF). The amino acid sequence of the JYG-MMTV ORF resembles that of SW-MMTV (92% identity) and endogenous Mtv-7 (93% identity) especially at the C-terminal region. Thus, a functional similarity in T-cell receptor V beta recognition as a superantigen is implicated among these MMTV species. Analysis of the viral gag nucleotide sequence revealed that this gene is not disrupted by the bacterial insertion sequence IS1 or IS2, which have been reported to be present in the majority of the plasmids containing the gag region. Comparison of amino acid sequences of JYG-MMTV with those of BR6-MMTV showed that over 96% of the amino acids of gag, pol, protease and env products are identical. These results suggest the intact nature of the nucleotide sequence of the near full-length MMTV genome cloned in the plasmid.
Resumo:
BACKGROUND: Complete mitochondrial genome sequences have become important tools for the study of genome architecture, phylogeny, and molecular evolution. Despite the rapid increase in available mitogenomes, the taxonomic sampling often poorly reflects phylogenetic diversity and is often also biased to represent deeper (family-level) evolutionary relationships. RESULTS: We present the first fully sequenced ant (Hymenoptera: Formicidae) mitochondrial genomes. We sampled four mitogenomes from three species of fire ants, genus Solenopsis, which represent various evolutionary depths. Overall, ant mitogenomes appear to be typical of hymenopteran mitogenomes, displaying a general A+T-bias. The Solenopsis mitogenomes are slightly more compact than other hymentoperan mitogenomes (~15.5 kb), retaining all protein coding genes, ribosomal, and transfer RNAs. We also present evidence of recombination between the mitogenomes of the two conspecific Solenopsis mitogenomes. Finally, we discuss potential ways to improve the estimation of phylogenies using complete mitochondrial genome sequences. CONCLUSIONS: The ant mitogenome presents an important addition to the continued efforts in studying hymenopteran mitogenome architecture, evolution, and phylogenetics. We provide further evidence that the sampling across many taxonomic levels (including conspecifics and congeners) is useful and important to gain detailed insights into mitogenome evolution. We also discuss ways that may help improve the use of mitogenomes in phylogenetic analyses by accounting for non-stationary and non-homogeneous evolution among branches.
Resumo:
The Complete Arabidopsis Transcriptome Micro Array (CATMA) database contains gene sequence tag (GST) and gene model sequences for over 70% of the predicted genes in the Arabidopsis thaliana genome as well as primer sequences for GST amplification and a wide range of supplementary information. All CATMA GST sequences are specific to the gene for which they were designed, and all gene models were predicted from a complete reannotation of the genome using uniform parameters. The database is searchable by sequence name, sequence homology or direct SQL query, and is available through the CATMA website at http://www.catma.org/.
Resumo:
Determination of the precise composition and variation of microbiota in cystic fibrosis lungs is crucial since chronic inflammation due to microorganisms leads to lung damage and ultimately, death. However, this constitutes a major technical challenge. Culturing of microorganisms does not provide a complete representation of a microbiota, even when using culturomics (high-throughput culture). So far, only PCR-based metagenomics have been investigated. However, these methods are biased towards certain microbial groups, and suffer from uncertain quantification of the different microbial domains. We have explored whole genome sequencing (WGS) using the Illumina high-throughput technology applied directly to DNA extracted from sputa obtained from two cystic fibrosis patients. To detect all microorganism groups, we used four procedures for DNA extraction, each with a different lysis protocol. We avoided biases due to whole DNA amplification thanks to the high efficiency of current Illumina technology. Phylogenomic classification of the reads by three different methods produced similar results. Our results suggest that WGS provides, in a single analysis, a better qualitative and quantitative assessment of microbiota compositions than cultures and PCRs. WGS identified a high quantity of Haemophilus spp. (patient 1) or Staphylococcus spp. plus Streptococcus spp. (patient 2) together with low amounts of anaerobic (Veillonella, Prevotella, Fusobacterium) and aerobic bacteria (Gemella, Moraxella, Granulicatella). WGS suggested that fungal members represented very low proportions of the microbiota, which were detected by cultures and PCRs because of their selectivity. The future increase of reads' sizes and decrease in cost should ensure the usefulness of WGS for the characterisation of microbiota.
Resumo:
The IncP alpha promiscuous plasmid (R18, R68, RK2, RP1 and RP4) comprises 60,099 bp of nucleotide sequence, encoding at least 74 genes. About 40 kb of the genome, designated the IncP core and including all essential replication and transfer functions, can be aligned with equivalent sequences in the IncP beta plasmid R751. The compiled IncP alpha sequence revealed several previously unidentified reading frames that are potential genes. IncP alpha plasmids carry genetic information very efficiently: the coding sequences of the genes are closely packed but rarely overlap, and occupy almost 86% of the genome's nucleotide sequence. All of the 74 genes should be expressed, although there is as yet experimental evidence for expression of only 60 of them. Six examples of tandem-in-frame initiation sites specifying two gene products each are known. Two overlapping gene arrangements occupy different reading frames of the same region. Intergenic regions include most of the 25 promoters; transcripts are usually polycistronic. Translation of most of the open reading frames seems to be initiated independently, each from its own ribosomal binding and initiation site, although, a few cases of coupled translation have been reported. The most frequently used initiation codon is AUG but translation for a few open reading frames begins at GUG or UUG. The most common stop-codon is UGA followed by UAA and then UAG. Regulatory circuits are complex and largely dependent on two components of the central control operon. KorA and KorB are transcriptional repressors controlling at least seven operons. KorA and KorB act synergistically in several cases by recognizing and binding to conserved nucleotide sequences. Twelve KorB binding sites were found around the IncP alpha sequence and these are conserved in R751 (IncP beta) with respect to both sequence and location. Replication of IncP alpha plasmids requires oriV and the plasmid-encoded initiator protein TrfA in combination with the host-encoded replication machinery. Conjugative plasmid transfer depends on two separate regions occupying about half of the genome. The primary segregational stability system designated Par/Mrs consists of a putative site-specific recombinase, a possible partitioning apparatus and a post-segregational lethality mechanism, all encoded in two divergent operons. Proteins related to the products of F sop and P1 par partitioning genes are separately encoded in the central control operon.
Resumo:
Genome-wide association studies (GWAS) are conducted with the promise to discover novel genetic variants associated with diverse traits. For most traits, associated markers individually explain just a modest fraction of the phenotypic variation, but their number can well be in the hundreds. We developed a maximum likelihood method that allows us to infer the distribution of associated variants even when many of them were missed by chance. Compared to previous approaches, the novelty of our method is that it (a) does not require having an independent (unbiased) estimate of the effect sizes; (b) makes use of the complete distribution of P-values while allowing for the false discovery rate; (c) takes into account allelic heterogeneity and the SNP pruning strategy. We applied our method to the latest GWAS meta-analysis results of the GIANT consortium. It revealed that while the explained variance of genome-wide (GW) significant SNPs is around 1% for waist-hip ratio (WHR), the observed P-values provide evidence for the existence of variants explaining 10% (CI=[8.5-11.5%]) of the phenotypic variance in total. Similarly, the total explained variance likely to exist for height is estimated to be 29% (CI=[28-30%]), three times higher than what the observed GW significant SNPs give rise to. This methodology also enables us to predict the benefit of future GWA studies that aim to reveal more associated genetic markers via increased sample size.
Resumo:
The numerous yeast genome sequences presently available provide a rich source of information for functional as well as evolutionary genomics but unequally cover the large phylogenetic diversity of extant yeasts. We present here the complete sequence of the nuclear genome of the haploid-type strain of Kuraishia capsulata (CBS1993(T)), a nitrate-assimilating Saccharomycetales of uncertain taxonomy, isolated from tunnels of insect larvae underneath coniferous barks and characterized by its copious production of extracellular polysaccharides. The sequence is composed of seven scaffolds, one per chromosome, totaling 11.4 Mb and containing 6,029 protein-coding genes, ~13.5% of which being interrupted by introns. This GC-rich yeast genome (45.7%) appears phylogenetically related with the few other nitrate-assimilating yeasts sequenced so far, Ogataea polymorpha, O. parapolymorpha, and Dekkera bruxellensis, with which it shares a very reduced number of tRNA genes, a novel tRNA sparing strategy, and a common nitrate assimilation cluster, three specific features to this group of yeasts. Centromeres were recognized in GC-poor troughs of each scaffold. The strain bears MAT alpha genes at a single MAT locus and presents a significant degree of conservation with Saccharomyces cerevisiae genes, suggesting that it can perform sexual cycles in nature, although genes involved in meiosis were not all recognized. The complete absence of conservation of synteny between K. capsulata and any other yeast genome described so far, including the three other nitrate-assimilating species, validates the interest of this species for long-range evolutionary genomic studies among Saccharomycotina yeasts.
Resumo:
The past decade has seen the emergence of next-generation sequencing (NGS) technologies, which have revolutionized the field of human molecular genetics. With NGS, significant portions of the human genome can now be assessed by direct sequence analysis, highlighting normal and pathological variants of our DNA. Recent advances have also allowed the sequencing of complete genomes, by a method referred to as whole genome sequencing (WGS). In this work, we review the use of WGS in medical genetics, with specific emphasis on the benefits and the disadvantages of this technique for detecting genomic alterations leading to Mendelian human diseases and to cancer.