963 resultados para Prokaryotic Genomes
Resumo:
Polyomavirus JC (JCV) is ubiquitous in humans and causes a chronic demyelinating disease of the central nervous system , progressive multifocal leukoencephalopathy which is common in AIDS. JCV is excreted in urine of 30-70% of adults worldwide. Based on sequence analysis of JCV complete genomes or fragments thereof, JCV can be classified into geographically derived genotypes. Types 1 and 2 are of European and Asian origin respectively while Types 3 and 6 are African in origin. Type 4, a possible recombinant of European and African genotypes (1 and 3) is common in the USA. To delineate the JCV genotypes in an aboriginal African population, random urine samples were collected from the Biaka Pygmies and Bantu from the Central African Republic. There were 43 males and 25 females aged 4-55 years, with an average age of 26 years. After PCR amplification of JCV in urine, products were directly cycle sequenced. Five of 23 Pygmy adults (22%) and four of 20 Bantu adults (20%) were positive for JC viruria. DNA sequence analysis revealed JCV Type 3 (two), Type 6 (two) and one Type 1 variant in Biaka Pygmies. All the Bantu strains were Type 6. Type 3 and 6 strains of JCV are the predominant strains in central Africa. The presence of multiple subtypes of JCV in Biaka Pygmies may be a result of extensive interactions of Pygmies with their African tribal neighbors during their itinerant movements in the equatorial forest.
Resumo:
Human and chimpanzee genomes are 98.8% identical within comparable sequences. However, they differ structurally in nine pericentric inversions, one fusion that originated human chromosome 2, and content and localization of heterochromatin and lineage-specific segmental duplications. The possible functional consequences of these cytogenetic and structural differences are not fully understood and their possible involvement in speciation remains unclear. We show that subtelomeric regions-regions that have a species-specific organization, are more divergent in sequence, and are enriched in genes and recombination hotspots-are significantly enriched for species-specific histone modifications that decorate transcription start sites in different tissues in both human and chimpanzee. The human lineage-specific chromosome 2 fusion point and ancestral centromere locus as well as chromosome 1 and 18 pericentric inversion breakpoints showed enrichment of human-specific H3K4me3 peaks in the prefrontal cortex. Our results reveal an association between plastic regions and potential novel regulatory elements.
Resumo:
Loss-of-function variants in innate immunity genes are associated with Mendelian disorders in the form of primary immunodeficiencies. Recent resequencing projects report that stop-gains and frameshifts are collectively prevalent in humans and could be responsible for some of the inter-individual variability in innate immune response. Current computational approaches evaluating loss-of-function in genes carrying these variants rely on gene-level characteristics such as evolutionary conservation and functional redundancy across the genome. However, innate immunity genes represent a particular case because they are more likely to be under positive selection and duplicated. To create a ranking of severity that would be applicable to innate immunity genes we evaluated 17,764 stop-gain and 13,915 frameshift variants from the NHLBI Exome Sequencing Project and 1,000 Genomes Project. Sequence-based features such as loss of functional domains, isoform-specific truncation and nonsense-mediated decay were found to correlate with variant allele frequency and validated with gene expression data. We integrated these features in a Bayesian classification scheme and benchmarked its use in predicting pathogenic variants against Online Mendelian Inheritance in Man (OMIM) disease stop-gains and frameshifts. The classification scheme was applied in the assessment of 335 stop-gains and 236 frameshifts affecting 227 interferon-stimulated genes. The sequence-based score ranks variants in innate immunity genes according to their potential to cause disease, and complements existing gene-based pathogenicity scores. Specifically, the sequence-based score improves measurement of functional gene impairment, discriminates across different variants in a given gene and appears particularly useful for analysis of less conserved genes.
Resumo:
BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints.Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements.Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes.
Resumo:
With the advent of High performance computing, it is now possible to achieve orders of magnitude performance and computation e ciency gains over conventional computer architectures. This thesis explores the potential of using high performance computing to accelerate whole genome alignment. A parallel technique is applied to an algorithm for whole genome alignment, this technique is explained and some experiments were carried out to test it. This technique is based in a fair usage of the available resource to execute genome alignment and how this can be used in HPC clusters. This work is a rst approximation to whole genome alignment and it shows the advantages of parallelism and some of the drawbacks that our technique has. This work describes the resource limitations of current WGA applications when dealing with large quantities of sequences. It proposes a parallel heuristic to distribute the load and to assure that alignment quality is mantained.
Resumo:
Desde el inicio del proyecto del genoma humano y su éxito en el año 2001 se han secuenciado genomas de multitud de especies. La mejora en las tecnologías de secuenciación ha generado volúmenes de datos con un crecimiento exponencial. El proyecto Análisis bioinformáticos sobre la tecnología Hadoop abarca la computación paralela de datos biológicos como son las secuencias de ADN. El estudio ha sido encauzado por la naturaleza del problema a resolver. El alineamiento de secuencias genéticas con el paradigma MapReduce.
Resumo:
La cerca de similituds entre regions de diferents genomes ofereix molta informació sobre les relaciones entre les especies d’aquest genomes. Es molt útil per a l’estudi de la conservació de gens d’una especia a un altre, de com les propietats d’un gen son assignades a un altre gen o de com es creen variacions en genomes diferents durant l’evolució d’aquestes especies. La finalitat d’aquest projecte es la creació d’una eina per a la cerca d’ancestres comuns de diferents especies basada en la comparació de la conservació entre regions dels genomes d’aquestes especies. Per a una comparació entre genomes mes eficaç una part important del projecte es destinarà a la creació d’una nova unitat de comparació. Aquestes noves unitats seran superestructures basades en agrupació dels MUMs existent per la mateixa comparació que anomenarem superMUMs. La aplicació final estarà disponible al servidor: http://revolutionresearch.uab.es
Resumo:
To estimate the minimal gene set required to sustain bacterial life in nutritious conditions, we carried out a systematic inactivation of Bacillus subtilis genes. Among approximately 4,100 genes of the organism, only 192 were shown to be indispensable by this or previous work. Another 79 genes were predicted to be essential. The vast majority of essential genes were categorized in relatively few domains of cell metabolism, with about half involved in information processing, one-fifth involved in the synthesis of cell envelope and the determination of cell shape and division, and one-tenth related to cell energetics. Only 4% of essential genes encode unknown functions. Most essential genes are present throughout a wide range of Bacteria, and almost 70% can also be found in Archaea and Eucarya. However, essential genes related to cell envelope, shape, division, and respiration tend to be lost from bacteria with small genomes. Unexpectedly, most genes involved in the Embden-Meyerhof-Parnas pathway are essential. Identification of unknown and unexpected essential genes opens research avenues to better understanding of processes that sustain bacterial life.
Resumo:
Las herramientas de análisis de secuencias genómicas permiten a los biólogos identificar y entender regiones fundamentales que tienen implicación en enfermedades genéticas. Actualmente existe una necesidad de dotar al ámbito científico de herramientas de análisis eficientes. Este proyecto lleva a cabo una caracterización y análisis del rendimiento de algoritmos utilizados en la comparación de secuencias genómicas completas, y ejecutadas en arquitecturas MultiCore y ManyCore. A partir del análisis se evalúa la idoneidad de este tipo de arquitecturas para resolver el problema de comparar secuencias genómicas. Finalmente se propone una serie de modificaciones en las implementaciones de estos algoritmos con el objetivo de mejorar el rendimiento.
Resumo:
Las aplicaciones de alineamiento de secuencias son una herramienta importante para la comunidad científica. Estas aplicaciones bioinformáticas son usadas en muchos campos distintos como pueden ser la medicina, la biología, la farmacología, la genética, etc. A día de hoy los algoritmos de alineamiento de secuencias tienen una complejidad elevada y cada día tienen que manejar un volumen de datos más grande. Por esta razón se deben buscar alternativas para que estas aplicaciones sean capaces de manejar el aumento de tamaño que los bancos de secuencias están sufriendo día a día. En este proyecto se estudian y se investigan mejoras en este tipo de aplicaciones como puede ser el uso de sistemas paralelos que pueden mejorar el rendimiento notablemente.
Resumo:
The parasite-host-environment system is dynamic, with several points of equilibrium. This makes it difficult to trace the thresholds between benefit and damage, and therefore, the definitions of commensalism, mutualism, and symbiosis become worthless. Therefore, the same concept of parasitism may encompass commensalism, mutualism, and symbiosis. Parasitism is essential for life. Life emerged as a consequence of parasitism at the molecular level, and intracellular parasitism created evolutive events that allowed species to diversify. An ecological and evolutive approach to the study of parasitism is presented here. Studies of the origin and evolution of parasitism have new perspectives with the development of molecular paleoparasitology, by which ancient parasite and host genomes can be recovered from disappeared populations. Molecular paleoparasitology points to host-parasite co-evolutive mechanisms of evolution traceable through genome retrospective studies.
Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.
Resumo:
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.
Resumo:
Pendant ma thèse de doctorat, j'ai utilisé des espèces modèles, comme la souris et le poisson-zèbre, pour étudier les facteurs qui affectent l'évolution des gènes et leur expression. Plus précisément, j'ai montré que l'anatomie et le développement sont des facteurs clés à prendre en compte, car ils influencent la vitesse d'évolution de la séquence des gènes, l'impact sur eux de mutations (i.e. la délétion du gène est-elle létale ?), et leur tendance à se dupliquer. Où et quand il est exprimé impose à un gène certaines contraintes ou au contraire lui donne des opportunités d'évoluer. J'ai pu comparer ces tendances aux modèles classiques d'évolution de la morphologie, que l'on pensait auparavant refléter directement les contraintes s'appliquant sur le génome. Nous avons montré que les contraintes entre ces deux niveaux d'organisation ne peuvent pas être transférées simplement : il n'y a pas de lien direct entre la conservation du génotype et celle de phénotypes comme la morphologie. Ce travail a été possible grâce au développement d'outils bioinformatiques. Notamment, j'ai travaillé sur le développement de la base de données Bgee, qui a pour but de comparer l'expression des gènes entre différentes espèces de manière automatique et à large échelle. Cela implique une formalisation de l'anatomie, du développement et de concepts liés à l'homologie grâce à l'utilisation d'ontologies. Une intégration cohérente de données d'expression hétérogènes (puces à ADN, marqueurs de séquence exprimée, hybridations in situ) a aussi été nécessaire. Cette base de données est mise à jour régulièrement et disponible librement. Elle devrait contribuer à étendre les possibilités de comparaison de l'expression des gènes entre espèces pour des études d'évo-devo (évolution du développement) et de génomique. During my PhD, I used model species of vertebrates, such as mouse and zebrafish, to study factors affecting the evolution of genes and their expression. More precisely I have shown that anatomy and development are key factors to take into account, influencing the rate of gene sequence evolution, the impact of mutations (i.e. is the deletion of a gene lethal?), and the propensity of a gene to duplicate. Where and when genes are expressed imposes constraints, or on the contrary leaves them some opportunity to evolve. We analyzed these patterns in relation to classical models of morphological evolution in vertebrates, which were previously thought to directly reflect constraints on the genomes. We showed that the patterns of evolution at these two levels of organization do not translate smoothly: there is no direct link between the conservation of genotype and phenotypes such as morphology. This work was made possible by the development of bioinformatics tools. Notably, I worked on the development of the database Bgee, which aims at comparing gene expression between different species in an automated and large-scale way. This involves the formalization of anatomy, development, and concepts related to homology, through the use of ontologies. A coherent integration of heterogeneous expression data (microarray, expressed sequence tags, in situ hybridizations) is also required. This database is regularly updated and freely available. It should contribute to extend the possibilities for comparison of gene expression between species in evo-devo and genomics studies.
Resumo:
We report a nested reverse transcription-polymerase chain reaction (RT-PCR) assay for hantavirus using primers selected to match high homology regions of hantavirus genomes detected from the whole blood of hantavirus cardiopulmonary syndrome (HCPS) patients from Brazil, also including the N gene nucleotide sequence of Araraquara virus. Hantavirus genomes were detected in eight out of nine blood samples from the HCPS patients by RT-PCR (88.9% positivity) and in all 9 blood samples (100% positivity) by nested-PCR. The eight amplicons obtained by RT-PCR (P1, P3-P9), including one obtained by nested-PCR (P-2) and not obtained by RT-PCR, were sequenced and showed high homology (94.8% to 99.1%) with the N gene of Araraquara hantavirus. Although the serologic method ELISA is the most appropriate test for HCPS diagnosis, the use of nested RT-PCR for hantavirus in Brazil would contribute to the diagnosis of acute hantavirus disease detecting viral genomes in patient specimens as well as initial genomic characterization of circulating hantaviruses.
Resumo:
BACKGROUND: After age, sex is the most important risk factor for coronary artery disease (CAD). The mechanism through which women are protected from CAD is still largely unknown, but the observed sex difference suggests the involvement of the reproductive steroid hormone signaling system. Genetic association studies of the gene-encoding Estrogen Receptor α (ESR1) have shown conflicting results, although only a limited range of variation in the gene has been investigated. METHODS AND RESULTS: We exploited information made available by advanced new methods and resources in complex disease genetics to revisit the question of ESR1's role in risk of CAD. We performed a meta-analysis of 14 genome-wide association studies (CARDIoGRAM discovery analysis, N=≈87,000) to search for population-wide and sex-specific associations between CAD risk and common genetic variants throughout the coding, noncoding, and flanking regions of ESR1. In addition to samples from the MIGen (N=≈6000), WTCCC (N=≈7400), and Framingham (N=≈3700) studies, we extended this search to a larger number of common and uncommon variants by imputation into a panel of haplotypes constructed using data from the 1000 Genomes Project. Despite the widespread expression of ERα in vascular tissues, we found no evidence for involvement of common or low-frequency genetic variation throughout the ESR1 gene in modifying risk of CAD, either in the general population or as a function of sex. CONCLUSIONS: We suggest that future research on the genetic basis of sex-related differences in CAD risk should initially prioritize other genes in the reproductive steroid hormone biosynthesis system.