957 resultados para WHOLE-GENOME AMPLIFICATION
Resumo:
Background: The ratio of the rates of non-synonymous and synonymous substitution (d(N)/d(S)) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, d(N)/d(S) should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As N-e is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and d(N)/d(S) is consistently observed is conflicting. Results: Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to d(N)/d(S), the ratio of radical to conservative amino acid substitutions (K-r/K-c) correlates positively with body mass. Conclusions: Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of d(N)/d(S) and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric.
Resumo:
During the genomic era, a large amount of whole-genome sequences accumulated, which identified many hypothetical proteins of unknown function. Rapidly, functional genomics, which is the research domain that assign a function to a given gene product, has thus been developed. Functional genomics of intracellular pathogenic bacteria exhibit specific peculiarities due to the fastidious growth of most of these intracellular micro-organisms, due to the close interaction with the host cell, due to the risk of contamination of experiments with host cell proteins and, for some strict intracellular bacteria such as Chlamydia, due to the absence of simple genetic system to manipulate the bacterial genome. To identify virulence factors of intracellular pathogenic bacteria, functional genomics often rely on bioinformatic analyses compared with model organisms such as Escherichia coli and Bacillus subtilis. The use of heterologous expression is another common approach. Given the intracellular lifestyle and the many effectors that are used by the intracellular bacteria to corrupt host cell functions, functional genomics is also often targeting the identification of new effectors such as those of the T4SS of Brucella and Legionella.
Resumo:
Résumé -Caractéristiques architecturales des génomes bactériens et leurs applications Les bactéries possèdent généralement un seul chromosome circulaire. A chaque génération, ce chromosome est répliqué bidirectionnellement, par deux complexes enzymatiques de réplication se déplaçant en sens opposé depuis l'origine de réplication jusqu'au terminus, situé à l'opposé. Ce mode de réplication régit l'architecture du chromosome -l'orientation des gènes par rapport à la réplication, notamment - et est en grande partie à l'origine des pressions qui provoquent la variation de la composition en nucléotides du génome, hors des contraintes liées à la structure et à la fonction des protéines codées sur le chromosome. Le but de cette thèse est de contribuer à quantifier les effets de la réplication sur l'architecture chromosomique, en s'intéressant notamment aux gènes des ARN ribosomiques, cruciaux pour la bactérie. D'un autre côté, cette architecture est spécifique à l'espèce et donne ainsi une «identité génomique » aux gènes. Il est démontré ici qu'il est possible d'utiliser des marqueurs «naïfs » de cette identité pour détecter, notamment dans le génome du staphylocoque doré, des îlots de pathogénicité, qui concentrent un grand nombre de facteurs de virulence de la bactérie. Ces îlots de pathogénicité sont mobiles, et peuvent passer d'une bactérie à une autre, mais conservent durant un certain temps l'identité génomique de leur hôte précédent, ce qui permet de les reconnaître dans leur nouvel hôte. Ces méthodes simples, rapides et fiables seront de la plus haute importance lorsque le séquençage des génomes entiers sera rapide et disponible à très faible coût. Il sera alors possible d'analyser instantanément les déterminants pathogéniques et de résistance aux antibiotiques des agents pathogènes. Summary The bacterial genome is a highly organized structure, which may be referred to as the genome architecture, and is mainly directed by DNA replication. This thesis provides significant insights in the comprehension of the forces that shape bacterial chromosomes, different in each genome and contributing to confer them an identity. First, it shows the importance of the replication in directing the orientation of prokaryotic ribosomal RNAs, and how it shapes their nucleotide composition in a tax on-specific manner. Second, it highlights the pressure acting on the orientation of the genes in general, a majority of which are transcribed in the same direction as replication. Consequently, apparent infra-arm genome rearrangements, involving an exchange of the leading/lagging strands and shown to reduce growth rate, are very likely artifacts due to an incorrect contig assembly. Third, it shows that this genomic identity can be used to detect foreign parts in genomes, by establishing this identity for a given host and identifying the regions that deviate from it. This property is notably illustrated with Staphylococcus aureus: known pathogenicity islands and phages, and putative ancient pathogenicity islands concentrating many known pathogenicity-related genes are highlighted; the analysis also detects, incidentally, proteins responsible for the adhesion of S. aureus to the hosts' cells. In conclusion, the study of nucleotide composition of bacterial genomes provides the opportunity to better understand the genome-level pressures that shape DNA sequences, and to identify genes and regions potentially related to pathogenicity with fast, simple and reliable methods. This will be of crucial importance when whole-genome sequencing will be a rapid, inexpensive and routine tool.
Resumo:
Reproductive and worker division of labour (DOL) is a hallmark of social insect societies. Despite a long-standing interest in worker DOL, the molecular mechanisms regulating this process have only been investigated in detail in honey bees, and little is known about the regulatory mechanisms operating in other social insects. In the fire ant Solenopsis invicta, one of the most studied ant species, workers are permanently sterile and the tasks performed are modulated by the worker's internal state (age and size) and the outside environment (social environment), which potentially includes the effect of the queen presence through chemical communication via pheromones. However, the molecular mechanisms underpinning these processes are unknown. Using a whole-genome microarray platform, we characterized the molecular basis for worker DOL and we explored how a drastic change in the social environment (i.e. the sudden loss of the queen) affects global gene expression patterns of worker ants. We identified numerous genes differentially expressed between foraging and nonforaging workers in queenright colonies. With a few exceptions, these genes appear to be distinct from those involved in DOL in bees and wasps. Interestingly, after the queen was removed, foraging workers were no longer distinct from nonforaging workers at the transcriptomic level. Furthermore, few expression differences were detected between queenright and queenless workers when we did not consider the task performed. Thus, the social condition of the colony (queenless vs. queenright) appears to impact the molecular pathways underlying worker task performance, providing strong evidence for social regulation of DOL in S. invicta.
Resumo:
ABSTRACT: BACKGROUND: Plants are sessile and therefore have to perceive and adjust to changes in their environment. The presence of neighbours leads to a competitive situation where resources and space will be limited. Complex adaptive responses to such situation are poorly understood at the molecular level. RESULTS: Using microarrays, we analysed whole-genome expression changes in Arabidopsis thaliana plants subjected to intraspecific competition. The leaf and root transcriptome was strongly altered by competition. Differentially expressed genes were enriched in genes involved in nutrient deficiency (mainly N, P, K), perception of light quality, and responses to abiotic and biotic stresses. Interestingly, performance of the generalist insect Spodoptera littoralis on densely grown plants was significantly reduced, suggesting that plants under competition display enhanced resistance to herbivory. CONCLUSIONS: This study provides a comprehensive list of genes whose expression is affected by intraspecific competition in Arabidopsis. The outcome is a unique response that involves genes related to light, nutrient deficiency, abiotic stress, and defence responses.
Resumo:
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels.
Resumo:
1 Abstract Sleep is a vital necessity, yet its basic physiological function is still unknown, despite numerous studies both in healthy humans and animal models. The study of patients with sleep disorders may help uncover major biological pathways in sleep regulation and thus shed light on the actual function of sleep. Narcolepsy is a well defined but rare sleep disorder characterized by excessive daytime sleepiness and cataplexy, thought to be caused by a combination of genetic and environmental factors. The aim of this work was to identify genes or genetic variants, which contribute to the pathogenesis of sporadic and familial narcolepsy. Sporadic narcolepsy is the disorder with the strongest human leukocyte antigen (HLA) association ever reported. Since the associated HLA-DRB1 *1501-DQB1 *0602 haplotype is common in the general population (15-25%), it has been suggested that it is necessary but not sufficient for developing narcolepsy. To further define the genetic basis of narcolepsy risk, we performed a genome-wide association study (GWAS) in 562 European individuals with narcolepsy (cases) and 702 ethnically matched controls, with independent replication in 370 cases and 495 controls, all heterozygous for DRB1*1501-DQB1*0602. We found association with a protective variant near HLA-DQA2. Further analysis revealed that the identified SNP is strongly linked to DRB1*03-DQB1*02 and DRBΠ 301-DQB1*0603. Cases almost never carried a trans DRB1*1301-DQB1*0603 haplotype. This unexpected protective HLA haplotype suggests a causal involvement of the HLA region in narcolepsy susceptibility. Familial cases of narcolepsy account for 10% of all narcolepsy cases. However, due to low number of affected family members, narcolepsy families are usually not eligible for genetic linkage studies. We identified and characterized a large Spanish family with 11 affected family members representing the largest ever reported narcolepsy family. We ran a genetic linkage analysis using DNA of 11 affected and 15 unaffected family members and hereby identified a chromosomal candidate region on chromosome 6 encompassing 163 kb with a maximum multipoint LOD score of 5.02. The coding sequences of 4 genes within this haplotype block as well as 2 neighboring genes were screened for pathogenetic mutations in 2 affected and 1 healthy family members. So far no pathogenic mutation could be identified. Further in-depth sequencing of our candidate region as well as whole genome exome sequencing are underway to identify the pathogenic mutation(s) in this family and will further improve our understanding of the genetic basis of narcolepsy. 2 Résumé Le sommeil est un processus vital, dont la fonction physiologique est encore inconnue, malgré de nombreuses études chez des sujets humains sains ainsi que dans des modèles animaux. L'étude de patients souffrant de troubles du sommeil peut permettre la découverte de voies biologiques jouant un rôle majeur dans la régulation du sommeil. L'un de ces troubles, la narcolepsie, est une maladie rare mais néanmoins bien définie, caractérisée par une somnolence diurne excessive accompagnée de cataplexies. Les connaissances actuelles suggèrent qu'une combinaison de facteurs génétiques et environnementaux en est à l'origine. Le but du présent travail était d'identifier !e(s) gène(s) ou les polymorphismes constituant des facteurs de risque dans les formes sporadique et familiale de narcolepsie. La narcolepsie sporadique est la maladie possédant la plus forte association avec le complexe majeur d'histocompatibilité humain (HLA) jamais reportée. La fréquence au sein de la population générale de l'haplotype associé HLA-DRB1*1501- DQB1*0602 (15-25%) suggère que ce dernier est nécessaire, mais pas suffisant, pour (e développement de la maladie. Nous avons voulu approfondir la recherche de facteurs génétiques augmentant le risque de la narcolepsie. A cette fin, nous avons entrepris une étude d'association à l'échelle du génome (genome-wide association study, GWAS) parmi 562 sujets narcoleptiques européens (cas) et 702 individus contrôle de même origine ethnique et nous avons trouvé une association avec un variant protecteur près du gène HLA- DQA2. Ce résultat a été répliqué indépendamment dans 370 cas et 495 contrôles, tous hétérozygotes au locus DRB1*1501-DQB1*0602. Une analyse plus fine montre que le polymorphisme identifié est fortement lié aux allèles DRB1*03-DQB1*02 et DRB1*1301-DQB1*0603. Nous notons que seul un cas était porteur d'un haplotype en trans DRB1*1301-DQBr0603. La découverte de cet allele HLA protecteur suggère que la région HLA joue un rôle causal dans la susceptibilité à la narcolepsie. Dix pourcents des cas de narcolepsie sont familiaux. Cependant, le faible nombre de membres affectés rend ces familles inéligibles pour des études de liaison génétique. Nous avons identifié et caractérisé une grande famille espagnole, dont 11 membres sont atteints par la maladie, ce qui représente la plus grande famille narcoleptique rapportée jusqu'à ce jour. A partir de l'ADN de 11 membres atteints et 15 non- atteints, nous avons identifié par étude de liaison une région candidate de 163 kîlobases (kb) sur le chromosome 6, correspondant à un LOD score multipoints de 5.02. Nous avons cherché, sans succès, des mutations pathogéniques dans la séquence codante de deux gènes situés à l'intérieur de ce segment, ainsi que 4 gènes adjacents. Un séquençage plus approfondi de la région ainsi que le séquençage des exons de tout le génome est en cours et doit s'avérer plus fructueux et révéler la ou tes mutation(s) pathogénique(s) dans cette famille, ce qui contribuerait à une meilleure compréhension des causes génétiques de la narcolepsie. 3 Résumé pour un large public Le sommeil est une nécessité vitale, dont le rôle physiologique exact reste inconnu malgré de nombreuses études sur des sujets humains sains ainsi que sur des modèles animaux. C'est pourquoi les troubles du sommeil intéressent les chercheurs, car l'élucidation des mécanismes responsables peut permettre de mieux comprendre le fonctionnement du sommeil normal. La narcolepsie est une maladie du sommeil caractérisée par une somnolence diurne excessive. Les personnes atteintes peuvent s'endormir involontairement à tout moment de la journée, et souffrent également de pertes du tonus musculaire (cataplexie) lors de fortes émotions, par exemple un fou rire. La narcolepsie est une maladie rare, apparaissant dans 1 personne sur 2000. Les connaissances actuelles suggèrent qu'une combinaison de facteurs génétiques et environnementaux en est à l'origine. Nous avons voulu identifier les facteurs génétiques influençant le déclenchement de la maladie, d'abord dans sa forme sporadique, puis dans une famille comptant de nombreux membres atteints. En comparant les variations génétiques de près de 1000 sujets narcoleptiques européens avec ceux de 1200 individus sains, nous avons trouvé chez 30% de ces derniers un variant protecteur, qui diminue de 50 fois le risque de développer la maladie, ce qui constitue le plus puissant facteur génétique protecteur décrit à ce jour. Nous avons ensuite étudié une grande famille espagnole comptant une trentaine de membres, dont 11 sont atteints de narcolepsie. De nouveau, nous avons comparé les variations génétiques des membres atteints avec ceux des membres sains. Nous avons ainsi pu identifier une région dans le génome où se trouverait le(s) gène(s) impliqué(s) dans la maladie dans cette famille, mais n'avons pas encore trouvé le(s) variant(s) exact(s). Une étude plus approfondie devrait permettre de P(les) identifier et ainsi contribuer à l'élucidation des mécanismes menant au développement de la narcolepsie.
Resumo:
Reference collections of multiple Drosophila lines with accumulating collections of "omics" data have proven especially valuable for the study of population genetics and complex trait genetics. Here we present a description of a resource collection of 84 strains of Drosophila melanogaster whose genome sequences were obtained after 12 generations of full-sib inbreeding. The initial rationale for this resource was to foster development of a systems biology platform for modeling metabolic regulation by the use of natural polymorphisms as perturbations. As reference lines, they are amenable to repeated phenotypic measurements, and already a large collection of metabolic traits have been assayed. Another key feature of these strains is their widespread geographic origin, coming from Beijing, Ithaca, Netherlands, Tasmania, and Zimbabwe. After obtaining 12.5× coverage of paired-end Illumina sequence reads, SNP and indel calls were made with the GATK platform. Thorough quality control was enabled by deep sequencing one line to >100×, and single-nucleotide polymorphisms and indels were validated using ddRAD-sequencing as an orthogonal platform. In addition, a series of preliminary population genetic tests were performed with these single-nucleotide polymorphism data for assessment of data quality. We found 83 segregating inversions among the lines, and as expected these were especially abundant in the African sample. We anticipate that this will make a useful addition to the set of reference D. melanogaster strains, thanks to its geographic structuring and unusually high level of genetic diversity.
Resumo:
There is a widespread agreement from patient and professional organisations alike that the safety of stem cell therapeutics is of paramount importance, particularly for ex vivo autologous gene therapy. Yet current technology makes it difficult to thoroughly evaluate the behaviour of genetically corrected stem cells before they are transplanted. To address this, we have developed a strategy that permits transplantation of a clonal population of genetically corrected autologous stem cells that meet stringent selection criteria and the principle of precaution. As a proof of concept, we have stably transduced epidermal stem cells (holoclones) obtained from a patient suffering from recessive dystrophic epidermolysis bullosa. Holoclones were infected with self-inactivating retroviruses bearing a COL7A1 cDNA and cloned before the progeny of individual stem cells were characterised using a number of criteria. Clonal analysis revealed a great deal of heterogeneity among transduced stem cells in their capacity to produce functional type VII collagen (COLVII). Selected transduced stem cells transplanted onto immunodeficient mice regenerated a non-blistering epidermis for months and produced a functional COLVII. Safety was assessed by determining the sites of proviral integration, rearrangements and hit genes and by whole-genome sequencing. The progeny of the selected stem cells also had a diploid karyotype, was not tumorigenic and did not disseminate after long-term transplantation onto immunodeficient mice. In conclusion, a clonal strategy is a powerful and efficient means of by-passing the heterogeneity of a transduced stem cell population. It guarantees a safe and homogenous medicinal product, fulfilling the principle of precaution and the requirements of regulatory affairs. Furthermore, a clonal strategy makes it possible to envision exciting gene-editing technologies like zinc finger nucleases, TALENs and homologous recombination for next-generation gene therapy.
Resumo:
Background: The main goal of the present study was to analyse the genetic architecture of mRNA expression in muscle, a tissue with an outmost economic importance for pig breeders. Previous studies have used F2 crosses to detect porcine expression QTL (eQTL), so they contributed with data that mostly represents the between-breed component of eQTL variation. Herewith, we have analysed eQTL segregation in an outbred Duroc population using two groups of animals with divergent fatness profiles. This approach is particularly suitable to analyse the within-breed component of eQTL variation, with a special emphasis on loci involved in lipid metabolism. Methodology/Principal Findings: GeneChip Porcine Genome arrays (Affymetrix) were used to determine the mRNA expression levels of gluteus medius samples from 105 Duroc barrows. A whole-genome eQTL scan was carried out with a panel of 116 microsatellites. Results allowed us to detect 613 genome-wide significant eQTL unevenly distributed across the pig genome. A clear predominance of trans- over cis-eQTL, was observed. Moreover, 11 trans-regulatory hotspots affecting the expression levels of four to 16 genes were identified. A Gene Ontology study showed that regulatory polymorphisms affected the expression of muscle development and lipid metabolism genes. A number of positional concordances between eQTL and lipid trait QTL were also found, whereas limited evidence of a linear relationship between muscle fat deposition and mRNA levels of eQTL regulated genes was obtained. Conclusions/Significance: Our data provide substantial evidence that there is a remarkable amount of within-breed genetic variation affecting muscle mRNA expression. Most of this variation acts in trans and influences biological processes related with muscle development, lipid deposition and energy balance. The identification of the underlying causal mutations and the ascertainment of their effects on phenotypes would allow gaining a fundamental perspective about how complex traits are built at the molecular level.
Resumo:
Phage therapy has been proven to be more effective, in some cases, than conventional antibiotics, especially regarding multidrug-resistant biofilm infections. The objective here was to isolate an anti-Enterococcus faecalis bacteriophage and to evaluate its efficacy against planktonic and biofilm cultures. E. faecalis is an important pathogen found in many infections, including endocarditis and persistent infections associated with root canal treatment failure. The difficulty in E. faecalis treatment has been attributed to the lack of anti-infective strategies to eradicate its biofilm and to the frequent emergence of multidrug-resistant strains. To this end, an anti-E. faecalis and E. faecium phage, termed EFDG1, was isolated from sewage effluents. The phage was visualized by electron microscopy. EFDG1 coding sequences and phylogeny were determined by whole genome sequencing (GenBank accession number KP339049), revealing it belongs to the Spounavirinae subfamily of the Myoviridae phages, which includes promising candidates for therapy against Gram-positive pathogens. This analysis also showed that the EFDG1 genome does not contain apparent harmful genes. EFDG1 antibacterial efficacy was evaluated in vitro against planktonic and biofilm cultures, showing effective lytic activity against various E. faecalis and E. faecium isolates, regardless of their antibiotic resistance profile. In addition, EFDG1 efficiently prevented ex vivo E. faecalis root canal infection. These findings suggest that phage therapy using EFDG1 might be efficacious to prevent E. faecalis infection after root canal treatment.
Resumo:
VariScan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. Among other features, the software:(1) can conduct many population genetic analyses; (2) incorporates a multiresolution wavelet transform-based method that allows capturing relevant information from DNA polymorphism data; and (3) it facilitates the visualization of the results in the most commonly used genome browsers.
Resumo:
OBJECTIVES: Leri's pleonosteosis (LP) is an autosomal dominant rheumatic condition characterised by flexion contractures of the interphalangeal joints, limited motion of multiple joints, and short broad metacarpals, metatarsals and phalanges. Scleroderma-like skin thickening can be seen in some individuals with LP. We undertook a study to characterise the phenotype of LP and identify its genetic basis. METHODS AND RESULTS: Whole-genome single-nucleotide polymorphism genotyping in two families with LP defined microduplications of chromosome 8q22.1 as the cause of this condition. Expression analysis of dermal fibroblasts from affected individuals showed overexpression of two genes, GDF6 and SDC2, within the duplicated region, leading to dysregulation of genes that encode proteins of the extracellular matrix and downstream players in the transforming growth factor (TGF)-β pathway. Western blot analysis revealed markedly decreased inhibitory SMAD6 levels in patients with LP. Furthermore, in a cohort of 330 systemic sclerosis cases, we show that the minor allele of a missense SDC2 variant, p.Ser71Thr, could confer protection against disease (p<1×10(-5)). CONCLUSIONS: Our work identifies the genetic cause of LP in these two families, demonstrates the phenotypic range of the condition, implicates dysregulation of extracellular matrix homoeostasis genes in its pathogenesis, and highlights the link between TGF-β/SMAD signalling, growth/differentiation factor 6 and syndecan-2. We propose that LP is an additional member of the growing 'TGF-β-pathies' group of musculoskeletal disorders, which includes Myhre syndrome, acromicric dysplasia, geleophysic dysplasias, Weill-Marchesani syndromes and stiff skin syndrome. Identification of a systemic sclerosis-protective SDC2 variant lays the foundation for exploration of the role of syndecan-2 in systemic sclerosis in the future.
Resumo:
Reliable molecular typing methods are necessary to investigate the epidemiology of bacterial pathogens. Reference methods such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE) are costly and time consuming. Here, we compared our newly developed double-locus sequence typing (DLST) method for Pseudomonas aeruginosa to MLST and PFGE on a collection of 281 isolates. DLST was as discriminatory as MLST and was able to recognize "high-risk" epidemic clones. Both methods were highly congruent. Not surprisingly, a higher discriminatory power was observed with PFGE. In conclusion, being a simple method (single-strand sequencing of only 2 loci), DLST is valuable as a first-line typing tool for epidemiological investigations of P. aeruginosa. Coupled to a more discriminant method like PFGE or whole genome sequencing, it might represent an efficient typing strategy to investigate or prevent outbreaks.
Resumo:
Helicobacter pylori is an important human pathogen associated with serious gastric diseases. Owing to its medical importance and close relationship with its human host, understanding genomic patterns of global and local adaptation in H. pylori may be of particular significance for both clinical and evolutionary studies. Here we present the first such whole genome analysis of 60 globally distributed strains, from which we inferred worldwide population structure and demographic history and shed light on interesting global and local events of positive selection, with particular emphasis on the evolution of San-associated lineages. Our results indicate a more ancient origin for the association of humans and H. pylori than previously thought. We identify several important perspectives for future clinical research on candidate selected regions that include both previously characterized genes (e.g., transcription elongation factor NusA and tumor necrosis factor alpha-inducing protein Tipα) and hitherto unknown functional genes.