22 resultados para DNA Barcoding
em Helda - Digital Repository of University of Helsinki
Resumo:
Mutation and recombination are the fundamental processes leading to genetic variation in natural populations. This variation forms the raw material for evolution through natural selection and drift. Therefore, studying mutation rates may reveal information about evolutionary histories as well as phylogenetic interrelationships of organisms. In this thesis two molecular tools, DNA barcoding and the molecular clock were examined. In the first part, the efficiency of mutations to delineate closely related species was tested and the implications for conservation practices were assessed. The second part investigated the proposition that a constant mutation rate exists within invertebrates, in form of a metabolic-rate dependent molecular clock, which can be applied to accurately date speciation events. DNA barcoding aspires to be an efficient technique to not only distinguish between species but also reveal population-level variation solely relying on mutations found on a short stretch of a single gene. In this thesis barcoding was applied to discriminate between Hylochares populations from Russian Karelia and new Hylochares findings from the greater Helsinki region in Finland. Although barcoding failed to delineate the two reproductively isolated groups, their distinct morphological features and differing life-history traits led to their classification as two closely related, although separate species. The lack of genetic differentiation appears to be due to a recent divergence event not yet reflected in the beetles molecular make-up. Thus, the Russian Hylochares was described as a new species. The Finnish species, previously considered as locally extinct, was recognized as endangered. Even if, due to their identical genetic make-up, the populations had been regarded as conspecific, conservation strategies based on prior knowledge from Russia would not have guaranteed the survival of the Finnish beetle. Therefore, new conservation actions based on detailed studies of the biology and life-history of the Finnish Hylochares were conducted to protect this endemic rarity in Finland. The idea behind the strict molecular clock is that mutation rates are constant over evolutionary time and may thus be used to infer species divergence dates. However, one of the most recent theories argues that a strict clock does not tick per unit of time but that it has a constant substitution rate per unit of mass-specific metabolic energy. Therefore, according to this hypothesis, molecular clocks have to be recalibrated taking body size and temperature into account. This thesis tested the temperature effect on mutation rates in equally sized invertebrates. For the first dataset (family Eucnemidae, Coleoptera) the phylogenetic interrelationships and evolutionary history of the genus Arrhipis had to be inferred before the influence of temperature on substitution rates could be studied. Further, a second, larger invertebrate dataset (family Syrphidae, Diptera) was employed. Several methodological approaches, a number of genes and multiple molecular clock models revealed that there was no consistent relationship between temperature and mutation rate for the taxa under study. Thus, the body size effect, observed in vertebrates but controversial for invertebrates, rather than temperature may be the underlying driving force behind the metabolic-rate dependent molecular clock. Therefore, the metabolic-rate dependent molecular clock does not hold for the here studied invertebrate groups. This thesis emphasizes that molecular techniques relying on mutation rates have to be applied with caution. Whereas they may work satisfactorily under certain conditions for specific taxa, they may fail for others. The molecular clock as well as DNA barcoding should incorporate all the information and data available to obtain comprehensive estimations of the existing biodiversity and its evolutionary history.
Resumo:
Understanding the overwhelming diversity of life calls for complex organisational schemes. The field of systematics may thus be seen as the cornerstone of evolutionary biology. In the last few decades, systematics has been rejuvenated through the introduction of molecular methods such as DNA barcoding and multi-gene phylogenetic approaches. These methods may shed new light on established taxonomic ideas and problems. For example, the classification of ants has aroused much debate due to reinterpretation of morphological characters or contradictions between molecular data and morphology. Only in the last few years a consensus was reached regarding the phylogeny of ant subfamilies. However, the situation remains deplorable for lower taxonomic ranks such as subfamilies, tribes and genera. This thesis describes the systematics and evolution of the Holarctic ant genus Myrmica and the tribe to which it belongs, Myrmicini. Using barcoding, molecular-phylogenetic data and divergence time estimations, it addresses questions regarding the taxonomy, morphology and biogeography of this group. Furthermore, the interrelationships between socially parasitic Myrmica species and their hosts (other species in the genus) were inferred. The phylogeny suggests that social parasitism evolved several times in Myrmica. Finally, this thesis investigated whether coevolution shaped the phylogeny of socially parasitic Maculinea butterflies that live inside Myrmica colonies. No evidence was found for coevolution.
Resumo:
Herbivorous insects, their host plants and natural enemies form the largest and most species-rich communities on earth. But what forces structure such communities? Do they represent random collections of species, or are they assembled by given rules? To address these questions, food webs offer excellent tools. As a result of their versatile information content, such webs have become the focus of intensive research over the last few decades. In this thesis, I study herbivore-parasitoid food webs from a new perspective: I construct multiple, quantitative food webs in a spatially explicit setting, at two different scales. Focusing on food webs consisting of specialist herbivores and their natural enemies on the pedunculate oak, Quercus robur, I examine consistency in food web structure across space and time, and how landscape context affects this structure. As an important methodological development, I use DNA barcoding to resolve potential cryptic species in the food webs, and to examine their effect on food web structure. I find that DNA barcoding changes our perception of species identity for as many as a third of the individuals, by reducing misidentifications and by resolving several cryptic species. In terms of the variation detected in food web structure, I find surprising consistency in both space and time. From a spatial perspective, landscape context leaves no detectable imprint on food web structure, while species richness declines significantly with decreasing connectivity. From a temporal perspective, food web structure remains predictable from year to year, despite considerable species turnover in local communities. The rate of such turnover varies between guilds and species within guilds. The factors best explaining these observations are abundant and common species, which have a quantitatively dominant imprint on overall structure, and suffer the lowest turnover. By contrast, rare species with little impact on food web structure exhibit the highest turnover rates. These patterns reveal important limitations of modern metrics of quantitative food web structure. While they accurately describe the overall topology of the web and its most significant interactions, they are disproportionately affected by species with given traits, and insensitive to the specific identity of species. As rare species have been shown to be important for food web stability, metrics depicting quantitative food web structure should then not be used as the sole descriptors of communities in a changing world. To detect and resolve the versatile imprint of global environmental change, one should rather use these metrics as one tool among several.
Resumo:
Cassava brown streak disease (CBSD) was described for the first time in Tanganyika (now Tanzania) about seven decades ago. Tanganyika (now Tanzania) about seven decades ago. It was endemic in the lowland areas of East Africa and inland parts of Malawi and caused by Cassava brown streak virus (CBSV; genus Ipomovirus; Potyviridae). However, in 1990s CBSD was observed at high altitude areas in Uganda. The causes for spread to new locations were not known.The present work was thus initiated to generate information on genetic variability, clarify the taxonomy of the virus or viruses associated with CBSD in Eastern Africa as well as to understand the evolutionary forces acting on their genes. It also sought to develop a molecular based diagnostic tool for detection of CBSD-associated virus isolates. Comparison of the CP-encoding sequences of CBSD-associated virus isolates collected from Uganda and north-western Tanzania in 2007 and the partial sequences available in Genbank revealed occurrence of two genetically distinct groups of isolates. Two isolates were selected to represent the two groups. The complete genomes of isolates MLB3 (TZ:Mlb3:07) and Kor6 (TZ:Kor6:08) obtained from North-Western (Kagera) and North-Eastern (Tanga) Tanzania, respectively, were sequenced. The genomes were 9069 and 8995 nucleotides (nt), respectively. They translated into polyproteins that were predicted to yield ten mature proteins after cleavage. Nine proteins were typical in the family Potyviridae, namely P1, P3, 6K1, CI, 6K2, VPg, NIa-Pro, NIb and CP, but the viruses did not contain HC-Pro. Interestingly, genomes of both isolates contained a Maf/HAM1-like sequence (HAM1h; 678 nucleotides, 25 kDa) recombined between the NIb and CP domains in the 3’-proximal part of the genomes. HAM1h was also identified in Euphorbia ringspot virus (EuRSV) whose sequence was in GenBank. The HAM1 gene is widely spread in both prokaryotes and eukaryotes. In yeast (Saccharomyces cerevisiae) it is known to be a nucleoside triphosphate (NTP) pyrophosphatase. Novel information was obtained on the structural variation at the N-termini of polyproteins of viruses in the genus Ipomovirus. Cucumber vein yellowing virus (CVYV) and Squash vein yellowing virus (SqVYV) contain a duplicated P1 (P1a and P1b) but lack the HC-Pro. On the other hand, Sweet potato mild mottle virus (SPMMV), has a single but large P1 and has HC-Pro. Both virus isolates (TZ:Mlb3:07 & TZ:Kor6:08) characterized in this study contained a single P1 and lacked the HC-Pro which indicates unique evolution in the family Potyviridae. Comparison of 12 complete genomes of CBSD-associated viruses which included two genomes characterized in this study, revealed genetic identity of 69.0–70.3% (nt) and amino acid (aa) identities of 73.6–74.4% at polyprotein level. Comparison was also made among 68 complete CP sequences, which indicated 69.0-70.3 and 73.6-74.4 % identity at nt and aa levels, respectively. The genetic variation was large enough for dermacation of CBSD-associated virus isolates into two distinct species. The name CBSV was retained for isolates that were related to CBSV isolates available in database whereas the new virus described for the first time in this study was named Ugandan cassava brown streak virus (UCBSV) by the International Committee on Virus Taxonomy (ICTV). The isolates TZ:Mlb3:07 and TZ:Kor6:08 belong to UCBSV and CBSV, respectively. The isolates of CBSV and UCBSV were 79.3-95.5% and 86.3-99.3 % identitical at nt level, respectively, suggesting more variation amongst CBSV isolates. The main sources of variation in plant viruses are mutations and recombination. Signals for recombination events were detected in 50% of isolates of each virus. Recombination events were detected in coding and non-coding (3’-UTR) sequences except in the 5’UTR and P3. There was no evidence for recombination between isolates of CBSV and UCBSV. The non-synonomous (dN) to synonomous (dS) nucleotide substitution ratio (ω) for the HAM1h and CP domains of both viruses were ≤ 0.184 suggesting that most sites of these proteins were evolving under strong purifying selection. However, there were individual amino acid sites that were submitted to adaptive evolution. For instance, adaptive evolution was detected in the HAM1h of UCBSV (n=15) where 12 aa sites were under positive selection (P< 0.05) but not in CBSV (n=12). The CP of CBSV (n=23) contained 12 aa sites (p<0.01) while only 5 aa sites in the CP gene of UCBSV were predicted to be submitted to positive selection pressure (p<0.01). The advantages offered by the aa sites under positive selection could not be established but occurrence of such sites in the terminal ends of UCBSV-HAMIh, for example, was interpreted as a requirement for proteolysis during polyprotein processing. Two different primer pairs that simultaneously detect UCBSV and CBSV isolates were developed in this study. They were used successfully to study distribution of CBSV, UCBSV and their mixed infections in Tanzania and Uganda. It was established that the two viruses co-infect cassava and that incidences of co-infection could be as high as 50% around Lake Victoria on the Tanzanian side. Furthermore, it was revealed for the first time that both UCBSV and CBSV were widely distributed in Eastern Africa. The primer pair was also used to confirm infection in a close relative of cassava, Manihot glaziovii (Müller Arg.) with CBSV. DNA barcoding of M. glaziovii was done by sequencing the matK gene. Two out of seven M. glaziovii from the coastal areas of Korogwe and Kibaha in north eastern Tanzania were shown to be infected by CBSV but not UCBSV isolates. Detection in M. glaziovii has an implication in control and management of CBSD as it is likely to serve as virus reservoir. This study has contributed to the understanding of evolution of CBSV and UCBSV, which cause CBSD epidemic in Eastern Africa. The detection tools developed in this work will be useful in plant breeding, verification of the phytosanitary status of materials in regional and international movement of germplasm, and in all diagnostic activities related to management of CBSD. Whereas there are still many issues to be resolved such as the function and biological significance of HAM1h and its origin, this work has laid a foundation upon which the studies on these aspects can be based.
Resumo:
Species identification forms the basis for understanding the diversity of the living world, but it is also a prerequisite for understanding many evolutionary patterns and processes. The most promising approach for correctly delimiting and identifying species is to integrate many types of information in the same study. Our aim was to test how cuticular hydro- carbons, traditional morphometrics, genetic polymorphisms in nuclear markers (allozymes and DNA microsatellites) and DNA barcoding (partial mitochondrial COI gene) perform in delimiting species. As an example, we used two closely related Formica ants, F. fusca and F. lemani, sampled from a sympatric population in the northern part of their distribu- tion. Morphological characters vary and overlap in different parts of their distribution areas, but cuticular hydrocarbons include a strong taxonomic signal and our aim is to test the degree to which morphological and genetic data correspond to the chemical data. In the morphological analysis, species were best separated by the combined number of hairs on pro- notum and mesonotum, but individual workers overlapped in hair numbers, as previously noted by several authors. Nests of the two species were separated but not clustered according to species in a Principal Component Analysis made on nuclear genetic data. However, model-based Bayesian clustering resulted in perfect separation of the species and gave no indication of hybridization. Furthermore, F. lemani and F. fusca did not share any mitochondrial haplotypes, and the species were perfectly separated in a phylogenetic tree. We conclude that F. fusca and F. lemani are valid species that can be separated in our study area relatively well with all methods employed. However, the unusually small genetic differen- tiation in nuclear markers (FST = 0.12) shows that they are closely related, and occasional hybridization between F. fusca and F. lemani cannot be ruled out.
Resumo:
Prostate cancer is the most common noncutaneous malignancy and the second leading cause of cancer mortality in men. In 2004, 5237 new cases were diagnosed and altogether 25 664 men suffered from prostate cancer in Finland (Suomen Syöpärekisteri). Although extensively investigated, we still have a very rudimentary understanding of the molecular mechanisms leading to the frequent transformation of the prostate epithelium. Prostate cancer is characterized by several unique features including the multifocal origin of tumors and extreme resistance to chemotherapy, and new treatment options are therefore urgently needed. The integrity of genomic DNA is constantly challenged by genotoxic insults. Cellular responses to DNA damage involve elegant checkpoint cascades enforcing cell cycle arrest, thus facilitating damage repair, apoptosis or cellular senescence. Cellular DNA damage triggers the activation of tumor suppressor protein p53 and Wee1 kinase which act as executors of the cellular checkpoint responses. These are essential for genomic integrity, and are activated in early stages of tumorigenesis in order to function as barriers against tumor formation. Our work establishes that the primary human prostatic epithelial cells and prostatic epithelium have unexpectedly indulgent checkpoint surveillance. This is evidenced by the absence of inhibitory Tyr15 phosphorylation on Cdk2, lack of p53 response, radioresistant DNA synthesis, lack of G1/S and G2/M phase arrest, and presence of persistent gammaH2AX damage foci. We ascribe the absence of inhibitory Tyr15 phosphorylation to low levels of Wee1A, a tyrosine kinase and negative regulator of cell cycle progression. Ectopic Wee1A kinase restored Cdk2-Tyr15 phosphorylation and efficiently rescued the ionizing radiation-induced checkpoints in the human prostatic epithelial cells. As variability in the DNA damage responses has been shown to underlie susceptibility to cancer, our results imply that a suboptimal checkpoint arrest may greatly increase the accumulation of genetic lesions in the prostate epithelia. We also show that small molecules can restore p53 function in prostatic epithelial cells and may serve as a paradigm for the development of future therapeutic agents for the treatment of prostate cancer We hypothesize that the prostate has evolved to activate the damage surveillance pathways and molecules involved in these pathways only to certain stresses in extreme circumstances. In doing so, this organ inadvertently made itself vulnerable to genotoxic stress, which may have implications in malignant transformation. Recognition of the limited activity of p53 and Wee1 in the prostate could drive mechanism-based discovery of preventative and therapeutic agents.
Resumo:
Megasphaera cerevisiae, Pectinatus cerevisiiphilus, Pectinatus frisingensis, Selenomonas lacticifex, Zymophilus paucivorans and Zymophilus raffinosivorans are strictly anaerobic Gram-stain-negative bacteria that are able to spoil beer by producing off-flavours and turbidity. They have only been isolated from the beer production chain. The species are phylogenetically affiliated to the Sporomusa sub-branch in the class "Clostridia". Routine cultivation methods for detection of strictly anaerobic bacteria in breweries are time-consuming and do not allow species identification. The main aim of this study was to utilise DNA-based techniques in order to improve detection and identification of the Sporomusa sub-branch beer-spoilage bacteria and to increase understanding of their biodiversity, evolution and natural sources. Practical PCR-based assays were developed for monitoring of M. cerevisiae, Pectinatus species and the group of Sporomusa sub-branch beer spoilers throughout the beer production process. The developed assays reliably differentiated the target bacteria from other brewery-related microbes. The contaminant detection in process samples (10 1,000 cfu/ml) could be accomplished in 2 8 h. Low levels of viable cells in finished beer (≤10 cfu/100 ml) were usually detected after 1 3 d culture enrichment. Time saving compared to cultivation methods was up to 6 d. Based on a polyphasic approach, this study revealed the existence of three new anaerobic spoilage species in the beer production chain, i.e. Megasphaera paucivorans, Megasphaera sueciensis and Pectinatus haikarae. The description of these species enabled establishment of phenotypic and DNA-based methods for their detection and identification. The 16S rRNA gene based phylogenetic analysis of the Sporomusa sub-branch showed that the genus Selenomonas originates from several ancestors and will require reclassification. Moreover, Z. paucivorans and Z. raffinosivorans were found to be in fact members of the genus Propionispira. This relationship implies that they were carried to breweries along with plant material. The brewery-related Megasphaera species formed a distinct sub-group that did not include any sequences from other sources, suggesting that M. cerevisiae, M. paucivorans and M. sueciensis may be uniquely adapted to the brewery ecosystem. M. cerevisiae was also shown to exhibit remarkable resistance against many brewery-related stress conditions. This may partly explain why it is a brewery contaminant. This study showed that DNA-based techniques provide useful tools for obtaining more rapid and specific information about the presence and identity of the strictly anaerobic spoilage bacteria in the beer production chain than is possible using cultivation methods. This should ensure financial benefits to the industry and better product quality to customers. In addition, DNA-based analyses provided new insight into the biodiversity as well as natural sources and relations of the Sporomusa sub-branch bacteria. The data can be exploited for taxonomic classification of these bacteria and for surveillance and control of contaminations.
Resumo:
This thesis consists of two parts; in the first part we performed a single-molecule force extension measurement with 10kb long DNA-molecules from phage-λ to validate the calibration and single-molecule capability of our optical tweezers instrument. Fitting the worm-like chain interpolation formula to the data revealed that ca. 71% of the DNA tethers featured a contour length within ±15% of the expected value (3.38 µm). Only 25% of the found DNA had a persistence length between 30 and 60 nm. The correct value should be within 40 to 60 nm. In the second part we designed and built a precise temperature controller to remove thermal fluctuations that cause drifting of the optical trap. The controller uses feed-forward and PID (proportional-integral-derivative) feedback to achieve 1.58 mK precision and 0.3 K absolute accuracy. During a 5 min test run it reduced drifting of the trap from 1.4 nm/min in open-loop to 0.6 nm/min in closed-loop.
Resumo:
This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.
Resumo:
The object of this study is a tailless internal membrane-containing bacteriophage PRD1. It has a dsDNA genome with covalently bound terminal proteins required for replication. The uniqueness of the structure makes this phage a desirable object of research. PRD1 has been studied for some 30 years during which time a lot of information has accumulated on its structure and life-cycle. The two least characterised steps of the PRD1 life-cycle, the genome packaging and virus release are investigated here. PRD1 shares the main principles of virion assembly (DNA packaging in particular) and host cell lysis with other dsDNA bacteriophages. However, this phage has some fascinating individual peculiarities, such as DNA packaging into a membrane vesicle inside the capsid, absence of apparent portal protein, holin inhibitor and procapsid expansion. In the course of this study we have identified the components of the DNA packaging vertex of the capsid, and determined the function of protein P6 in packaging. We managed to purify the procapsids for an in vitro packaging system, optimise the reaction and significantly increase its efficiency. We developed a new method to determine DNA translocation and were able to quantify the efficiency and the rate of packaging. A model for PRD1 DNA packaging was also proposed. Another part of this study covers the lysis of the host cell. As other dsDNA bacteriophages PRD1 has been proposed to utilise a two-component lysis system. The existence of this lysis system in PRD1 has been proven by experiments using recombinant proteins and the multi-step nature of the lysis process has been established.
Resumo:
Extraintestinal pathogenic Escherichia coli (ExPEC) represent a diverse group of strains of E. coli, which infect extraintestinal sites, such as the urinary tract, the bloodstream, the meninges, the peritoneal cavity, and the lungs. Urinary tract infections (UTIs) caused by uropathogenic E. coli (UPEC), the major subgroup of ExPEC, are among the most prevalent microbial diseases world wide and a substantial burden for public health care systems. UTIs are responsible for serious morbidity and mortality in the elderly, in young children, and in immune-compromised and hospitalized patients. ExPEC strains are different, both from genetic and clinical perspectives, from commensal E. coli strains belonging to the normal intestinal flora and from intestinal pathogenic E. coli strains causing diarrhea. ExPEC strains are characterized by a broad range of alternate virulence factors, such as adhesins, toxins, and iron accumulation systems. Unlike diarrheagenic E. coli, whose distinctive virulence determinants evoke characteristic diarrheagenic symptoms and signs, ExPEC strains are exceedingly heterogeneous and are known to possess no specific virulence factors or a set of factors, which are obligatory for the infection of a certain extraintestinal site (e. g. the urinary tract). The ExPEC genomes are highly diverse mosaic structures in permanent flux. These strains have obtained a significant amount of DNA (predictably up to 25% of the genomes) through acquisition of foreign DNA from diverse related or non-related donor species by lateral transfer of mobile genetic elements, including pathogenicity islands (PAIs), plasmids, phages, transposons, and insertion elements. The ability of ExPEC strains to cause disease is mainly derived from this horizontally acquired gene pool; the extragenous DNA facilitates rapid adaptation of the pathogen to changing conditions and hence the extent of the spectrum of sites that can be infected. However, neither the amount of unique DNA in different ExPEC strains (or UPEC strains) nor the mechanisms lying behind the observed genomic mobility are known. Due to this extreme heterogeneity of the UPEC and ExPEC populations in general, the routine surveillance of ExPEC is exceedingly difficult. In this project, we presented a novel virulence gene algorithm (VGA) for the estimation of the extraintestinal virulence potential (VP, pathogenicity risk) of clinically relevant ExPECs and fecal E. coli isolates. The VGA was based on a DNA microarray specific for the ExPEC phenotype (ExPEC pathoarray). This array contained 77 DNA probes homologous with known (e.g. adhesion factors, iron accumulation systems, and toxins) and putative (e.g. genes predictably involved in adhesion, iron uptake, or in metabolic functions) ExPEC virulence determinants. In total, 25 of DNA probes homologous with known virulence factors and 36 of DNA probes representing putative extraintestinal virulence determinants were found at significantly higher frequency in virulent ExPEC isolates than in commensal E. coli strains. We showed that the ExPEC pathoarray and the VGA could be readily used for the differentiation of highly virulent ExPECs both from less virulent ExPEC clones and from commensal E. coli strains as well. Implementing the VGA in a group of unknown ExPECs (n=53) and fecal E. coli isolates (n=37), 83% of strains were correctly identified as extraintestinal virulent or commensal E. coli. Conversely, 15% of clinical ExPECs and 19% of fecal E. coli strains failed to raster into their respective pathogenic and non-pathogenic groups. Clinical data and virulence gene profiles of these strains warranted the estimated VPs; UPEC strains with atypically low risk-ratios were largely isolated from patients with certain medical history, including diabetes mellitus or catheterization, or from elderly patients. In addition, fecal E. coli strains with VPs characteristic for ExPEC were shown to represent the diagnostically important fraction of resident strains of the gut flora with a high potential of causing extraintestinal infections. Interestingly, a large fraction of DNA probes associated with the ExPEC phenotype corresponded to novel DNA sequences without any known function in UTIs and thus represented new genetic markers for the extraintestinal virulence. These DNA probes included unknown DNA sequences originating from the genomic subtractions of four clinical ExPEC isolates as well as from five novel cosmid sequences identified in the UPEC strains HE300 and JS299. The characterized cosmid sequences (pJS332, pJS448, pJS666, pJS700, and pJS706) revealed complex modular DNA structures with known and unknown DNA fragments arranged in a puzzle-like manner and integrated into the common E. coli genomic backbone. Furthermore, cosmid pJS332 of the UPEC strain HE300, which carried a chromosomal virulence gene cluster (iroBCDEN) encoding the salmochelin siderophore system, was shown to be part of a transmissible plasmid of Salmonella enterica. Taken together, the results of this project pointed towards the assumptions that first, (i) homologous recombination, even within coding genes, contributes to the observed mosaicism of ExPEC genomes and secondly, (ii) besides en block transfer of large DNA regions (e.g. chromosomal PAIs) also rearrangements of small DNA modules provide a means of genomic plasticity. The data presented in this project supplemented previous whole genome sequencing projects of E. coli and indicated that each E. coli genome displays a unique assemblage of individual mosaic structures, which enable these strains to successfully colonize and infect different anatomical sites.