24 resultados para Genomic data integration
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In recent years, the use of Reverse Engineering systems has got a considerable interest for a wide number of applications. Therefore, many research activities are focused on accuracy and precision of the acquired data and post processing phase improvements. In this context, this PhD Thesis deals with the definition of two novel methods for data post processing and data fusion between physical and geometrical information. In particular a technique has been defined for error definition in 3D points’ coordinates acquired by an optical triangulation laser scanner, with the aim to identify adequate correction arrays to apply under different acquisition parameters and operative conditions. Systematic error in data acquired is thus compensated, in order to increase accuracy value. Moreover, the definition of a 3D thermogram is examined. Object geometrical information and its thermal properties, coming from a thermographic inspection, are combined in order to have a temperature value for each recognizable point. Data acquired by an optical triangulation laser scanner are also used to normalize temperature values and make thermal data independent from thermal-camera point of view.
Resumo:
In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources.
Resumo:
In medicine, innovation depends on a better knowledge of the human body mechanism, which represents a complex system of multi-scale constituents. Unraveling the complexity underneath diseases proves to be challenging. A deep understanding of the inner workings comes with dealing with many heterogeneous information. Exploring the molecular status and the organization of genes, proteins, metabolites provides insights on what is driving a disease, from aggressiveness to curability. Molecular constituents, however, are only the building blocks of the human body and cannot currently tell the whole story of diseases. This is why nowadays attention is growing towards the contemporary exploitation of multi-scale information. Holistic methods are then drawing interest to address the problem of integrating heterogeneous data. The heterogeneity may derive from the diversity across data types and from the diversity within diseases. Here, four studies conducted data integration using customly designed workflows that implement novel methods and views to tackle the heterogeneous characterization of diseases. The first study devoted to determine shared gene regulatory signatures for onco-hematology and it showed partial co-regulation across blood-related diseases. The second study focused on Acute Myeloid Leukemia and refined the unsupervised integration of genomic alterations, which turned out to better resemble clinical practice. In the third study, network integration for artherosclerosis demonstrated, as a proof of concept, the impact of network intelligibility when it comes to model heterogeneous data, which showed to accelerate the identification of new potential pharmaceutical targets. Lastly, the fourth study introduced a new method to integrate multiple data types in a unique latent heterogeneous-representation that facilitated the selection of important data types to predict the tumour stage of invasive ductal carcinoma. The results of these four studies laid the groundwork to ease the detection of new biomarkers ultimately beneficial to medical practice and to the ever-growing field of Personalized Medicine.
Resumo:
Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.
Resumo:
In the post genomic era with the massive production of biological data the understanding of factors affecting protein stability is one of the most important and challenging tasks for highlighting the role of mutations in relation to human maladies. The problem is at the basis of what is referred to as molecular medicine with the underlying idea that pathologies can be detailed at a molecular level. To this purpose scientific efforts focus on characterising mutations that hamper protein functions and by these affect biological processes at the basis of cell physiology. New techniques have been developed with the aim of detailing single nucleotide polymorphisms (SNPs) at large in all the human chromosomes and by this information in specific databases are exponentially increasing. Eventually mutations that can be found at the DNA level, when occurring in transcribed regions may then lead to mutated proteins and this can be a serious medical problem, largely affecting the phenotype. Bioinformatics tools are urgently needed to cope with the flood of genomic data stored in database and in order to analyse the role of SNPs at the protein level. In principle several experimental and theoretical observations are suggesting that protein stability in the solvent-protein space is responsible of the correct protein functioning. Then mutations that are found disease related during DNA analysis are often assumed to perturb protein stability as well. However so far no extensive analysis at the proteome level has investigated whether this is the case. Also computationally methods have been developed to infer whether a mutation is disease related and independently whether it affects protein stability. Therefore whether the perturbation of protein stability is related to what it is routinely referred to as a disease is still a big question mark. In this work we have tried for the first time to explore the relation among mutations at the protein level and their relevance to diseases with a large-scale computational study of the data from different databases. To this aim in the first part of the thesis for each mutation type we have derived two probabilistic indices (for 141 out of 150 possible SNPs): the perturbing index (Pp), which indicates the probability that a given mutation effects protein stability considering all the “in vitro” thermodynamic data available and the disease index (Pd), which indicates the probability of a mutation to be disease related, given all the mutations that have been clinically associated so far. We find with a robust statistics that the two indexes correlate with the exception of all the mutations that are somatic cancer related. By this each mutation of the 150 can be coded by two values that allow a direct comparison with data base information. Furthermore we also implement computational methods that starting from the protein structure is suited to predict the effect of a mutation on protein stability and find that overpasses a set of other predictors performing the same task. The predictor is based on support vector machines and takes as input protein tertiary structures. We show that the predicted data well correlate with the data from the databases. All our efforts therefore add to the SNP annotation process and more importantly found the relationship among protein stability perturbation and the human variome leading to the diseasome.
Resumo:
Aim of the present study was to develop a statistical approach to define the best cut-off Copy number alterations (CNAs) calling from genomic data provided by high throughput experiments, able to predict a specific clinical end-point (early relapse, 18 months) in the context of Multiple Myeloma (MM). 743 newly diagnosed MM patients with SNPs array-derived genomic and clinical data were included in the study. CNAs were called both by a conventional (classic, CL) and an outcome-oriented (OO) method, and Progression Free Survival (PFS) hazard ratios of CNAs called by the two approaches were compared. The OO approach successfully identified patients at higher risk of relapse and the univariate survival analysis showed stronger prognostic effects for OO-defined high-risk alterations, as compared to that defined by CL approach, statistically significant for 12 CNAs. Overall, 155/743 patients relapsed within 18 months from the therapy start. A small number of OO-defined CNAs were significantly recurrent in early-relapsed patients (ER-CNAs) - amp1q, amp2p, del2p, del12p, del17p, del19p -. Two groups of patients were identified either carrying or not ≥1 ER-CNAs (249 vs. 494, respectively), the first one with significantly shorter PFS and overall survivals (OS) (PFS HR 2.15, p<0001; OS HR 2.37, p<0.0001). The risk of relapse defined by the presence of ≥1 ER-CNAs was independent from those conferred both by R-IIS 3 (HR=1.51; p=0.01) and by low quality (< stable disease) clinical response (HR=2.59 p=0.004). Notably, the type of induction therapy was not descriptive, suggesting that ER is strongly related to patients’ baseline genomic architecture. In conclusion, the OO- approach employed allowed to define CNAs-specific dynamic clonality cut-offs, improving the CNAs calls’ accuracy to identify MM patients with the highest probability to ER. As being outcome-dependent, the OO-approach is dynamic and might be adjusted according to the selected outcome variable of interest.
Resumo:
Pig meat and carcass quality is a complex concept determined by environmental and genetic factors concurring to the phenotypic variation in qualitative characteristics of meat (fat content, tenderness, juiciness, flavor,etc). This thesis shows the results of different investigations to study and to analyze pig meat and carcass quality focusing mainly on genomic; moreover proteomic approach has been also used. The aim was to analyze data from association studies between genes considered as candidate and meat and carcass quality in different pig breeds. The approach was used to detect new SNP in genes functionally associated to the studied traits and to confirm as candidate other genes already known. Five polymorphisms (one new SNP in Calponin 1 gene and four additional polymorphism already known in other genes) were considered on chromosome 2 (SSC2). Calponin 1 (CNN1) was associated to the studied traits and furthermore the results reported confirmed the data already known for Lactate dehydrogenase A (LDHA), Low density lipoprotein receptor (LDLR), Myogenic differentiation 1 (MYOD1) e Ubiquitin-like 5 (UBL5), in Italian Large White pigs. Using an in silico search it was possible to detect on SSC2 a new SNP of Deoxyhypusine synthase (DHPS) gene partially overlapping with WD repeat domain 83 (WDR83) gene and significant for the meat pH variation in Italian Large White (ILW) pigs. Perilipin 1 (PLIN1) mapping on chromosome 7 and Perilipin 2 (PLIN2) mapping on chromosome 1 were studied and the results obtained in Duroc breed have shown significant associations with carcass traits. Moreover a study of protein composition of porcine LD muscle, indicated an effect of temperature treatment of carcass, on proteins of the sarcoplasmic fraction and in particular on PGM1 phosphorylation. Future studies on pig meat quality should be based on the integration of different experimental approaches (genomics, proteomics, transcriptomics, etc).
Resumo:
Recently, a rising interest in political and economic integration/disintegration issues has been developed in the political economy field. This growing strand of literature partly draws on traditional issues of fiscal federalism and optimum public good provision and focuses on a trade-off between the benefits of centralization, arising from economies of scale or externalities, and the costs of harmonizing policies as a consequence of the increased heterogeneity of individual preferences in an international union or in a country composed of at least two regions. This thesis stems from this strand of literature and aims to shed some light on two highly relevant aspects of the political economy of European integration. The first concerns the role of public opinion in the integration process; more precisely, how economic benefits and costs of integration shape citizens' support for European Union (EU) membership. The second is the allocation of policy competences among different levels of government: European, national and regional. Chapter 1 introduces the topics developed in this thesis by reviewing the main recent theoretical developments in the political economy analysis of integration processes. It is structured as follows. First, it briefly surveys a few relevant articles on economic theories of integration and disintegration processes (Alesina and Spolaore 1997, Bolton and Roland 1997, Alesina et al. 2000, Casella and Feinstein 2002) and discusses their relevance for the study of the impact of economic benefits and costs on public opinion attitude towards the EU. Subsequently, it explores the links existing between such political economy literature and theories of fiscal federalism, especially with regard to normative considerations concerning the optimal allocation of competences in a union. Chapter 2 firstly proposes a model of citizens’ support for membership of international unions, with explicit reference to the EU; subsequently it tests the model on a panel of EU countries. What are the factors that influence public opinion support for the European Union (EU)? In international relations theory, the idea that citizens' support for the EU depends on material benefits deriving from integration, i.e. whether European integration makes individuals economically better off (utilitarian support), has been common since the 1970s, but has never been the subject of a formal treatment (Hix 2005). A small number of studies in the 1990s have investigated econometrically the link between national economic performance and mass support for European integration (Eichenberg and Dalton 1993; Anderson and Kalthenthaler 1996), but only making informal assumptions. The main aim of Chapter 2 is thus to propose and test our model with a view to providing a more complete and theoretically grounded picture of public support for the EU. Following theories of utilitarian support, we assume that citizens are in favour of membership if they receive economic benefits from it. To develop this idea, we propose a simple political economic model drawing on the recent economic literature on integration and disintegration processes. The basic element is the existence of a trade-off between the benefits of centralisation and the costs of harmonising policies in presence of heterogeneous preferences among countries. The approach we follow is that of the recent literature on the political economy of international unions and the unification or break-up of nations (Bolton and Roland 1997, Alesina and Wacziarg 1999, Alesina et al. 2001, 2005a, to mention only the relevant). The general perspective is that unification provides returns to scale in the provision of public goods, but reduces each member state’s ability to determine its most favoured bundle of public goods. In the simple model presented in Chapter 2, support for membership of the union is increasing in the union’s average income and in the loss of efficiency stemming from being outside the union, and decreasing in a country’s average income, while increasing heterogeneity of preferences among countries points to a reduced scope of the union. Afterwards we empirically test the model with data on the EU; more precisely, we perform an econometric analysis employing a panel of member countries over time. The second part of Chapter 2 thus tries to answer the following question: does public opinion support for the EU really depend on economic factors? The findings are broadly consistent with our theoretical expectations: the conditions of the national economy, differences in income among member states and heterogeneity of preferences shape citizens’ attitude towards their country’s membership of the EU. Consequently, this analysis offers some interesting policy implications for the present debate about ratification of the European Constitution and, more generally, about how the EU could act in order to gain more support from the European public. Citizens in many member states are called to express their opinion in national referenda, which may well end up in rejection of the Constitution, as recently happened in France and the Netherlands, triggering a European-wide political crisis. These events show that nowadays understanding public attitude towards the EU is not only of academic interest, but has a strong relevance for policy-making too. Chapter 3 empirically investigates the link between European integration and regional autonomy in Italy. Over the last few decades, the double tendency towards supranationalism and regional autonomy, which has characterised some European States, has taken a very interesting form in this country, because Italy, besides being one of the founding members of the EU, also implemented a process of decentralisation during the 1970s, further strengthened by a constitutional reform in 2001. Moreover, the issue of the allocation of competences among the EU, the Member States and the regions is now especially topical. The process leading to the drafting of European Constitution (even if then it has not come into force) has attracted much attention from a constitutional political economy perspective both on a normative and positive point of view (Breuss and Eller 2004, Mueller 2005). The Italian parliament has recently passed a new thorough constitutional reform, still to be approved by citizens in a referendum, which includes, among other things, the so called “devolution”, i.e. granting the regions exclusive competence in public health care, education and local police. Following and extending the methodology proposed in a recent influential article by Alesina et al. (2005b), which only concentrated on the EU activity (treaties, legislation, and European Court of Justice’s rulings), we develop a set of quantitative indicators measuring the intensity of the legislative activity of the Italian State, the EU and the Italian regions from 1973 to 2005 in a large number of policy categories. By doing so, we seek to answer the following broad questions. Are European and regional legislations substitutes for state laws? To what extent are the competences attributed by the European treaties or the Italian Constitution actually exerted in the various policy areas? Is their exertion consistent with the normative recommendations from the economic literature about their optimum allocation among different levels of government? The main results show that, first, there seems to be a certain substitutability between EU and national legislations (even if not a very strong one), but not between regional and national ones. Second, the EU concentrates its legislative activity mainly in international trade and agriculture, whilst social policy is where the regions and the State (which is also the main actor in foreign policy) are more active. Third, at least two levels of government (in some cases all of them) are significantly involved in the legislative activity in many sectors, even where the rationale for that is, at best, very questionable, indicating that they actually share a larger number of policy tasks than that suggested by the economic theory. It appears therefore that an excessive number of competences are actually shared among different levels of government. From an economic perspective, it may well be recommended that some competences be shared, but only when the balance between scale or spillover effects and heterogeneity of preferences suggests so. When, on the contrary, too many levels of government are involved in a certain policy area, the distinction between their different responsibilities easily becomes unnecessarily blurred. This may not only leads to a slower and inefficient policy-making process, but also risks to make it too complicate to understand for citizens, who, on the contrary, should be able to know who is really responsible for a certain policy when they vote in national,local or European elections or in referenda on national or European constitutional issues.
Resumo:
Nandrolone and other anabolic androgenic steroids (AAS) at elevated concentration can alter the expression and function of neurotransmitter systems and contribute to neuronal cell death. This effect can explain the behavioural changes, drug dependence and neuro degeneration observed in steroid abuser. Nandrolone treatment (10-8M–10-5M) caused a time- and concentration-dependent downregulation of mu opioid receptor (MOPr) transcripts in SH-SY5Y human neuroblastoma cells. This effect was prevented by the androgen receptor (AR) antagonist hydroxyflutamide. Receptor binding assays confirmed a decrease in MOPr of approximately 40% in nandrolonetreated cells. Treatment with actinomycin D (10-5M), a transcription inhibitor, revealed that nandrolone may regulate MOPr mRNA stability. In SH-SY5Y cells transfected with a human MOPr luciferase promoter/reporter construct, nandrolone did not alter the rate of gene transcription. These results suggest that nandrolone may regulate MOPr expression through post-transcriptional mechanisms requiring the AR. Cito-toxicity assays demonstrated a time- and concentration dependent decrease of cells viability in SH-SY5Y cells exposed to steroids (10-6M–10-4M). This toxic effects is independent of activation of AR and sigma-2 receptor. An increased of caspase-3 activity was observed in cells treated with Nandrolone 10-6M for 48h. Collectively, these data support the existence of two cellular mechanisms that might explain the neurological syndromes observed in steroids abuser.
Resumo:
This study provides a comprehensive genetic overview on the endangered Italian wolf population. In particular, it focuses on two research lines. On one hand, we focalised on melanism in wolf in order to isolate a mutation related with black coat colour in canids. With several reported black individuals (an exception at European level), the Italian wolf population constituted a challenging research field posing many unanswered questions. As found in North American wolf, we reported that melanism in the Italian population is caused by a different melanocortin pathway component, the K locus, in which a beta-defensin protein acts as an alternative ligand for the Mc1r. This research project was conducted in collaboration with Prof. Gregory Barsh, Department of Genetics and Paediatrics, Stanford University. On the other hand, we performed analysis on a high number of SNPs thanks to a customized Canine microarray useful to integrate or substitute the STR markers for genotyping individuals and detecting wolf-dog hybrids. Thanks to DNA microchip technology, we obtained an impressive amount of genetic data which provides a solid base for future functional genomic studies. This study was undertaken in collaboration with Prof. Robert K. Wayne, Department of Ecology and Evolutionary Biology, University of California, Los Angeles (UCLA).
Resumo:
We present a non linear technique to invert strong motion records with the aim of obtaining the final slip and rupture velocity distributions on the fault plane. In this thesis, the ground motion simulation is obtained evaluating the representation integral in the frequency. The Green’s tractions are computed using the discrete wave-number integration technique that provides the full wave-field in a 1D layered propagation medium. The representation integral is computed through a finite elements technique, based on a Delaunay’s triangulation on the fault plane. The rupture velocity is defined on a coarser regular grid and rupture times are computed by integration of the eikonal equation. For the inversion, the slip distribution is parameterized by 2D overlapping Gaussian functions, which can easily relate the spectrum of the possible solutions with the minimum resolvable wavelength, related to source-station distribution and data processing. The inverse problem is solved by a two-step procedure aimed at separating the computation of the rupture velocity from the evaluation of the slip distribution, the latter being a linear problem, when the rupture velocity is fixed. The non-linear step is solved by optimization of an L2 misfit function between synthetic and real seismograms, and solution is searched by the use of the Neighbourhood Algorithm. The conjugate gradient method is used to solve the linear step instead. The developed methodology has been applied to the M7.2, Iwate Nairiku Miyagi, Japan, earthquake. The estimated magnitude seismic moment is 2.6326 dyne∙cm that corresponds to a moment magnitude MW 6.9 while the mean the rupture velocity is 2.0 km/s. A large slip patch extends from the hypocenter to the southern shallow part of the fault plane. A second relatively large slip patch is found in the northern shallow part. Finally, we gave a quantitative estimation of errors associates with the parameters.
Resumo:
Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).
Resumo:
MFA and LCA methodologies were applied to analyse the anthropogenic aluminium cycle in Italy with focus on historical evolution of stocks and flows of the metal, embodied GHG emissions, and potentials from recycling to provide key features to Italy for prioritizing industrial policy toward low-carbon technologies and materials. Historical trend series were collected from 1947 to 2009 and balanced with data from production, manufacturing and waste management of aluminium-containing products, using a ‘top-down’ approach to quantify the contemporary in-use stock of the metal, and helping to identify ‘applications where aluminium is not yet being recycled to its full potential and to identify present and future recycling flows’. The MFA results were used as a basis for the LCA aimed at evaluating the carbon footprint evolution, from primary and electrical energy, the smelting process and the transportation, embodied in the Italian aluminium. A discussion about how the main factors, according to the Kaya Identity equation, they did influence the Italian GHG emissions pattern over time, and which are the levers to mitigate it, it has been also reported. The contemporary anthropogenic reservoirs of aluminium was estimated at about 320 kg per capita, mainly embedded within the transportation and building and construction sectors. Cumulative in-use stock represents approximately 11 years of supply at current usage rates (about 20 Mt versus 1.7 Mt/year), and it would imply a potential of about 160 Mt of CO2eq emissions savings. A discussion of criticality related to aluminium waste recovery from the transportation and the containers and packaging sectors was also included in the study, providing an example for how MFA and LCA may support decision-making at sectorial or regional level. The research constitutes the first attempt of an integrated approach between MFA and LCA applied to the aluminium cycle in Italy.
Resumo:
Enterobacteriaceae genomes evolve through mutations, rearrangements and horizontal gene transfer (HGT). The latter evolutionary pathway works through the acquisition DNA (GEI) modules of foreign origin that enhances fitness of the host to a given environment. The genome of E. coli IHE3034, a strain isolated from a case of neonatal meningitis, has recently been sequenced and its subsequent sequence analysis has predicted 18 possible GEIs, of which: 8 have not been previously described, 5 fully meet the pathogenic island definition and at least 10 that seem to be of prophagic origin. In order to study the GEI distribution of our reference strain, we screened for the presence 18 GEIs a panel of 132 strains, representative of E. coli diversity. Also, using an inverse nested PCR approach we identified 9 GEI that can form an extrachromosomal circular intermediate (CI) and their respective attachment sites (att). Further, we set up a qPCR approach that allowed us to determine the excision rates of 5 genomic islands in different growth conditions. Four islands, specific for strains appertaining to the sequence type complex 95 (STC95), have been deleted in order to assess their function in a Dictyostelium discoideum grazing assays. Overall, the distribution data presented here indicate that 16 IHE3034 GEIs are more associated to the STC95 strains. Also the functional and genetic characterization has uncovered that GEI 13, 17 and 19 are involved in the resistance to phagocitation by Dictyostelium d thus suggesting a possible role in the adaptation of the pathogen during certain stages of infection.