55 resultados para Y-specific sequence


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intelligent agents are an advanced technology utilized in Web Intelligence. When searching information from a distributed Web environment, information is retrieved by multi-agents on the client site and fused on the broker site. The current information fusion techniques rely on cooperation of agents to provide statistics. Such techniques are computationally expensive and unrealistic in the real world. In this paper, we introduce a model that uses a world ontology constructed from the Dewey Decimal Classification to acquire user profiles. By search using specific and exhaustive user profiles, information fusion techniques no longer rely on the statistics provided by agents. The model has been successfully evaluated using the large INEX data set simulating the distributed Web environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Light plays a unique role for plants as it is both a source of energy for growth and a signal for development. Light captured by the pigments in the light harvesting complexes is used to drive the synthesis of the chemical energy required for carbon assimilation. The light perceived by photoreceptors activates effectors, such as transcription factors (TFs), which modulate the expression of light-responsive genes. Recently, it has been speculated that increasing the photosynthetic rate could further improve the yield potential of three carbon (C3) crops such as wheat. However, little is currently known about the transcriptional regulation of photosynthesis genes, particularly in crop species. Nuclear factor Y (NF-Y) TF is a functionally diverse regulator of growth and development in the model plant species, with demonstrated roles in embryo development, stress response, flowering time and chloroplast biogenesis. Furthermore, a light-responsive NF-Y binding site (CCAAT-box) is present in the promoter of a spinach photosynthesis gene. As photosynthesis genes are co-regulated by light and co-regulated genes typically have similar regulatory elements in their promoters, it seems likely that other photosynthesis genes would also have light-responsive CCAAT-boxes. This provided the impetus to investigate the NF-Y TF in bread wheat. This thesis is focussed on wheat NF-Y members that have roles in light-mediated gene regulation with an emphasis on their involvement in the regulation of photosynthesis genes. NF-Y is a heterotrimeric complex, comprised of the three subunits NF-YA, NF-YB and NF-YC. Unlike the mammalian and yeast counterparts, each of the three subunits is encoded by multiple genes in Arabidopsis. The initial step taken in this study was the identification of the wheat NF-Y family (Chapter 3). A search of the current wheat nucleotide sequence databases identified 37 NF-Y genes (10 NF-YA, 11 NF-YB, 14 NF-YC & 2 Dr1). Phylogenetic analysis revealed that each of the three wheat NF-Y (TaNF-Y) subunit families could be divided into 4-5 clades based on their conserved core regions. Outside of the core regions, eleven motifs were identified to be conserved between Arabidopsis, rice and wheat NF-Y subunit members. The expression profiles of TaNF-Y genes were constructed using quantitative real-time polymerase chain reaction (RT-PCR). Some TaNF-Y subunit members had little variation in their transcript levels among the organs, while others displayed organ-predominant expression profiles, including those expressed mainly in the photosynthetic organs. To investigate their potential role in light-mediated gene regulation, the light responsiveness of the TaNF-Y genes were examined (Chapters 4 and 5). Two TaNF-YB and five TaNF-YC members were markedly upregulated by light in both the wheat leaves and seedling shoots. To identify the potential target genes of the light-upregulated NF-Y subunit members, a gene expression correlation analysis was conducted using publically available Affymetrix Wheat Genome Array datasets. This analysis revealed that the transcript expression levels of TaNF-YB3 and TaNF-YC11 were significantly correlated with those of photosynthesis genes. These correlated express profiles were also observed in the quantitative RT-PCR dataset from wheat plants grown under light and dark conditions. Sequence analysis of the promoters of these wheat photosynthesis genes revealed that they were enriched with potential NF-Y binding sites (CCAAT-box). The potential role of TaNF-YB3 in the regulation of photosynthetic genes was further investigated using a transgenic approach (Chapter 5). Transgenic wheat lines constitutively expressing TaNF-YB3 were found to have significantly increased expression levels of photosynthesis genes, including those encoding light harvesting chlorophyll a/b-binding proteins, photosystem I reaction centre subunits, a chloroplast ATP synthase subunit and glutamyl-tRNA reductase (GluTR). GluTR is a rate-limiting enzyme in the chlorophyll biosynthesis pathway. In association with the increased expression of the photosynthesis genes, the transgenic lines had a higher leaf chlorophyll content, increased photosynthetic rate and had a more rapid early growth rate compared to the wild-type wheat. In addition to its role in the regulation of photosynthesis genes, TaNF-YB3 overexpression lines flower on average 2-days earlier than the wild-type (Chapter 6). Quantitative RT-PCR analysis showed that there was a 13-fold increase in the expression level of the floral integrator, TaFT. The transcript levels of other downstream genes (TaFT2 and TaVRN1) were also increased in the transgenic lines. Furthermore, the transcript levels of TaNF-YB3 were significantly correlated with those of constans (CO), constans-like (COL) and timing of chlorophyll a/b-binding (CAB) expression 1 [TOC1; (CCT)] domain-containing proteins known to be involved in the regulation of flowering time. To summarise the key findings of this study, 37 NF-Y genes were identified in the crop species wheat. An in depth analysis of TaNF-Y gene expression profiles revealed that the potential role of some light-upregulated members was in the regulation of photosynthetic genes. The involvement of TaNF-YB3 in the regulation of photosynthesis genes was supported by data obtained from transgenic wheat lines with increased constitutive expression of TaNF-YB3. The overexpression of TaNF-YB3 in the transgenic lines revealed this NF-YB member is also involved in the fine-tuning of flowering time. These data suggest that the NF-Y TF plays an important role in light-mediated gene regulation in wheat.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The major limitation of current typing methods for Streptococcus pyogenes, such as emm sequence typing and T typing, is that these are based on regions subject to considerable selective pressure. Multilocus sequence typing (MLST) is a better indicator of the genetic backbone of a strain but is not widely used due to high costs. The objective of this study was to develop a robust and cost-effective alternative to S. pyogenes MLST. A 10-member single nucleotide polymorphism (SNP) set that provides a Simpson’s Index of Diversity (D) of 0.99 with respect to the S. pyogenes MLST database was derived. A typing format involving high-resolution melting (HRM) analysis of small fragments nucleated by each of the resolution-optimized SNPs was developed. The fragments were 59–119 bp in size and, based on differences in G+C content, were predicted to generate three to six resolvable HRM curves. The combination of curves across each of the 10 fragments can be used to generate a melt type (MelT) for each sequence type (ST). The 525 STs currently in the S. pyogenes MLST database are predicted to resolve into 298 distinct MelTs and the method is calculated to provide a D of 0.996 against the MLST database. The MelTs are concordant with the S. pyogenes population structure. To validate the method we examined clinical isolates of S. pyogenes of 70 STs. Curves were generated as predicted by G+C content discriminating the 70 STs into 65 distinct MelTs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Immunotherapy is a promising new treatment for patients with advanced prostate and ovarian cancer, but its application is limited by the lack of suitable target antigens that are recognized by CD8+ cytotoxic T lymphocytes (CTL). Human kallikrein 4 (KLK4) is a member of the kallikrein family of serine proteases that is significantly overexpressed in malignant versus healthy prostate and ovarian tissue, making it an attractive target for immunotherapy. We identified a naturally processed, HLA-A*0201-restricted peptide epitope within the signal sequence region of KLK4 that induced CTL responses in vitro in most healthy donors and prostate cancer patients tested. These CTL lysed HLA-A*0201+ KLK4 + cell lines and KLK4 mRNA-transfected monocyte-derived dendritic cells. CTL specific for the HLA-A*0201-restricted KLK4 peptide were more readily expanded to a higher frequency in vitro compared to the known HLA-A*0201-restricted epitopes from prostate cancer antigens; prostate-specific antigen (PSA), prostate-specific membrane antigen (PSMA) and prostatic acid phosphatase (PAP). These data demonstrate that KLK4 is an immunogenic molecule capable of inducing CTL responses and identify it as an attractive target for prostate and ovarian cancer immunotherapy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The European Early Lung Cancer (EUELC) project aims to determine if specific genetic alterations occurring in lung carcinogenesis are detectable in the respiratory epithelium. In order to pursue this objective, nonsmall cell lung cancer (NSCLC) patients with a very high risk of developing progressive lung cancer were recruited from 12 centres in eight European countries: France, Germany, southern Ireland, Italy, the Netherlands, Poland, Spain and the UK. In addition, NSCLC patients were followed up every 6 months for 36 months. A European Bronchial Tissue Bank was set up at the University of Liverpool (Liverpool, UK) to optimise the use of biological specimens. The molecular - pathological investigations were subdivided into specific work packages that were delivered by EUELC Partners. The work packages encompassed mutational analysis, genetic instability, methylation profiling, expression profiling utilising immunohistochemistry and chip-based technologies, as well as in-depth analysis of FHIT and RARβ genes, the telomerase catalytic subunit hTERT and genotyping of susceptibility genes in specific pathways. The EUELC project engendered a tremendous collaborative effort, and it enabled the EUELC Partners to establish protocols for assessing molecular biomarkers in early lung cancer with the view to using such biomarkers for early diagnosis and as intermediate end-points in future chemopreventive programmes. Copyright©ERS Journals Ltd 2009.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

High-resolution, high-contrast, three-dimensional images of live cell and tissue architecture can be obtained using second harmonic generation (SHG), which comprises non-absorptive frequency changes in an excitation laser line. SHG does not require any exogenous antibody or fluorophore labeling, and can generate images from unstained sections of several key endogenous biomolecules, in a wide variety of species and from different types of processed tissue. Here, we examined normal control human skin sections and human burn scar tissues using SHG on a multi-photon microscope (MPM). Examination and comparison of normal human skin and burn scar tissue demonstrated a clear arrangement of fibers in the dermis, similar to dermal collagen fiber signals. Fluorescence-staining confirmed the MPM-SHG collagen colocalization with antibody staining for dermal collagen type-I but not fibronectin or elastin. Furthermore, we were able to detect collagen MPM-SHG signal in human frozen sections as well as in unstained paraffin embedded tissue sections that were then compared with hematoxylin and eosin staining in the identical sections. This same approach was also successful in localizing collagen in porcine and ovine skin samples, and may be particularly important when species-specific antibodies may not be available. Collectively, our results demonstrate that MPM SHG-detection is a useful tool for high resolution examination of collagen architecture in both normal and wounded human, porcine and ovine dermal tissue.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chlamydia pecorum is a significant pathogen of domestic livestock and wildlife. We have developed a C. pecorum-specific multilocus sequence analysis (MLSA) scheme to examine the genetic diversity of and relationships between Australian sheep, cattle, and koala isolates. An MLSA of seven concatenated housekeeping gene fragments was performed using 35 isolates, including 18 livestock isolates (11 Australian sheep, one Australian cow, and six U.S. livestock isolates) and 17 Australian koala isolates. Phylogenetic analyses showed that the koala isolates formed a distinct clade, with limited clustering with C. pecorum isolates from Australian sheep. We identified 11 MLSA sequence types (STs) among Australian C. pecorum isolates, 10 of them novel, with koala and sheep sharing at least one identical ST (designated ST2013Aa). ST23, previously identified in global C. pecorum livestock isolates, was observed here in a subset of Australian bovine and sheep isolates. Most notably, ST23 was found in association with multiple disease states and hosts, providing insights into the transmission of this pathogen between livestock hosts. The complexity of the epidemiology of this disease was further highlighted by the observation that at least two examples of sheep were infected with different C. pecorum STs in the eyes and gastrointestinal tract. We have demonstrated the feasibility of our MLSA scheme for understanding the host relationship that exists between Australian C. pecorum strains and provide the first molecular epidemiological data on infections in Australian livestock hosts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The effect of two different DNA minor groove binding molecules, Hoechst 33258 and distamycin A, on the binding kinetics of NF-κB p50 to three different specific DNA sequences was studied at various salt concentrations. Distamycin A was shown to significantly increase the dissociation rate constant of p50 from the sequences PRDII (5′-GGGAAATTCC-3′) and Ig-κ B (5′-GGGACTTTCC-3′) but had a negligible effect on the dissociation from the palindromic target-κB binding site (5′-GGGAATTCCC-3′). By comparison, the effect of Hoechst 33258 on binding of p50 to each sequence was found to be minimal. The dissociation rates for the protein–DNA complexes increased at higher potassium chloride concentrations for the PRDII and Ig-κB binding motifs and this effect was magnified by distamycin A. In contrast, p50 bound to the palindromic target-κB site with a much higher intrinsic affinity and exhibited a significantly reduced salt dependence of binding over the ionic strength range studied, retaining a KD of less than 10 pM at 150 mM KCl. Our results demonstrate that the DNA binding kinetics of p50 and their salt dependence is strongly sequence-dependent and, in addition, that the binding of p50 to DNA can be influenced by the addition of minor groove-binding drugs in a sequence-dependent manner.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The majority of introns in gene transcripts are found within the coding sequences (CDSs). A small but significant fraction of introns are also found to reside within the untranslated regions (5′UTRs and 3′UTRs) of expressed sequences. Alignment of the whole genome and expressed sequence tags (ESTs) of the model plant Arabidopsis thaliana has identified introns residing in both coding and non-coding regions of the genome. Results A bioinformatic analysis revealed some interesting observations: (1) the density of introns in 5′UTRs is similar to that in CDSs but much higher than that in 3′UTRs; (2) the 5′UTR introns are preferentially located close to the initiating ATG codon; (3) introns in the 5′UTRs are, on average, longer than introns in the CDSs and 3′UTRs; and (4) 5′UTR introns have a different nucleotide composition to that of CDs and 3′UTR introns. Furthermore, we show that the 5′UTR intron of the A. thaliana EFIα-A3 gene affects the gene expression and the size of the 5′UTR intron influences the level of gene expression. Conclusion Introns within the 5′UTR show specific features that distinguish them from introns that reside within the coding sequence and the 3′UTR. In the EFIα-A3 gene, the presence of a long intron in the 5′UTR is sufficient to enhance gene expression in plants in a size dependent manner.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We report a high-quality draft genome sequence of the domesticated apple (Malus × domestica). We show that a relatively recent (>50 million years ago) genome-wide duplication (GWD) has resulted in the transition from nine ancestral chromosomes to 17 chromosomes in the Pyreae. Traces of older GWDs partly support the monophyly of the ancestral paleohexaploidy of eudicots. Phylogenetic reconstruction of Pyreae and the genus Malus, relative to major Rosaceae taxa, identified the progenitor of the cultivated apple as M. sieversii. Expansion of gene families reported to be involved in fruit development may explain formation of the pome, a Pyreae-specific false fruit that develops by proliferation of the basal part of the sepals, the receptacle. In apple, a subclade of MADS-box genes, normally involved in flower and fruit development, is expanded to include 15 members, as are other gene families involved in Rosaceae-specific metabolism, such as transport and assimilation of sorbitol.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past decade the mitochondrial (mt) genome has become the most widely used genomic resource available for systematic entomology. While the availability of other types of ‘–omics’ data – in particular transcriptomes – is increasing rapidly, mt genomes are still vastly cheaper to sequence and are far less demanding of high quality templates. Furthermore, almost all other ‘–omics’ approaches also sequence the mt genome, and so it can form a bridge between legacy and contemporary datasets. Mitochondrial genomes have now been sequenced for all insect orders, and in many instances representatives of each major lineage within orders (suborders, series or superfamilies depending on the group). They have also been applied to systematic questions at all taxonomic scales from resolving interordinal relationships (e.g. Cameron et al., 2009; Wan et al., 2012; Wang et al., 2012), through many intraordinal (e.g. Dowton et al., 2009; Timmermans et al., 2010; Zhao et al. 2013a) and family-level studies (e.g. Nelson et al., 2012; Zhao et al., 2013b) to population/biogeographic studies (e.g. Ma et al., 2012). Methodological issues around the use of mt genomes in insect phylogenetic analyses and the empirical results found to date have recently been reviewed by Cameron (2014), yet the technical aspects of sequencing and annotating mt genomes were not covered. Most papers which generate new mt genome report their methods in a simplified form which can be difficult to replicate without specific knowledge of the field. Published studies utilize a sufficiently wide range of approaches, usually without justification for the one chosen, that confusion about commonly used jargon such as ‘long PCR’ and ‘primer walking’ could be a serious barrier to entry. Furthermore, sequenced mt genomes have been annotated (gene locations defined) to wildly varying standards and improving data quality through consistent annotation procedures will benefit all downstream users of these datasets. The aims of this review are therefore to: 1. Describe in detail the various sequencing methods used on insect mt genomes; 2. Explore the strengths/weakness of different approaches; 3. Outline the procedures and software used for insect mt genome annotation, and; 4. Highlight quality control steps used for new annotations, and to improve the re-annotation of previously sequenced mt genomes used in systematic or comparative research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Escherichia coli ST131 is now recognised as a leading contributor to urinary tract and bloodstream infections in both community and clinical settings. Here we present the complete, annotated genome of E. coli EC958, which was isolated from the urine of a patient presenting with a urinary tract infection in the Northwest region of England and represents the most well characterised ST131 strain. Sequencing was carried out using the Pacific Biosciences platform, which provided sufficient depth and read-length to produce a complete genome without the need for other technologies. The discovery of spurious contigs within the assembly that correspond to site-specific inversions in the tail fibre regions of prophages demonstrates the potential for this technology to reveal dynamic evolutionary mechanisms. E. coli EC958 belongs to the major subgroup of ST131 strains that produce the CTX-M-15 extended spectrum β-lactamase, are fluoroquinolone resistant and encode the fimH30 type 1 fimbrial adhesin. This subgroup includes the Indian strain NA114 and the North American strain JJ1886. A comparison of the genomes of EC958, JJ1886 and NA114 revealed that differences in the arrangement of genomic islands, prophages and other repetitive elements in the NA114 genome are not biologically relevant and are due to misassembly. The availability of a high quality uropathogenic E. coli ST131 genome provides a reference for understanding this multidrug resistant pathogen and will facilitate novel functional, comparative and clinical studies of the E. coli ST131 clonal lineage.