916 resultados para Genome-specific Sequence
Resumo:
With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.
Resumo:
Despite the development of novel typing methods based on whole genome sequencing, most laboratories still rely on classical molecular methods for outbreak investigation or surveillance. Reference methods for Clostridium difficile include ribotyping and pulsed-field gel electrophoresis, which are band-comparing methods often difficult to establish and which require reference strain collections. Here, we present the double locus sequence typing (DLST) scheme as a tool to analyse C. difficile isolates. Using a collection of clinical C. difficile isolates recovered during a 1-year period, we evaluated the performance of DLST and compared the results to multilocus sequence typing (MLST), a sequence-based method that has been used to study the structure of bacterial populations and highlight major clones. DLST had a higher discriminatory power compared to MLST (Simpson's index of diversity of 0.979 versus 0.965) and successfully identified all isolates of the study (100 % typeability). Previous studies showed that the discriminatory power of ribotyping was comparable to that of MLST; thus, DLST might be more discriminatory than ribotyping. DLST is easy to establish and provides several advantages, including absence of DNA extraction [polymerase chain reaction (PCR) is performed on colonies], no specific instrumentation, low cost and unambiguous definition of types. Moreover, the implementation of a DLST typing scheme on an Internet database, such as that previously done for Staphylococcus aureus and Pseudomonas aeruginosa ( http://www.dlst.org ), will allow users to easily obtain the DLST type by submitting directly sequencing files and will avoid problems associated with multiple databases.
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
Genome sequence varies in numerous ways among individuals although the gross architecture is fixed for all humans. Retrotransposons create one of the most abundant structural variants in the human genome and are divided in many families, with certain members in some families, e.g., L1, Alu, SVA, and HERV-K, remaining active for transposition. Along with other types of genomic variants, retrotransponson-derived variants contribute to the whole spectrum of genome variants in humans. With the advancement of sequencing techniques, many human genomes are being sequenced at the individual level, fueling the comparative research on these variants among individuals. In this thesis, the evolution and functional impact of structural variations is examined primarily focusing on retrotransposons in the context of human evolution. The thesis comprises of three different studies on the topics that are presented in three data chapters. First, the recent evolution of all human specific AluYb members, representing the second most active subfamily of Alus, was tracked to identify their source/master copy using a novel approach. All human-specific AluYb elements from the reference genome were extracted, aligned with one another to construct clusters of similar copies and each cluster was analyzed to generate the evolutionary relationship between the members of the cluster. The approach resulted in identification of one major driver copy of all human specific Yb8 and the source copy of the Yb9 lineage. Three new subfamilies within the AluYb family – Yb8a1, Yb10 and Yb11 were also identified, with Yb11 being the youngest and most polymorphic. Second, an attempt to construct a relation between transposable elements (TEs) and tandem repeats (TRs) was made at a genome-wide scale for the first time. Upon sequence comparison, positional cross-checking and other relevant analyses, it was observed that over 20% of all TRs are derived from TEs. This result established the first connection between these two types of repetitive elements, and extends our appreciation for the impact of TEs on genomes. Furthermore, only 6% of these TE-derived TRs follow the already postulated initiation and expansion mechanisms, suggesting that the others are likely to follow a yet-unidentified mechanism. Third, by taking a combination of multiple computational approaches involving all types of genetic variations published so far including transposable elements, the first whole genome sequence of the most recent common ancestor of all modern human populations that diverged into different populations around 125,000-100,000 years ago was constructed. The study shows that the current reference genome sequence is 8.89 million base pairs larger than our common ancestor’s genome, contributed by a whole spectrum of genetic mechanisms. The use of this ancestral reference genome to facilitate the analysis of personal genomes was demonstrated using an example genome and more insightful recent evolutionary analyses involving the Neanderthal genome. The three data chapters presented in this thesis conclude that the tandem repeats and transposable elements are not two entirely distinctly isolated elements as over 20% TRs are actually derived from TEs. Certain subfamilies of TEs themselves are still evolving with the generation of newer subfamilies. The evolutionary analyses of all TEs along with other genomic variants helped to construct the genome sequence of the most recent common ancestor to all modern human populations which provides a better alternative to human reference genome and can be a useful resource for the study of personal genomics, population genetics, human and primate evolution.
Resumo:
The complete genome of an Erwinia amylovora bacteriophage, vB_EamM_Ea35-70 (Ea35-70), is 271,084 bp, encodes 318 putative proteins, and contains one tRNA. Comparative analysis with other Myoviridae genomes suggests that Ea35-70 is related to the Phikzlikevirus genus within the family Myoviridae, since 26% of Ea35-70 proteins share homology to proteins in Pseudomonas phage φKZ.
Resumo:
Le centromère est la région chromosomique où le kinétochore s'assemble en mitose. Contrairement à certaines caractéristiques géniques, la séquence centromérique n'est ni conservée entre les espèces ni suffisante à la fonction centromérique. Il est donc bien accepté dans la littérature que le centromère est régulé épigénétiquement par une variante de l'histone H3, CENP-A. KNL-2, aussi connu sous le nom de M18BP1, ainsi que ces partenaires Mis18α et Mis18β sont des protéines essentielles pour l'incorporation de CENP-A nouvellement synthétisé aux centromères. Des évidences expérimentales démontrent que KNL-2, ayant un domaine de liaison à l'ADN nommé Myb, est la protéine la plus en amont pour l'incorporation de CENP-A aux centromères en phase G1. Par contre, sa fonction dans le processus d'incorporation de CENP-A aux centromères n'est pas bien comprise et ces partenaires de liaison ne sont pas tous connus. De nouveaux partenaires de liaison de KNL-2 ont été identifiés par des expériences d'immunoprécipitation suivies d'une analyse en spectrométrie de masse. Un rôle dans l'incorporation de CENP-A nouvellement synthétisé aux centromères a été attribué à MgcRacGAP, une des 60 protéines identifiées par l'essai. MgcRacGAP ainsi que les protéines ECT-2 (GEF) et la petite GTPase Cdc42 ont été démontrées comme étant requises pour la stabilité de CENP-A incorporé aux centromères. Ces différentes observations ont mené à l'identification d'une troisième étape au niveau moléculaire pour l'incorporation de CENP-A nouvellement synthétisé en phase G1, celle de la stabilité de CENP-A nouvellement incorporé aux centromères. Cette étape est importante pour le maintien de l'identité centromérique à chaque division cellulaire. Pour caractériser la fonction de KNL-2 lors de l'incorporation de CENP-A nouvellement synthétisé aux centromères, une technique de microscopie à haute résolution couplée à une quantification d'image a été utilisée. Les résultats générés démontrent que le recrutement de KNL-2 au centromère est rapide, environ 5 minutes après la sortie de la mitose. De plus, la structure du domaine Myb de KNL-2 provenant du nématode C. elegans a été résolue par RMN et celle-ci démontre un motif hélice-tour-hélice, une structure connue pour les domaines de liaison à l'ADN de la famille Myb. De plus, les domaines humain (HsMyb) et C. elegans (CeMyb) Myb lient l'ADN in vitro, mais aucune séquence n'est reconnue spécifiquement par ces domaines. Cependant, il a été possible de démontrer que ces deux domaines lient préférentiellement la chromatine CENP-A-YFP comparativement à la chromatine H2B-GFP par un essai modifié de SIMPull sous le microscope TIRF. Donc, le domaine Myb de KNL-2 est suffisant pour reconnaître de façon spécifique la chromatine centromérique. Finalement, l'élément reconnu par les domaines Myb in vitro a potentiellement été identifié. En effet, il a été démontré que les domaines HsMyb et CeMyb lient l'ADN simple brin in vitro. De plus, les domaines HsMyb et CeMyb ne colocalisent pas avec CENP-A lorsqu'exprimés dans les cellules HeLa, mais plutôt avec les corps nucléaires PML, des structures nucléaires composées d'ARN. Donc, en liant potentiellement les transcrits centromériques, les domaines Myb de KNL-2 pourraient spécifier l'incorporation de CENP-A nouvellement synthétisé uniquement aux régions centromériques.
A genetic linkage map of microsatellite, gene-specific and morphological markers in diploid Fragaria
Resumo:
Diploid Fragaria provide a potential model for genomic studies in the Rosaceae. To develop a genetic linkage map of diploid Fragaria, we scored 78 markers (68 microsatellites, one sequence-characterised amplified region, six gene-specific markers and three morphological traits) in an interspecific F2 population of 94 plants generated from a cross of F.vesca f. semperflorens × F. nubicola. Co-segregation analysis arranged 76 markers into seven discrete linkage groups covering 448 cM, with linkage group sizes ranging from 100.3 cM to 22.9 cM. Marker coverage was generally good; however some clustering of markers was observed on six of the seven linkage groups. Segregation distortion was observed at a high proportion of loci (54%), which could reflect the interspecific nature of the progeny and, in some cases, the self-incompatibility of F. nubicola. Such distortion may also account for some of the marker clustering observed in the map. One of the morphological markers, pale-green leaf (pg) has not previously been mapped in Fragaria and was located to the mid-point of linkage group VI. The transferable nature of the markers used in this study means that the map will be ideal for use as a framework for additional marker incorporation aimed at enhancing and resolving map coverage of the diploid Fragaria genome. The map also provides a sound basis for linkage map transfer to the cultivated octoploid strawberry.
Resumo:
A recently emerging bleeding canker disease, caused by Pseudomonas syringae pathovar aesculi (Pae), is threatening European horse chestnut in northwest Europe. Very little is known about the origin and biology of this new disease. We used the nucleotide sequences of seven commonly used marker genes to investigate the phylogeny of three strains isolated recently from bleeding stem cankers on European horse chestnut in Britain (E-Pae). On the basis of these sequences alone, the E-Pae strains were identical to the Pae type-strain (I-Pae), isolated from leaf spots on Indian horse chestnut in India in 1969. The phylogenetic analyses also showed that Pae belongs to a distinct clade of P. syringae pathovars adapted to woody hosts. We generated genome-wide Illumina sequence data from the three E-Pae strains and one strain of I-Pae. Comparative genomic analyses revealed pathovar-specific genomic regions in Pae potentially implicated in virulence on a tree host, including genes for the catabolism of plant-derived aromatic compounds and enterobactin synthesis. Several gene clusters displayed intra-pathovar variation, including those encoding type IV secretion, a novel fatty acid biosynthesis pathway and a sucrose uptake pathway. Rates of single nucleotide polymorphisms in the four Pae genomes indicate that the three E-Pae strains diverged from each other much more recently than they diverged from I-Pae. The very low genetic diversity among the three geographically distinct E-Pae strains suggests that they originate from a single, recent introduction into Britain, thus highlighting the serious environmental risks posed by the spread of an exotic plant pathogenic bacterium to a new geographic location. The genomic regions in Pae that are absent from other P. syringae pathovars that infect herbaceous hosts may represent candidate genetic adaptations to infection of the woody parts of the tree.
Resumo:
An apple rootstock progeny raised from the cross between the very dwarfing ‘M.27’ and the more vigorous ‘M.116’ (‘M.M.106’ × ‘M.27’) was used for the construction of a linkage map comprising a total of 324 loci: 252 previously mapped SSRs, 71 newly characterised or previously unmapped SSR loci (including 36 amplified by 33 out of the 35 novel markers reported here), and the self-incompatibility locus. The map spanned the 17 linkage groups (LG) expected for apple covering a genetic distance of 1,229.5 cM, an estimated 91% of the Malus genome. Linkage groups were well populated and, although marker density ranged from 2.3 to 6.2 cM/SSR, just 15 gaps of more than 15 cM were observed. Moreover, only 17.5% of markers displayed segregation distortion and, unsurprisingly in a semi-compatible backcross, distortion was particularly pronounced surrounding the self-incompatibility locus (S) at the bottom of LG17. DNA sequences of 273 SSR markers and the S locus, representing a total of 314 loci in this investigation, were used to anchor to the ‘Golden Delicious’ genome sequence. More than 260 of these loci were located on the expected pseudo-chromosome on the ‘Golden Delicious’ genome or on its homeologous pseudo-chromosome. In total, 282.4 Mbp of sequence from 142 genome sequence scaffolds of the Malus genome were anchored to the ‘M.27’ × ‘M.116’ map, providing an interface between the marker data and the underlying genome sequence. This will be exploited for the identification of genes responsible for traits of agronomic importance such as dwarfing and water use efficiency.
Resumo:
Pseudomonas corrugata was first described as the causal agent of a tomato disease called 'pith necrosis' yet it is considered as a biological resource in various fields such as biocontrol of plant diseases and production of industrially promising microbial biopolymers (mcl-PHA). Here we report the first draft genome sequence of this species.
Resumo:
Pseudomonas corrugata was first described as the causal agent of a tomato disease called ‘pith necrosis’ yet it is considered as a biological resource in various fields such as biocontrol of plant diseases and production of industrially promising microbial biopolymers (mcl-PHA). Here we report the first draft genome sequence of this species.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)