8 resultados para Complete genome sequencing

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bipolar disorder (BP) is a complex psychiatric disorder characterized by episodes of mania and depression. BP affects approximately 1% of the world’s population and shows no difference in lifetime prevalence between males and females. BP arises from complex interactions among genetic, developmental and environmental factors, and it is likely that several predisposing genes are involved in BP. The genetic background of BP is still poorly understood, although intensive and long-lasting research has identified several chromosomal regions and genes involved in susceptibility to BP. This thesis work aims to identify the genetic variants that influence bipolar disorder in the Finnish population by candidate gene and genome-wide linkage analyses in families with many BP cases. In addition to diagnosis-based phenotypes, neuropsychological traits that can be seen as potential endophenotypes or intermediate traits for BP were analyzed. In the first part of the thesis, we examined the role of the allelic variants of the TSNAX/DISC1 gene cluster to psychotic and bipolar spectrum disorders and found association of distinct allelic haplotypes with these two groups of disorders. The haplotype at the 5’ end of the Disrupted-in-Schizophrenia-1 gene (DISC1) was over-transmitted to males with psychotic disorder (p = 0.008; for an extended haplotype p = 0.0007 with both genders), whereas haplotypes at the 3’ end of DISC1 associated with bipolar spectrum disorder (p = 0.0002; for an extended haplotype p = 0.0001). The variants of these haplotypes also showed association with different cognitive traits. The haplotypes at the 5’ end associated with perseverations and auditory attention, while the variants at the 3’ end associated with several cognitive traits including verbal fluency and psychomotor processing speed. Second, in our complete set of BP families with 723 individuals we studied six functional candidate genes from three distinct signalling systems: serotonin-related genes (SLC6A4 and TPH2), BDNF -related genes (BDNF, CREB1 and NTRK2) and one gene related to the inflammation and cytokine system (P2RX7). We replicated association of the functional variant Val66Met of BDNF with BP and better performance in retention. The variants at the 5’ end of SLC6A4 also showed some evidence of association among males (p = 0.004), but the widely studied functional variants did not yield any significant results. A protective four-variant haplotype on P2RX7 showed evidence of association with BP and executive functions: semantic and phonemic fluency (p = 0.006 and p = 0.0003, respectively). Third, we analyzed 23 bipolar families originating from the North-Eastern region of Finland. A genome-wide scan was performed using the 6K single nucleotide polymorphism (SNP) array. We identified susceptibility loci at chromosomes 7q31 with a LOD score of 3.20 and at 9p13.1 with a LOD score of 4.02. We followed up both linkage findings in the complete set of 179 Finnish bipolar families. The finding on chromosome 9p13 was supported (maximum LOD score of 3.02), but the susceptibility gene itself remains unclarified. In the fourth part of the thesis, we wanted to test the role of the allelic variants that have associated with bipolar disorder in recent genome-wide association studies (GWAS). We could confirm findings for the DFNB31, SORCS2, SCL39A3, and DGKH genes. The best signal in this study comes from DFNB31, which remained significant after multiple testing corrections. Two variants of SORCS2 were allelic replications and presented the same signal as the haplotype analysis. However, no association was detected with the PALB2 gene, which was the most significantly associated region in the previous GWAS. Our results indicate that BP is heterogeneous and its genetic background may accordingly vary in different populations. In order to fully understand the allelic heterogeneity that underlies common diseases such as BP, complete genome sequencing for many individuals with and without the disease is required. Identification of the specific risk variants will help us better understand the pathophysiology underlying BP and will lead to the development of treatments with specific biochemical targets. In addition, it will further facilitate the identification of environmental factors that alter risk, which will potentially provide improved occupational, social and psychological advice for individuals with high risk of BP.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Growth is a fundamental aspect of life cycle of all organisms. Body size varies highly in most animal groups, such as mammals. Moreover, growth of a multicellular organism is not uniform enlargement of size, but different body parts and organs grow to their characteristic sizes at different times. Currently very little is known about the molecular mechanisms governing this organ-specific growth. The genome sequencing projects have provided complete genomic DNA sequences of several species over the past decade. The amount of genomic sequence information, including sequence variants within species, is constantly increasing. Based on the universal genetic code, we can make sense of this sequence information as far as it codes proteins. However, less is known about the molecular mechanisms that control expression of genes, and about the variations in gene expression that underlie many pathological states in humans. This is caused in part by lack of information about the second genetic code that consists of the binding specificities of transcription factors and the combinatorial code by which transcription factor binding sites are assembled to form tissue-specific and/or ligand-regulated enhancer elements. This thesis presents a high-throughput assay for identification of transcription factor binding specificities, which were then used to measure the DNA binding profiles of transcription factors involved in growth control. We developed ‘enhancer element locator’, a computational tool, which can be used to predict functional enhancer elements. A genome-wide prediction of human and mouse enhancer elements generated a large database of enhancer elements. This database can be used to identify target genes of signaling pathways, and to predict activated transcription factors based on changes in gene expression. Predictions validated in transgenic mouse embryos revealed the presence of multiple tissue-specific enhancers in mouse c- and N-Myc genes, which has implications to organ specific growth control and tumor type specificity of oncogenes. Furthermore, we were able to locate a variation in a single nucleotide, which carries a susceptibility to colorectal cancer, to an enhancer element and propose a mechanism by which this SNP might be involved in generation of colorectal cancer.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The first glycyl radical in an enzyme was described 20 years ago and since then the family of glycyl radical enzymes (GREs) has expanded to include enzymes catalysing five chemically distinct reactions. The type enzymes of the family, anaerobic ribonucleotide reductase (RNRIII) and pyruvate formate lyase (PFL) had been studied long before it was known that they are GREs. Spectroscopic measurements on the radical and an observation that exposure to oxygen irreversibly inactivates the enzymes by cleavage of the protein proved that the radical is located on a particular glycine residue, close to the C-terminus of the protein. Both anaerobic RNRIII and PFL, are important for many anaerobic and facultative anaerobic bacteria as RNRIII is responsible for the synthesis of DNA precursors and PFL catalyses a key metabolic reaction in glycolysis. The crystal structures of both were solved in 1999 and they revealed that, although the enzymes do not share significant sequence identity, they share a similar structure - the radical site and residues necessary for catalysis are buried inside a ten stranded $\ualpha $/$\ubeta $-barrel. GREs are synthesised in an inactive form and are post-translationally activated by an activating enzyme which uses S-adenosyl methionine and an iron-sulphur cluster to generate the radical. One of the goals of this thesis work was to crystallise the activating enzyme of PFL. This task is challenging as, like GREs, the activating component is inactivated by oxygen. The experiments were therefore carried out in an oxygen free atmosphere. This is the first report of a crystalline GRE activating enzyme. Recently several new GREs have been characterised, all sharing sequence similarity to PFL but not to RNRIII. Also, the genome sequencing projects have identified many PFL-like GREs of unknown function, usually annotated as PFLs. In the present thesis I describe the grouping of these PFL family enzymes based on the sequence similarity and analyse the conservation patterns when compared to the structure of E. coli PFL. Based on this information an activation route is proposed. I also report a crystal structure of one of the PFL-like enzymes with unknown function, PFL2 from Archaeoglobus fulgidus. As A. fulgidus is a hyperthermophilic organism, possible mechanisms stabilising the structure are discussed. The organisation of an active site of PFL2 suggests that the enzyme may be a dehydratase. Keywords: glycyl radical, enzyme, pyruvate formate lyase, x-ray crystallography, bioinformatics

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Extraintestinal pathogenic Escherichia coli (ExPEC) represent a diverse group of strains of E. coli, which infect extraintestinal sites, such as the urinary tract, the bloodstream, the meninges, the peritoneal cavity, and the lungs. Urinary tract infections (UTIs) caused by uropathogenic E. coli (UPEC), the major subgroup of ExPEC, are among the most prevalent microbial diseases world wide and a substantial burden for public health care systems. UTIs are responsible for serious morbidity and mortality in the elderly, in young children, and in immune-compromised and hospitalized patients. ExPEC strains are different, both from genetic and clinical perspectives, from commensal E. coli strains belonging to the normal intestinal flora and from intestinal pathogenic E. coli strains causing diarrhea. ExPEC strains are characterized by a broad range of alternate virulence factors, such as adhesins, toxins, and iron accumulation systems. Unlike diarrheagenic E. coli, whose distinctive virulence determinants evoke characteristic diarrheagenic symptoms and signs, ExPEC strains are exceedingly heterogeneous and are known to possess no specific virulence factors or a set of factors, which are obligatory for the infection of a certain extraintestinal site (e. g. the urinary tract). The ExPEC genomes are highly diverse mosaic structures in permanent flux. These strains have obtained a significant amount of DNA (predictably up to 25% of the genomes) through acquisition of foreign DNA from diverse related or non-related donor species by lateral transfer of mobile genetic elements, including pathogenicity islands (PAIs), plasmids, phages, transposons, and insertion elements. The ability of ExPEC strains to cause disease is mainly derived from this horizontally acquired gene pool; the extragenous DNA facilitates rapid adaptation of the pathogen to changing conditions and hence the extent of the spectrum of sites that can be infected. However, neither the amount of unique DNA in different ExPEC strains (or UPEC strains) nor the mechanisms lying behind the observed genomic mobility are known. Due to this extreme heterogeneity of the UPEC and ExPEC populations in general, the routine surveillance of ExPEC is exceedingly difficult. In this project, we presented a novel virulence gene algorithm (VGA) for the estimation of the extraintestinal virulence potential (VP, pathogenicity risk) of clinically relevant ExPECs and fecal E. coli isolates. The VGA was based on a DNA microarray specific for the ExPEC phenotype (ExPEC pathoarray). This array contained 77 DNA probes homologous with known (e.g. adhesion factors, iron accumulation systems, and toxins) and putative (e.g. genes predictably involved in adhesion, iron uptake, or in metabolic functions) ExPEC virulence determinants. In total, 25 of DNA probes homologous with known virulence factors and 36 of DNA probes representing putative extraintestinal virulence determinants were found at significantly higher frequency in virulent ExPEC isolates than in commensal E. coli strains. We showed that the ExPEC pathoarray and the VGA could be readily used for the differentiation of highly virulent ExPECs both from less virulent ExPEC clones and from commensal E. coli strains as well. Implementing the VGA in a group of unknown ExPECs (n=53) and fecal E. coli isolates (n=37), 83% of strains were correctly identified as extraintestinal virulent or commensal E. coli. Conversely, 15% of clinical ExPECs and 19% of fecal E. coli strains failed to raster into their respective pathogenic and non-pathogenic groups. Clinical data and virulence gene profiles of these strains warranted the estimated VPs; UPEC strains with atypically low risk-ratios were largely isolated from patients with certain medical history, including diabetes mellitus or catheterization, or from elderly patients. In addition, fecal E. coli strains with VPs characteristic for ExPEC were shown to represent the diagnostically important fraction of resident strains of the gut flora with a high potential of causing extraintestinal infections. Interestingly, a large fraction of DNA probes associated with the ExPEC phenotype corresponded to novel DNA sequences without any known function in UTIs and thus represented new genetic markers for the extraintestinal virulence. These DNA probes included unknown DNA sequences originating from the genomic subtractions of four clinical ExPEC isolates as well as from five novel cosmid sequences identified in the UPEC strains HE300 and JS299. The characterized cosmid sequences (pJS332, pJS448, pJS666, pJS700, and pJS706) revealed complex modular DNA structures with known and unknown DNA fragments arranged in a puzzle-like manner and integrated into the common E. coli genomic backbone. Furthermore, cosmid pJS332 of the UPEC strain HE300, which carried a chromosomal virulence gene cluster (iroBCDEN) encoding the salmochelin siderophore system, was shown to be part of a transmissible plasmid of Salmonella enterica. Taken together, the results of this project pointed towards the assumptions that first, (i) homologous recombination, even within coding genes, contributes to the observed mosaicism of ExPEC genomes and secondly, (ii) besides en block transfer of large DNA regions (e.g. chromosomal PAIs) also rearrangements of small DNA modules provide a means of genomic plasticity. The data presented in this project supplemented previous whole genome sequencing projects of E. coli and indicated that each E. coli genome displays a unique assemblage of individual mosaic structures, which enable these strains to successfully colonize and infect different anatomical sites.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The first part of this work investigates the molecular epidemiology of a human enterovirus (HEV), echovirus 30 (E-30). This project is part of a series of studies performed in our research team analyzing the molecular epidemiology of HEV-B viruses. A total of 129 virus strains had been isolated in different parts of Europe. The sequence analysis was performed in three different genomic regions: 420 nucleotides (nt) in the VP4/VP2 capsid protein coding region, the entire VP1 capsid protein coding gene of 876 nt, and 150 nt in the VP1/2A junction region. The analysis revealed a succession of dominant sublineages within a major genotype. The temporally earlier genotypes had been replaced by a genetically homogenous lineage that has been circulating in Europe since the late 1970s. The same genotype was found by other research groups in North America and Australia. Globally, other cocirculating genetic lineages also exist. The prevalence of a dominant genotype makes E-30 different from other previously studied HEVs, such as polioviruses and coxsackieviruses B4 and B5, for which several coexisting genetic lineages have been reported. The second part of this work deals with molecular epidemiology of human rhinoviruses (HRVs). A total of 61 field isolates were studied in the 420-nt stretch in the capsid coding region of VP4/VP2. The isolates were collected from children under two years of age in Tampere, Finland. Sequences from the clinical isolates clustered in the two previously known phylogenetic clades. Seasonal clustering was found. Also, several distinct serotype-like clusters were found to co-circulate during the same epidemic season. Reappearance of a cluster after disappearing for a season was observed. The molecular epidemiology of the analyzed strains turned out to be complex, and we decided to continue our studies of HRV. Only five previously published complete genome sequences of HRV prototype strains were available for analysis. Therefore, all designated HRV prototype strains (n=102) were sequenced in the VP4/VP2 region, and the possibility of genetic typing of HRV was evaluated. Seventy-six of the 102 prototype strains clustered in HRV genetic group A (HRV-A) and 25 in group B (HRV-B). Serotype 87 clustered separately from other HRVs with HEV species D. The field strains of HRV represented as many as 19 different genotypes, as judged with an approximate demarcation of a 20% nt difference in the VP4/VP2 region. The interserotypic differences of HRV were generally similar to those reported between different HEV serotypes (i.e. about 20%), but smaller differences, less than 10%, were also observed. Because some HRV serotypes are genetically so closely related, we suggest that the genetic typing be performed using the criterion "the closest prototype strain". This study is the first systematic genetic characterization of all known HRV prototype strains, providing a further taxonomic proposal for classification of HRV. We proposed to divide the genus Human rhinoviruses into HRV-A and HRV-B. The final part of the work comprises a phylogenetic analysis of a subset (48) of HRV prototype strains and field isolates (12) in the nonstructural part of the genome coding for the RNA-dependent RNA polymerase (3D). The proposed division of the HRV strains in the species HRV-A and HRV-B was also supported by 3D region. HRV-B clustered closer to HEV species B, C, and also to polioviruses than to HRV-A. Intraspecies variation within both HRV-A and HRV-B was greater in the 3D coding region than in the VP4/VP2 coding region, in contrast to HEV. Moreover, the diversity of HRV in 3D exceeded that of HEV. One group of HRV-A, designated HRV-A', formed a separate cluster outside other HRV-A in the 3D region. It formed a cluster also in the capsid region, but located within HRV-A. This may reflect a different evolutionary history of distinct genomic regions among HRV-A. Furthermore, the tree topology within HRV-A in the 3D region differed from that in the VP4/VP2, suggesting possible recombination events in the evolution of the strains. No conflicting phylogenies were observed in any of the 12 field isolates. Possible recombination was further studied using the Similarity and Bootscanning analyses of the complete genome sequences of HRV available in public databases. Evidence for recombination among HRV-A was found, as HRV2 and HRV39 showed higher similarity in the nonstructural part of the genome. Whether HRV2 and HRV39 strains - and perhaps also some other HRV-A strains not yet completely sequenced - are recombinants remains to be determined.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The growing interest for sequencing with higher throughput in the last decade has led to the development of new sequencing applications. This thesis concentrates on optimizing DNA library preparation for Illumina Genome Analyzer II sequencer. The library preparation steps that were optimized include fragmentation, PCR purification and quantification. DNA fragmentation was performed with focused sonication in different concentrations and durations. Two column based PCR purification method, gel matrix method and magnetic bead based method were compared. Quantitative PCR and gel electrophoresis in a chip were compared for DNA quantification. The magnetic bead purification was found to be the most efficient and flexible purification method. The fragmentation protocol was changed to produce longer fragments to be compatible with longer sequencing reads. Quantitative PCR correlates better with the cluster number and should thus be considered to be the default quantification method for sequencing. As a result of this study more data have been acquired from sequencing with lower costs and troubleshooting has become easier as qualification steps have been added to the protocol. New sequencing instruments and applications will create a demand for further optimizations in future.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Microarrays have a wide range of applications in the biomedical field. From the beginning, arrays have mostly been utilized in cancer research, including classification of tumors into different subgroups and identification of clinical associations. In the microarray format, a collection of small features, such as different oligonucleotides, is attached to a solid support. The advantage of microarray technology is the ability to simultaneously measure changes in the levels of multiple biomolecules. Because many diseases, including cancer, are complex, involving an interplay between various genes and environmental factors, the detection of only a single marker molecule is usually insufficient for determining disease status. Thus, a technique that simultaneously collects information on multiple molecules allows better insights into a complex disease. Since microarrays can be custom-manufactured or obtained from a number of commercial providers, understanding data quality and comparability between different platforms is important to enable the use of the technology to areas beyond basic research. When standardized, integrated array data could ultimately help to offer a complete profile of the disease, illuminating mechanisms and genes behind disorders as well as facilitating disease diagnostics. In the first part of this work, we aimed to elucidate the comparability of gene expression measurements from different oligonucleotide and cDNA microarray platforms. We compared three different gene expression microarrays; one was a commercial oligonucleotide microarray and the others commercial and custom-made cDNA microarrays. The filtered gene expression data from the commercial platforms correlated better across experiments (r=0.78-0.86) than the expression data between the custom-made and either of the two commercial platforms (r=0.62-0.76). Although the results from different platforms correlated reasonably well, combining and comparing the measurements were not straightforward. The clone errors on the custom-made array and annotation and technical differences between the platforms introduced variability in the data. In conclusion, the different gene expression microarray platforms provided results sufficiently concordant for the research setting, but the variability represents a challenge for developing diagnostic applications for the microarrays. In the second part of the work, we performed an integrated high-resolution microarray analysis of gene copy number and expression in 38 laryngeal and oral tongue squamous cell carcinoma cell lines and primary tumors. Our aim was to pinpoint genes for which expression was impacted by changes in copy number. The data revealed that especially amplifications had a clear impact on gene expression. Across the genome, 14-32% of genes in the highly amplified regions (copy number ratio >2.5) had associated overexpression. The impact of decreased copy number on gene underexpression was less clear. Using statistical analysis across the samples, we systematically identified hundreds of genes for which an increased copy number was associated with increased expression. For example, our data implied that FADD and PPFIA1 were frequently overexpressed at the 11q13 amplicon in HNSCC. The 11q13 amplicon, including known oncogenes such as CCND1 and CTTN, is well-characterized in different type of cancers, but the roles of FADD and PPFIA1 remain obscure. Taken together, the integrated microarray analysis revealed a number of known as well as novel target genes in altered regions in HNSCC. The identified genes provide a basis for functional validation and may eventually lead to the identification of novel candidates for targeted therapy in HNSCC.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The neuronal ceroid lipofuscinoses (NCLs) are a group of mostly autosomal recessively inherited neurodegenerative disorders. The aim of this thesis was to characterize the molecular genetic bases of these, previously genetically undetermined, NCL forms. Congenital NCL is the most aggressive form of NCLs. Previously, a mutation in the cathepsin D (CTSD) gene was shown to cause congenital NCL in sheep. Based on the close resemblance of the phenotypes between congenital NCLs in sheep and human, CTSD was considered as a potential candidate gene in humans as well. When screened for mutations by sequencing, a homozygous nucleotide duplication creating a premature stop codon was identified in CTSD in one family with congenital NCL. While in vitro the overexpressed truncated mutant protein was stable although inactive, the absence of CTSD staining in brain tissue samples of patients indicated degradation of the mutant CTSD in vivo. A lack of CTSD staining was detected also in another, unrelated family with congenital NCL. These results imply that CTSD deficiency underlies congenital NCL. While initially Turkish vLINCL was considered a distinct genetic entity (CLN7), mutations in the CLN8 gene were later reported to account for the disease in a subset of Turkish patients with vLINCL. To further dissect the genetic basis of the disease, all known NCL genes were screened for homozygosity by haplotype analysis of microsatellite markers and/or sequenced in 13 mainly consanguineous, Turkish vLINCL families. Two novel, family-specific homozygous mutations were identified in the CLN6 gene. In the remaining families, all known NCL loci were excluded. To identify novel gene(s) underlying vLINCL, a genomewide single nucleotide polymorphism scan, homozygosity mapping, and positional candidate gene sequencing were performed in ten of these families. On chromosome 4q28.1-q28.2, a novel major facilitator superfamily domain containing 8 (MFSD8) gene with six family-specific homozygous mutations in vLINCL patients was identified. MFSD8 transcript was shown to be ubiquitously expressed with a complex pattern of alternative splicing. Our results suggest that MFSD8 is a novel lysosomal integral membrane protein which, as a member of the major facilitator superfamily, is predicted to function as a transporter. Identification of MFSD8 emphasizes the genetic heterogeneity of Turkish vLINCL. In families where no MFSD8 mutations were detected, additional NCL-causing genes remain to be identified. The identification of CTSD and MFSD8 increases the number of known human NCL-causing genes to eight, and is an important step towards the complete understanding of the genetic spectrum underlying NCLs. In addition, it is a starting point for dissecting the molecular mechanisms behind the associated NCLs and contributes to the challenging task of understanding the molecular pathology underlying the group of NCL disorders.