8 resultados para REARRANGEMENTS
em Helda - Digital Repository of University of Helsinki
Resumo:
Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.
Resumo:
Germline mutations in fumarate hydratase (FH) cause hereditary leiomyomatosis and renal cell cancer (HLRCC). FH is a nuclear encoded enzyme which functions in the Krebs tricarboxylic acid cycle, and homozygous mutation in FH lead to severe developmental defects. Both uterine and cutaneous leiomyomas are components of the HLRCC phenotype. Most of these tumours show loss of the wild-type allele and, also, the mutations reduce FH enzyme activity, which indicate that FH is a tumour suppressor gene. The renal cell cancers associated with HLRCC are of rare papillary type 2 histology. Other genes involved in the Krebs cycle, which are also implicated in neoplasia are 3 of the 4 subunits encoding succinate dehydrogenase (SDH); mutations in SHDB, SDHC, and SDHD predispose to paraganglioma and phaeochromocytoma. Although uterine leiomyomas (or fibroids) are very common, the estimations of affected women ranging from 25% to 77%, not much is known about their genetic background. Cytogenetic studies have revealed that rearrangements involving chromosomes 6, 7, 12 and 14 are most commonly seen in fibroids. Deletions on the long arm of chromosome 7 have been reported to be involved in about 17 to 34 % of leiomyomas and the small commonly deleted region on 7q22 suggests that there might be an underlying tumour suppressor gene in that region. The purpose of this study was to investigate the genetic mechanisms behind the development of tumours associated with HLRCC, both renal cell cancer and uterine fibroids. Firstly, a database search at the Finnish cancer registry was conducted in order to identify new families with early-onset RCC and to test if the family history was compatible with HLRCC. Secondly, sporadic uterine fibroids were tested for deletions on 7q in order to define the minimal deleted 7q-region, followed by mutation analysis of the candidate genes. Thirdly, oligonucleotide chips were utilised to study the global gene expression profiles of uterine fibroids in order to test whether 7q-deletions and FH mutations significantly affected fibroid biology. In the screen for early-onset RCC, 214 families were identified. Subsequently, the pedigrees were constructed and clinical data obtained. One of the index cases (RCC at the age of 28) had a mother who had been diagnosed with a heart tumour, which in further investigation turned out to be a paraganglioma. This lead to an alternative hypothesis that SDH, instead of FH, could be involved. SDHA, SDHB, SDHC and SDHD were sequenced from these individuals; a germline SDHB R27X mutation was detected with loss of the wild-type allele in both tumours. These results suggest that germline mutations in the SDHB gene predispose to early-onset RCC establishing a novel form of hereditary RCC. This has immediate clinical implications in the surveillance of patients suffering from early-onset RCC and phaeochromocytoma/paraganglioma. For the studies on sporadic uterine fibroids, a set of 166 fibroids from 51 individuals were collected. The 7q LOH mapping defined a commonly deleted region of about 3.2 mega bases in 11 of the 166 tumours. The deletion was consistent with previously reported allelotyping studies of leiomyomas and it therefore suggested the presence of a tumour suppressor gene in the deleted region. Furthermore, the high-resolution aCGH-chip analysis refined the deleted region to only 2.79Mb. When combined with previous data, the commonly deleted region was only 2.3Mb. The mutation screening of the known genes within the commonly deleted region did not reveal pathogenic mutations, however. The expression microarray analysis revealed that FH-deficient fibroids, both sporadic and familial, had their distinct gene expression profile as they formed their own group in the unsupervised clustering. On the other hand, the presence or absence of 7q-deletions did not significantly alter the global gene expression pattern of fibroids, suggesting that these two groups do not have different biological backgrounds. Multiple differentially expressed genes were identified between FH wild-type and FH-mutant fibroids, and the most significant increase was seen in the expression of carbohydrate metabolism-related and hypoxia inducible factor (HIF) target genes.
Resumo:
Hereditary nonpolyposis colorectal cancer (HNPCC) and familial adenomatous polyposis (FAP) are characterized by a high risk and early onset of colorectal cancer (CRC). HNPCC is due to a germline mutation in one of the following MMR genes: MLH1, MSH2, MSH6 and PMS2. A majority of FAP and attenuated FAP (AFAP) cases are due to germline mutations of APC, causing the development of multiple colorectal polyps. To date, over 450 MMR gene mutations and over 800 APC mutations have been identified. Most of these mutations lead to a truncated protein, easily detected by conventional mutation detection methods. However, in about 30% of HNPCC and FAP, and about 90% of AFAP families, mutations remain unknown. We aimed to clarify the genetic basis and genotype-phenotype correlation of mutation negative HNPCC and FAP/AFAP families by advanced mutation detection methods designed to detect large genomic rearrangements, mRNA and protein expression alterations, promoter mutations, phenotype linked haplotypes, and tumoral loss of heterozygosity. We also aimed to estimate the frequency of HNPCC in Uruguayan CRC patients. Our expression based analysis of mutation negative HNPCC divided these families into two categories: 1) 42% of families linked to the MMR genes with a phenotype resembling that of mutation positive, and 2) 58% of families likely to be associated with other susceptibility genes. Unbalanced mRNA expression of MLH1 was observed in two families. Further studies revealed that a MLH1 nonsense mutation, R100X was associated with aberrant splicing of exons not related to the mutation and an MLH1 deletion (AGAA) at nucleotide 210 was associated with multiple exon skipping, without an overall increase in the frequency of splice events. APC mutation negative FAP/AFAP families were divided into four groups according to the genetic basis of their predisposition. Four (14%) families displayed a constitutional deletion of APC with profuse polyposis, early age of onset and frequent extracolonic manifestations. Aberrant mRNA expression of one allele was observed in seven (24%) families with later onset and less frequent extracolonic manifestations. In 15 (52%) families the involvement of APC could neither be confirmed nor excluded. In three (10%) of the families a germline mutation was detected in genes other than APC: AXIN2 in one family, and MYH in two families. The families with undefined genetic basis and especially those with AXIN2 or MYH mutations frequently displayed AFAP or atypical polyposis. Of the Uruguayan CRC patients, 2.6% (12/461) fulfilled the diagnostic criteria for HNPCC and 5.6% (26/461) were associated with increased risk of cancer. Unexpectedly low frequency of molecularly defined HNPCC cases may suggest a different genetic profile in the Uruguayan population and the involvement of novel susceptibility genes. Accurate genetic and clinical characterization of families with hereditary colorectal cancers, and the definition of the genetic basis of "mutation negative" families in particular, facilitate proper clinical management of such families.
Resumo:
Background: The Ewing sarcoma family of tumors (ESFT) are rare but highly malignant neoplasms that occur mainly in bone or but also in soft tissue. ESFT affects patients typically in their second decade of life, whereby children and adolescents bear the heaviest incidence burden. Despite recent advances in the clinical management of ESFT patients, their prognosis and survival are still disappointingly poor, especially in cases with metastasis. No targeted therapy for ESFT patients is currently available. Moreover, based merely on current clinical and biological characteristics, accurate classification of ESFT patients often fails at the time of diagnosis. Therefore, there is a constant need for novel molecular biomarkers to be applied in tandem with conventional parameters to further intensify ESFT risk-stratification and treatment selection, and ultimately to develop novel targeted therapies. In this context, a greater understanding of the genetics and immune characteristics of ESFT is needed. Aims: This study sought to open novel insights into gene copy number changes and gene expression in ESFT and, further, to enlighten the role of inflammation in ESFT. For this purpose, microarrays were used to provide gene-level information on a genomewide scale. In addition, this study focused on screening of 9p21.3 deletion sizes and frequencies in ESFT and, in another pediatric cancer, acute lymphocytic leukemia (ALL), in order to define more exact criteria for highrisk patient selection and to provide data for developing a more reliable diagnostic method to detect CDKN2A deletions. Results: In study I, 20 novel ESFT-associated suppressor genes and oncogenes were pinpointed using combined array CGH and expression analysis. In addition, interesting chromosomal rearrangements were identified: (1) Duplication of derivative chromosome der(22)(11;22) was detected in three ESFT patients. This duplication included the EWSR1-FLI1 fusion gene leading to increase in its copy number; (2) Cryptic amplifications on chromosomes 20 and 22 were detected, suggesting a novel translocation between chromosomes 20 and 22, which most probably produces a fusion between EWSR1 and NFATC2. In study II, bioinformatic analysis of ESFT expression profiles showed that inflammatory gene activation is detectable in ESFT patient samples and that the activation is characterized by macrophage gene expression. Most interestingly, ESFT patient samples were shown to express certain inflammatory genes that were prognostically significant. High local expression of C5 and JAK1 at the tumor site was shown to associate with favorable clinical outcome, whereas high local expression of IL8 was shown to be detrimental. Studies III and IV showed that the smallest overlapping region of deletion in 9p21.3 includes CDKN2A in all cases and that the length of this region is 12.2 kb in both Ewing sarcoma and ALL. Furthermore, our results showed that the most widely used commercial CDKN2A FISH probe creates false negative results in the narrowest microdeletion cases (<190 kb). Therefore, more accurate methods should be developed for the detection of deletions in the CDKN2A locus. Conclusions: This study provides novel insights into the genetic changes involved in the biology of ESFT, in the interaction between ESFT cells and immune system, and in the inactivation of CDKN2A. Novel ESFT biomarker genes identified in this study serve as a useful resource for future studies and in developing novel therapeutic strategies to improve the survival of patients with ESFT.
Resumo:
Dimeric phenolic compounds lignans and dilignols form in the so-called oxidative coupling reaction of phenols. Enzymes such as peroxidases and lac-cases catalyze the reaction using hydrogen peroxide or oxygen respectively as oxidant generating phenoxy radicals which couple together according to certain rules. In this thesis, the effects of the structures of starting materials mono-lignols and the effects of reaction conditions such as pH and solvent system on this coupling mechanism and on its regio- and stereoselectivity have been studied. After the primary coupling of two phenoxy radicals a very reactive quinone me-thide intermediate is formed. This intermediate reacts quickly with a suitable nucleophile which can be, for example, an intramolecular hydroxyl group or another nucleophile such as water, methanol, or a phenolic compound in the reaction system. This reaction is catalyzed by acids. After the nucleophilic addi-tion to the quinone methide, other hydrolytic reactions, rearrangements, and elimination reactions occur leading finally to stable dimeric structures called lignans or dilignols. Similar reactions occur also in the so-called lignification process when monolignol (or dilignol) reacts with the growing lignin polymer. New kinds of structures have been observed in this thesis. The dimeric com-pounds with so-called spirodienone structure have been observed to form both in the dehydrodimerization of methyl sinapate and in the beta-1-type cross-coupling reaction of two different monolignols. This beta-1-type dilignol with a spirodienone structure was the first synthetized and published dilignol model compound, and at present, it has been observed to exist as a fundamental construction unit in lignins. The enantioselectivity of the oxidative coupling reaction was also studied for obtaining enantiopure lignans and dilignols. A rather good enantioselectivity was obtained in the oxidative coupling reaction of two monolignols with chiral auxiliary substituents using peroxidase/H2O2 as an oxidation system. This observation was published as one of the first enantioselective oxidative coupling reaction of phenols. Pure enantiomers of lignans were also obtained by using chiral cryogenic chromatography as a chiral resolution technique. This technique was shown to be an alternative route to prepare enantiopure lignans or lignin model compounds in a preparative scale.
Resumo:
The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.
Resumo:
Extraintestinal pathogenic Escherichia coli (ExPEC) represent a diverse group of strains of E. coli, which infect extraintestinal sites, such as the urinary tract, the bloodstream, the meninges, the peritoneal cavity, and the lungs. Urinary tract infections (UTIs) caused by uropathogenic E. coli (UPEC), the major subgroup of ExPEC, are among the most prevalent microbial diseases world wide and a substantial burden for public health care systems. UTIs are responsible for serious morbidity and mortality in the elderly, in young children, and in immune-compromised and hospitalized patients. ExPEC strains are different, both from genetic and clinical perspectives, from commensal E. coli strains belonging to the normal intestinal flora and from intestinal pathogenic E. coli strains causing diarrhea. ExPEC strains are characterized by a broad range of alternate virulence factors, such as adhesins, toxins, and iron accumulation systems. Unlike diarrheagenic E. coli, whose distinctive virulence determinants evoke characteristic diarrheagenic symptoms and signs, ExPEC strains are exceedingly heterogeneous and are known to possess no specific virulence factors or a set of factors, which are obligatory for the infection of a certain extraintestinal site (e. g. the urinary tract). The ExPEC genomes are highly diverse mosaic structures in permanent flux. These strains have obtained a significant amount of DNA (predictably up to 25% of the genomes) through acquisition of foreign DNA from diverse related or non-related donor species by lateral transfer of mobile genetic elements, including pathogenicity islands (PAIs), plasmids, phages, transposons, and insertion elements. The ability of ExPEC strains to cause disease is mainly derived from this horizontally acquired gene pool; the extragenous DNA facilitates rapid adaptation of the pathogen to changing conditions and hence the extent of the spectrum of sites that can be infected. However, neither the amount of unique DNA in different ExPEC strains (or UPEC strains) nor the mechanisms lying behind the observed genomic mobility are known. Due to this extreme heterogeneity of the UPEC and ExPEC populations in general, the routine surveillance of ExPEC is exceedingly difficult. In this project, we presented a novel virulence gene algorithm (VGA) for the estimation of the extraintestinal virulence potential (VP, pathogenicity risk) of clinically relevant ExPECs and fecal E. coli isolates. The VGA was based on a DNA microarray specific for the ExPEC phenotype (ExPEC pathoarray). This array contained 77 DNA probes homologous with known (e.g. adhesion factors, iron accumulation systems, and toxins) and putative (e.g. genes predictably involved in adhesion, iron uptake, or in metabolic functions) ExPEC virulence determinants. In total, 25 of DNA probes homologous with known virulence factors and 36 of DNA probes representing putative extraintestinal virulence determinants were found at significantly higher frequency in virulent ExPEC isolates than in commensal E. coli strains. We showed that the ExPEC pathoarray and the VGA could be readily used for the differentiation of highly virulent ExPECs both from less virulent ExPEC clones and from commensal E. coli strains as well. Implementing the VGA in a group of unknown ExPECs (n=53) and fecal E. coli isolates (n=37), 83% of strains were correctly identified as extraintestinal virulent or commensal E. coli. Conversely, 15% of clinical ExPECs and 19% of fecal E. coli strains failed to raster into their respective pathogenic and non-pathogenic groups. Clinical data and virulence gene profiles of these strains warranted the estimated VPs; UPEC strains with atypically low risk-ratios were largely isolated from patients with certain medical history, including diabetes mellitus or catheterization, or from elderly patients. In addition, fecal E. coli strains with VPs characteristic for ExPEC were shown to represent the diagnostically important fraction of resident strains of the gut flora with a high potential of causing extraintestinal infections. Interestingly, a large fraction of DNA probes associated with the ExPEC phenotype corresponded to novel DNA sequences without any known function in UTIs and thus represented new genetic markers for the extraintestinal virulence. These DNA probes included unknown DNA sequences originating from the genomic subtractions of four clinical ExPEC isolates as well as from five novel cosmid sequences identified in the UPEC strains HE300 and JS299. The characterized cosmid sequences (pJS332, pJS448, pJS666, pJS700, and pJS706) revealed complex modular DNA structures with known and unknown DNA fragments arranged in a puzzle-like manner and integrated into the common E. coli genomic backbone. Furthermore, cosmid pJS332 of the UPEC strain HE300, which carried a chromosomal virulence gene cluster (iroBCDEN) encoding the salmochelin siderophore system, was shown to be part of a transmissible plasmid of Salmonella enterica. Taken together, the results of this project pointed towards the assumptions that first, (i) homologous recombination, even within coding genes, contributes to the observed mosaicism of ExPEC genomes and secondly, (ii) besides en block transfer of large DNA regions (e.g. chromosomal PAIs) also rearrangements of small DNA modules provide a means of genomic plasticity. The data presented in this project supplemented previous whole genome sequencing projects of E. coli and indicated that each E. coli genome displays a unique assemblage of individual mosaic structures, which enable these strains to successfully colonize and infect different anatomical sites.
Resumo:
Microbial degradation pathways play a key role in the detoxification and the mineralization of polyaromatic hydrocarbons (PAHs), which are widespread pollutants in soil and constituents of petroleum hydrocarbons. In microbiology the aromatic degradation pathways are traditionally studied from single bacterial strains with capacity to degrade certain pollutant. In soil the degradation of aromatics is performed by a diverse community of micro-organisms. The aim of this thesis was to study biodegradation on different levels starting from a versatile aromatic degrader Sphingobium sp. HV3 and its megaplasmid, extending to revelation of diversity of key catabolic enzymes in the environment and finally studying birch rhizoremediation in PAH-polluted soil. To understand biodegradation of aromatics on bacterial species level, the aromatic degradation capacity of Sphingobium sp. HV3 and the role of the plasmid pSKY4, was studied. Toluene, m-xylene, biphenyl, fluorene, phenanthrene were detected as carbon and energy sources of the HV3 strain. Tn5 transposon mutagenesis linked the degradation capacity of toluene, m-xylene, biphenyl and naphthalene to the pSKY4 plasmid and qPCR expression analysis showed that plasmid extradiol dioxygenases genes (bphC and xylE) are inducted by phenanthrene, m-xylene and biphenyl whereas the 2,4-dichlorophenoxyacetic acid herbicide induced the chlorocatechol 1,2-dioxygenase gene (tfdC) from the ortho-pathway. A method to study upper meta-pathway extradiol dioxygenase gene diversity in soil was developed. The extradiol dioxygenases catalyse cleavage of the aromatic ring between a hydroxylated carbon and an adjacent non-hydroxylated carbon (meta-cleavage). A high diversity of extradiol dioxygenases were detected from polluted soils. The detected extradiol dioxygenases showed sequence similarity to known catabolic genes of Alpha-, Beta-, and Gammaproteobacteria. Five groups of extradiol dioxygenases contained sequences with no close homologues in the database, representing novel genes. In rhizoremediation experiment with birch (Betula pendula) treatment specific changes of extradiol dioxygenase communities were shown. PAH pollution changed the bulk soil extradiol dioxygenase community structure and birch rhizosphere contained a more diverse extradiol dioxygenase community than the bulk soil showing a rhizosphere effect. The degradation of pyrene in soil was enhanced with birch seedlings compared to soil without birch. The complete 280,923 kb nucleotide sequence of pSKY4 plasmid was determined. The open reading frames of pSKY4 were divided into putative conjugative transfer, aromatic degradation, replication/maintaining and transposition/integration function-encoding proteins. Aromatic degradation orfs shared high similarity to corresponding genes in pNL1, a plasmid from the deep subsurface strain Novosphingobium aromaticivorans F199. The plasmid backbones were considerably more divergent with lower similarity, which suggests that the aromatic pathway has functioned as a plasmid independent mobile genetic element. The functional diversity of microbial communities in soil is still largely unknown. Several novel clusters of extradiol dioxygenases representing catabolic bacteria, whose function, biodegradation pathways and phylogenetic position is not known were amplified with single primer pair from polluted soils. These extradiol dioxygenase communities were shown to change upon PAH pollution, which indicates that their hosts function in PAH biodegradation in soil. Although the degradation pathways of specific bacterial species are substantially better depicted than pathways in situ, the evolution of degradation pathways for the xenobiotic compounds is largely unknown. The pSKY4 plasmid contains aromatic degradation genes in putative mobile genetic element causing flexibility/instability to the pathway. The localisation of the aromatic biodegradation pathway in mobile genetic elements suggests that gene transfer and rearrangements are a competetive advantage for Sphingomonas bacteria in the environment.