14 resultados para high-throughput
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis presents a highly sensitive genome wide search method for recessive mutations. The method is suitable for distantly related samples that are divided into phenotype positives and negatives. High throughput genotype arrays are used to identify and compare homozygous regions between the cohorts. The method is demonstrated by comparing colorectal cancer patients against unaffected references. The objective is to find homozygous regions and alleles that are more common in cancer patients. We have designed and implemented software tools to automate the data analysis from genotypes to lists of candidate genes and to their properties. The programs have been designed in respect to a pipeline architecture that allows their integration to other programs such as biological databases and copy number analysis tools. The integration of the tools is crucial as the genome wide analysis of the cohort differences produces many candidate regions not related to the studied phenotype. CohortComparator is a genotype comparison tool that detects homozygous regions and compares their loci and allele constitutions between two sets of samples. The data is visualised in chromosome specific graphs illustrating the homozygous regions and alleles of each sample. The genomic regions that may harbour recessive mutations are emphasised with different colours and a scoring scheme is given for these regions. The detection of homozygous regions, cohort comparisons and result annotations are all subjected to presumptions many of which have been parameterized in our programs. The effect of these parameters and the suitable scope of the methods have been evaluated. Samples with different resolutions can be balanced with the genotype estimates of their haplotypes and they can be used within the same study.
Resumo:
Miniaturization of analytical instrumentation is attracting growing interest in response to the explosive demand for rapid, yet sensitive analytical methods and low-cost, highly automated instruments for pharmaceutical and bioanalyses and environmental monitoring. Microfabrication technology in particular, has enabled fabrication of low-cost microdevices with a high degree of integrated functions, such as sample preparation, chemical reaction, separation, and detection, on a single microchip. These miniaturized total chemical analysis systems (microTAS or lab-on-a-chip) can also be arrayed for parallel analyses in order to accelerate the sample throughput. Other motivations include reduced sample consumption and waste production as well as increased speed of analysis. One of the most promising hyphenated techniques in analytical chemistry is the combination of a microfluidic separation chip and mass spectrometer (MS). In this work, the emerging polymer microfabrication techniques, ultraviolet lithography in particular, were exploited to develop a capillary electrophoresis (CE) separation chip which incorporates a monolithically integrated electrospray ionization (ESI) emitter for efficient coupling with MS. An epoxy photoresist SU-8 was adopted as structural material and characterized with respect to its physicochemical properties relevant to chip-based CE and ESI/MS, namely surface charge, surface interactions, heat transfer, and solvent compatibility. As a result, SU-8 was found to be a favorable material to substitute for the more commonly used glass and silicon in microfluidic applications. In addition, an infrared (IR) thermography was introduced as direct, non-intrusive method to examine the heat transfer and thermal gradients during microchip-CE. The IR data was validated through numerical modeling. The analytical performance of SU-8-based microchips was established for qualitative and quantitative CE-ESI/MS analysis of small drug compounds, peptides, and proteins. The CE separation efficiency was found to be similar to that of commercial glass microchips and conventional CE systems. Typical analysis times were only 30-90 s per sample indicating feasibility for high-throughput analysis. Moreover, a mass detection limit at the low-attomole level, as low as 10E+5 molecules, was achieved utilizing MS detection. The SU-8 microchips developed in this work could also be mass produced at low cost and with nearly identical performance from chip to chip. Until this work, the attempts to combine CE separation with ESI in a chip-based system, amenable to batch fabrication and capable of high, reproducible analytical performance, have not been successful. Thus, the CE-ESI chip developed in this work is a substantial step toward lab-on-a-chip technology.
Resumo:
The basic goal of a proteomic microchip is to achieve efficient and sensitive high throughput protein analyses, automatically carrying out several measurements in parallel. A protein microchip would either detect a single protein or a large set of proteins for diagnostic purposes, basic proteome or functional analysis. Such analyses would include e.g. interactomics, general protein expression studies, detecting structural alterations or secondary modifications. Visualization of the results may occur by simple immunoreactions, general or specific labelling, or mass spectrometry. For this purpose we have manufactured chip-based proteome analysis devices that utilize the classical polymer gel electrophoresis technology to run one and two-dimensional gel electrophoresis separations of proteins in just a smaller size. In total, we manufactured three functional prototypes of which one performed a miniaturized one-dimensional gel electrophoresis (1-DE) separation, the second and third preformed two-dimensional gel electrophoresis (2-DE) separations. These microchips were successfully used to separate and characterize a set of predefined standard proteins, cell and tissue samples. Also, the miniaturized 2-DE (ComPress-2DE) chip presents a novel way of combining the 1st and 2nd dimensional separations, thus avoiding manual handling of the gels, eliminate cross-contamination, and make analyses faster and repeatability better. They all showed the advantages of miniaturization over the commercial devices; such as fast analysis, low sample- and reagent consumption, high sensitivity, high repeatability and inexpensive performance. All these instruments have the potential to be fully automated due to their easy-to-use set-up.
Resumo:
Chromosomal alterations in leukemia have been shown to have prognostic and predictive significance and are also important minimal residual disease (MRD) markers in the follow-up of leukemia patients. Although specific oncogenes and tumor suppressors have been discovered in some of the chromosomal alterations, the role and target genes of many alterations in leukemia remain unknown. In addition, a number of leukemia patients have a normal karyotype by standard cytogenetics, but have variability in clinical course and are often molecularly heterogeneous. Cytogenetic methods traditionally used in leukemia analysis and diagnostics; G-banding, various fluorescence in situ hybridization (FISH) techniques, and chromosomal comparative genomic hybridization (cCGH), have enormously increased knowledge about the leukemia genome, but have limitations in resolution or in genomic coverage. In the last decade, the development of microarray comparative genomic hybridization (array-CGH, aCGH) for DNA copy number analysis and the SNP microarray (SNP-array) method for simultaneous copy number and loss of heterozygosity (LOH) analysis has enabled investigation of chromosomal and gene alterations genome-wide with high resolution and high throughput. In these studies, genetic alterations were analyzed in acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL). The aim was to screen and characterize genomic alterations that could play role in leukemia pathogenesis by using aCGH and SNP-arrays. One of the most important goals was to screen cryptic alterations in karyotypically normal leukemia patients. In addition, chromosomal changes were evaluated to narrow the target regions, to find new markers, and to obtain tumor suppressor and oncogene candidates. The work presented here shows the capability of aCGH to detect submicroscopic copy number alterations in leukemia, with information about breakpoints and genes involved in the alterations, and that genome-wide microarray analyses with aCGH and SNP-array are advantageous methods in the research and diagnosis of leukemia. The most important findings were the cryptic changes detected with aCGH in karyotypically normal AML and CLL, characterization of amplified genes in 11q marker chromosomes, detection of deletion-based mechanisms of MLL-ARHGEF12 fusion gene formation, and detection of LOH without copy number alteration in karyotypically normal AML. These alterations harbor candidate oncogenes and tumor suppressors for further studies.
Resumo:
The commodity plastics that are used in our everyday lives are based on polyolefin resins and they find wide variety of applications in several areas. Most of the production is carried out in catalyzed low pressure processes. As a consequence polymerization of ethene and α-olefins has been one of the focus areas for catalyst research both in industry and academia. Enormous amount of effort have been dedicated to fine tune the processes and to obtain better control of the polymerization and to produce tailored polymer structures The literature review of the thesis concentrates on the use of Group IV metal complexes as catalysts for polymerization of ethene and branched α-olefins. More precisely the review is focused on the use of complexes bearing [O,O] and [O,N] type ligands which have gained considerable interest. Effects of the ligand framework as well as mechanical and fluxional behaviour of the complexes are discussed. The experimental part consists mainly of development of new Group IV metal complexes bearing [O,O] and [O,N] ligands and their use as catalysts precursors in ethene polymerization. Part of the experimental work deals with usage of high-throughput techniques in tailoring properties of new polymer materials which are synthesized using Group IV complexes as catalysts. It is known that the by changing the steric and electronic properties of the ligand framework it is possible to fine tune the catalyst and to gain control over the polymerization reaction. This is why in this thesis the complex structures were designed so that the ligand frameworks could be fairly easily modified. All together 14 complexes were synthesised and used as catalysts in ethene polymerizations. It was found that the ligand framework did have an impact within the studied catalyst families. The activities of the catalysts were affected by the changes in complex structure and also effects on the produced polymers were observed: molecular weights and molecular weight distributions were depended on the used catalyst structure. Some catalysts also produced bi- or multi-modal polymers. During last decade high-throughput techniques developed in pharmaceutical industries have been adopted into polyolefin research in order to speed-up and optimize the catalyst candidates. These methods can now be regarded as established method suitable for both academia and industry alike. These high-throughput techniques were used in tailoring poly(4-methyl-1-pentene) polymers which were synthesized using Group IV metal complexes as catalysts. This work done in this thesis represents the first successful example where the high-throughput synthesis techniques are combined with high-throughput mechanical testing techniques to speed-up the discovery process for new polymer materials.
Resumo:
The purpose of this study is to describe the development of application of mass spectrometry for the structural analyses of non-coding ribonucleic acids during past decade. Mass spectrometric methods are compared of traditional gel electrophoretic methods, the characteristics of performance of mass spectrometric, analyses are studied and the future trends of mass spectrometry of ribonucleic acids are discussed. Non-coding ribonucleic acids are short polymeric biomolecules which are not translated to proteins, but which may affect the gene expression in all organisms. Regulatory ribonucleic acids act through transient interactions with key molecules in signal transduction pathways. Interactions are mediated through specific secondary and tertiary structures. Posttranscriptional modifications in the structures of molecules may introduce new properties to the organism, such as adaptation to environmental changes or development of resistance to antibiotics. In the scope of this study, the structural studies include i) determination of the sequence of nucleobases in the polymer chain, ii) characterisation and localisation of posttranscriptional modifications in nucleobases and in the backbone structure, iii) identification of ribonucleic acid-binding molecules and iv) probing of higher order structures in the ribonucleic acid molecule. Bacteria, archaea, viruses and HeLa cancer cells have been used as target organisms. Synthesised ribonucleic acids consisting of structural regions of interest have been frequently used. Electrospray ionisation (ESI) and matrix-assisted laser desorption ionisation (MALDI) have been used for ionisation of ribonucleic analytes. Ammonium acetate and 2-propanol are common solvents for ESI. Trihydroxyacetophenone is the optimal MALDI matrix for ionisation of ribonucleic acids and peptides. Ammonium salts are used in ESI buffers and MALDI matrices as additives to remove cation adducts. Reverse phase high performance liquid chromatography has been used for desalting and fractionation of analytes either off-line of on-line, coupled with ESI source. Triethylamine and triethylammonium bicarbonate are used as ion pair reagents almost exclusively. Fourier transform ion cyclotron resonance analyser using ESI coupled with liquid chromatography is the platform of choice for all forms of structural analyses. Time-of-flight (TOF) analyser using MALDI may offer sensitive, easy-to-use and economical solution for simple sequencing of longer oligonucleotides and analyses of analyte mixtures without prior fractionation. Special analysis software is used for computer-aided interpretation of mass spectra. With mass spectrometry, sequences of 20-30 nucleotides of length may be determined unambiguously. Sequencing may be applied to quality control of short synthetic oligomers for analytical purposes. Sequencing in conjunction with other structural studies enables accurate localisation and characterisation of posttranscriptional modifications and identification of nucleobases and amino acids at the sites of interaction. High throughput screening methods for RNA-binding ligands have been developed. Probing of the higher order structures has provided supportive data for computer-generated three dimensional models of viral pseudoknots. In conclusion. mass spectrometric methods are well suited for structural analyses of small species of ribonucleic acids, such as short non-coding ribonucleic acids in the molecular size region of 20-30 nucleotides. Structural information not attainable with other methods of analyses, such as nuclear magnetic resonance and X-ray crystallography, may be obtained with the use of mass spectrometry. Sequencing may be applied to quality control of short synthetic oligomers for analytical purposes. Ligand screening may be used in the search of possible new therapeutic agents. Demanding assay design and challenging interpretation of data requires multidisclipinary knowledge. The implement of mass spectrometry to structural studies of ribonucleic acids is probably most efficiently conducted in specialist groups consisting of researchers from various fields of science.
Resumo:
Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.
Resumo:
Growth is a fundamental aspect of life cycle of all organisms. Body size varies highly in most animal groups, such as mammals. Moreover, growth of a multicellular organism is not uniform enlargement of size, but different body parts and organs grow to their characteristic sizes at different times. Currently very little is known about the molecular mechanisms governing this organ-specific growth. The genome sequencing projects have provided complete genomic DNA sequences of several species over the past decade. The amount of genomic sequence information, including sequence variants within species, is constantly increasing. Based on the universal genetic code, we can make sense of this sequence information as far as it codes proteins. However, less is known about the molecular mechanisms that control expression of genes, and about the variations in gene expression that underlie many pathological states in humans. This is caused in part by lack of information about the second genetic code that consists of the binding specificities of transcription factors and the combinatorial code by which transcription factor binding sites are assembled to form tissue-specific and/or ligand-regulated enhancer elements. This thesis presents a high-throughput assay for identification of transcription factor binding specificities, which were then used to measure the DNA binding profiles of transcription factors involved in growth control. We developed ‘enhancer element locator’, a computational tool, which can be used to predict functional enhancer elements. A genome-wide prediction of human and mouse enhancer elements generated a large database of enhancer elements. This database can be used to identify target genes of signaling pathways, and to predict activated transcription factors based on changes in gene expression. Predictions validated in transgenic mouse embryos revealed the presence of multiple tissue-specific enhancers in mouse c- and N-Myc genes, which has implications to organ specific growth control and tumor type specificity of oncogenes. Furthermore, we were able to locate a variation in a single nucleotide, which carries a susceptibility to colorectal cancer, to an enhancer element and propose a mechanism by which this SNP might be involved in generation of colorectal cancer.
Resumo:
The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.
Resumo:
Lead contamination in the environment is of particular concern, as it is a known toxin. Until recently, however, much less attention has been given to the local contamination caused by activities at shooting ranges compared to large-scale industrial contamination. In Finland, more than 500 tons of Pb is produced each year for shotgun ammunition. The contaminant threatens various organisms, ground water and the health of human populations. However, the forest at shooting ranges usually shows no visible sign of stress compared to nearby clean environments. The aboveground biota normally reflects the belowground ecosystem. Thus, the soil microbial communities appear to bear strong resistance to contamination, despite the influence of lead. The studies forming this thesis investigated a shooting range site at Hälvälä in Southern Finland, which is heavily contaminated by lead pellets. Previously it was experimentally shown that the growth of grasses and degradation of litter are retarded. Measurements of acute toxicity of the contaminated soil or soil extracts gave conflicting results, as enchytraeid worms used as toxicity reporters were strongly affected, while reporter bacteria showed no or very minor decreases in viability. Measurements using sensitive inducible luminescent reporter bacteria suggested that the bioavailability of lead in the soil is indeed low, and this notion was supported by the very low water extractability of the lead. Nevertheless, the frequency of lead-resistant cultivable bacteria was elevated based on the isolation of cultivable strains. The bacterial and fungal diversity in heavily lead contaminated shooting sectors were compared with those of pristine sections of the shooting range area. The bacterial 16S rRNA gene and fungal ITS rRNA gene were amplified, cloned and sequenced using total DNA extracted from the soil humus layer as the template. Altogether, 917 sequenced bacterial clones and 649 sequenced fungal clones revealed a high soil microbial diversity. No effect of lead contamination was found on bacterial richness or diversity, while fungal richness and diversity significantly differed between lead contaminated and clean control areas. However, even in the case of fungi, genera that were deemed sensitive were not totally absent from the contaminated area: only their relative frequency was significantly reduced. Some operational taxonomic units (OTUs) assigned to Basidiomycota were clearly affected, and were much rarer in the lead contaminated areas. The studies of this thesis surveyed EcM sporocarps, analyzed morphotyped EcM root tips by direct sequencing, and 454-pyrosequenced fungal communities in in-growth bags. A total of 32 EcM fungi that formed conspicuous sporocarps, 27 EcM fungal OTUs from 294 root tips, and 116 EcM fungal OTUs from a total of 8 194 ITS2 454 sequences were recorded. The ordination analyses by non-parametric multidimensional scaling (NMS) indicated that Pb enrichment induced a shift in the EcM community composition. This was visible as indicative trends in the sporocarp and root tip datasets, but explicitly clear in the communities observed in the in-growth bags. The compositional shift in the EcM community was mainly attributable to an increase in the frequencies of OTUs assigned to the genus Thelephora, and to a decrease in the OTUs assigned to Pseudotomentella, Suillus and Tylospora in Pb-contaminated areas when compared to the control. The enrichment of Thelephora in contaminated areas was also observed when examining the total fungal communities in soil using DNA cloning and sequencing technology. While the compositional shifts are clear, their functional consequences for the dominant trees or soil ecosystem remain undetermined. The results indicate that at the Hälvälä shooting range, lead influences the fungal communities but not the bacterial communities. The forest ecosystem shows apparent functional redundancy, since no significant effects were seen on forest trees. Recently, by means of 454 pyrosequencing , the amount of sequences in a single analysis run can be up to one million. It has been applied in microbial ecology studies to characterize microbial communities. The handling of sequence data with traditional programs is becoming difficult and exceedingly time consuming, and novel tools are needed to handle the vast amounts of data being generated. The field of microbial ecology has recently benefited from the availability of a number of tools for describing and comparing microbial communities using robust statistical methods. However, although these programs provide methods for rapid calculation, it has become necessary to make them more amenable to larger datasets and numbers of samples from pyrosequencing. As part of this thesis, a new program was developed, MuSSA (Multi-Sample Sequence Analyser), to handle sequence data from novel high-throughput sequencing approaches in microbial community analyses. The greatest advantage of the program is that large volumes of sequence data can be manipulated, and general OTU series with a frequency value can be calculated among a large number of samples.
Resumo:
Microbes in natural and artificial environments as well as in the human body are a key part of the functional properties of these complex systems. The presence or absence of certain microbial taxa is a correlate of functional status like risk of disease or course of metabolic processes of a microbial community. As microbes are highly diverse and mostly notcultivable, molecular markers like gene sequences are a potential basis for detection and identification of key types. The goal of this thesis was to study molecular methods for identification of microbial DNA in order to develop a tool for analysis of environmental and clinical DNA samples. Particular emphasis was placed on specificity of detection which is a major challenge when analyzing complex microbial communities. The approach taken in this study was the application and optimization of enzymatic ligation of DNA probes coupled with microarray read-out for high-throughput microbial profiling. The results show that fungal phylotypes and human papillomavirus genotypes could be accurately identified from pools of PCR amplicons generated from purified sample DNA. Approximately 1 ng/μl of sample DNA was needed for representative PCR amplification as measured by comparisons between clone sequencing and microarray. A minimum of 0,25 amol/μl of PCR amplicons was detectable from amongst 5 ng/μl of background DNA, suggesting that the detection limit of the test comprising of ligation reaction followed by microarray read-out was approximately 0,04%. Detection from sample DNA directly was shown to be feasible with probes forming a circular molecule upon ligation followed by PCR amplification of the probe. In this approach, the minimum detectable relative amount of target genome was found to be 1% of all genomes in the sample as estimated from 454 deep sequencing results. Signal-to-noise of contact printed microarrays could be improved by using an internal microarray hybridization control oligonucleotide probe together with a computational algorithm. The algorithm was based on identification of a bias in the microarray data and correction of the bias as shown by simulated and real data. The results further suggest semiquantitative detection to be possible by ligation detection, allowing estimation of target abundance in a sample. However, in practise, comprehensive sequence information of full length rRNA genes is needed to support probe design with complex samples. This study shows that DNA microarray has the potential for an accurate microbial diagnostic platform to take advantage of increasing sequence data and to replace traditional, less efficient methods that still dominate routine testing in laboratories. The data suggests that ligation reaction based microarray assay can be optimized to a degree that allows good signal-tonoise and semiquantitative detection.