37 resultados para Bayesian techniques
em Helda - Digital Repository of University of Helsinki
Resumo:
Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
Standards have been placed to regulate the microbial and preservative contents to assure that foods are safe to the consumer. In a case of a food-related disease outbreak, it is crucial to be able to detect and identify quickly and accurately the cause of the disease. In addition, for every day control of food microbial and preservative contents, the detection methods must be easily performed for numerous food samples. In this present study, quicker alternative methods were studied for identification of bacteria by DNA fingerprinting. A flow cytometry method was developed as an alternative to pulsed-field gel electrophoresis, the golden method . DNA fragment sizing by an ultrasensitive flow cytometer was able to discriminate species and strains in a reproducible and comparable manner to pulsed-field gel electrophoresis. This new method was hundreds times faster and 200,000 times more sensitive. Additionally, another DNA fingerprinting identification method was developed based on single-enzyme amplified fragment length polymorphism (SE-AFLP). This method allowed the differentiation of genera, species, and strains of pathogenic bacteria of Bacilli, Staphylococci, Yersinia, and Escherichia coli. These fingerprinting patterns obtained by SE-AFLP were simpler and easier to analyze than those by the traditional amplified fragment length polymorphism by double enzyme digestion. Nisin (E234) is added as a preservative to different types of foods, especially dairy products, around the world. Various detection methods exist for nisin, but they lack in sensitivity, speed or specificity. In this present study, a sensitive nisin-induced green fluorescent protein (GFPuv) bioassay was developed using the Lactococcus lactis two-component signal system NisRK and the nisin-inducible nisA promoter. The bioassay was extremely sensitive with detection limit of 10 pg/ml in culture supernatant. In addition, it was compatible for quantification from various food matrices, such as milk, salad dressings, processed cheese, liquid eggs, and canned tomatoes. Wine has good antimicrobial properties due to its alcohol concentration, low pH, and organic content and therefore often assumed to be microbially safe to consume. Another aim of this thesis was to study the microbiota of wines returned by customers complaining of food-poisoning symptoms. By partial 16S rRNA gene sequence analysis, ribotyping, and boar spermatozoa motility assay, it was identified that one of the wines contained a Bacillus simplex BAC91, which produced a heat-stable substance toxic to the mitochondria of sperm cells. The antibacterial activity of wine was tested on the vegetative cells and spores of B. simplex BAC91, B. cereus type strain ATCC 14579 and cereulide-producing B. cereus F4810/72. Although the vegetative cells and spores of B. simplex BAC91 were sensitive to the antimicrobial effects of wine, the spores of B. cereus strains ATCC 14579 and F4810/72 stayed viable for at least 4 months. According to these results, Bacillus spp., more specifically spores, can be a possible risk to the wine consumer.
Resumo:
Determination of testosterone and related compounds in body fluids is of utmost importance in doping control and the diagnosis of many diseases. Capillary electromigration techniques are a relatively new approach for steroid research. Owing to their electrical neutrality, however, separation of steroids by capillary electromigration techniques requires the use of charged electrolyte additives that interact with the steroids either specifically or non-specifically. The analysis of testosterone and related steroids by non-specific micellar electrokinetic chromatography (MEKC) was investigated in this study. The partial filling (PF) technique was employed, being suitable for detection by both ultraviolet spectrophotometry (UV) and electrospray ionization mass spectrometry (ESI-MS). Efficient, quantitative PF-MEKC UV methods for steroid standards were developed through the use of optimized pseudostationary phases comprising surfactants and cyclodextrins. PF-MEKC UV proved to be a more sensitive, efficient and repeatable method for the steroids than PF-MEKC ESI-MS. It was discovered that in PF-MEKC analyses of electrically neutral steroids, ESI-MS interfacing sets significant limitations not only on the chemistry affecting the ionization and detection processes, but also on the separation. The new PF-MEKC UV method was successfully employed in the determination of testosterone in male urine samples after microscale immunoaffinity solid-phase extraction (IA-SPE). The IA-SPE method, relying on specific interactions between testosterone and a recombinant anti-testosterone Fab fragment, is the first such method described for testosterone. Finally, new data for interactions between steroids and human and bovine serum albumins were obtained through the use of affinity capillary electrophoresis. A new algorithm for the calculation of association constants between proteins and neutral ligands is introduced.
Resumo:
Miniaturized mass spectrometric ionization techniques for environmental analysis and bioanalysis Novel miniaturized mass spectrometric ionization techniques based on atmospheric pressure chemical ionization (APCI) and atmospheric pressure photoionization (APPI) were studied and evaluated in the analysis of environmental samples and biosamples. The three analytical systems investigated here were gas chromatography-microchip atmospheric pressure chemical ionization-mass spectrometry (GC-µAPCI-MS) and gas chromatography-microchip atmospheric pressure photoionization-mass spectrometry (GC-µAPPI-MS), where sample pretreatment and chromatographic separation precede ionization, and desorption atmospheric pressure photoionization-mass spectrometry (DAPPI-MS), where the samples are analyzed either as such or after minimal pretreatment. The gas chromatography-microchip atmospheric pressure ionization-mass spectrometry (GC-µAPI-MS) instrumentations were used in the analysis of polychlorinated biphenyls (PCBs) in negative ion mode and 2-quinolinone-derived selective androgen receptor modulators (SARMs) in positive ion mode. The analytical characteristics (i.e., limits of detection, linear ranges, and repeatabilities) of the methods were evaluated with PCB standards and SARMs in urine. All methods showed good analytical characteristics and potential for quantitative environmental analysis or bioanalysis. Desorption and ionization mechanisms in DAPPI were studied. Desorption was found to be a thermal process, with the efficiency strongly depending on thermal conductivity of the sampling surface. Probably the size and polarity of the analyte also play a role. In positive ion mode, the ionization is dependent on the ionization energy and proton affinity of the analyte and the spray solvent, while in negative ion mode the ionization mechanism is determined by the electron affinity and gas-phase acidity of the analyte and the spray solvent. DAPPI-MS was tested in the fast screening analysis of environmental, food, and forensic samples, and the results demonstrated the feasibility of DAPPI-MS for rapid screening analysis of authentic samples.
Resumo:
Radioactive particles from three locations were investigated for elemental composition, oxidation states of matrix elements, and origin. Instrumental techniques applied to the task were scanning electron microscopy, X-ray and gamma-ray spectrometry, secondary ion mass spectrometry, and synchrotron radiation based microanalytical techniques comprising X-ray fluorescence spectrometry, X-ray fluorescence tomography, and X-ray absorption near-edge structure spectroscopy. Uranium-containing low activity particles collected from Irish Sea sediments were characterized in terms of composition and distribution of matrix elements and the oxidation states of uranium. Indications of the origin were obtained from the intensity ratios and the presence of thorium, uranium, and plutonium. Uranium in the particles was found to exist mostly as U(IV). Studies on plutonium particles from Runit Island (Marshall Islands) soil indicated that the samples were weapon fuel fragments originating from two separate detonations: a safety test and a low-yield test. The plutonium in the particles was found to be of similar age. The distribution and oxidation states of uranium and plutonium in the matrix of weapon fuel particles from Thule (Greenland) sediments were investigated. The variations in intensity ratios observed with different techniques indicated more than one origin. Uranium in particle matrixes was mostly U(IV), but plutonium existed in some particles mainly as Pu(IV), and in others mainly as oxidized Pu(VI). The results demonstrated that the various techniques were effectively applied in the characterization of environmental radioactive particles. An on-line method was developed for separating americium from environmental samples. The procedure utilizes extraction chromatography to separate americium from light lanthanides, and cation exchange to concentrate americium before the final separation in an ion chromatography column. The separated radiochemically pure americium fraction is measured by alpha spectrometry. The method was tested with certified sediment and soil samples and found to be applicable for the analysis of environmental samples containing a wide range of Am-241 activity. Proceeding from the on-line method developed for americium, a method was also developed for separating plutonium and americium. Plutonium is reduced to Pu(III), and separated together with Am(III) throughout the procedure. Pu(III) and Am(III) are eluted from the ion chromatography column as anionic dipicolinate and oxalate complexes, respectively, and measured by alpha spectrometry.
Application of Modern NMR Spectroscopic Techniques to Structural Studies of Wood and Pulp Components
Development of Sample Pretreatment and Liquid Chromatographic Techniques for Antioxidative Compounds
Resumo:
In this study, novel methodologies for the determination of antioxidative compounds in herbs and beverages were developed. Antioxidants are compounds that can reduce, delay or inhibit oxidative events. They are a part of the human defense system and are obtained through the diet. Antioxidants are naturally present in several types of foods, e.g. in fruits, beverages, vegetables and herbs. Antioxidants can also be added to foods during manufacturing to suppress lipid oxidation and formation of free radicals under conditions of cooking or storage and to reduce the concentration of free radicals in vivo after food ingestion. There is growing interest in natural antioxidants, and effective compounds have already been identified from antioxidant classes such as carotenoids, essential oils, flavonoids and phenolic acids. The wide variety of sample matrices and analytes presents quite a challenge for the development of analytical techniques. Growing demands have been placed on sample pretreatment. In this study, three novel extraction techniques, namely supercritical fluid extraction (SFE), pressurised hot water extraction (PHWE) and dynamic sonication-assisted extraction (DSAE) were studied. SFE was used for the extraction of lycopene from tomato skins and PHWE was used in the extraction of phenolic compounds from sage. DSAE was applied to the extraction of phenolic acids from Lamiaceae herbs. In the development of extraction methodologies, the main parameters of the extraction were studied and the recoveries were compared to those achieved by conventional extraction techniques. In addition, the stability of lycopene was also followed under different storage conditions. For the separation of the antioxidative compounds in the extracts, liquid chromatographic methods (LC) were utilised. Two novel LC techniques, namely ultra performance liquid chromatography (UPLC) and comprehensive two-dimensional liquid chromatography (LCxLC) were studied and compared with conventional high performance liquid chromatography (HPLC) for the separation of antioxidants in beverages and Lamiaceae herbs. In LCxLC, the selection of LC mode, column dimensions and flow rates were studied and optimised to obtain efficient separation of the target compounds. In addition, the separation powers of HPLC, UPLC, HPLCxHPLC and HPLCxUPLC were compared. To exploit the benefits of an integrated system, in which sample preparation and final separation are performed in a closed unit, dynamic sonication-assisted extraction was coupled on-line to a liquid chromatograph via a solid-phase trap. The increased sensitivity was utilised in the extraction of phenolic acids from Lamiaceae herbs. The results were compared to those of achieved by the LCxLC system.
Resumo:
In this thesis the use of the Bayesian approach to statistical inference in fisheries stock assessment is studied. The work was conducted in collaboration of the Finnish Game and Fisheries Research Institute by using the problem of monitoring and prediction of the juvenile salmon population in the River Tornionjoki as an example application. The River Tornionjoki is the largest salmon river flowing into the Baltic Sea. This thesis tackles the issues of model formulation and model checking as well as computational problems related to Bayesian modelling in the context of fisheries stock assessment. Each article of the thesis provides a novel method either for extracting information from data obtained via a particular type of sampling system or for integrating the information about the fish stock from multiple sources in terms of a population dynamics model. Mark-recapture and removal sampling schemes and a random catch sampling method are covered for the estimation of the population size. In addition, a method for estimating the stock composition of a salmon catch based on DNA samples is also presented. For most of the articles, Markov chain Monte Carlo (MCMC) simulation has been used as a tool to approximate the posterior distribution. Problems arising from the sampling method are also briefly discussed and potential solutions for these problems are proposed. Special emphasis in the discussion is given to the philosophical foundation of the Bayesian approach in the context of fisheries stock assessment. It is argued that the role of subjective prior knowledge needed in practically all parts of a Bayesian model should be recognized and consequently fully utilised in the process of model formulation.
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.