33 resultados para incorporate probabilistic techniques
em Helda - Digital Repository of University of Helsinki
Resumo:
What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
Standards have been placed to regulate the microbial and preservative contents to assure that foods are safe to the consumer. In a case of a food-related disease outbreak, it is crucial to be able to detect and identify quickly and accurately the cause of the disease. In addition, for every day control of food microbial and preservative contents, the detection methods must be easily performed for numerous food samples. In this present study, quicker alternative methods were studied for identification of bacteria by DNA fingerprinting. A flow cytometry method was developed as an alternative to pulsed-field gel electrophoresis, the golden method . DNA fragment sizing by an ultrasensitive flow cytometer was able to discriminate species and strains in a reproducible and comparable manner to pulsed-field gel electrophoresis. This new method was hundreds times faster and 200,000 times more sensitive. Additionally, another DNA fingerprinting identification method was developed based on single-enzyme amplified fragment length polymorphism (SE-AFLP). This method allowed the differentiation of genera, species, and strains of pathogenic bacteria of Bacilli, Staphylococci, Yersinia, and Escherichia coli. These fingerprinting patterns obtained by SE-AFLP were simpler and easier to analyze than those by the traditional amplified fragment length polymorphism by double enzyme digestion. Nisin (E234) is added as a preservative to different types of foods, especially dairy products, around the world. Various detection methods exist for nisin, but they lack in sensitivity, speed or specificity. In this present study, a sensitive nisin-induced green fluorescent protein (GFPuv) bioassay was developed using the Lactococcus lactis two-component signal system NisRK and the nisin-inducible nisA promoter. The bioassay was extremely sensitive with detection limit of 10 pg/ml in culture supernatant. In addition, it was compatible for quantification from various food matrices, such as milk, salad dressings, processed cheese, liquid eggs, and canned tomatoes. Wine has good antimicrobial properties due to its alcohol concentration, low pH, and organic content and therefore often assumed to be microbially safe to consume. Another aim of this thesis was to study the microbiota of wines returned by customers complaining of food-poisoning symptoms. By partial 16S rRNA gene sequence analysis, ribotyping, and boar spermatozoa motility assay, it was identified that one of the wines contained a Bacillus simplex BAC91, which produced a heat-stable substance toxic to the mitochondria of sperm cells. The antibacterial activity of wine was tested on the vegetative cells and spores of B. simplex BAC91, B. cereus type strain ATCC 14579 and cereulide-producing B. cereus F4810/72. Although the vegetative cells and spores of B. simplex BAC91 were sensitive to the antimicrobial effects of wine, the spores of B. cereus strains ATCC 14579 and F4810/72 stayed viable for at least 4 months. According to these results, Bacillus spp., more specifically spores, can be a possible risk to the wine consumer.
Resumo:
Determination of testosterone and related compounds in body fluids is of utmost importance in doping control and the diagnosis of many diseases. Capillary electromigration techniques are a relatively new approach for steroid research. Owing to their electrical neutrality, however, separation of steroids by capillary electromigration techniques requires the use of charged electrolyte additives that interact with the steroids either specifically or non-specifically. The analysis of testosterone and related steroids by non-specific micellar electrokinetic chromatography (MEKC) was investigated in this study. The partial filling (PF) technique was employed, being suitable for detection by both ultraviolet spectrophotometry (UV) and electrospray ionization mass spectrometry (ESI-MS). Efficient, quantitative PF-MEKC UV methods for steroid standards were developed through the use of optimized pseudostationary phases comprising surfactants and cyclodextrins. PF-MEKC UV proved to be a more sensitive, efficient and repeatable method for the steroids than PF-MEKC ESI-MS. It was discovered that in PF-MEKC analyses of electrically neutral steroids, ESI-MS interfacing sets significant limitations not only on the chemistry affecting the ionization and detection processes, but also on the separation. The new PF-MEKC UV method was successfully employed in the determination of testosterone in male urine samples after microscale immunoaffinity solid-phase extraction (IA-SPE). The IA-SPE method, relying on specific interactions between testosterone and a recombinant anti-testosterone Fab fragment, is the first such method described for testosterone. Finally, new data for interactions between steroids and human and bovine serum albumins were obtained through the use of affinity capillary electrophoresis. A new algorithm for the calculation of association constants between proteins and neutral ligands is introduced.
Resumo:
Miniaturized mass spectrometric ionization techniques for environmental analysis and bioanalysis Novel miniaturized mass spectrometric ionization techniques based on atmospheric pressure chemical ionization (APCI) and atmospheric pressure photoionization (APPI) were studied and evaluated in the analysis of environmental samples and biosamples. The three analytical systems investigated here were gas chromatography-microchip atmospheric pressure chemical ionization-mass spectrometry (GC-µAPCI-MS) and gas chromatography-microchip atmospheric pressure photoionization-mass spectrometry (GC-µAPPI-MS), where sample pretreatment and chromatographic separation precede ionization, and desorption atmospheric pressure photoionization-mass spectrometry (DAPPI-MS), where the samples are analyzed either as such or after minimal pretreatment. The gas chromatography-microchip atmospheric pressure ionization-mass spectrometry (GC-µAPI-MS) instrumentations were used in the analysis of polychlorinated biphenyls (PCBs) in negative ion mode and 2-quinolinone-derived selective androgen receptor modulators (SARMs) in positive ion mode. The analytical characteristics (i.e., limits of detection, linear ranges, and repeatabilities) of the methods were evaluated with PCB standards and SARMs in urine. All methods showed good analytical characteristics and potential for quantitative environmental analysis or bioanalysis. Desorption and ionization mechanisms in DAPPI were studied. Desorption was found to be a thermal process, with the efficiency strongly depending on thermal conductivity of the sampling surface. Probably the size and polarity of the analyte also play a role. In positive ion mode, the ionization is dependent on the ionization energy and proton affinity of the analyte and the spray solvent, while in negative ion mode the ionization mechanism is determined by the electron affinity and gas-phase acidity of the analyte and the spray solvent. DAPPI-MS was tested in the fast screening analysis of environmental, food, and forensic samples, and the results demonstrated the feasibility of DAPPI-MS for rapid screening analysis of authentic samples.
Resumo:
Radioactive particles from three locations were investigated for elemental composition, oxidation states of matrix elements, and origin. Instrumental techniques applied to the task were scanning electron microscopy, X-ray and gamma-ray spectrometry, secondary ion mass spectrometry, and synchrotron radiation based microanalytical techniques comprising X-ray fluorescence spectrometry, X-ray fluorescence tomography, and X-ray absorption near-edge structure spectroscopy. Uranium-containing low activity particles collected from Irish Sea sediments were characterized in terms of composition and distribution of matrix elements and the oxidation states of uranium. Indications of the origin were obtained from the intensity ratios and the presence of thorium, uranium, and plutonium. Uranium in the particles was found to exist mostly as U(IV). Studies on plutonium particles from Runit Island (Marshall Islands) soil indicated that the samples were weapon fuel fragments originating from two separate detonations: a safety test and a low-yield test. The plutonium in the particles was found to be of similar age. The distribution and oxidation states of uranium and plutonium in the matrix of weapon fuel particles from Thule (Greenland) sediments were investigated. The variations in intensity ratios observed with different techniques indicated more than one origin. Uranium in particle matrixes was mostly U(IV), but plutonium existed in some particles mainly as Pu(IV), and in others mainly as oxidized Pu(VI). The results demonstrated that the various techniques were effectively applied in the characterization of environmental radioactive particles. An on-line method was developed for separating americium from environmental samples. The procedure utilizes extraction chromatography to separate americium from light lanthanides, and cation exchange to concentrate americium before the final separation in an ion chromatography column. The separated radiochemically pure americium fraction is measured by alpha spectrometry. The method was tested with certified sediment and soil samples and found to be applicable for the analysis of environmental samples containing a wide range of Am-241 activity. Proceeding from the on-line method developed for americium, a method was also developed for separating plutonium and americium. Plutonium is reduced to Pu(III), and separated together with Am(III) throughout the procedure. Pu(III) and Am(III) are eluted from the ion chromatography column as anionic dipicolinate and oxalate complexes, respectively, and measured by alpha spectrometry.
Application of Modern NMR Spectroscopic Techniques to Structural Studies of Wood and Pulp Components
Development of Sample Pretreatment and Liquid Chromatographic Techniques for Antioxidative Compounds
Resumo:
In this study, novel methodologies for the determination of antioxidative compounds in herbs and beverages were developed. Antioxidants are compounds that can reduce, delay or inhibit oxidative events. They are a part of the human defense system and are obtained through the diet. Antioxidants are naturally present in several types of foods, e.g. in fruits, beverages, vegetables and herbs. Antioxidants can also be added to foods during manufacturing to suppress lipid oxidation and formation of free radicals under conditions of cooking or storage and to reduce the concentration of free radicals in vivo after food ingestion. There is growing interest in natural antioxidants, and effective compounds have already been identified from antioxidant classes such as carotenoids, essential oils, flavonoids and phenolic acids. The wide variety of sample matrices and analytes presents quite a challenge for the development of analytical techniques. Growing demands have been placed on sample pretreatment. In this study, three novel extraction techniques, namely supercritical fluid extraction (SFE), pressurised hot water extraction (PHWE) and dynamic sonication-assisted extraction (DSAE) were studied. SFE was used for the extraction of lycopene from tomato skins and PHWE was used in the extraction of phenolic compounds from sage. DSAE was applied to the extraction of phenolic acids from Lamiaceae herbs. In the development of extraction methodologies, the main parameters of the extraction were studied and the recoveries were compared to those achieved by conventional extraction techniques. In addition, the stability of lycopene was also followed under different storage conditions. For the separation of the antioxidative compounds in the extracts, liquid chromatographic methods (LC) were utilised. Two novel LC techniques, namely ultra performance liquid chromatography (UPLC) and comprehensive two-dimensional liquid chromatography (LCxLC) were studied and compared with conventional high performance liquid chromatography (HPLC) for the separation of antioxidants in beverages and Lamiaceae herbs. In LCxLC, the selection of LC mode, column dimensions and flow rates were studied and optimised to obtain efficient separation of the target compounds. In addition, the separation powers of HPLC, UPLC, HPLCxHPLC and HPLCxUPLC were compared. To exploit the benefits of an integrated system, in which sample preparation and final separation are performed in a closed unit, dynamic sonication-assisted extraction was coupled on-line to a liquid chromatograph via a solid-phase trap. The increased sensitivity was utilised in the extraction of phenolic acids from Lamiaceae herbs. The results were compared to those of achieved by the LCxLC system.
Resumo:
Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.