40 resultados para Bayesian belief network
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
Elucidating the mechanisms responsible for the patterns of species abundance, diversity, and distribution within and across ecological systems is a fundamental research focus in ecology. Species abundance patterns are shaped in a convoluted way by interplays between inter-/intra-specific interactions, environmental forcing, demographic stochasticity, and dispersal. Comprehensive models and suitable inferential and computational tools for teasing out these different factors are quite limited, even though such tools are critically needed to guide the implementation of management and conservation strategies, the efficacy of which rests on a realistic evaluation of the underlying mechanisms. This is even more so in the prevailing context of concerns over climate change progress and its potential impacts on ecosystems. This thesis utilized the flexible hierarchical Bayesian modelling framework in combination with the computer intensive methods known as Markov chain Monte Carlo, to develop methodologies for identifying and evaluating the factors that control the structure and dynamics of ecological communities. These methodologies were used to analyze data from a range of taxa: macro-moths (Lepidoptera), fish, crustaceans, birds, and rodents. Environmental stochasticity emerged as the most important driver of community dynamics, followed by density dependent regulation; the influence of inter-specific interactions on community-level variances was broadly minor. This thesis contributes to the understanding of the mechanisms underlying the structure and dynamics of ecological communities, by showing directly that environmental fluctuations rather than inter-specific competition dominate the dynamics of several systems. This finding emphasizes the need to better understand how species are affected by the environment and acknowledge species differences in their responses to environmental heterogeneity, if we are to effectively model and predict their dynamics (e.g. for management and conservation purposes). The thesis also proposes a model-based approach to integrating the niche and neutral perspectives on community structure and dynamics, making it possible for the relative importance of each category of factors to be evaluated in light of field data.
Resumo:
Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
Many species inhabit fragmented landscapes, resulting either from anthropogenic or from natural processes. The ecological and evolutionary dynamics of spatially structured populations are affected by a complex interplay between endogenous and exogenous factors. The metapopulation approach, simplifying the landscape to a discrete set of patches of breeding habitat surrounded by unsuitable matrix, has become a widely applied paradigm for the study of species inhabiting highly fragmented landscapes. In this thesis, I focus on the construction of biologically realistic models and their parameterization with empirical data, with the general objective of understanding how the interactions between individuals and their spatially structured environment affect ecological and evolutionary processes in fragmented landscapes. I study two hierarchically structured model systems, which are the Glanville fritillary butterfly in the Åland Islands, and a system of two interacting aphid species in the Tvärminne archipelago, both being located in South-Western Finland. The interesting and challenging feature of both study systems is that the population dynamics occur over multiple spatial scales that are linked by various processes. My main emphasis is in the development of mathematical and statistical methodologies. For the Glanville fritillary case study, I first build a Bayesian framework for the estimation of death rates and capture probabilities from mark-recapture data, with the novelty of accounting for variation among individuals in capture probabilities and survival. I then characterize the dispersal phase of the butterflies by deriving a mathematical approximation of a diffusion-based movement model applied to a network of patches. I use the movement model as a building block to construct an individual-based evolutionary model for the Glanville fritillary butterfly metapopulation. I parameterize the evolutionary model using a pattern-oriented approach, and use it to study how the landscape structure affects the evolution of dispersal. For the aphid case study, I develop a Bayesian model of hierarchical multi-scale metapopulation dynamics, where the observed extinction and colonization rates are decomposed into intrinsic rates operating specifically at each spatial scale. In summary, I show how analytical approaches, hierarchical Bayesian methods and individual-based simulations can be used individually or in combination to tackle complex problems from many different viewpoints. In particular, hierarchical Bayesian methods provide a useful tool for decomposing ecological complexity into more tractable components.
Resumo:
Ongoing habitat loss and fragmentation threaten much of the biodiversity that we know today. As such, conservation efforts are required if we want to protect biodiversity. Conservation budgets are typically tight, making the cost-effective selection of protected areas difficult. Therefore, reserve design methods have been developed to identify sets of sites, that together represent the species of conservation interest in a cost-effective manner. To be able to select reserve networks, data on species distributions is needed. Such data is often incomplete, but species habitat distribution models (SHDMs) can be used to link the occurrence of the species at the surveyed sites to the environmental conditions at these locations (e.g. climatic, vegetation and soil conditions). The probability of the species occurring at unvisited location is next predicted by the model, based on the environmental conditions of those sites. The spatial configuration of reserve networks is important, because habitat loss around reserves can influence the persistence of species inside the network. Since species differ in their requirements for network configuration, the spatial cohesion of networks needs to be species-specific. A way to account for species-specific requirements is to use spatial variables in SHDMs. Spatial SHDMs allow the evaluation of the effect of reserve network configuration on the probability of occurrence of the species inside the network. Even though reserves are important for conservation, they are not the only option available to conservation planners. To enhance or maintain habitat quality, restoration or maintenance measures are sometimes required. As a result, the number of conservation options per site increases. Currently available reserve selection tools do however not offer the ability to handle multiple, alternative options per site. This thesis extends the existing methodology for reserve design, by offering methods to identify cost-effective conservation planning solutions when multiple, alternative conservation options are available per site. Although restoration and maintenance measures are beneficial to certain species, they can be harmful to other species with different requirements. This introduces trade-offs between species when identifying which conservation action is best applied to which site. The thesis describes how the strength of such trade-offs can be identified, which is useful for assessing consequences of conservation decisions regarding species priorities and budget. Furthermore, the results of the thesis indicate that spatial SHDMs can be successfully used to account for species-specific requirements for spatial cohesion - in the reserve selection (single-option) context as well as in the multi-option context. Accounting for the spatial requirements of multiple species and allowing for several conservation options is however complicated, due to trade-offs in species requirements. It is also shown that spatial SHDMs can be successfully used for gaining information on factors that drive a species spatial distribution. Such information is valuable to conservation planning, as better knowledge on species requirements facilitates the design of networks for species persistence. This methods and results described in this thesis aim to improve species probabilities of persistence, by taking better account of species habitat and spatial requirements. Many real-world conservation planning problems are characterised by a variety of conservation options related to protection, restoration and maintenance of habitat. Planning tools therefore need to be able to incorporate multiple conservation options per site, in order to continue the search for cost-effective conservation planning solutions. Simultaneously, the spatial requirements of species need to be considered. The methods described in this thesis offer a starting point for combining these two relevant aspects of conservation planning.
Resumo:
Tooth development is regulated by sequential and reciprocal interactions between epithelium and mesenchyme. The molecular mechanisms underlying this regulation are conserved and most of the participating molecules belong to several signalling families. Research focusing on mouse teeth has uncovered many aspects of tooth development, including molecular and evolutionary specifi cs, and in addition offered a valuable system to analyse the regulation of epithelial stem cells. In mice the spatial and temporal regulation of cell differentiation and the mechanisms of patterning during development can be analysed both in vivo and in vitro. Follistatin (Fst), a negative regulator of TGFβ superfamily signalling, is an important inhibitor during embryonic development. We showed the necessity of modulation of TGFβ signalling by Fst in three different regulatory steps during tooth development. First we showed that tinkering with the level of TGFβ signalling by Fst may cause variation in the molar cusp patterning and crown morphogenesis. Second, our results indicated that in the continuously growing mouse incisors asymmetric expression of Fst is responsible for the labial-lingual patterning of ameloblast differentiation and enamel formation. Two TGFβ superfamily signals, BMP and Activin, are required for proper ameloblast differentiation and Fst modulates their effects. Third, we identifi ed a complex signalling network regulating the maintenance and proliferation of epithelial stem cells in the incisor, and showed that Fst is an essential modulator of this regulation. FGF3 in cooperation with FGF10 stimulates proliferation of epithelial stem cells and transit amplifying cells in the labial cervical loop. BMP4 represses Fgf3 expression whereas Activin inhibits the repressive effect of BMP4 on the labial side. Thus, Fst inhibits Activin rather than BMP4 in the cervical loop area and limits the proliferation of lingual epithelium, thereby causing the asymmetric maintenance and proliferation of epithelial stem cells. In addition, we detected Lgr5, a Wnt target gene and an epithelial stem cell marker in the intestine, in the putative epithelial stem cells of the incisor, suggesting that Lgr5 is a marker of incisor stem cells but is not regulated by Wnt/β-catenin signalling in the incisor. Thus the epithelial stem cells in the incisor may not be directly regulated by Wnt/β-catenin signalling. In conclusion, we showed in the mouse incisors that modulating the balance between inductive and inhibitory signals constitutes a key mechanism regulating the epithelial stem cells and ameloblast differentiation. Furthermore, we found additional support for the location of the putative epithelial stem cells and for the stemness of these cells. In the mouse molar we showed the necessity of fi ne-tuning the signalling in the regulation of the crown morphogenesis, and that altering the levels of an inhibitor can cause variation in the crown patterning.
Resumo:
Accelerator mass spectrometry (AMS) is an ultrasensitive technique for measuring the concentration of a single isotope. The electric and magnetic fields of an electrostatic accelerator system are used to filter out other isotopes from the ion beam. The high velocity means that molecules can be destroyed and removed from the measurement background. As a result, concentrations down to one atom in 10^16 atoms are measurable. This thesis describes the construction of the new AMS system in the Accelerator Laboratory of the University of Helsinki. The system is described in detail along with the relevant ion optics. System performance and some of the 14C measurements done with the system are described. In a second part of the thesis, a novel statistical model for the analysis of AMS data is presented. Bayesian methods are used in order to make the best use of the available information. In the new model, instrumental drift is modelled with a continuous first-order autoregressive process. This enables rigorous normalization to standards measured at different times. The Poisson statistical nature of a 14C measurement is also taken into account properly, so that uncertainty estimates are much more stable. It is shown that, overall, the new model improves both the accuracy and the precision of AMS measurements. In particular, the results can be improved for samples with very low 14C concentrations or measured only a few times.