34 resultados para statistical framework


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Planar curves arise naturally as interfaces between two regions of the plane. An important part of statistical physics is the study of lattice models. This thesis is about the interfaces of 2D lattice models. The scaling limit is an infinite system limit which is taken by letting the lattice mesh decrease to zero. At criticality, the scaling limit of an interface is one of the SLE curves (Schramm-Loewner evolution), introduced by Oded Schramm. This family of random curves is parametrized by a real variable, which determines the universality class of the model. The first and the second paper of this thesis study properties of SLEs. They contain two different methods to study the whole SLE curve, which is, in fact, the most interesting object from the statistical physics point of view. These methods are applied to study two symmetries of SLE: reversibility and duality. The first paper uses an algebraic method and a representation of the Virasoro algebra to find common martingales to different processes, and that way, to confirm the symmetries for polynomial expected values of natural SLE data. In the second paper, a recursion is obtained for the same kind of expected values. The recursion is based on stationarity of the law of the whole SLE curve under a SLE induced flow. The third paper deals with one of the most central questions of the field and provides a framework of estimates for describing 2D scaling limits by SLE curves. In particular, it is shown that a weak estimate on the probability of an annulus crossing implies that a random curve arising from a statistical physics model will have scaling limits and those will be well-described by Loewner evolutions with random driving forces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reuse of existing carefully designed and tested software improves the quality of new software systems and reduces their development costs. Object-oriented frameworks provide an established means for software reuse on the levels of both architectural design and concrete implementation. Unfortunately, due to frame-works complexity that typically results from their flexibility and overall abstract nature, there are severe problems in using frameworks. Patterns are generally accepted as a convenient way of documenting frameworks and their reuse interfaces. In this thesis it is argued, however, that mere static documentation is not enough to solve the problems related to framework usage. Instead, proper interactive assistance tools are needed in order to enable system-atic framework-based software production. This thesis shows how patterns that document a framework s reuse interface can be represented as dependency graphs, and how dynamic lists of programming tasks can be generated from those graphs to assist the process of using a framework to build an application. This approach to framework specialization combines the ideas of framework cookbooks and task-oriented user interfaces. Tasks provide assistance in (1) cre-ating new code that complies with the framework reuse interface specification, (2) assuring the consistency between existing code and the specification, and (3) adjusting existing code to meet the terms of the specification. Besides illustrating how task-orientation can be applied in the context of using frameworks, this thesis describes a systematic methodology for modeling any framework reuse interface in terms of software patterns based on dependency graphs. The methodology shows how framework-specific reuse interface specifi-cations can be derived from a library of existing reusable pattern hierarchies. Since the methodology focuses on reusing patterns, it also alleviates the recog-nized problem of framework reuse interface specification becoming complicated and unmanageable for frameworks of realistic size. The ideas and methods proposed in this thesis have been tested through imple-menting a framework specialization tool called JavaFrames. JavaFrames uses role-based patterns that specify a reuse interface of a framework to guide frame-work specialization in a task-oriented manner. This thesis reports the results of cases studies in which JavaFrames and the hierarchical framework reuse inter-face modeling methodology were applied to the Struts web application frame-work and the JHotDraw drawing editor framework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Wireless technologies are continuously evolving. Second generation cellular networks have gained worldwide acceptance. Wireless LANs are commonly deployed in corporations or university campuses, and their diffusion in public hotspots is growing. Third generation cellular systems are yet to affirm everywhere; still, there is an impressive amount of research ongoing for deploying beyond 3G systems. These new wireless technologies combine the characteristics of WLAN based and cellular networks to provide increased bandwidth. The common direction where all the efforts in wireless technologies are headed is towards an IP-based communication. Telephony services have been the killer application for cellular systems; their evolution to packet-switched networks is a natural path. Effective IP telephony signaling protocols, such as the Session Initiation Protocol (SIP) and the H 323 protocol are needed to establish IP-based telephony sessions. However, IP telephony is just one service example of IP-based communication. IP-based multimedia sessions are expected to become popular and offer a wider range of communication capabilities than pure telephony. In order to conjoin the advances of the future wireless technologies with the potential of IP-based multimedia communication, the next step would be to obtain ubiquitous communication capabilities. According to this vision, people must be able to communicate also when no support from an infrastructured network is available, needed or desired. In order to achieve ubiquitous communication, end devices must integrate all the capabilities necessary for IP-based distributed and decentralized communication. Such capabilities are currently missing. For example, it is not possible to utilize native IP telephony signaling protocols in a totally decentralized way. This dissertation presents a solution for deploying the SIP protocol in a decentralized fashion without support of infrastructure servers. The proposed solution is mainly designed to fit the needs of decentralized mobile environments, and can be applied to small scale ad-hoc networks or also bigger networks with hundreds of nodes. A framework allowing discovery of SIP users in ad-hoc networks and the establishment of SIP sessions among them, in a fully distributed and secure way, is described and evaluated. Security support allows ad-hoc users to authenticate the sender of a message, and to verify the integrity of a received message. The distributed session management framework has been extended in order to achieve interoperability with the Internet, and the native Internet applications. With limited extensions to the SIP protocol, we have designed and experimentally validated a SIP gateway allowing SIP signaling between ad-hoc networks with private addressing space and native SIP applications in the Internet. The design is completed by an application level relay that permits instant messaging sessions to be established in heterogeneous environments. The resulting framework constitutes a flexible and effective approach for the pervasive deployment of real time applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The tackling of coastal eutrophication requires water protection measures based on status assessments of water quality. The main purpose of this thesis was to evaluate whether it is possible both scientifically and within the terms of the European Union Water Framework Directive (WFD) to assess the status of coastal marine waters reliably by using phytoplankton biomass (ww) and chlorophyll a (Chl) as indicators of eutrophication in Finnish coastal waters. Empirical approaches were used to study whether the criteria, established for determining an indicator, are fulfilled. The first criterion (i) was that an indicator should respond to anthropogenic stresses in a predictable manner and has low variability in its response. Summertime Chl could be predicted accurately by nutrient concentrations, but not from the external annual loads alone, because of the rapid affect of primary production and sedimentation close to the loading sources in summer. The most accurate predictions were achieved in the Archipelago Sea, where total phosphorus (TP) and total nitrogen (TN) alone accounted for 87% and 78% of the variation in Chl, respectively. In river estuaries, the TP mass-balance regression model predicted Chl most accurately when nutrients originated from point-sources, whereas land-use regression models were most accurate in cases when nutrients originated mainly from diffuse sources. The inclusion of morphometry (e.g. mean depth) into nutrient models improved accuracy of the predictions. The second criterion (ii) was associated with the WFD. It requires that an indicator should have type-specific reference conditions, which are defined as "conditions where the values of the biological quality elements are at high ecological status". In establishing reference conditions, the empirical approach could only be used in the outer coastal water types, where historical observations of Secchi depth of the early 1900s are available. The most accurate prediction was achieved in the Quark. In the inner coastal water types, reference Chl, estimated from present monitoring data, are imprecise - not only because of the less accurate estimation method but also because the intrinsic characteristics, described for instance by morphometry, vary considerably inside these extensive inner coastal types. As for phytoplankton biomass, the reference values were less accurate than in the case of Chl, because it was possible to estimate reference conditions for biomass only by using the reconstructed Chl values, not the historical Secchi observations. An paleoecological approach was also applied to estimate annual average reference conditions for Chl. In Laajalahti, an urban embayment off Helsinki, strongly loaded by municipal waste waters in the 1960s and 1970s, reference conditions prevailed in the mid- and late 1800s. The recovery of the bay from pollution has been delayed as a consequence of benthic release of nutrients. Laajalahti will probably not achieve the good quality objectives of the WFD on time.    The third criterion (iii) was associated with coastal management including the resources it has available. Analyses of Chl are cheap and fast to carry out compared to the analyses of phytoplankton biomass and species composition; the fact which has an effect on number of samples to be taken and thereby on the reliability of assessments. However, analyses on phytoplankton biomass and species composition provide more metrics for ecological classification, the metrics which reveal various aspects of eutrophication contrary to what Chl alone does.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many species inhabit fragmented landscapes, resulting either from anthropogenic or from natural processes. The ecological and evolutionary dynamics of spatially structured populations are affected by a complex interplay between endogenous and exogenous factors. The metapopulation approach, simplifying the landscape to a discrete set of patches of breeding habitat surrounded by unsuitable matrix, has become a widely applied paradigm for the study of species inhabiting highly fragmented landscapes. In this thesis, I focus on the construction of biologically realistic models and their parameterization with empirical data, with the general objective of understanding how the interactions between individuals and their spatially structured environment affect ecological and evolutionary processes in fragmented landscapes. I study two hierarchically structured model systems, which are the Glanville fritillary butterfly in the Åland Islands, and a system of two interacting aphid species in the Tvärminne archipelago, both being located in South-Western Finland. The interesting and challenging feature of both study systems is that the population dynamics occur over multiple spatial scales that are linked by various processes. My main emphasis is in the development of mathematical and statistical methodologies. For the Glanville fritillary case study, I first build a Bayesian framework for the estimation of death rates and capture probabilities from mark-recapture data, with the novelty of accounting for variation among individuals in capture probabilities and survival. I then characterize the dispersal phase of the butterflies by deriving a mathematical approximation of a diffusion-based movement model applied to a network of patches. I use the movement model as a building block to construct an individual-based evolutionary model for the Glanville fritillary butterfly metapopulation. I parameterize the evolutionary model using a pattern-oriented approach, and use it to study how the landscape structure affects the evolution of dispersal. For the aphid case study, I develop a Bayesian model of hierarchical multi-scale metapopulation dynamics, where the observed extinction and colonization rates are decomposed into intrinsic rates operating specifically at each spatial scale. In summary, I show how analytical approaches, hierarchical Bayesian methods and individual-based simulations can be used individually or in combination to tackle complex problems from many different viewpoints. In particular, hierarchical Bayesian methods provide a useful tool for decomposing ecological complexity into more tractable components.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis, the solar wind-magnetosphere-ionosphere coupling is studied observationally, with the main focus on the ionospheric currents in the auroral region. The thesis consists of five research articles and an introductory part that summarises the most important results reached in the articles and places them in a wider context within the field of space physics. Ionospheric measurements are provided by the International Monitor for Auroral Geomagnetic Effects (IMAGE) magnetometer network, by the low-orbit CHAllenging Minisatellite Payload (CHAMP) satellite, by the European Incoherent SCATter (EISCAT) radar, and by the Imager for Magnetopause-to-Aurora Global Exploration (IMAGE) satellite. Magnetospheric observations, on the other hand, are acquired from the four spacecraft of the Cluster mission, and solar wind observations from the Advanced Composition Explorer (ACE) and Wind spacecraft. Within the framework of this study, a new method for determining the ionospheric currents from low-orbit satellite-based magnetic field data is developed. In contrast to previous techniques, all three current density components can be determined on a matching spatial scale, and the validity of the necessary one-dimensionality approximation, and thus, the quality of the results, can be estimated directly from the data. The new method is applied to derive an empirical model for estimating the Hall-to-Pedersen conductance ratio from ground-based magnetic field data, and to investigate the statistical dependence of the large-scale ionospheric currents on solar wind and geomagnetic parameters. Equations describing the amount of field-aligned current in the auroral region, as well as the location of the auroral electrojets, as a function of these parameters are derived. Moreover, the mesoscale (10-1000 km) ionospheric equivalent currents related to two magnetotail plasma sheet phenomena, bursty bulk flows and flux ropes, are studied. Based on the analysis of 22 events, the typical equivalent current pattern related to bursty bulk flows is established. For the flux ropes, on the other hand, only two conjugate events are found. As the equivalent current patterns during these two events are not similar, it is suggested that the ionospheric signatures of a flux rope depend on the orientation and the length of the structure, but analysis of additional events is required to determine the possible ionospheric connection of flux ropes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An efficient and statistically robust solution for the identification of asteroids among numerous sets of astrometry is presented. In particular, numerical methods have been developed for the short-term identification of asteroids at discovery, and for the long-term identification of scarcely observed asteroids over apparitions, a task which has been lacking a robust method until now. The methods are based on the solid foundation of statistical orbital inversion properly taking into account the observational uncertainties, which allows for the detection of practically all correct identifications. Through the use of dimensionality-reduction techniques and efficient data structures, the exact methods have a loglinear, that is, O(nlog(n)), computational complexity, where n is the number of included observation sets. The methods developed are thus suitable for future large-scale surveys which anticipate a substantial increase in the astrometric data rate. Due to the discontinuous nature of asteroid astrometry, separate sets of astrometry must be linked to a common asteroid from the very first discovery detections onwards. The reason for the discontinuity in the observed positions is the rotation of the observer with the Earth as well as the motion of the asteroid and the observer about the Sun. Therefore, the aim of identification is to find a set of orbital elements that reproduce the observed positions with residuals similar to the inevitable observational uncertainty. Unless the astrometric observation sets are linked, the corresponding asteroid is eventually lost as the uncertainty of the predicted positions grows too large to allow successful follow-up. Whereas the presented identification theory and the numerical comparison algorithm are generally applicable, that is, also in fields other than astronomy (e.g., in the identification of space debris), the numerical methods developed for asteroid identification can immediately be applied to all objects on heliocentric orbits with negligible effects due to non-gravitational forces in the time frame of the analysis. The methods developed have been successfully applied to various identification problems. Simulations have shown that the methods developed are able to find virtually all correct linkages despite challenges such as numerous scarce observation sets, astrometric uncertainty, numerous objects confined to a limited region on the celestial sphere, long linking intervals, and substantial parallaxes. Tens of previously unknown main-belt asteroids have been identified with the short-term method in a preliminary study to locate asteroids among numerous unidentified sets of single-night astrometry of moving objects, and scarce astrometry obtained nearly simultaneously with Earth-based and space-based telescopes has been successfully linked despite a substantial parallax. Using the long-term method, thousands of realistic 3-linkages typically spanning several apparitions have so far been found among designated observation sets each spanning less than 48 hours.