18 resultados para probability distributions

em Helda - Digital Repository of University of Helsinki


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Maan törmäyskraaterien ikäjakauman mahdollinen ajallinen jaksollisuus on herättänyt laajaa keskustelua sen jälkeen, kun ilmiö ensimmäistä kertaa raportoitiin joukossa arvostettuja tieteellisiä artikkeleita vuonna 1984. Vaikka nykytiedon valossa on kyseenalaista perustuuko havaittu jaksollisuus todelliseen fysikaaliseen ilmiöön, on kuitenkin mahdollista, että jaksollisuus on todella olemassa ja se voitaisiin havaita laajemmalla ja tarkemmalla törmäyskraateriaineistolla. Tutkimuksessa luotiin simuloidut kraaterien ajalliset tiheys- ja kertymäfunktiot tapauksille, jossa kraaterit syntyvät joko täysin jaksollisella tai satunnaisella prosessilla. Näiden kahden ääritapauksen lisäksi luotiin jakaumat myös kahdelle niiden yhdistelmälle. Nämä mallit mahdollistavat myös erilaisten kraaterien iänmäärityksen epätarkkuuksien huomioonottamisen. Näistä jakaumista luotiin eri pituisia simuloituja kraaterien ikien aikasarjoja. Lopulta simuloiduista aikasarjoista pyrittiin Rayleigh'n menetelmän avulla etsimään jakaumassa ollutta jaksollisuutta. Tutkimuksemme perusteella ajallisen jaksollisuuden havaitseminen kraateriaikasarjoista on lähes mahdotonta mikäli vain yksi kolmasosa kraatereista on jaksollisen ilmiön aiheuttamia, vaikka nykyistä kraateriaineistoa laajempi ja tarkempi aineisto olisi tulevaisuudessa saatavilla. Mikäli kaksi kolmasosaa meteoriittitörmäyksistä on jaksollisia, sen havaitseminen on mahdollista, mutta vaatii huomattavasti tämän hetkistä kattavamman kraateriaineiston. Tutkimuksen perusteella on syytä epäillä, että havaittu kraaterien ajallinen jaksollisuus ei ole todellinen ilmiö.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The stochastic filtering has been in general an estimation of indirectly observed states given observed data. This means that one is discussing conditional expected values as being one of the most accurate estimation, given the observations in the context of probability space. In my thesis, I have presented the theory of filtering using two different kind of observation process: the first one is a diffusion process which is discussed in the first chapter, while the third chapter introduces the latter which is a counting process. The majority of the fundamental results of the stochastic filtering is stated in form of interesting equations, such the unnormalized Zakai equation that leads to the Kushner-Stratonovich equation. The latter one which is known also by the normalized Zakai equation or equally by Fujisaki-Kallianpur-Kunita (FKK) equation, shows the divergence between the estimate using a diffusion process and a counting process. I have also introduced an example for the linear gaussian case, which is mainly the concept to build the so-called Kalman-Bucy filter. As the unnormalized and the normalized Zakai equations are in terms of the conditional distribution, a density of these distributions will be developed through these equations and stated by Kushner Theorem. However, Kushner Theorem has a form of a stochastic partial differential equation that needs to be verify in the sense of the existence and uniqueness of its solution, which is covered in the second chapter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ongoing habitat loss and fragmentation threaten much of the biodiversity that we know today. As such, conservation efforts are required if we want to protect biodiversity. Conservation budgets are typically tight, making the cost-effective selection of protected areas difficult. Therefore, reserve design methods have been developed to identify sets of sites, that together represent the species of conservation interest in a cost-effective manner. To be able to select reserve networks, data on species distributions is needed. Such data is often incomplete, but species habitat distribution models (SHDMs) can be used to link the occurrence of the species at the surveyed sites to the environmental conditions at these locations (e.g. climatic, vegetation and soil conditions). The probability of the species occurring at unvisited location is next predicted by the model, based on the environmental conditions of those sites. The spatial configuration of reserve networks is important, because habitat loss around reserves can influence the persistence of species inside the network. Since species differ in their requirements for network configuration, the spatial cohesion of networks needs to be species-specific. A way to account for species-specific requirements is to use spatial variables in SHDMs. Spatial SHDMs allow the evaluation of the effect of reserve network configuration on the probability of occurrence of the species inside the network. Even though reserves are important for conservation, they are not the only option available to conservation planners. To enhance or maintain habitat quality, restoration or maintenance measures are sometimes required. As a result, the number of conservation options per site increases. Currently available reserve selection tools do however not offer the ability to handle multiple, alternative options per site. This thesis extends the existing methodology for reserve design, by offering methods to identify cost-effective conservation planning solutions when multiple, alternative conservation options are available per site. Although restoration and maintenance measures are beneficial to certain species, they can be harmful to other species with different requirements. This introduces trade-offs between species when identifying which conservation action is best applied to which site. The thesis describes how the strength of such trade-offs can be identified, which is useful for assessing consequences of conservation decisions regarding species priorities and budget. Furthermore, the results of the thesis indicate that spatial SHDMs can be successfully used to account for species-specific requirements for spatial cohesion - in the reserve selection (single-option) context as well as in the multi-option context. Accounting for the spatial requirements of multiple species and allowing for several conservation options is however complicated, due to trade-offs in species requirements. It is also shown that spatial SHDMs can be successfully used for gaining information on factors that drive a species spatial distribution. Such information is valuable to conservation planning, as better knowledge on species requirements facilitates the design of networks for species persistence. This methods and results described in this thesis aim to improve species probabilities of persistence, by taking better account of species habitat and spatial requirements. Many real-world conservation planning problems are characterised by a variety of conservation options related to protection, restoration and maintenance of habitat. Planning tools therefore need to be able to incorporate multiple conservation options per site, in order to continue the search for cost-effective conservation planning solutions. Simultaneously, the spatial requirements of species need to be considered. The methods described in this thesis offer a starting point for combining these two relevant aspects of conservation planning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Climate change will influence the living conditions of all life on Earth. For some species the change in the environmental conditions that has occurred so far has already increased the risk of extinction, and the extinction risk is predicted to increase for large numbers of species in the future. Some species may have time to adapt to the changing environmental conditions, but the rate and magnitude of the change are too great to allow many species to survive via evolutionary changes. Species responses to climate change have been documented for some decades. Some groups of species, like many insects, respond readily to changes in temperature conditions and have shifted their distributions northwards to new climatically suitable regions. Such range shifts have been well documented especially in temperate zones. In this context, butterflies have been studied more than any other group of species, partly for the reason that their past geographical ranges are well documented, which facilitates species-climate modelling and other analyses. The aim of the modelling studies is to examine to what extent shifts in species distributions can be explained by climatic and other factors. Models can also be used to predict the future distributions of species. In this thesis, I have studied the response to climate change of one species of butterfly within one geographically restricted area. The study species, the European map butterfly (Araschnia levana), has expanded rapidly northwards in Finland during the last two decades. I used statistical and dynamic modelling approaches in combination with field studies to analyse the effects of climate warming and landscape structure on the expansion. I studied possible role of molecular variation in phosphoglucose isomerase (PGI), a glycolytic enzyme affecting flight metabolism and thereby flight performance, in the observed expansion of the map butterfly at two separate expansion fronts in Finland. The expansion rate of the map butterfly was shown to be correlated with the frequency of warmer than average summers during the study period. The result is in line with the greater probability of occurrence of the second generation during warm summers and previous results on this species showing greater mobility of the second than first generation individuals. The results of a field study in this thesis indicated low mobility of the first generation butterflies. Climatic variables alone were not sufficient to explain the observed expansion in Finland. There are also problems in transferring the climate model to new regions from the ones from which data were available to construct the model. The climate model predicted a wider distribution in the south-western part of Finland than what has been observed. Dynamic modelling of the expansion in response to landscape structure suggested that habitat and landscape structure influence the rate of expansion. In southern Finland the landscape structure may have slowed down the expansion rate. The results on PGI suggested that allelic variation in this enzyme may influence flight performance and thereby the rate of expansion. Genetic differences of the populations at the two expansion fronts may explain at least partly the observed differences in the rate of expansion. Individuals with the genotype associated with high flight metabolic rate were most frequent in eastern Finland, where the rate of range expansion has been highest.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis studies quantile residuals and uses different methodologies to develop test statistics that are applicable in evaluating linear and nonlinear time series models based on continuous distributions. Models based on mixtures of distributions are of special interest because it turns out that for those models traditional residuals, often referred to as Pearson's residuals, are not appropriate. As such models have become more and more popular in practice, especially with financial time series data there is a need for reliable diagnostic tools that can be used to evaluate them. The aim of the thesis is to show how such diagnostic tools can be obtained and used in model evaluation. The quantile residuals considered here are defined in such a way that, when the model is correctly specified and its parameters are consistently estimated, they are approximately independent with standard normal distribution. All the tests derived in the thesis are pure significance type tests and are theoretically sound in that they properly take the uncertainty caused by parameter estimation into account. -- In Chapter 2 a general framework based on the likelihood function and smooth functions of univariate quantile residuals is derived that can be used to obtain misspecification tests for various purposes. Three easy-to-use tests aimed at detecting non-normality, autocorrelation, and conditional heteroscedasticity in quantile residuals are formulated. It also turns out that these tests can be interpreted as Lagrange Multiplier or score tests so that they are asymptotically optimal against local alternatives. Chapter 3 extends the concept of quantile residuals to multivariate models. The framework of Chapter 2 is generalized and tests aimed at detecting non-normality, serial correlation, and conditional heteroscedasticity in multivariate quantile residuals are derived based on it. Score test interpretations are obtained for the serial correlation and conditional heteroscedasticity tests and in a rather restricted special case for the normality test. In Chapter 4 the tests are constructed using the empirical distribution function of quantile residuals. So-called Khmaladze s martingale transformation is applied in order to eliminate the uncertainty caused by parameter estimation. Various test statistics are considered so that critical bounds for histogram type plots as well as Quantile-Quantile and Probability-Probability type plots of quantile residuals are obtained. Chapters 2, 3, and 4 contain simulations and empirical examples which illustrate the finite sample size and power properties of the derived tests and also how the tests and related graphical tools based on residuals are applied in practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a measurement of the top quark mass with t-tbar dilepton events produced in p-pbar collisions at the Fermilab Tevatron $\sqrt{s}$=1.96 TeV and collected by the CDF II detector. A sample of 328 events with a charged electron or muon and an isolated track, corresponding to an integrated luminosity of 2.9 fb$^{-1}$, are selected as t-tbar candidates. To account for the unconstrained event kinematics, we scan over the phase space of the azimuthal angles ($\phi_{\nu_1},\phi_{\nu_2}$) of neutrinos and reconstruct the top quark mass for each $\phi_{\nu_1},\phi_{\nu_2}$ pair by minimizing a $\chi^2$ function in the t-tbar dilepton hypothesis. We assign $\chi^2$-dependent weights to the solutions in order to build a preferred mass for each event. Preferred mass distributions (templates) are built from simulated t-tbar and background events, and parameterized in order to provide continuous probability density functions. A likelihood fit to the mass distribution in data as a weighted sum of signal and background probability density functions gives a top quark mass of $165.5^{+{3.4}}_{-{3.3}}$(stat.)$\pm 3.1$(syst.) GeV/$c^2$.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a search for standard model Higgs boson production in association with a W boson in proton-antiproton collisions at a center of mass energy of 1.96 TeV. The search employs data collected with the CDF II detector that correspond to an integrated luminosity of approximately 1.9 inverse fb. We select events consistent with a signature of a single charged lepton, missing transverse energy, and two jets. Jets corresponding to bottom quarks are identified with a secondary vertex tagging method, a jet probability tagging method, and a neural network filter. We use kinematic information in an artificial neural network to improve discrimination between signal and background compared to previous analyses. The observed number of events and the neural network output distributions are consistent with the standard model background expectations, and we set 95% confidence level upper limits on the production cross section times branching fraction ranging from 1.2 to 1.1 pb or 7.5 to 102 times the standard model expectation for Higgs boson masses from 110 to $150 GeV/c^2, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report the observation of the bottom, doubly-strange baryon Omega^-_b through the decay chain Omega^-_b -> J/psi Omega^-, where J/psi -> mu^+ mu^-, Omega^- -> Lambda K^-, and Lambda -> p pi^-, using 4.2 fb^{-1} of data from p\bar p collisions at sqrt{s}=1.96 TeV, and recorded with the Collider Detector at Fermilab. A signal is observed whose probability of arising from a background fluctuation is 4.0 * 10^{-8}, or 5.5 Gaussian standard deviations. The Omega^-_b mass is measured to be 6054.4 +/- 6.8 (stat.) +/- 0.9 (syst.) MeV/c^2. The lifetime of the Omega^-_b baryon is measured to be 1.13^{+0.53}_{-0.40}(stat.) +/- 0.02(syst.)$ ps. In addition, for the \Xi^-_b baryon we measure a mass of 5790.9 +/- 2.6(stat.) +/- 0.8(syst.) MeV/c^2 and a lifetime of 1.56^{+0.27}_{-0.25}(stat.) +/-0.02(syst.) ps. Under the assumption that the \Xi_b^- and \Omega_b^- are produced with similar kinematic distributions to the \Lambda^0_b baryon, we find sigma(Xi_b^-) B(Xi_b^- -> J/psi Xi^-)}/ sigma(Lambda^0_b) B(Lambda^0_b -> J/psi Lambda)} = 0.167^{+0.037}_{-0.025}(stat.) +/-0.012(syst.) and sigma(Omega_b^-) B(Omega_b^- -> J/psi Omega^-)/ sigma(Lambda^0_b) B(Lambda^0_b -> J/psi Lambda)} = 0.045^{+0.017}_{-0.012}(stat.) +/- 0.004(syst.) for baryons produced with transverse momentum in the range of 6-20 GeV/c.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Measurements of inclusive charged-hadron transverse-momentum and pseudorapidity distributions are presented for proton-proton collisions at sqrt(s) = 0.9 and 2.36 TeV. The data were collected with the CMS detector during the LHC commissioning in December 2009. For non-single-diffractive interactions, the average charged-hadron transverse momentum is measured to be 0.46 +/- 0.01 (stat.) +/- 0.01 (syst.) GeV/c at 0.9 TeV and 0.50 +/- 0.01 (stat.) +/- 0.01 (syst.) GeV/c at 2.36 TeV, for pseudorapidities between -2.4 and +2.4. At these energies, the measured pseudorapidity densities in the central region, dN(charged)/d(eta) for |eta|

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this paper is to improve option risk monitoring by examining the information content of implied volatility and by introducing the calculation of a single-sum expected risk exposure similar to the Value-at-Risk. The figure is calculated in two steps. First, there is a need to estimate the value of a portfolio of options for a number of different market scenarios, while the second step is to summarize the information content of the estimated scenarios into a single-sum risk measure. This involves the use of probability theory and return distributions, which confronts the user with the problems of non-normality in the return distribution of the underlying asset. Here the hyperbolic distribution is used to describe one alternative for dealing with heavy tails. Results indicate that the information content of implied volatility is useful when predicting future large returns in the underlying asset. Further, the hyperbolic distribution provides a good fit to historical returns enabling a more accurate definition of statistical intervals and extreme events.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we examine the predictability of observed volatility smiles in three major European index options markets, utilising the historical return distributions of the respective underlying assets. The analysis involves an application of the Black (1976) pricing model adjusted in accordance with the Jarrow-Rudd methodology as proposed in 1982. Thereby we adjust the expected future returns for the third and fourth central moments as these represent deviations from normality in the distributions of observed returns. Thus, they are considered one possible explanation to the existence of the smile. The obtained results indicate that the inclusion of the higher moments in the pricing model to some extent reduces the volatility smile, compared with the unadjusted Black-76 model. However, as the smile is partly a function of supply, demand, and liquidity, and as such intricate to model, this modification does not appear sufficient to fully capture the characteristics of the smile.