198 resultados para likelihood to publication
Resumo:
Olkiluoto Island is situated in the northern Baltic Sea, near the southwestern coast of Finland, and is the proposed location of a spent nuclear fuel repository. This study examined Holocene palaeoseismicity in the Olkiluoto area and in the surrounding sea areas by computer simulations together with acoustic-seismic, sedimentological and dating methods. The most abundant rock type on the island is migmatic mica gneiss, intruded by tonalites, granodiorites and granites. The surrounding Baltic Sea seabed consists of Palaeoproterozoic crystalline bedrock, which is to a great extent covered by younger Mesoproterozoic sedimentary rocks. The area contains several ancient deep-seated fracture zones that divide it into bedrock blocks. The response of bedrock at the Olkiluoto site was modelled considering four future ice-age scenarios. Each scenario produced shear displacements of fractures with different times of occurrence and varying recovery rates. Generally, the larger the maximum ice load, the larger were the permanent shear displacements. For a basic case, the maximum shear displacements were a few centimetres at the proposed nuclear waste repository level, at proximately 500 m b.s.l. High-resolution, low-frequency echo-sounding was used to examine the Holocene submarine sedimentary structures and possible direct and indirect indicators of palaeoseismic activity in the northern Baltic Sea. Echo-sounding profiles of Holocene submarine sediments revealed slides and slumps, normal faults, debris flows and turbidite-type structures. The profiles also showed pockmarks and other structures related to gas or groundwater seepages, which might be related to fracture zone activation. Evidence of postglacial reactivation in the study area was derived from the spatial occurrence of some of the structures, especial the faults and the seepages, in the vicinity of some old bedrock fracture zones. Palaeoseismic event(s) (a single or several events) in the Olkiluoto area were dated and the palaeoenvironment was characterized using palaeomagnetic, biostratigraphical and lithostratigraphical methods, enhancing the reliability of the chronology. Combined lithostratigraphy, biostratigraphy and palaeomagnetic stratigraphy revealed an age estimation of 10 650 to 10 200 cal. years BP for the palaeoseismic event(s). All Holocene sediment faults in the northern Baltic Sea occur at the same stratigraphical level, the age of which is estimated at 10 700 cal. years BP (9500 radiocarbon years BP). Their movement is suggested to have been triggered by palaeoseismic event(s) when the Late Weichselian ice sheet was retreating from the site and bedrock stresses were released along the bedrock fracture zones. Since no younger or repeated traces of seismic events were found, it corroborates the suggestion that the major seismic activity occurred within a short time during and after the last deglaciation. The origin of the gas/groundwater seepages remains unclear. Their reflections in the echo-sounding profiles imply that part of the gas is derived from the organic-bearing Litorina and modern gyttja clays. However, at least some of the gas is derived from the bedrock. Additional information could be gained by pore water analysis from the pockmarks. Information on postglacial fault activation and possible gas and/or fluid discharges under high hydraulic heads has relevance in evaluating the safety assessment of a planned spent nuclear fuel repository in the region.
Resumo:
Nisäkkäiden levinneisyyteen, niiden morfologisiin ja ekologisiin piirteisiin vaikuttavat ympäristön sekä lyhyet että pitkäkestoiset muutokset, etenkin ilmaston ja kasvillisuuden vaihtelut. Työssä tutkittiin nisäkkäiden sopeutumista ilmastonmuutoksiin Euraasiassa viimeisen 24 miljoonan vuoden aikana. Tutkimuksessa keskityttiin varsinkin viimeiseen kahteen miljoonaan vuoteen, jonka aikana ilmasto muuttui voimakkaasti ja ihmisen toiminta alkoi tulla merkittäväksi. Tämän takia on usein vaikea erottaa, kummasta em. seikasta jonkin nisäkäslajin sukupuutto tai häviäminen alueelta johtui. Aineistona käytettiin laajaa venäjänkielistä kirjallisuutta, josta löytyvät tiedot ovat kääntämättöminä jääneet aiemmin länsimaisen tutkimuksen ulkopuolelle. Työssä käytettiin myös NOW-tietokantaa, jossa on fossiilisten nisäkkäiden löytöpaikat sekä niiden iät.
Resumo:
The Taita Hills in southeastern Kenya form the northernmost part of Africa’s Eastern Arc Mountains, which have been identified by Conservation International as one of the top ten biodiversity hotspots on Earth. As with many areas of the developing world, over recent decades the Taita Hills have experienced significant population growth leading to associated major changes in land use and land cover (LULC), as well as escalating land degradation, particularly soil erosion. Multi-temporal medium resolution multispectral optical satellite data, such as imagery from the SPOT HRV, HRVIR, and HRG sensors, provides a valuable source of information for environmental monitoring and modelling at a landscape level at local and regional scales. However, utilization of multi-temporal SPOT data in quantitative remote sensing studies requires the removal of atmospheric effects and the derivation of surface reflectance factor. Furthermore, for areas of rugged terrain, such as the Taita Hills, topographic correction is necessary to derive comparable reflectance throughout a SPOT scene. Reliable monitoring of LULC change over time and modelling of land degradation and human population distribution and abundance are of crucial importance to sustainable development, natural resource management, biodiversity conservation, and understanding and mitigating climate change and its impacts. The main purpose of this thesis was to develop and validate enhanced processing of SPOT satellite imagery for use in environmental monitoring and modelling at a landscape level, in regions of the developing world with limited ancillary data availability. The Taita Hills formed the application study site, whilst the Helsinki metropolitan region was used as a control site for validation and assessment of the applied atmospheric correction techniques, where multiangular reflectance field measurements were taken and where horizontal visibility meteorological data concurrent with image acquisition were available. The proposed historical empirical line method (HELM) for absolute atmospheric correction was found to be the only applied technique that could derive surface reflectance factor within an RMSE of < 0.02 ps in the SPOT visible and near-infrared bands; an accuracy level identified as a benchmark for successful atmospheric correction. A multi-scale segmentation/object relationship modelling (MSS/ORM) approach was applied to map LULC in the Taita Hills from the multi-temporal SPOT imagery. This object-based procedure was shown to derive significant improvements over a uni-scale maximum-likelihood technique. The derived LULC data was used in combination with low cost GIS geospatial layers describing elevation, rainfall and soil type, to model degradation in the Taita Hills in the form of potential soil loss, utilizing the simple universal soil loss equation (USLE). Furthermore, human population distribution and abundance were modelled with satisfactory results using only SPOT and GIS derived data and non-Gaussian predictive modelling techniques. The SPOT derived LULC data was found to be unnecessary as a predictor because the first and second order image texture measurements had greater power to explain variation in dwelling unit occurrence and abundance. The ability of the procedures to be implemented locally in the developing world using low-cost or freely available data and software was considered. The techniques discussed in this thesis are considered equally applicable to other medium- and high-resolution optical satellite imagery, as well the utilized SPOT data.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
The focus of this study is on statistical analysis of categorical responses, where the response values are dependent of each other. The most typical example of this kind of dependence is when repeated responses have been obtained from the same study unit. For example, in Paper I, the response of interest is the pneumococcal nasopharengyal carriage (yes/no) on 329 children. For each child, the carriage is measured nine times during the first 18 months of life, and thus repeated respones on each child cannot be assumed independent of each other. In the case of the above example, the interest typically lies in the carriage prevalence, and whether different risk factors affect the prevalence. Regression analysis is the established method for studying the effects of risk factors. In order to make correct inferences from the regression model, the associations between repeated responses need to be taken into account. The analysis of repeated categorical responses typically focus on regression modelling. However, further insights can also be gained by investigating the structure of the association. The central theme in this study is on the development of joint regression and association models. The analysis of repeated, or otherwise clustered, categorical responses is computationally difficult. Likelihood-based inference is often feasible only when the number of repeated responses for each study unit is small. In Paper IV, an algorithm is presented, which substantially facilitates maximum likelihood fitting, especially when the number of repeated responses increase. In addition, a notable result arising from this work is the freely available software for likelihood-based estimation of clustered categorical responses.
Resumo:
Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.
Resumo:
Many problems in analysis have been solved using the theory of Hodge structures. P. Deligne started to treat these structures in a categorical way. Following him, we introduce the categories of mixed real and complex Hodge structures. Category of mixed Hodge structures over the field of real or complex numbers is a rigid abelian tensor category, and in fact, a neutral Tannakian category. Therefore it is equivalent to the category of representations of an affine group scheme. The direct sums of pure Hodge structures of different weights over real or complex numbers can be realized as a representation of the torus group, whose complex points is the Cartesian product of two punctured complex planes. Mixed Hodge structures turn out to consist of information of a direct sum of pure Hodge structures of different weights and a nilpotent automorphism. Therefore mixed Hodge structures correspond to the representations of certain semidirect product of a nilpotent group and the torus group acting on it.
Resumo:
Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.
Resumo:
What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.