955 resultados para Bayesian statistical modelling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article considers alternative methods to calculate the fair premium rate of crop insurance contracts based on county yields. The premium rate was calculated using parametric and nonparametric approaches to estimate the conditional agricultural yield density. These methods were applied to a data set of county yield provided by the Statistical and Geography Brazilian Institute (IBGE), for the period of 1990 through 2002, for soybean, corn and wheat, in the State of Paran. In this article, we propose methodological alternatives to pricing crop insurance contracts resulting in more accurate premium rates in a situation of limited data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this work is to try to create a statistical model, based only on easily computable parameters from the CSP problem to predict runtime behaviour of the solving algorithms, and let us choose the best algorithm to solve the problem. Although it seems that the obvious choice should be MAC, experimental results obtained so far show, that with big numbers of variables, other algorithms perfom much better, specially for hard problems in the transition phase.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lecture notes for a first year statistical modelling course.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The impact of projected climate change on wine production was analysed for the Demarcated Region of Douro, Portugal. A statistical grapevine yield model (GYM) was developed using climate parameters as predictors. Statistically significant correlations were identified between annual yield and monthly mean temperatures and monthly precipitation totals during the growing cycle. These atmospheric factors control grapevine yield in the region, with the GYM explaining 50.4% of the total variance in the yield time series in recent decades. Anomalously high March rainfall (during budburst, shoot and inflorescence development) favours yield, as well as anomalously high temperatures and low precipitation amounts in May and June (May: flowering and June: berry development). The GYM was applied to a regional climate model output, which was shown to realistically reproduce the GYM predictors. Finally, using ensemble simulations under the A1B emission scenario, projections for GYM-derived yield in the Douro Region, and for the whole of the twenty-first century, were analysed. A slight upward trend in yield is projected to occur until about 2050, followed by a steep and continuous increase until the end of the twenty-first century, when yield is projected to be about 800 kg/ha above current values. While this estimate is based on meteorological parameters alone, changes due to elevated CO2 may further enhance this effect. In spite of the associated uncertainties, it can be stated that projected climate change may significantly benefit wine yield in the Douro Valley.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Extreme rainfall events have triggered a significant number of flash floods in Madeira Island along its past and recent history. Madeira is a volcanic island where the spatial rainfall distribution is strongly affected by its rugged topography. In this thesis, annual maximum of daily rainfall data from 25 rain gauge stations located in Madeira Island were modelled by the generalised extreme value distribution. Also, the hypothesis of a Gumbel distribution was tested by two methods and the existence of a linear trend in both distributions parameters was analysed. Estimates for the 50– and 100–year return levels were also obtained. Still in an univariate context, the assumption that a distribution function belongs to the domain of attraction of an extreme value distribution for monthly maximum rainfall data was tested for the rainy season. The available data was then analysed in order to find the most suitable domain of attraction for the sampled distribution. In a different approach, a search for thresholds was also performed for daily rainfall values through a graphical analysis. In a multivariate context, a study was made on the dependence between extreme rainfall values from the considered stations based on Kendall’s τ measure. This study suggests the influence of factors such as altitude, slope orientation, distance between stations and their proximity of the sea on the spatial distribution of extreme rainfall. Groups of three pairwise associated stations were also obtained and an adjustment was made to a family of extreme value copulas involving the Marshall–Olkin family, whose parameters can be written as a function of Kendall’s τ association measures of the obtained pairs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anaerobic threshold (AT) is usually estimated as a change point problem by visual analysis of the cardiorespiratory response to incremental dynamic exercise. In this study, two phase linear (TPL) models of the linear-linear and linear-quadratic type were used for the estimation of AT. The correlation coefficient between the classical and statistical approaches was 0.88, and 0.89 after outlier exclusion. The TPL models provide a simple method for estimating AT that can be easily implemented using a digital computer for the automatic pattern recognition of AT.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poly(methylvinylether-co-maleic acid) (PMVE/MA) is commonly used as a component of pharmaceutical platforms, principally to enhance interactions with biological substrates (mucoadhesion). However, the limited knowledge on the rheological properties of this polymer and their relationships with mucoadhesion has negated the biomedical use of this polymer as a mono-component platform. This study presents a comprehensive study of the rheological properties of aqueous PMVE/MA platforms and defines their relationships with mucoadhesion using multiple regression analysis. Using dilute solution viscometry the intrinsic viscosities of un-neutralised PMVE/MA and PMVE/MA neutralised using NaOH or TEA were 22.32 ± 0.89 dL g-1, 274.80 ± 1.94 dL g-1 and 416.49 ± 2.21 dL g-1 illustrating greater polymer chain expansion following neutralisation using Triethylamine (TEA). PMVE/MA platforms exhibited shear-thinning properties. Increasing polymer concentration increased the consistencies, zero shear rate (ZSR) viscosities (determined from flow rheometry), storage and loss moduli, dynamic viscosities (defined using oscillatory analysis) and mucoadhesive properties, yet decreased the loss tangents of the neutralised polymer platforms. TEA neutralised systems possessed significantly and substantially greater consistencies, ZSR and dynamic viscosities, storage and loss moduli, mucoadhesion and lower loss tangents than their NaOH counterparts. Multiple regression analysis enabled identification of the dominant role of polymer viscoelasticity on mucoadhesion (r > 0.98). The mucoadhesive properties of PMVE/MA platforms were considerable and were greater than those of other platforms that have successfully been shown to enhance in vivo retention when applied to the oral cavity, indicating a positive role for PMVE/MA mono-component platforms for pharmaceutical and biomedical applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When actuaries face with the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or homeowner's insurance policy, they usually assume that types of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to multivariate Poisson date, mainly because of their computational di±culties. Bayesian inference based on MCMC helps to solve this problem (and also lets us derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claims. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models and their zero-inflated versions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical analyses of measurements that can be described by statistical models are of essence in astronomy and in scientific inquiry in general. The sensitivity of such analyses, modelling approaches, and the consequent predictions, is sometimes highly dependent on the exact techniques applied, and improvements therein can result in significantly better understanding of the observed system of interest. Particularly, optimising the sensitivity of statistical techniques in detecting the faint signatures of low-mass planets orbiting the nearby stars is, together with improvements in instrumentation, essential in estimating the properties of the population of such planets, and in the race to detect Earth-analogs, i.e. planets that could support liquid water and, perhaps, life on their surfaces. We review the developments in Bayesian statistical techniques applicable to detections planets orbiting nearby stars and astronomical data analysis problems in general. We also discuss these techniques and demonstrate their usefulness by using various examples and detailed descriptions of the respective mathematics involved. We demonstrate the practical aspects of Bayesian statistical techniques by describing several algorithms and numerical techniques, as well as theoretical constructions, in the estimation of model parameters and in hypothesis testing. We also apply these algorithms to Doppler measurements of nearby stars to show how they can be used in practice to obtain as much information from the noisy data as possible. Bayesian statistical techniques are powerful tools in analysing and interpreting noisy data and should be preferred in practice whenever computational limitations are not too restrictive.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification of compositional changes in fumarolic gases of active and quiescent volcanoes is one of the most important targets in monitoring programs. From a general point of view, many systematic (often cyclic) and random processes control the chemistry of gas discharges, making difficult to produce a convincing mathematical-statistical modelling. Changes in the chemical composition of volcanic gases sampled at Vulcano Island (Aeolian Arc, Sicily, Italy) from eight different fumaroles located in the northern sector of the summit crater (La Fossa) have been analysed by considering their dependence from time in the period 2000-2007. Each intermediate chemical composition has been considered as potentially derived from the contribution of the two temporal extremes represented by the 2000 and 2007 samples, respectively, by using inverse modelling methodologies for compositional data. Data pertaining to fumaroles F5 and F27, located on the rim and in the inner part of La Fossa crater, respectively, have been used to achieve the proposed aim. The statistical approach has allowed us to highlight the presence of random and not random fluctuations, features useful to understand how the volcanic system works, opening new perspectives in sampling strategies and in the evaluation of the natural risk related to a quiescent volcano