943 resultados para Statistical modelling
Resumo:
The election of an Australian Labor Government in Australia in 2007 saw ‘social inclusion’ emerge as the official and overarching social policy agenda. Being ‘included’ was subsequently defined by the ALP Government as being able to ‘have the resources, opportunities and capabilities needed to learn, work, engage and have a voice’. Various researchers in Australia demonstrated an interest in social inclusion, as it enabled them to construct a multi-dimensional framework for measuring disadvantage. This research program resulted in various forms of statistical modelling based on some agreement about what it means to be included in society. The multi-dimensional approach taken by academic researchers, however, did not necessarily translate to a new model of social policy development or implementation. We argue that, similar to the experience of the UK, Australia’s social inclusion policy agenda was for the most part narrowly and individually defined by politicians and policy makers, particularly in terms of equating being employed with being included. We conclude with discussion about the need to strengthen the social inclusion framework by adopting an understanding of social inequality and social justice that is more relational and less categorical.
Resumo:
Species distribution models (SDMs) are considered to exemplify Pattern rather than Process based models of a species' response to its environment. Hence when used to map species distribution, the purpose of SDMs can be viewed as interpolation, since species response is measured at a few sites in the study region, and the aim is to interpolate species response at intermediate sites. Increasingly, however, SDMs are also being used to also extrapolate species-environment relationships beyond the limits of the study region as represented by the training data. Regardless of whether SDMs are to be used for interpolation or extrapolation, the debate over how to implement SDMs focusses on evaluating the quality of the SDM, both ecologically and mathematically. This paper proposes a framework that includes useful tools previously employed to address uncertainty in habitat modelling. Together with existing frameworks for addressing uncertainty more generally when modelling, we then outline how these existing tools help inform development of a broader framework for addressing uncertainty, specifically when building habitat models. As discussed earlier we focus on extrapolation rather than interpolation, where the emphasis on predictive performance is diluted by the concerns for robustness and ecological relevance. We are cognisant of the dangers of excessively propagating uncertainty. Thus, although the framework provides a smorgasbord of approaches, it is intended that the exact menu selected for a particular application, is small in size and targets the most important sources of uncertainty. We conclude with some guidance on a strategic approach to identifying these important sources of uncertainty. Whilst various aspects of uncertainty in SDMs have previously been addressed, either as the main aim of a study or as a necessary element of constructing SDMs, this is the first paper to provide a more holistic view.
Resumo:
description and analysis of geographically indexed health data with respect to demographic, environmental, behavioural, socioeconomic, genetic, and infectious risk factors (Elliott andWartenberg 2004). Disease maps can be useful for estimating relative risk; ecological analyses, incorporating area and/or individual-level covariates; or cluster analyses (Lawson 2009). As aggregated data are often more readily available, one common method of mapping disease is to aggregate the counts of disease at some geographical areal level, and present them as choropleth maps (Devesa et al. 1999; Population Health Division 2006). Therefore, this chapter will focus exclusively on methods appropriate for areal data...
Resumo:
Instead of regarding a particular type of gambling activity (for example, electronic gambling machines, table games) as an isolated factor for problem gambling, recent research suggests that gambling involvement (for example, as measured by the number of different types of gambling activities played) should also be considered. Using a large sample of the Victorian adult population, this study found that the strength of association between problem gambling and the type of gambling reduced after adjusting for gambling involvement. This finding supports recent research that gambling involvement is an important factor in assessing the risk of problem gambling. The study also provides insights into the measurements of gambling involvement and provides alternative statistical modelling to analyse problem gambling.
Resumo:
Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. This article addresses the question of what to do when the total is known and is of interest. Tools used in this case are reviewed and analysed, in particular the relationship between the positive orthant of D-dimensional real space, the product space of the real line times the D-part simplex, and their Euclidean space structures. The first alternative corresponds to data analysis taking logarithms on each component, and the second one to treat a log-transformed total jointly with a composition describing the distribution of component amounts. Real data about total abundances of phytoplankton in an Australian river motivated the present study and are used for illustration.
Resumo:
The increase in global temperature has been attributed to increased atmospheric concentrations of greenhouse gases (GHG), mainly that of CO2. The threat of severe and complex socio-economic and ecological implications of climate change have initiated an international process that aims to reduce emissions, to increase C sinks, and to protect existing C reservoirs. The famous Kyoto protocol is an offspring of this process. The Kyoto protocol and its accords state that signatory countries need to monitor their forest C pools, and to follow the guidelines set by the IPCC in the preparation, reporting and quality assessment of the C pool change estimates. The aims of this thesis were i) to estimate the changes in carbon stocks vegetation and soil in the forests in Finnish forests from 1922 to 2004, ii) to evaluate the applied methodology by using empirical data, iii) to assess the reliability of the estimates by means of uncertainty analysis, iv) to assess the effect of forest C sinks on the reliability of the entire national GHG inventory, and finally, v) to present an application of model-based stratification to a large-scale sampling design of soil C stock changes. The applied methodology builds on the forest inventory measured data (or modelled stand data), and uses statistical modelling to predict biomasses and litter productions, as well as a dynamic soil C model to predict the decomposition of litter. The mean vegetation C sink of Finnish forests from 1922 to 2004 was 3.3 Tg C a-1, and in soil was 0.7 Tg C a-1. Soil is slowly accumulating C as a consequence of increased growing stock and unsaturated soil C stocks in relation to current detritus input to soil that is higher than in the beginning of the period. Annual estimates of vegetation and soil C stock changes fluctuated considerably during the period, were frequently opposite (e.g. vegetation was a sink but soil was a source). The inclusion of vegetation sinks into the national GHG inventory of 2003 increased its uncertainty from between -4% and 9% to ± 19% (95% CI), and further inclusion of upland mineral soils increased it to ± 24%. The uncertainties of annual sinks can be reduced most efficiently by concentrating on the quality of the model input data. Despite the decreased precision of the national GHG inventory, the inclusion of uncertain sinks improves its accuracy due to the larger sectoral coverage of the inventory. If the national soil sink estimates were prepared by repeated soil sampling of model-stratified sample plots, the uncertainties would be accounted for in the stratum formation and sample allocation. Otherwise, the increases of sampling efficiency by stratification remain smaller. The highly variable and frequently opposite annual changes in ecosystem C pools imply the importance of full ecosystem C accounting. If forest C sink estimates will be used in practice average sink estimates seem a more reasonable basis than the annual estimates. This is due to the fact that annual forest sinks vary considerably and annual estimates are uncertain, and they have severe consequences for the reliability of the total national GHG balance. The estimation of average sinks should still be based on annual or even more frequent data due to the non-linear decomposition process that is influenced by the annual climate. The methodology used in this study to predict forest C sinks can be transferred to other countries with some modifications. The ultimate verification of sink estimates should be based on comparison to empirical data, in which case the model-based stratification presented in this study can serve to improve the efficiency of the sampling design.
Resumo:
There has been a recent spate of high profile infrastructure cost overruns in Australia and internationally. This is just the tip of a longer-term and more deeply-seated problem with initial budget estimating practice, well recognised in both academic research and industry reviews: the problem of uncertainty. A case study of the Sydney Opera House is used to identify and illustrate the key causal factors and system dynamics of cost overruns. It is conventionally the role of risk management to deal with such uncertainty, but the type and extent of the uncertainty involved in complex projects is shown to render established risk management techniques ineffective. This paper considers a radical advance on current budget estimating practice which involves a particular approach to statistical modelling complemented by explicit training in estimating practice. The statistical modelling approach combines the probability management techniques of Savage, which operate on actual distributions of values rather than flawed representations of distributions, and the data pooling technique of Skitmore, where the size of the reference set is optimised. Estimating training employs particular calibration development methods pioneered by Hubbard, which reduce the bias of experts caused by over-confidence and improve the consistency of subjective decision-making. A new framework for initial budget estimating practice is developed based on the combined statistical and training methods, with each technique being explained and discussed.
Resumo:
Habitat fragmentation produces patches of suitable habitat surrounded by unfavourable matrix habitat. A species may persist in such a fragmented landscape in an equilibrium between the extinctions and recolonizations of local populations, thus forming a metapopulation. Migration between local populations is necessary for the long-term persistence of a metapopulation. The Glanville fritillary butterfly (Melitaea cinxia) forms a metapopulation in the Åland islands in Finland. There is migration between the populations, the extent of which is affected by several environmental factors and variation in the phenotype of individual butterflies. Different allelic forms of the glycolytic enzyme phosphoglucose isomerase (Pgi) has been identified as a possible genetic factor influencing flight performance and migration rate in this species. The frequency of a certain Pgi allele, Pgi-f, follows the same pattern in relation to population age and connectivity as migration propensity. Furthermore, variation in flight metabolic performance, which is likely to affect migration propensity, has been linked to genetic variation in Pgi or a closely linked locus. The aim of this study was to investigate the association between Pgi genotype and the migration propensity in the Glanville fritillary both at the individual and population levels using a statistical modelling approach. A mark-release-recapture (MRR) study was conducted in a habitat patch network of M. cinxia in Åland to collect data on the movements of individual butterflies. Larval samples from the study area were also collected for population level examinations. Each butterfly and larva was genotyped at the Pgi locus. The MRR data was parameterised with two mathematical models of migration: the Virtual Migration Model (VM) and the spatially explicit diffusion model. VM model predicted and observed numbers of emigrants from populations with high and low frequencies of Pgi-f were compared. Posterior predictive data sets were simulated based on the parameters of the diffusion model. Lack-of-fit of observed values to the model predicted values of several descriptors of movements were detected, and the effect of Pgi genotype on the deviations was assessed by randomizations including the genotype information. This study revealed a possible difference in the effect of Pgi genotype on migration propensity between the two sexes in the Glanville fritillary. The females with and males without the Pgi-f allele moved more between habitat patches, which is probably related to differences in the function of flight in the two sexes. Females may use their high flight capacity to migrate between habitat patches to find suitable oviposition sites, whereas males may use it to acquire mates by keeping a territory and fighting off other intruding males, possibly causing them to emigrate. The results were consistent across different movement descriptors and at the individual and population levels. The effect of Pgi is likely to be dependent on the structure of the landscape and the prevailing environmental conditions.
Resumo:
The importance of long-range prediction of rainfall pattern for devising and planning agricultural strategies cannot be overemphasized. However, the prediction of rainfall pattern remains a difficult problem and the desired level of accuracy has not been reached. The conventional methods for prediction of rainfall use either dynamical or statistical modelling. In this article we report the results of a new modelling technique using artificial neural networks. Artificial neural networks are especially useful where the dynamical processes and their interrelations for a given phenomenon are not known with sufficient accuracy. Since conventional neural networks were found to be unsuitable for simulating and predicting rainfall patterns, a generalized structure of a neural network was then explored and found to provide consistent prediction (hindcast) of all-India annual mean rainfall with good accuracy. Performance and consistency of this network are evaluated and compared with those of other (conventional) neural networks. It is shown that the generalized network can make consistently good prediction of annual mean rainfall. Immediate application and potential of such a prediction system are discussed.
Resumo:
In this work we chiefly deal with two broad classes of problems in computational materials science, determining the doping mechanism in a semiconductor and developing an extreme condition equation of state. While solving certain aspects of these questions is well-trodden ground, both require extending the reach of existing methods to fully answer them. Here we choose to build upon the framework of density functional theory (DFT) which provides an efficient means to investigate a system from a quantum mechanics description.
Zinc Phosphide (Zn3P2) could be the basis for cheap and highly efficient solar cells. Its use in this regard is limited by the difficulty in n-type doping the material. In an effort to understand the mechanism behind this, the energetics and electronic structure of intrinsic point defects in zinc phosphide are studied using generalized Kohn-Sham theory and utilizing the Heyd, Scuseria, and Ernzerhof (HSE) hybrid functional for exchange and correlation. Novel 'perturbation extrapolation' is utilized to extend the use of the computationally expensive HSE functional to this large-scale defect system. According to calculations, the formation energy of charged phosphorus interstitial defects are very low in n-type Zn3P2 and act as 'electron sinks', nullifying the desired doping and lowering the fermi-level back towards the p-type regime. Going forward, this insight provides clues to fabricating useful zinc phosphide based devices. In addition, the methodology developed for this work can be applied to further doping studies in other systems.
Accurate determination of high pressure and temperature equations of state is fundamental in a variety of fields. However, it is often very difficult to cover a wide range of temperatures and pressures in an laboratory setting. Here we develop methods to determine a multi-phase equation of state for Ta through computation. The typical means of investigating thermodynamic properties is via ’classical’ molecular dynamics where the atomic motion is calculated from Newtonian mechanics with the electronic effects abstracted away into an interatomic potential function. For our purposes, a ’first principles’ approach such as DFT is useful as a classical potential is typically valid for only a portion of the phase diagram (i.e. whatever part it has been fit to). Furthermore, for extremes of temperature and pressure quantum effects become critical to accurately capture an equation of state and are very hard to capture in even complex model potentials. This requires extending the inherently zero temperature DFT to predict the finite temperature response of the system. Statistical modelling and thermodynamic integration is used to extend our results over all phases, as well as phase-coexistence regions which are at the limits of typical DFT validity. We deliver the most comprehensive and accurate equation of state that has been done for Ta. This work also lends insights that can be applied to further equation of state work in many other materials.
Resumo:
158 p.
Resumo:
Numerical integration is a key component of many problems in scientific computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model's hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy.
Resumo:
This report studies when and why two Hidden Markov Models (HMMs) may represent the same stochastic process. HMMs are characterized in terms of equivalence classes whose elements represent identical stochastic processes. This characterization yields polynomial time algorithms to detect equivalent HMMs. We also find fast algorithms to reduce HMMs to essentially unique and minimal canonical representations. The reduction to a canonical form leads to the definition of 'Generalized Markov Models' which are essentially HMMs without the positivity constraint on their parameters. We discuss how this generalization can yield more parsimonious representations of stochastic processes at the cost of the probabilistic interpretation of the model parameters.