973 resultados para conditional models


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We evaluate the performance of several specification tests for Markov regime-switching time-series models. We consider the Lagrange multiplier (LM) and dynamic specification tests of Hamilton (1996) and Ljung–Box tests based on both the generalized residual and a standard-normal residual constructed using the Rosenblatt transformation. The size and power of the tests are studied using Monte Carlo experiments. We find that the LM tests have the best size and power properties. The Ljung–Box tests exhibit slight size distortions, though tests based on the Rosenblatt transformation perform better than the generalized residual-based tests. The tests exhibit impressive power to detect both autocorrelation and autoregressive conditional heteroscedasticity (ARCH). The tests are illustrated with a Markov-switching generalized ARCH (GARCH) model fitted to the US dollar–British pound exchange rate, with the finding that both autocorrelation and GARCH effects are needed to adequately fit the data.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We explore the empirical usefulness of conditional coskewness to explain the cross-section of equity returns. We find that coskewness is an important determinant of the returns to equity, and that the pricing relationship varies through time. In particular we find that when the conditional market skewness is positive investors are willing to sacrifice 7.87% annually per unit of gamma (a standardized measure of coskewness risk) while they only demand a premium of 1.80% when the market is negatively skewed. A similar picture emerges from the coskewness factor of Harvey and Siddique (Harvey, C., Siddique, A., 2000a. Conditional skewness in asset pricing models tests. Journal of Finance 65, 1263–1295.) (a portfolio that is long stocks with small coskewness with the market and short high coskewness stocks) which earns 5.00% annually when the market is positively skewed but only 2.81% when the market is negatively skewed. The conditional two-moment CAPM and a conditional Fama and French (Fama, E., French, K., 1992. The cross-section of expected returns. Journal of Finance 47,427465.) three-factor model are rejected, but a model which includes coskewness is not rejected by the data. The model also passes a structural break test which many existing asset pricing models fail.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis, the issue of incorporating uncertainty for environmental modelling informed by imagery is explored by considering uncertainty in deterministic modelling, measurement uncertainty and uncertainty in image composition. Incorporating uncertainty in deterministic modelling is extended for use with imagery using the Bayesian melding approach. In the application presented, slope steepness is shown to be the main contributor to total uncertainty in the Revised Universal Soil Loss Equation. A spatial sampling procedure is also proposed to assist in implementing Bayesian melding given the increased data size with models informed by imagery. Measurement error models are another approach to incorporating uncertainty when data is informed by imagery. These models for measurement uncertainty, considered in a Bayesian conditional independence framework, are applied to ecological data generated from imagery. The models are shown to be appropriate and useful in certain situations. Measurement uncertainty is also considered in the context of change detection when two images are not co-registered. An approach for detecting change in two successive images is proposed that is not affected by registration. The procedure uses the Kolmogorov-Smirnov test on homogeneous segments of an image to detect change, with the homogeneous segments determined using a Bayesian mixture model of pixel values. Using the mixture model to segment an image also allows for uncertainty in the composition of an image. This thesis concludes by comparing several different Bayesian image segmentation approaches that allow for uncertainty regarding the allocation of pixels to different ground components. Each segmentation approach is applied to a data set of chlorophyll values and shown to have different benefits and drawbacks depending on the aims of the analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Plant biosecurity requires statistical tools to interpret field surveillance data in order to manage pest incursions that threaten crop production and trade. Ultimately, management decisions need to be based on the probability that an area is infested or free of a pest. Current informal approaches to delimiting pest extent rely upon expert ecological interpretation of presence / absence data over space and time. Hierarchical Bayesian models provide a cohesive statistical framework that can formally integrate the available information on both pest ecology and data. The overarching method involves constructing an observation model for the surveillance data, conditional on the hidden extent of the pest and uncertain detection sensitivity. The extent of the pest is then modelled as a dynamic invasion process that includes uncertainty in ecological parameters. Modelling approaches to assimilate this information are explored through case studies on spiralling whitefly, Aleurodicus dispersus and red banded mango caterpillar, Deanolis sublimbalis. Markov chain Monte Carlo simulation is used to estimate the probable extent of pests, given the observation and process model conditioned by surveillance data. Statistical methods, based on time-to-event models, are developed to apply hierarchical Bayesian models to early detection programs and to demonstrate area freedom from pests. The value of early detection surveillance programs is demonstrated through an application to interpret surveillance data for exotic plant pests with uncertain spread rates. The model suggests that typical early detection programs provide a moderate reduction in the probability of an area being infested but a dramatic reduction in the expected area of incursions at a given time. Estimates of spiralling whitefly extent are examined at local, district and state-wide scales. The local model estimates the rate of natural spread and the influence of host architecture, host suitability and inspector efficiency. These parameter estimates can support the development of robust surveillance programs. Hierarchical Bayesian models for the human-mediated spread of spiralling whitefly are developed for the colonisation of discrete cells connected by a modified gravity model. By estimating dispersal parameters, the model can be used to predict the extent of the pest over time. An extended model predicts the climate restricted distribution of the pest in Queensland. These novel human-mediated movement models are well suited to demonstrating area freedom at coarse spatio-temporal scales. At finer scales, and in the presence of ecological complexity, exploratory models are developed to investigate the capacity for surveillance information to estimate the extent of red banded mango caterpillar. It is apparent that excessive uncertainty about observation and ecological parameters can impose limits on inference at the scales required for effective management of response programs. The thesis contributes novel statistical approaches to estimating the extent of pests and develops applications to assist decision-making across a range of plant biosecurity surveillance activities. Hierarchical Bayesian modelling is demonstrated as both a useful analytical tool for estimating pest extent and a natural investigative paradigm for developing and focussing biosecurity programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite considerable success in treatment of early stage localized prostate cancer (PC), acute inadequacy of late stage PC treatment and its inherent heterogeneity poses a formidable challenge. Clearly, an improved understanding of PC genesis and progression along with the development of new targeted therapies are warranted. Animal models, especially, transgenic immunocompetent mouse models, have proven to be the best ally in this respect. A series of models have been developed by modulation of expression of genes implicated in cancer-genesis and progression; mainly, modulation of expression of oncogenes, steroid hormone receptors, growth factors and their receptors, cell cycle and apoptosis regulators, and tumor suppressor genes have been used. Such models have contributed significantly to our understanding of the molecular and pathological aspects of PC initiation and progression. In particular, the transgenic mouse models based on multiple genetic alterations can more accurately address the inherent complexity of PC, not only in revealing the mechanisms of tumorigenesis and progression but also for clinically relevant evaluation of new therapies. Further, with advances in conditional knockout technologies, otherwise embryonically lethal gene changes can be incorporated leading to the development of new generation transgenics, thus adding significantly to our existing knowledge base. Different models and their relevance to PC research are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The measurement error model is a well established statistical method for regression problems in medical sciences, although rarely used in ecological studies. While the situations in which it is appropriate may be less common in ecology, there are instances in which there may be benefits in its use for prediction and estimation of parameters of interest. We have chosen to explore this topic using a conditional independence model in a Bayesian framework using a Gibbs sampler, as this gives a great deal of flexibility, allowing us to analyse a number of different models without losing generality. Using simulations and two examples, we show how the conditional independence model can be used in ecology, and when it is appropriate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1. The goal of this task is to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus. This paper details our participation to this ShARe/CLEF task. Our approaches are based on using the clinical natural language processing tool Metamap and Conditional Random Fields (CRF) to individuate mentions of disorders and then to map those to SNOMED CT concepts. Empirical results obtained on the 2013 ShARe/CLEF task highlight that our instance of Metamap (after ltering irrelevant semantic types), although achieving a high level of precision, is only able to identify a small amount of disorders (about 21% to 28%) from free-text health records. On the other hand, the addition of the CRF models allows for a much higher recall (57% to 79%) of disorders from free-text, without sensible detriment in precision. When evaluating the accuracy of the mapping of disorders to SNOMED CT concepts in the UMLS, we observe that the mapping obtained by our ltered instance of Metamap delivers state-of-the-art e ectiveness if only spans individuated by our system are considered (`relaxed' accuracy).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A spatial process observed over a lattice or a set of irregular regions is usually modeled using a conditionally autoregressive (CAR) model. The neighborhoods within a CAR model are generally formed deterministically using the inter-distances or boundaries between the regions. An extension of CAR model is proposed in this article where the selection of the neighborhood depends on unknown parameter(s). This extension is called a Stochastic Neighborhood CAR (SNCAR) model. The resulting model shows flexibility in accurately estimating covariance structures for data generated from a variety of spatial covariance models. Specific examples are illustrated using data generated from some common spatial covariance functions as well as real data concerning radioactive contamination of the soil in Switzerland after the Chernobyl accident.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A discrete agent-based model on a periodic lattice of arbitrary dimension is considered. Agents move to nearest-neighbor sites by a motility mechanism accounting for general interactions, which may include volume exclusion. The partial differential equation describing the average occupancy of the agent population is derived systematically. A diffusion equation arises for all types of interactions and is nonlinear except for the simplest interactions. In addition, multiple species of interacting subpopulations give rise to an advection-diffusion equation for each subpopulation. This work extends and generalizes previous specific results, providing a construction method for determining the transport coefficients in terms of a single conditional transition probability, which depends on the occupancy of sites in an influence region. These coefficients characterize the diffusion of agents in a crowded environment in biological and physical processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis has contributed to the advancement of knowledge in disease modelling by addressing interesting and crucial issues relevant to modelling health data over space and time. The research has led to the increased understanding of spatial scales, temporal scales, and spatial smoothing for modelling diseases, in terms of their methodology and applications. This research is of particular significance to researchers seeking to employ statistical modelling techniques over space and time in various disciplines. A broad class of statistical models are employed to assess what impact of spatial and temporal scales have on simulated and real data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Multilevel and spatial models are being increasingly used to obtain substantive information on area-level inequalities in cancer survival. Multilevel models assume independent geographical areas, whereas spatial models explicitly incorporate geographical correlation, often via a conditional autoregressive prior. However the relative merits of these methods for large population-based studies have not been explored. Using a case-study approach, we report on the implications of using multilevel and spatial survival models to study geographical inequalities in all-cause survival. Methods Multilevel discrete-time and Bayesian spatial survival models were used to study geographical inequalities in all-cause survival for a population-based colorectal cancer cohort of 22,727 cases aged 20–84 years diagnosed during 1997–2007 from Queensland, Australia. Results Both approaches were viable on this large dataset, and produced similar estimates of the fixed effects. After adding area-level covariates, the between-area variability in survival using multilevel discrete-time models was no longer significant. Spatial inequalities in survival were also markedly reduced after adjusting for aggregated area-level covariates. Only the multilevel approach however, provided an estimation of the contribution of geographical variation to the total variation in survival between individual patients. Conclusions With little difference observed between the two approaches in the estimation of fixed effects, multilevel models should be favored if there is a clear hierarchical data structure and measuring the independent impact of individual- and area-level effects on survival differences is of primary interest. Bayesian spatial analyses may be preferred if spatial correlation between areas is important and if the priority is to assess small-area variations in survival and map spatial patterns. Both approaches can be readily fitted to geographically enabled survival data from international settings