970 resultados para information criteria


Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of model selection of a univariate long memory time series is investigated once a semi parametric estimator for the long memory parameter has been used. Standard information criteria are not consistent in this case. A Modified Information Criterion (MIC) that overcomes these difficulties is introduced and proofs that show its asymptotic validity are provided. The results are general and cover a wide range of short memory processes. Simulation evidence compares the new and existing methodologies and empirical applications in monthly inflation and daily realized volatility are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the finite sample properties of model selection by information criteria in conditionally heteroscedastic models. Recent theoretical results show that certain popular criteria are consistent in that they will select the true model asymptotically with probability 1. To examine the empirical relevance of this property, Monte Carlo simulations are conducted for a set of non–nested data generating processes (DGPs) with the set of candidate models consisting of all types of model used as DGPs. In addition, not only is the best model considered but also those with similar values of the information criterion, called close competitors, thus forming a portfolio of eligible models. To supplement the simulations, the criteria are applied to a set of economic and financial series. In the simulations, the criteria are largely ineffective at identifying the correct model, either as best or a close competitor, the parsimonious GARCH(1, 1) model being preferred for most DGPs. In contrast, asymmetric models are generally selected to represent actual data. This leads to the conjecture that the properties of parameterizations of processes commonly used to model heteroscedastic data are more similar than may be imagined and that more attention needs to be paid to the behaviour of the standardized disturbances of such models, both in simulation exercises and in empirical modelling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper uses appropriately modified information criteria to select models from the GARCH family, which are subsequently used for predicting US dollar exchange rate return volatility. The out of sample forecast accuracy of models chosen in this manner compares favourably on mean absolute error grounds, although less favourably on mean squared error grounds, with those generated by the commonly used GARCH(1, 1) model. An examination of the orders of models selected by the criteria reveals that (1, 1) models are typically selected less than 20% of the time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As is well known, when using an information criterion to select the number of common factors in factor models the appropriate penalty is generally indetermine in the sense that it can be scaled by an arbitrary constant, c say, without affecting consistency. In an influential paper, Hallin and Liška (J Am Stat Assoc102:603–617, 2007) proposes a data-driven procedure for selecting the appropriate value of c. However, by removing one source of indeterminacy, the new procedure simultaneously creates several new ones, which make for rather complicated implementation, a problem that has been largely overlooked in the literature. By providing an extensive analysis using both simulated and real data, the current paper fills this gap.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

BACKGROUND: Whilst multimorbidity is more prevalent with increasing age, approximately 30% of middle-aged adults (45-64 years) are also affected. Several prescribing criteria have been developed to optimise medication use in older people (≥65 years) with little focus on potentially inappropriate prescribing (PIP) in middle-aged adults. We have developed a set of explicit prescribing criteria called PROMPT (PRescribing Optimally in Middle-aged People's Treatments) which may be applied to prescribing datasets to determine the prevalence of PIP in this age-group.

METHODS: A literature search was conducted to identify published prescribing criteria for all age groups, with the Project Steering Group (convened for this study) adding further criteria for consideration, all of which were reviewed for relevance to middle-aged adults. These criteria underwent a two-round Delphi process, using an expert panel consisting of general practitioners, pharmacists and clinical pharmacologists from the United Kingdom and Republic of Ireland. Using web-based questionnaires, 17 panellists were asked to indicate their level of agreement with each criterion via a 5-point Likert scale (1 = Strongly Disagree, 5 = Strongly Agree) to assess the applicability to middle-aged adults in the absence of clinical information. Criteria were accepted/rejected/revised dependent on the panel's level of agreement using the median response/interquartile range and additional comments.

RESULTS: Thirty-four criteria were rated in the first round of this exercise and consensus was achieved on 17 criteria which were accepted into the PROMPT criteria. Consensus was not reached on the remaining 17, and six criteria were removed following a review of the additional comments. The second round of this exercise focused on the remaining 11 criteria, some of which were revised following the first exercise. Five criteria were accepted from the second round, providing a final list of 22 criteria [gastro-intestinal system (n = 3), cardiovascular system (n = 4), respiratory system (n = 4), central nervous system (n = 6), infections (n = 1), endocrine system (n = 1), musculoskeletal system (n = 2), duplicates (n = 1)].

CONCLUSIONS: PROMPT is the first set of prescribing criteria developed for use in middle-aged adults. The utility of these criteria will be tested in future studies using prescribing datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Modelling of interferometric signals related to tear film surface quality is considered. In the context of tear film surface quality estimation in normal healthy eyes, two clinical parameters are of interest: the build-up time, and the average interblink surface quality. The former is closely related to the signal derivative while the latter to the signal itself. Polynomial signal models, chosen for a particular set of noisy interferometric measurements, can be optimally selected, in some sense, with a range of information criteria such as AIC, MDL, Cp, and CME. Those criteria, however, do not always guarantee that the true derivative of the signal is accurately represented and they often overestimate it. Here, a practical method for judicious selection of model order in a polynomial fitting to a signal is proposed so that the derivative of the signal is adequately represented. The paper highlights the importance of context-based signal modelling in model order selection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper is concerned with the study of the equilibrium exchange of ammonium ions with two natural zeolite samples sourced in Australia from Castle Mountain Zeolites and Zeolite Australia. A range of sorption models including Langmuir Vageler, Competitive Langmuir, Freundlich, Temkin, Dubinin Astakhov and Brouers–Sotolongo were applied in order to gain an insight as to the exchange process. In contrast to most previous studies, non-linear regression was used in all instances to determine the best fit of the experimental data. Castle Mountain natural zeolite was found to exhibit higher ammonium capacity than Zeolite Australia material when in the freshly received state, and this behavior was related to the greater amount of sodium ions present relative to calcium ions on the zeolite exchange sites. The zeolite capacity for ammonium ions was also found to be dependent on the solution normality, with 35–60% increase inuptake noted when increasing the ammonium concentration from 250 to 1000 mg/L. The optimal fit ofthe equilibrium data was achieved by the Freundlich expression as confirmed by use of Akaikes Information Criteria. It was emphasized that the bottle-point method chosen influenced the isotherm profile in several ways, and could lead to misleading interpretation of experiments, especially if the constant zeolite mass approach was followed. Pre-treatment of natural zeolite with acid and subsequently sodium hydroxide promoted the uptake of ammonium species by at least 90%. This paper highlighted the factors which should be taken into account when investigating ammonium ion exchange with natural zeolites.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The interdependence of Greece and other European stock markets and the subsequent portfolio implications are examined in wavelet and variational mode decomposition domain. In applying the decomposition techniques, we analyze the structural properties of data and distinguish between short and long term dynamics of stock market returns. First, the GARCH-type models are fitted to obtain the standardized residuals. Next, different copula functions are evaluated, and based on the conventional information criteria and time varying parameter, Joe-Clayton copula is chosen to model the tail dependence between the stock markets. The short-run lower tail dependence time paths show a sudden increase in comovement during the global financial crises. The results of the long-run dependence suggest that European stock markets have higher interdependence with Greece stock market. Individual country’s Value at Risk (VaR) separates the countries into two distinct groups. Finally, the two-asset portfolio VaR measures provide potential markets for Greece stock market investment diversification.