912 resultados para Modified Information Criteria
Resumo:
The problem of model selection of a univariate long memory time series is investigated once a semi parametric estimator for the long memory parameter has been used. Standard information criteria are not consistent in this case. A Modified Information Criterion (MIC) that overcomes these difficulties is introduced and proofs that show its asymptotic validity are provided. The results are general and cover a wide range of short memory processes. Simulation evidence compares the new and existing methodologies and empirical applications in monthly inflation and daily realized volatility are presented.
Resumo:
This paper uses appropriately modified information criteria to select models from the GARCH family, which are subsequently used for predicting US dollar exchange rate return volatility. The out of sample forecast accuracy of models chosen in this manner compares favourably on mean absolute error grounds, although less favourably on mean squared error grounds, with those generated by the commonly used GARCH(1, 1) model. An examination of the orders of models selected by the criteria reveals that (1, 1) models are typically selected less than 20% of the time.
Resumo:
1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.
Resumo:
HIV risk in vulnerable groups such as itinerant male street labourers is often examined via a focus on individual determinants. This study provides a test of a modified Information-Motivation-Behavioral Skills (IMB) model to predict condom use behaviour among male street workers in urban Vietnam. In a cross-sectional survey using a social mapping technique, 450 male street labourers from 13 districts of Hanoi, Vietnam were recruited and interviewed. Collected data were first examined for completeness; structural equation modelling was then employed to test the model fit. Condoms were used inconsistently by many of these men, and usage varied in relation to a number of factors. A modified IMB model had a better fit than the original IMB model in predicting condom use behaviour. This modified model accounted for 49% of the variance, versus 10% by the original version. In the modified model, the influence of psychosocial factors was moderately high, whilst the influence of HIV prevention information, motivation and perceived behavioural skills was moderately low, explaining in part the limited level of condom use behaviour. This study provides insights into social factors that should be taken into account in public health planning to promote safer sexual behaviour among Asian male street labourers.
Resumo:
We consider the finite sample properties of model selection by information criteria in conditionally heteroscedastic models. Recent theoretical results show that certain popular criteria are consistent in that they will select the true model asymptotically with probability 1. To examine the empirical relevance of this property, Monte Carlo simulations are conducted for a set of non–nested data generating processes (DGPs) with the set of candidate models consisting of all types of model used as DGPs. In addition, not only is the best model considered but also those with similar values of the information criterion, called close competitors, thus forming a portfolio of eligible models. To supplement the simulations, the criteria are applied to a set of economic and financial series. In the simulations, the criteria are largely ineffective at identifying the correct model, either as best or a close competitor, the parsimonious GARCH(1, 1) model being preferred for most DGPs. In contrast, asymmetric models are generally selected to represent actual data. This leads to the conjecture that the properties of parameterizations of processes commonly used to model heteroscedastic data are more similar than may be imagined and that more attention needs to be paid to the behaviour of the standardized disturbances of such models, both in simulation exercises and in empirical modelling.
Resumo:
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
In two papers [Proc. SPIE 4471, 272-280 (2001) and Appl. Opt. 43, 2709-2721 (2004)], a logarithmic phase mask was proposed and proved to be effective in extending the depth of field; however, according to our research, this mask is not that perfect because the corresponding defocused modulation transfer function has large oscillations in the low-frequency region, even when the mask is optimized. So, in a previously published paper [Opt. Lett. 33, 1171-1173 (2008)], we proposed an improved logarithmic phase mask by making a small modification. The new mask can not only eliminate the drawbacks to a certain extent but can also be even less sensitive to focus errors according to Fisher information criteria. However, the performance comparison was carried out with the modified mask not being optimized, which was not reasonable. In this manuscript, we optimize the modified logarithmic phase mask first before analyzing its performance and more convincing results have been obtained based on the analysis of several frequently used metrics. (C) 2010 Optical Society of America
Resumo:
BACKGROUND: Whilst multimorbidity is more prevalent with increasing age, approximately 30% of middle-aged adults (45-64 years) are also affected. Several prescribing criteria have been developed to optimise medication use in older people (≥65 years) with little focus on potentially inappropriate prescribing (PIP) in middle-aged adults. We have developed a set of explicit prescribing criteria called PROMPT (PRescribing Optimally in Middle-aged People's Treatments) which may be applied to prescribing datasets to determine the prevalence of PIP in this age-group.
METHODS: A literature search was conducted to identify published prescribing criteria for all age groups, with the Project Steering Group (convened for this study) adding further criteria for consideration, all of which were reviewed for relevance to middle-aged adults. These criteria underwent a two-round Delphi process, using an expert panel consisting of general practitioners, pharmacists and clinical pharmacologists from the United Kingdom and Republic of Ireland. Using web-based questionnaires, 17 panellists were asked to indicate their level of agreement with each criterion via a 5-point Likert scale (1 = Strongly Disagree, 5 = Strongly Agree) to assess the applicability to middle-aged adults in the absence of clinical information. Criteria were accepted/rejected/revised dependent on the panel's level of agreement using the median response/interquartile range and additional comments.
RESULTS: Thirty-four criteria were rated in the first round of this exercise and consensus was achieved on 17 criteria which were accepted into the PROMPT criteria. Consensus was not reached on the remaining 17, and six criteria were removed following a review of the additional comments. The second round of this exercise focused on the remaining 11 criteria, some of which were revised following the first exercise. Five criteria were accepted from the second round, providing a final list of 22 criteria [gastro-intestinal system (n = 3), cardiovascular system (n = 4), respiratory system (n = 4), central nervous system (n = 6), infections (n = 1), endocrine system (n = 1), musculoskeletal system (n = 2), duplicates (n = 1)].
CONCLUSIONS: PROMPT is the first set of prescribing criteria developed for use in middle-aged adults. The utility of these criteria will be tested in future studies using prescribing datasets.
Resumo:
The thesis has covered various aspects of modeling and analysis of finite mean time series with symmetric stable distributed innovations. Time series analysis based on Box and Jenkins methods are the most popular approaches where the models are linear and errors are Gaussian. We highlighted the limitations of classical time series analysis tools and explored some generalized tools and organized the approach parallel to the classical set up. In the present thesis we mainly studied the estimation and prediction of signal plus noise model. Here we assumed the signal and noise follow some models with symmetric stable innovations.We start the thesis with some motivating examples and application areas of alpha stable time series models. Classical time series analysis and corresponding theories based on finite variance models are extensively discussed in second chapter. We also surveyed the existing theories and methods correspond to infinite variance models in the same chapter. We present a linear filtering method for computing the filter weights assigned to the observation for estimating unobserved signal under general noisy environment in third chapter. Here we consider both the signal and the noise as stationary processes with infinite variance innovations. We derived semi infinite, double infinite and asymmetric signal extraction filters based on minimum dispersion criteria. Finite length filters based on Kalman-Levy filters are developed and identified the pattern of the filter weights. Simulation studies show that the proposed methods are competent enough in signal extraction for processes with infinite variance.Parameter estimation of autoregressive signals observed in a symmetric stable noise environment is discussed in fourth chapter. Here we used higher order Yule-Walker type estimation using auto-covariation function and exemplify the methods by simulation and application to Sea surface temperature data. We increased the number of Yule-Walker equations and proposed a ordinary least square estimate to the autoregressive parameters. Singularity problem of the auto-covariation matrix is addressed and derived a modified version of the Generalized Yule-Walker method using singular value decomposition.In fifth chapter of the thesis we introduced partial covariation function as a tool for stable time series analysis where covariance or partial covariance is ill defined. Asymptotic results of the partial auto-covariation is studied and its application in model identification of stable auto-regressive models are discussed. We generalize the Durbin-Levinson algorithm to include infinite variance models in terms of partial auto-covariation function and introduce a new information criteria for consistent order estimation of stable autoregressive model.In chapter six we explore the application of the techniques discussed in the previous chapter in signal processing. Frequency estimation of sinusoidal signal observed in symmetric stable noisy environment is discussed in this context. Here we introduced a parametric spectrum analysis and frequency estimate using power transfer function. Estimate of the power transfer function is obtained using the modified generalized Yule-Walker approach. Another important problem in statistical signal processing is to identify the number of sinusoidal components in an observed signal. We used a modified version of the proposed information criteria for this purpose.
Resumo:
Objective: Criteria for metabolic syndrome (MS) differ particularly regarding the definition of central obesity and consequently, there could be differences in the assessment of cardiovascular risk. We estimated the prevalence of metabolic syndrome, compared the agreement of the World Health Organization (WHO) criteria with the standard and a modified National Cholesterol Education Program (NCEP) criterion and investigated whether additional factors were associated with the diagnosis of the syndrome in a Japanese descendant population.Methods: In this cross-sectional, population-based survey, 1166 Japanese-Brazilians (533 men, 633 women) aged 57.4 +/- 12.4 years with mean body mass index (BMI) and waist of 25.2 +/- 4.0 kg/m(2) and 84.5 +/- 10.6 cm, respectively, were included. McNemar and kappa statistics were used to assess the concordance between WHO criteria with the standard and a modified NCEP criteria (waist of 90 and 80 cm, for men and women, respectively). in logistic regression analysis, a number of metabolic variables and albumin-to-creatinine ratio were included to test independent associations with metabolic syndrome defined by the modified NCEP criteria.Results: According to WHO, 55.4% (95% Cl 52.5-58.2%) of the subjects had MS and to NCEP 47.4% (95% Cl 44.6-50.0%). WHO criterion detected 48.3% of central obese subjects while NCEP only 14.0%. Kappa statistics showed a good strength of agreement (k = 0.67, p < 0.01) between WHO and NCEP standard definitions of MS. Using the modified NCEP criterion for Asians, more subjects with metabolic syndrome were identified (58%) and agreement with WHO was improved (k = 0.72, p < 0.001). However, similar Framingham risk scores were attributed to the subsets of subjects classified by any of the three criteria. Areas under the receiver operating characteristic curves, obtained for the modified waist values to diagnose metabolic syndrome according to WHO, were > 0.80 and corresponded, respectively, to sensitivity and specificity of 63 and 83% for men and 77 and 72% for women. In final logistic regression model, age, male sex, BMI and homeostasis model assessment-insulin resistance but not with albumin-to-creatinine ratio (ACR) were independently associated with the syndrome.Conclusions: High prevalence of MS, independent of the criterion considered, was found in this Japanese-Brazilian population. The replacement of waist cutoff by those proposed by WHO for Asians lead to this diagnosis in a higher number of subjects with elevated cardiovascular risk. Our data did not support that ACR should be included in the classical definition of MS in Japanese descendants as previously suggested by WHO.
Resumo:
Modelling of interferometric signals related to tear film surface quality is considered. In the context of tear film surface quality estimation in normal healthy eyes, two clinical parameters are of interest: the build-up time, and the average interblink surface quality. The former is closely related to the signal derivative while the latter to the signal itself. Polynomial signal models, chosen for a particular set of noisy interferometric measurements, can be optimally selected, in some sense, with a range of information criteria such as AIC, MDL, Cp, and CME. Those criteria, however, do not always guarantee that the true derivative of the signal is accurately represented and they often overestimate it. Here, a practical method for judicious selection of model order in a polynomial fitting to a signal is proposed so that the derivative of the signal is adequately represented. The paper highlights the importance of context-based signal modelling in model order selection.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.