67 resultados para misspecification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The method of generalised estimating equations for regression modelling of clustered outcomes allows for specification of a working matrix that is intended to approximate the true correlation matrix of the observations. We investigate the asymptotic relative efficiency of the generalised estimating equation for the mean parameters when the correlation parameters are estimated by various methods. The asymptotic relative efficiency depends on three-features of the analysis, namely (i) the discrepancy between the working correlation structure and the unobservable true correlation structure, (ii) the method by which the correlation parameters are estimated and (iii) the 'design', by which we refer to both the structures of the predictor matrices within clusters and distribution of cluster sizes. Analytical and numerical studies of realistic data-analysis scenarios show that choice of working covariance model has a substantial impact on regression estimator efficiency. Protection against avoidable loss of efficiency associated with covariance misspecification is obtained when a 'Gaussian estimation' pseudolikelihood procedure is used with an AR(1) structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Birds represent the most diverse extant tetrapod clade, with ca. 10,000 extant species, and the timing of the crown avian radiation remains hotly debated. The fossil record supports a primarily Cenozoic radiation of crown birds, whereas molecular divergence dating analyses generally imply that this radiation was well underway during the Cretaceous. Furthermore, substantial differences have been noted between published divergence estimates. These have been variously attributed to clock model, calibration regime, and gene type. One underappreciated phenomenon is that disparity between fossil ages and molecular dates tends to be proportionally greater for shallower nodes in the avian Tree of Life. Here, we explore potential drivers of disparity in avian divergence dates through a set of analyses applying various calibration strategies and coding methods to a mitochondrial genome dataset and an 18-gene nuclear dataset, both sampled across 72 taxa. Our analyses support the occurrence of two deep divergences (i.e., the Palaeognathae/Neognathae split and the Galloanserae/Neoaves split) well within the Cretaceous, followed by a rapid radiation of Neoaves near the K-Pg boundary. However, 95% highest posterior density intervals for most basal divergences in Neoaves cross the boundary, and we emphasize that, barring unreasonably strict prior distributions, distinguishing between a rapid Early Paleocene radiation and a Late Cretaceous radiation may be beyond the resolving power of currently favored divergence dating methods. In contrast to recent observations for placental mammals, constraining all divergences within Neoaves to occur in the Cenozoic does not result in unreasonably high inferred substitution rates. Comparisons of nuclear DNA (nDNA) versus mitochondrial DNA (mtDNA) datasets and NT- versus RY-coded mitochondrial data reveal patterns of disparity that are consistent with substitution model misspecifications that result in tree compression/tree extension artifacts, which may explain some discordance between previous divergence estimates based on different sequence types. Comparisons of fully calibrated and nominally calibrated trees support a correlation between body mass and apparent dating error. Overall, our results are consistent with (but do not require) a Paleogene radiation for most major clades of crown birds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the performance of the tests proposed by Hadri and by Hadri and Larsson for testing for stationarity in heterogeneous panel data under model misspecification. The panel tests are based on the well known KPSS test (cf. Kwiatkowski et al.) which considers two models: stationarity around a deterministic level and stationarity around a deterministic trend. There is no study, as far as we know, on the statistical properties of the test when the wrong model is used. We also consider the case of the simultaneous presence of the two types of models in a panel. We employ two asymptotics: joint asymptotic, T, N -> infinity simultaneously, and T fixed and N allowed to grow indefinitely. We use Monte Carlo experiments to investigate the effects of misspecification in sample sizes usually used in practice. The results indicate that the assumption that T is fixed rather than asymptotic leads to tests that have less size distortions, particularly for relatively small T with large N panels (micro-panels) than the tests derived under the joint asymptotics. We also find that choosing a deterministic trend when a deterministic level is true does not significantly affect the properties of the test. But, choosing a deterministic level when a deterministic trend is true leads to extreme over-rejections. Therefore, when unsure about which model has generated the data, it is suggested to use the model with a trend. We also propose a new statistic for testing for stationarity in mixed panel data where the mixture is known. The performance of this new test is very good for both cases of T asymptotic and T fixed. The statistic for T asymptotic is slightly undersized when T is very small (

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The generic IS-success constructs first identified by DeLone and McLean (1992) continue to be widely employed in research. Yet, recent work by Petter et al (2007) has cast doubt on the validity of many mainstream constructs employed in IS research over the past 3 decades; critiquing the almost universal conceptualization and validation of these constructs as reflective when in many studies the measures appear to have been implicitly operationalized as formative. Cited examples of proper specification of the Delone and McLean constructs are few, particularly in light of their extensive employment in IS research. This paper introduces a four-stage formative construct development framework: Conceive > Operationalize > Respond > Validate (CORV). Employing the CORV framework in an archival analysis of research published in top outlets 1985-2007, the paper explores the extent of possible problems with past IS research due to potential misspecification of the four application-related success dimensions: Individual-Impact, Organizational-Impact, System-Quality and Information-Quality. Results suggest major concerns where there is a mismatch of the Respond and Validate stages. A general dearth of attention to the Operationalize and Respond stages in methodological writings is also observed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A national-level safety analysis tool is needed to complement existing analytical tools for assessment of the safety impacts of roadway design alternatives. FHWA has sponsored the development of the Interactive Highway Safety Design Model (IHSDM), which is roadway design and redesign software that estimates the safety effects of alternative designs. Considering the importance of IHSDM in shaping the future of safety-related transportation investment decisions, FHWA justifiably sponsored research with the sole intent of independently validating some of the statistical models and algorithms in IHSDM. Statistical model validation aims to accomplish many important tasks, including (a) assessment of the logical defensibility of proposed models, (b) assessment of the transferability of models over future time periods and across different geographic locations, and (c) identification of areas in which future model improvements should be made. These three activities are reported for five proposed types of rural intersection crash prediction models. The internal validation of the model revealed that the crash models potentially suffer from omitted variables that affect safety, site selection and countermeasure selection bias, poorly measured and surrogate variables, and misspecification of model functional forms. The external validation indicated the inability of models to perform on par with model estimation performance. Recommendations for improving the state of the practice from this research include the systematic conduct of carefully designed before-and-after studies, improvements in data standardization and collection practices, and the development of analytical methods to combine the results of before-and-after studies with cross-sectional studies in a meaningful and useful way.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For over half a century, it has been known that the rate of morphological evolution appears to vary with the time frame of measurement. Rates of microevolutionary change, measured between successive generations, were found to be far higher than rates of macroevolutionary change inferred from the fossil record. More recently, it has been suggested that rates of molecular evolution are also time dependent, with the estimated rate depending on the timescale of measurement. This followed surprising observations that estimates of mutation rates, obtained in studies of pedigrees and laboratory mutation-accumulation lines, exceeded long-term substitution rates by an order of magnitude or more. Although a range of studies have provided evidence for such a pattern, the hypothesis remains relatively contentious. Furthermore, there is ongoing discussion about the factors that can cause molecular rate estimates to be dependent on time. Here we present an overview of our current understanding of time-dependent rates. We provide a summary of the evidence for time-dependent rates in animals, bacteria and viruses. We review the various biological and methodological factors that can cause rates to be time dependent, including the effects of natural selection, calibration errors, model misspecification and other artefacts. We also describe the challenges in calibrating estimates of molecular rates, particularly on the intermediate timescales that are critical for an accurate characterization of time-dependent rates. This has important consequences for the use of molecular-clock methods to estimate timescales of recent evolutionary events.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Despite recent methodological advances in inferring the time-scale of biological evolution from molecular data, the fundamental question of whether our substitution models are sufficiently well specified to accurately estimate branch-lengths has received little attention. I examine this implicit assumption of all molecular dating methods, on a vertebrate mitochondrial protein-coding dataset. Comparison with analyses in which the data are RY-coded (AG → R; CT → Y) suggests that even rates-across-sites maximum likelihood greatly under-compensates for multiple substitutions among the standard (ACGT) NT-coded data, which has been subject to greater phylogenetic signal erosion. Accordingly, the fossil record indicates that branch-lengths inferred from the NT-coded data translate into divergence time overestimates when calibrated from deeper in the tree. Intriguingly, RY-coding led to the opposite result. The underlying NT and RY substitution model misspecifications likely relate respectively to “hidden” rate heterogeneity and changes in substitution processes across the tree, for which I provide simulated examples. Given the magnitude of the inferred molecular dating errors, branch-length estimation biases may partly explain current conflicts with some palaeontological dating estimates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study compared the performance of a local and three robust optimality criteria in terms of the standard error for a one-parameter and a two-parameter nonlinear model with uncertainty in the parameter values. The designs were also compared in conditions where there was misspecification in the prior parameter distribution. The impact of different correlation between parameters on the optimal design was examined in the two-parameter model. The designs and standard errors were solved analytically whenever possible and numerically otherwise.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A flexible and simple Bayesian decision-theoretic design for dose-finding trials is proposed in this paper. In order to reduce the computational burden, we adopt a working model with conjugate priors, which is flexible to fit all monotonic dose-toxicity curves and produces analytic posterior distributions. We also discuss how to use a proper utility function to reflect the interest of the trial. Patients are allocated based on not only the utility function but also the chosen dose selection rule. The most popular dose selection rule is the one-step-look-ahead (OSLA), which selects the best-so-far dose. A more complicated rule, such as the two-step-look-ahead, is theoretically more efficient than the OSLA only when the required distributional assumptions are met, which is, however, often not the case in practice. We carried out extensive simulation studies to evaluate these two dose selection rules and found that OSLA was often more efficient than two-step-look-ahead under the proposed Bayesian structure. Moreover, our simulation results show that the proposed Bayesian method's performance is superior to several popular Bayesian methods and that the negative impact of prior misspecification can be managed in the design stage.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.