964 resultados para Bayes information criterion


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis deals with the problem of Model Selection (MS) motivated by information and prediction theory, focusing on parametric time series (TS) models. The main contribution of the thesis is the extension to the multivariate case of the Misspecification-Resistant Information Criterion (MRIC), a criterion introduced recently that solves Akaike’s original research problem posed 50 years ago, which led to the definition of the AIC. The importance of MS is witnessed by the huge amount of literature devoted to it and published in scientific journals of many different disciplines. Despite such a widespread treatment, the contributions that adopt a mathematically rigorous approach are not so numerous and one of the aims of this project is to review and assess them. Chapter 2 discusses methodological aspects of MS from information theory. Information criteria (IC) for the i.i.d. setting are surveyed along with their asymptotic properties; and the cases of small samples, misspecification, further estimators. Chapter 3 surveys criteria for TS. IC and prediction criteria are considered for: univariate models (AR, ARMA) in the time and frequency domain, parametric multivariate (VARMA, VAR); nonparametric nonlinear (NAR); and high-dimensional models. The MRIC answers Akaike’s original question on efficient criteria, for possibly-misspecified (PM) univariate TS models in multi-step prediction with high-dimensional data and nonlinear models. Chapter 4 extends the MRIC to PM multivariate TS models for multi-step prediction introducing the Vectorial MRIC (VMRIC). We show that the VMRIC is asymptotically efficient by proving the decomposition of the MSPE matrix and the consistency of its Method-of-Moments Estimator (MoME), for Least Squares multi-step prediction with univariate regressor. Chapter 5 extends the VMRIC to the general multiple regressor case, by showing that the MSPE matrix decomposition holds, obtaining consistency for its MoME, and proving its efficiency. The chapter concludes with a digression on the conditions for PM VARX models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Inflammation is associated with heart failure (HF) risk factors and also directly affects myocardial function. However, the association between inflammation and HF risk in older adults has not been adequately evaluated. Methods: The association of baseline serum concentrations of interleukin-6 (IL-6), tumor necrosis factor alpha (TNF- ), and C-reactive protein (CRP) with incident HF was assessed with Cox proportional hazards models among 2610 older persons without prevalent HF enrolled in the Health, Aging, and Body Composition (Health ABC) Study (age, 73.6±2.9 years; 48.3% men; 59.6% white). Results: Median (interquartile range) baseline concentrations of IL-6, TNF- , and CRP were 1.80 (1.23, 2.76) pg/mL, 3.14 (2.41, 4.06) pg/mL, and 1.64 (0.99, 3.04) µg/mL, respectively. On follow-up (median, 9.4 years), 311 participants (11.9%) developed HF. In models controlling for clinical predictors of HF and incident coronary heart disease, doubling of IL-6, TNF- , and CRP concentrations was associated with 34% (95% CI, 18 -52%; P<.001), 33% (95% CI, 9 - 63%; P=.006), and 13% (95% CI, 3-24%; P=.01) increase in HF risk, respectively. In models including all 3 markers, IL-6 and TNF- , but not CRP, remained significant. Findings were similar across sex and race. Post-HF ejection fraction (EF) was available in 239 (76.8%) cases. When only cases with preserved EF were considered (n=105), IL-6 (HR per doubling, 1.57; 95% CI, 1.28 -1.94; P<.001), TNF- (HR per doubling, 1.59; 95% CI, 1.12-2.26; P=.01), and CRP (HR per doubling, 1.23; 95% CI, 1.05-1.44; P=.01) were all associated with HF risk in adjusted models. In contrast, when only cases with reduced EF (n=134) were considered, only IL-6 attained marginal significance in adjusted models (HR per doubling, 1.20; 95% CI, 0.99 -1.46; P=.06). Participants with 2 or 3 markers above median had pronounced HF risk in adjusted models (HR, 1.66; 95% CI, 1.12-2.46; P=.01; and HR, 1.76; 95% CI, 1.16 -2.65; P=.007, respectively). Addition of IL-6 to the clinical Health ABC HF model improved discrimination (C index from 0.717 to 0.734; P=.001) and fit (decreased Bayes information criterion by 17.8; P<.001). Conclusions: Inflammatory markers are associated with HF risk among older adults and may improve HF risk stratification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVES: The purpose of this study was to evaluate the association between inflammation and heart failure (HF) risk in older adults. BACKGROUND: Inflammation is associated with HF risk factors and also directly affects myocardial function. METHODS: The association of baseline serum concentrations of interleukin (IL)-6, tumor necrosis factor-alpha, and C-reactive protein (CRP) with incident HF was assessed with Cox models among 2,610 older persons without prevalent HF enrolled in the Health ABC (Health, Aging, and Body Composition) study (age 73.6 +/- 2.9 years; 48.3% men; 59.6% white). RESULTS: During follow-up (median 9.4 years), HF developed in 311 (11.9%) participants. In models controlling for clinical characteristics, ankle-arm index, and incident coronary heart disease, doubling of IL-6, tumor necrosis factor-alpha, and CRP concentrations was associated with 29% (95% confidence interval: 13% to 47%; p < 0.001), 46% (95% confidence interval: 17% to 84%; p = 0.001), and 9% (95% confidence interval: -1% to 24%; p = 0.087) increase in HF risk, respectively. In models including all 3 markers, IL-6, and tumor necrosis factor-alpha, but not CRP, remained significant. These associations were similar across sex and race and persisted in models accounting for death as a competing event. Post-HF ejection fraction was available in 239 (76.8%) cases; inflammatory markers had stronger association with HF with preserved ejection fraction. Repeat IL-6 and CRP determinations at 1-year follow-up did not provide incremental information. Addition of IL-6 to the clinical Health ABC HF model improved model discrimination (C index from 0.717 to 0.734; p = 0.001) and fit (decreased Bayes information criterion by 17.8; p < 0.001). CONCLUSIONS: Inflammatory markers are associated with HF risk among older adults and may improve HF risk stratification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study aimed at evaluating the validity, reliability, and factorial invariance of the complete (34-item) and shortened (8-item and 16-item) versions of the Body Shape Questionnaire (BSQ) when applied to Brazilian university students. A total of 739 female students with a mean age of 20.44 (standard deviation = 2.45) years participated. Confirmatory factor analysis was conducted to verify the degree to which the one-factor structure satisfies the proposal for the BSQ's expected structure. Two items of the 34-item version were excluded because they had factor weights (lambda)< 40. All models had adequate convergent validity (average variance extracted =.43-.58; composite reliability=.85-.97) and internal consistency (alpha =.85-.97). The 8-item B version was considered the best shortened BSQ version (Akaike information criterion = 84.07, Bayes information criterion = 157.75, Browne-Cudeck criterion= 84.46), with strong invariance for independent samples (Delta chi(2)lambda(7)= 5.06, Delta chi(2)Cov(8)= 5.11, Delta chi(2)Res(16) = 19.30). (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider model selection uncertainty in linear regression. We study theoretically and by simulation the approach of Buckland and co-workers, who proposed estimating a parameter common to all models under study by taking a weighted average over the models, using weights obtained from information criteria or the bootstrap. This approach is compared with the usual approach in which the 'best' model is used, and with Bayesian model averaging. The weighted predictor behaves similarly to model averaging, with generally more realistic mean-squared errors than the usual model-selection-based estimator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the finite sample properties of model selection by information criteria in conditionally heteroscedastic models. Recent theoretical results show that certain popular criteria are consistent in that they will select the true model asymptotically with probability 1. To examine the empirical relevance of this property, Monte Carlo simulations are conducted for a set of non–nested data generating processes (DGPs) with the set of candidate models consisting of all types of model used as DGPs. In addition, not only is the best model considered but also those with similar values of the information criterion, called close competitors, thus forming a portfolio of eligible models. To supplement the simulations, the criteria are applied to a set of economic and financial series. In the simulations, the criteria are largely ineffective at identifying the correct model, either as best or a close competitor, the parsimonious GARCH(1, 1) model being preferred for most DGPs. In contrast, asymmetric models are generally selected to represent actual data. This leads to the conjecture that the properties of parameterizations of processes commonly used to model heteroscedastic data are more similar than may be imagined and that more attention needs to be paid to the behaviour of the standardized disturbances of such models, both in simulation exercises and in empirical modelling.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Studies investigating the use of random regression models for genetic evaluation of milk production in Zebu cattle are scarce. In this study, 59,744 test-day milk yield records from 7,810 first lactations of purebred dairy Gyr (Bos indicus) and crossbred (dairy Gyr × Holstein) cows were used to compare random regression models in which additive genetic and permanent environmental effects were modeled using orthogonal Legendre polynomials or linear spline functions. Residual variances were modeled considering 1, 5, or 10 classes of days in milk. Five classes fitted the changes in residual variances over the lactation adequately and were used for model comparison. The model that fitted linear spline functions with 6 knots provided the lowest sum of residual variances across lactation. On the other hand, according to the deviance information criterion (DIC) and Bayesian information criterion (BIC), a model using third-order and fourth-order Legendre polynomials for additive genetic and permanent environmental effects, respectively, provided the best fit. However, the high rank correlation (0.998) between this model and that applying third-order Legendre polynomials for additive genetic and permanent environmental effects, indicates that, in practice, the same bulls would be selected by both models. The last model, which is less parameterized, is a parsimonious option for fitting dairy Gyr breed test-day milk yield records. © 2013 American Dairy Science Association.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.