952 resultados para Schwarz Information Criterion
Resumo:
We carried out a discriminant analysis with identity by descent (IBD) at each marker as inputs, and the sib pair type (affected-affected versus affected-unaffected) as the output. Using simple logistic regression for this discriminant analysis, we illustrate the importance of comparing models with different number of parameters. Such model comparisons are best carried out using either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). When AIC (or BIC) stepwise variable selection was applied to the German Asthma data set, a group of markers were selected which provide the best fit to the data (assuming an additive effect). Interestingly, these 25-26 markers were not identical to those with the highest (in magnitude) single-locus lod scores.
Resumo:
This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.
Resumo:
Cette thèse porte sur l'analyse bayésienne de données fonctionnelles dans un contexte hydrologique. L'objectif principal est de modéliser des données d'écoulements d'eau d'une manière parcimonieuse tout en reproduisant adéquatement les caractéristiques statistiques de celles-ci. L'analyse de données fonctionnelles nous amène à considérer les séries chronologiques d'écoulements d'eau comme des fonctions à modéliser avec une méthode non paramétrique. Dans un premier temps, les fonctions sont rendues plus homogènes en les synchronisant. Ensuite, disposant d'un échantillon de courbes homogènes, nous procédons à la modélisation de leurs caractéristiques statistiques en faisant appel aux splines de régression bayésiennes dans un cadre probabiliste assez général. Plus spécifiquement, nous étudions une famille de distributions continues, qui inclut celles de la famille exponentielle, de laquelle les observations peuvent provenir. De plus, afin d'avoir un outil de modélisation non paramétrique flexible, nous traitons les noeuds intérieurs, qui définissent les éléments de la base des splines de régression, comme des quantités aléatoires. Nous utilisons alors le MCMC avec sauts réversibles afin d'explorer la distribution a posteriori des noeuds intérieurs. Afin de simplifier cette procédure dans notre contexte général de modélisation, nous considérons des approximations de la distribution marginale des observations, nommément une approximation basée sur le critère d'information de Schwarz et une autre qui fait appel à l'approximation de Laplace. En plus de modéliser la tendance centrale d'un échantillon de courbes, nous proposons aussi une méthodologie pour modéliser simultanément la tendance centrale et la dispersion de ces courbes, et ce dans notre cadre probabiliste général. Finalement, puisque nous étudions une diversité de distributions statistiques au niveau des observations, nous mettons de l'avant une approche afin de déterminer les distributions les plus adéquates pour un échantillon de courbes donné.
Resumo:
In this study, the Schwarz Information Criterion (SIC) is applied in order to detect change-points in the time series of surface water quality variables. The application of change-point analysis allowed detecting change-points in both the mean and the variance in series under study. Time variations in environmental data are complex and they can hinder the identification of the so-called change-points when traditional models are applied to this type of problems. The assumptions of normality and uncorrelation are not present in some time series, and so, a simulation study is carried out in order to evaluate the methodology’s performance when applied to non-normal data and/or with time correlation.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.
Resumo:
Maternal and infant mortality is a global health issue with a significant social and economic impact. Each year, over half a million women worldwide die due to complications related to pregnancy or childbirth, four million infants die in the first 28 days of life, and eight million infants die in the first year. Ninety-nine percent of maternal and infant deaths are in developing countries. Reducing maternal and infant mortality is among the key international development goals. In China, the national maternal mortality ratio and infant mortality rate were reduced greatly in the past two decades, yet a large discrepancy remains between urban and rural areas. To address this problem, a large-scale Safe Motherhood Programme was initiated in 2000. The programme was implemented in Guangxi in 2003. Interventions in the programme included both demand-side and supply side-interventions focusing on increasing health service use and improving birth outcomes. Little is known about the effects and economic outcomes of the Safe Motherhood Programme in Guangxi, although it has been implemented for seven years. The aim of this research is to estimate the effectiveness and cost-effectiveness of the interventions in the Safe Motherhood Programme in Guangxi, China. The objectives of this research include: 1. To evaluate whether the changes of health service use and birth outcomes are associated with the interventions in the Safe Motherhood Programme. 2. To estimate the cost-effectiveness of the interventions in the Safe Motherhood Programme and quantify the uncertainty surrounding the decision. 3. To assess the expected value of perfect information associated with both the whole decision and individual parameters, and interpret the findings to inform priority setting in further research and policy making in this area. A quasi-experimental study design was used in this research to assess the effectiveness of the programme in increasing health service use and improving birth outcomes. The study subjects were 51 intervention counties and 30 control counties. Data on the health service use, birth outcomes and socio-economic factors from 2001 to 2007 were collected from the programme database and statistical yearbooks. Based on the profile plots of the data, general linear mixed models were used to evaluate the effectiveness of the programme while controlling for the effects of baseline levels of the response variables, change of socio-economic factors over time and correlations among repeated measurements from the same county. Redundant multicollinear variables were deleted from the mixed model using the results of the multicollinearity diagnoses. For each response variable, the best covariance structure was selected from 15 alternatives according to the fit statistics including Akaike information criterion, Finite-population corrected Akaike information criterion, and Schwarz.s Bayesian information criterion. Residual diagnostics were used to validate the model assumptions. Statistical inferences were made to show the effect of the programme on health service use and birth outcomes. A decision analytic model was developed to evaluate the cost-effectiveness of the programme, quantify the decision uncertainty, and estimate the expected value of perfect information associated with the decision. The model was used to describe the transitions between health states for women and infants and reflect the change of both costs and health benefits associated with implementing the programme. Result gained from the mixed models and other relevant evidence identified were synthesised appropriately to inform the input parameters of the model. Incremental cost-effectiveness ratios of the programme were calculated for the two groups of intervention counties over time. Uncertainty surrounding the parameters was dealt with using probabilistic sensitivity analysis, and uncertainty relating to model assumptions was handled using scenario analysis. Finally the expected value of perfect information for both the whole model and individual parameters in the model were estimated to inform priority setting in further research in this area.The annual change rates of the antenatal care rate and the institutionalised delivery rate were improved significantly in the intervention counties after the programme was implemented. Significant improvements were also found in the annual change rates of the maternal mortality ratio, the infant mortality rate, the incidence rate of neonatal tetanus and the mortality rate of neonatal tetanus in the intervention counties after the implementation of the programme. The annual change rate of the neonatal mortality rate was also improved, although the improvement was only close to statistical significance. The influences of the socio-economic factors on the health service use indicators and birth outcomes were identified. The rural income per capita had a significant positive impact on the health service use indicators, and a significant negative impact on the birth outcomes. The number of beds in healthcare institutions per 1,000 population and the number of rural telephone subscribers per 1,000 were found to be positively significantly related to the institutionalised delivery rate. The length of highway per square kilometre negatively influenced the maternal mortality ratio. The percentage of employed persons in the primary industry had a significant negative impact on the institutionalised delivery rate, and a significant positive impact on the infant mortality rate and neonatal mortality rate. The incremental costs of implementing the programme over the existing practice were US $11.1 million from the societal perspective, and US $13.8 million from the perspective of the Ministry of Health. Overall, 28,711 life years were generated by the programme, producing an overall incremental cost-effectiveness ratio of US $386 from the societal perspective, and US $480 from the perspective of the Ministry of Health, both of which were below the threshold willingness-to-pay ratio of US $675. The expected net monetary benefit generated by the programme was US $8.3 million from the societal perspective, and US $5.5 million from the perspective of the Ministry of Health. The overall probability that the programme was cost-effective was 0.93 and 0.89 from the two perspectives, respectively. The incremental cost-effectiveness ratio of the programme was insensitive to the different estimates of the three parameters relating to the model assumptions. Further research could be conducted to reduce the uncertainty surrounding the decision, in which the upper limit of investment was US $0.6 million from the societal perspective, and US $1.3 million from the perspective of the Ministry of Health. It is also worthwhile to get a more precise estimate of the improvement of infant mortality rate. The population expected value of perfect information for individual parameters associated with this parameter was US $0.99 million from the societal perspective, and US $1.14 million from the perspective of the Ministry of Health. The findings from this study have shown that the interventions in the Safe Motherhood Programme were both effective and cost-effective in increasing health service use and improving birth outcomes in rural areas of Guangxi, China. Therefore, the programme represents a good public health investment and should be adopted and further expanded to an even broader area if possible. This research provides economic evidence to inform efficient decision making in improving maternal and infant health in developing countries.
Resumo:
The problem of model selection of a univariate long memory time series is investigated once a semi parametric estimator for the long memory parameter has been used. Standard information criteria are not consistent in this case. A Modified Information Criterion (MIC) that overcomes these difficulties is introduced and proofs that show its asymptotic validity are provided. The results are general and cover a wide range of short memory processes. Simulation evidence compares the new and existing methodologies and empirical applications in monthly inflation and daily realized volatility are presented.
Resumo:
We consider the finite sample properties of model selection by information criteria in conditionally heteroscedastic models. Recent theoretical results show that certain popular criteria are consistent in that they will select the true model asymptotically with probability 1. To examine the empirical relevance of this property, Monte Carlo simulations are conducted for a set of non–nested data generating processes (DGPs) with the set of candidate models consisting of all types of model used as DGPs. In addition, not only is the best model considered but also those with similar values of the information criterion, called close competitors, thus forming a portfolio of eligible models. To supplement the simulations, the criteria are applied to a set of economic and financial series. In the simulations, the criteria are largely ineffective at identifying the correct model, either as best or a close competitor, the parsimonious GARCH(1, 1) model being preferred for most DGPs. In contrast, asymmetric models are generally selected to represent actual data. This leads to the conjecture that the properties of parameterizations of processes commonly used to model heteroscedastic data are more similar than may be imagined and that more attention needs to be paid to the behaviour of the standardized disturbances of such models, both in simulation exercises and in empirical modelling.
Resumo:
In this article, we present the EM-algorithm for performing maximum likelihood estimation of an asymmetric linear calibration model with the assumption of skew-normally distributed error. A simulation study is conducted for evaluating the performance of the calibration estimator with interpolation and extrapolation situations. As one application in a real data set, we fitted the model studied in a dimensional measurement method used for calculating the testicular volume through a caliper and its calibration by using ultrasonography as the standard method. By applying this methodology, we do not need to transform the variables to have symmetrical errors. Another interesting aspect of the approach is that the developed transformation to make the information matrix nonsingular, when the skewness parameter is near zero, leaves the parameter of interest unchanged. Model fitting is implemented and the best choice between the usual calibration model and the model proposed in this article was evaluated by developing the Akaike information criterion, Schwarz`s Bayesian information criterion and Hannan-Quinn criterion.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)