975 resultados para Box-Cox model
Resumo:
In this letter, a Box-Cox transformation-based radial basis function (RBF) neural network is introduced using the RBF neural network to represent the transformed system output. Initially a fixed and moderate sized RBF model base is derived based on a rank revealing orthogonal matrix triangularization (QR decomposition). Then a new fast identification algorithm is introduced using Gauss-Newton algorithm to derive the required Box-Cox transformation, based on a maximum likelihood estimator. The main contribution of this letter is to explore the special structure of the proposed RBF neural network for computational efficiency by utilizing the inverse of matrix block decomposition lemma. Finally, the Box-Cox transformation-based RBF neural network, with good generalization and sparsity, is identified based on the derived optimal Box-Cox transformation and a D-optimality-based orthogonal forward regression algorithm. The proposed algorithm and its efficacy are demonstrated with an illustrative example in comparison with support vector machine regression.
Resumo:
In most studies on beef cattle longevity, only the cows reaching a given number of calvings by a specific age are considered in the analyses. With the aim of evaluating all cows with productive life in herds, taking into consideration the different forms of management on each farm, it was proposed to measure cow longevity from age at last calving (ALC), that is, the most recent calving registered in the files. The objective was to characterize this trait in order to study the longevity of Nellore cattle, using the Kaplan-Meier estimators and the Cox model. The covariables and class effects considered in the models were age at first calving (AFC), year and season of birth of the cow and farm. The variable studied (ALC) was classified as presenting complete information (uncensored = 1) or incomplete information (censored = 0), using the criterion of the difference between the date of each cow's last calving and the date of the latest calving at each farm. If this difference was >36 months, the cow was considered to have failed. If not, this cow was censored, thus indicating that future calving remained possible for this cow. The records of 11 791 animals from 22 farms within the Nellore Breed Genetic Improvement Program ('Nellore Brazil') were used. In the estimation process using the Kaplan-Meier model, the variable of AFC was classified into three age groups. In individual analyses, the log-rank test and the Wilcoxon test in the Kaplan-Meier model showed that all covariables and class effects had significant effects (P < 0.05) on ALC. In the analysis considering all covariables and class effects, using the Wald test in the Cox model, only the season of birth of the cow was not significant for ALC (P > 0.05). This analysis indicated that each month added to AFC diminished the risk of the cow's failure in the herd by 2%. Nonetheless, this does not imply that animals with younger AFC had less profitability. Cows with greater numbers of calvings were more precocious than those with fewer calvings. Copyright © The Animal Consortium 2012.
Resumo:
The Box-Cox transformation is a technique mostly utilized to turn the probabilistic distribution of a time series data into approximately normal. And this helps statistical and neural models to perform more accurate forecastings. However, it introduces a bias when the reversion of the transformation is conducted with the predicted data. The statistical methods to perform a bias-free reversion require, necessarily, the assumption of Gaussianity of the transformed data distribution, which is a rare event in real-world time series. So, the aim of this study was to provide an effective method of removing the bias when the reversion of the Box-Cox transformation is executed. Thus, the developed method is based on a focused time lagged feedforward neural network, which does not require any assumption about the transformed data distribution. Therefore, to evaluate the performance of the proposed method, numerical simulations were conducted and the Mean Absolute Percentage Error, the Theil Inequality Index and the Signal-to-Noise ratio of 20-step-ahead forecasts of 40 time series were compared, and the results obtained indicate that the proposed reversion method is valid and justifies new studies. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice. This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences. We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.
Resumo:
For the first time, we introduce a class of transformed symmetric models to extend the Box and Cox models to more general symmetric models. The new class of models includes all symmetric continuous distributions with a possible non-linear structure for the mean and enables the fitting of a wide range of models to several data types. The proposed methods offer more flexible alternatives to Box-Cox or other existing procedures. We derive a very simple iterative process for fitting these models by maximum likelihood, whereas a direct unconditional maximization would be more difficult. We give simple formulae to estimate the parameter that indexes the transformation of the response variable and the moments of the original dependent variable which generalize previous published results. We discuss inference on the model parameters. The usefulness of the new class of models is illustrated in one application to a real dataset.
Resumo:
Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.
Resumo:
O objetivo principal desta dissertação é analisar a demanda por moeda no Brasil no período 1974 a 2008, lembrando que ele inclui sub-períodos de inflação elevada, e baixa, e levando em conta hipóteses alternativas quanto à formação de expectativas. A especificação adotada é a de Tourinho (1995), que generaliza a de Cagan (1956) para permitir uma forma funcional mais flexível e incorporar outras variáveis, além da inflação esperada, como variáveis explicativas. Verifica-se que estas extensões são importantes para modelar a demanda por saldos monetários reais no período aqui considerado. A forma funcional semi-log de Cagan é rejeitada, em favor de uma forma funcional flexível Box-Cox, e os coeficientes da taxa de juros real e da variância da inflação são significativos, mostrando a importância destas variáveis serem inseridas ao modelo. A função estimada para o período completo é comparada com aquelas estimadas para os sub-periodos de inflação alta e moderada, para verificar a estabilidade da formulação adotada. Conclui-se que se pode rejeitar a hipótese de que ela é estável. O modelo de Cagan é generalizado aqui em outra dimensão, considerando mecanismos alternativos de formação de expectativas, que podem ser adaptativas, como no modelo original, ou racionais. A hipótese de que expectativas adaptativas sejam racionais é também considerada. Conclui-se que a imposição da condição de racionalidade ao modelo com expectativas adaptativas não produz alterações importantes nos valores estimados.
Resumo:
Model selection between competing models is a key consideration in the discovery of prognostic multigene signatures. The use of appropriate statistical performance measures as well as verification of biological significance of the signatures is imperative to maximise the chance of external validation of the generated signatures. Current approaches in time-to-event studies often use only a single measure of performance in model selection, such as logrank test p-values, or dichotomise the follow-up times at some phase of the study to facilitate signature discovery. In this study we improve the prognostic signature discovery process through the application of the multivariate partial Cox model combined with the concordance index, hazard ratio of predictions, independence from available clinical covariates and biological enrichment as measures of signature performance. The proposed framework was applied to discover prognostic multigene signatures from early breast cancer data. The partial Cox model combined with the multiple performance measures were used in both guiding the selection of the optimal panel of prognostic genes and prediction of risk within cross validation without dichotomising the follow-up times at any stage. The signatures were successfully externally cross validated in independent breast cancer datasets, yielding a hazard ratio of 2.55 [1.44, 4.51] for the top ranking signature.
Resumo:
Le but de ce mémoire de maîtrise est de décrire les propriétés de la loi double Pareto-lognormale, de montrer comment on peut introduire des variables explicatives dans le modèle et de présenter son large potentiel d'applications dans le domaine de la science actuarielle et de la finance. Tout d'abord, nous donnons la définition de la loi double Pareto-lognormale et présentons certaines de ses propriétés basées sur les travaux de Reed et Jorgensen (2004). Les paramètres peuvent être estimés en utilisant la méthode des moments ou le maximum de vraisemblance. Ensuite, nous ajoutons une variable explicative à notre modèle. La procédure d'estimation des paramètres de ce mo-\\dèle est également discutée. Troisièmement, des applications numériques de notre modèle sont illustrées et quelques tests statistiques utiles sont effectués.