983 resultados para Bayesian free-knot regression splines


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse porte sur l'analyse bayésienne de données fonctionnelles dans un contexte hydrologique. L'objectif principal est de modéliser des données d'écoulements d'eau d'une manière parcimonieuse tout en reproduisant adéquatement les caractéristiques statistiques de celles-ci. L'analyse de données fonctionnelles nous amène à considérer les séries chronologiques d'écoulements d'eau comme des fonctions à modéliser avec une méthode non paramétrique. Dans un premier temps, les fonctions sont rendues plus homogènes en les synchronisant. Ensuite, disposant d'un échantillon de courbes homogènes, nous procédons à la modélisation de leurs caractéristiques statistiques en faisant appel aux splines de régression bayésiennes dans un cadre probabiliste assez général. Plus spécifiquement, nous étudions une famille de distributions continues, qui inclut celles de la famille exponentielle, de laquelle les observations peuvent provenir. De plus, afin d'avoir un outil de modélisation non paramétrique flexible, nous traitons les noeuds intérieurs, qui définissent les éléments de la base des splines de régression, comme des quantités aléatoires. Nous utilisons alors le MCMC avec sauts réversibles afin d'explorer la distribution a posteriori des noeuds intérieurs. Afin de simplifier cette procédure dans notre contexte général de modélisation, nous considérons des approximations de la distribution marginale des observations, nommément une approximation basée sur le critère d'information de Schwarz et une autre qui fait appel à l'approximation de Laplace. En plus de modéliser la tendance centrale d'un échantillon de courbes, nous proposons aussi une méthodologie pour modéliser simultanément la tendance centrale et la dispersion de ces courbes, et ce dans notre cadre probabiliste général. Finalement, puisque nous étudions une diversité de distributions statistiques au niveau des observations, nous mettons de l'avant une approche afin de déterminer les distributions les plus adéquates pour un échantillon de courbes donné.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main object of this paper is to discuss the Bayes estimation of the regression coefficients in the elliptically distributed simple regression model with measurement errors. The posterior distribution for the line parameters is obtained in a closed form, considering the following: the ratio of the error variances is known, informative prior distribution for the error variance, and non-informative prior distributions for the regression coefficients and for the incidental parameters. We proved that the posterior distribution of the regression coefficients has at most two real modes. Situations with a single mode are more likely than those with two modes, especially in large samples. The precision of the modal estimators is studied by deriving the Hessian matrix, which although complicated can be computed numerically. The posterior mean is estimated by using the Gibbs sampling algorithm and approximations by normal distributions. The results are applied to a real data set and connections with results in the literature are reported. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dans ce mémoire, nous cherchons à modéliser des tables à deux entrées monotones en lignes et/ou en colonnes, pour une éventuelle application sur les tables de mortalité. Nous adoptons une approche bayésienne non paramétrique et représentons la forme fonctionnelle des données par splines bidimensionnelles. L’objectif consiste à condenser une table de mortalité, c’est-à-dire de réduire l’espace d’entreposage de la table en minimisant la perte d’information. De même, nous désirons étudier le temps nécessaire pour reconstituer la table. L’approximation doit conserver les mêmes propriétés que la table de référence, en particulier la monotonie des données. Nous travaillons avec une base de fonctions splines monotones afin d’imposer plus facilement la monotonie au modèle. En effet, la structure flexible des splines et leurs dérivées faciles à manipuler favorisent l’imposition de contraintes sur le modèle désiré. Après un rappel sur la modélisation unidimensionnelle de fonctions monotones, nous généralisons l’approche au cas bidimensionnel. Nous décrivons l’intégration des contraintes de monotonie dans le modèle a priori sous l’approche hiérarchique bayésienne. Ensuite, nous indiquons comment obtenir un estimateur a posteriori à l’aide des méthodes de Monte Carlo par chaînes de Markov. Finalement, nous étudions le comportement de notre estimateur en modélisant une table de la loi normale ainsi qu’une table t de distribution de Student. L’estimation de nos données d’intérêt, soit la table de mortalité, s’ensuit afin d’évaluer l’amélioration de leur accessibilité.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The considerable search for synergistic agents in cancer research is motivated by the therapeutic benefits achieved by combining anti-cancer agents. Synergistic agents make it possible to reduce dosage while maintaining or enhancing a desired effect. Other favorable outcomes of synergistic agents include reduction in toxicity and minimizing or delaying drug resistance. Dose-response assessment and drug-drug interaction analysis play an important part in the drug discovery process, however analysis are often poorly done. This dissertation is an effort to notably improve dose-response assessment and drug-drug interaction analysis. The most commonly used method in published analysis is the Median-Effect Principle/Combination Index method (Chou and Talalay, 1984). The Median-Effect Principle/Combination Index method leads to inefficiency by ignoring important sources of variation inherent in dose-response data and discarding data points that do not fit the Median-Effect Principle. Previous work has shown that the conventional method yields a high rate of false positives (Boik, Boik, Newman, 2008; Hennessey, Rosner, Bast, Chen, 2010) and, in some cases, low power to detect synergy. There is a great need for improving the current methodology. We developed a Bayesian framework for dose-response modeling and drug-drug interaction analysis. First, we developed a hierarchical meta-regression dose-response model that accounts for various sources of variation and uncertainty and allows one to incorporate knowledge from prior studies into the current analysis, thus offering a more efficient and reliable inference. Second, in the case that parametric dose-response models do not fit the data, we developed a practical and flexible nonparametric regression method for meta-analysis of independently repeated dose-response experiments. Third, and lastly, we developed a method, based on Loewe additivity that allows one to quantitatively assess interaction between two agents combined at a fixed dose ratio. The proposed method makes a comprehensive and honest account of uncertainty within drug interaction assessment. Extensive simulation studies show that the novel methodology improves the screening process of effective/synergistic agents and reduces the incidence of type I error. We consider an ovarian cancer cell line study that investigates the combined effect of DNA methylation inhibitors and histone deacetylation inhibitors in human ovarian cancer cell lines. The hypothesis is that the combination of DNA methylation inhibitors and histone deacetylation inhibitors will enhance antiproliferative activity in human ovarian cancer cell lines compared to treatment with each inhibitor alone. By applying the proposed Bayesian methodology, in vitro synergy was declared for DNA methylation inhibitor, 5-AZA-2'-deoxycytidine combined with one histone deacetylation inhibitor, suberoylanilide hydroxamic acid or trichostatin A in the cell lines HEY and SKOV3. This suggests potential new epigenetic therapies in cell growth inhibition of ovarian cancer cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’intérêt principal de cette recherche porte sur la validation d’une méthode statistique en pharmaco-épidémiologie. Plus précisément, nous allons comparer les résultats d’une étude précédente réalisée avec un devis cas-témoins niché dans la cohorte utilisé pour tenir compte de l’exposition moyenne au traitement : – aux résultats obtenus dans un devis cohorte, en utilisant la variable exposition variant dans le temps, sans faire d’ajustement pour le temps passé depuis l’exposition ; – aux résultats obtenus en utilisant l’exposition cumulative pondérée par le passé récent ; – aux résultats obtenus selon la méthode bayésienne. Les covariables seront estimées par l’approche classique ainsi qu’en utilisant l’approche non paramétrique bayésienne. Pour la deuxième le moyennage bayésien des modèles sera utilisé pour modéliser l’incertitude face au choix des modèles. La technique utilisée dans l’approche bayésienne a été proposée en 1997 mais selon notre connaissance elle n’a pas été utilisée avec une variable dépendante du temps. Afin de modéliser l’effet cumulatif de l’exposition variant dans le temps, dans l’approche classique la fonction assignant les poids selon le passé récent sera estimée en utilisant des splines de régression. Afin de pouvoir comparer les résultats avec une étude précédemment réalisée, une cohorte de personnes ayant un diagnostique d’hypertension sera construite en utilisant les bases des données de la RAMQ et de Med-Echo. Le modèle de Cox incluant deux variables qui varient dans le temps sera utilisé. Les variables qui varient dans le temps considérées dans ce mémoire sont iv la variable dépendante (premier évènement cérébrovasculaire) et une des variables indépendantes, notamment l’exposition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We review several asymmetrical links for binary regression models and present a unified approach for two skew-probit links proposed in the literature. Moreover, under skew-probit link, conditions for the existence of the ML estimators and the posterior distribution under improper priors are established. The framework proposed here considers two sets of latent variables which are helpful to implement the Bayesian MCMC approach. A simulation study to criteria for models comparison is conducted and two applications are made. Using different Bayesian criteria we show that, for these data sets, the skew-probit links are better than alternative links proposed in the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study was to evaluate the performance of stacked species distribution models in predicting the alpha and gamma species diversity patterns of two important plant clades along elevation in the Andes. We modelled the distribution of the species in the Anthurium genus (53 species) and the Bromeliaceae family (89 species) using six modelling techniques. We combined all of the predictions for the same species in ensemble models based on two different criteria: the average of the rescaled predictions by all techniques and the average of the best techniques. The rescaled predictions were then reclassified into binary predictions (presence/absence). By stacking either the original predictions or binary predictions for both ensemble procedures, we obtained four different species richness models per taxa. The gamma and alpha diversity per elevation band (500 m) was also computed. To evaluate the prediction abilities for the four predictions of species richness and gamma diversity, the models were compared with the real data along an elevation gradient that was independently compiled by specialists. Finally, we also tested whether our richness models performed better than a null model of altitudinal changes of diversity based on the literature. Stacking of the ensemble prediction of the individual species models generated richness models that proved to be well correlated with the observed alpha diversity richness patterns along elevation and with the gamma diversity derived from the literature. Overall, these models tend to overpredict species richness. The use of the ensemble predictions from the species models built with different techniques seems very promising for modelling of species assemblages. Stacking of the binary models reduced the over-prediction, although more research is needed. The randomisation test proved to be a promising method for testing the performance of the stacked models, but other implementations may still be developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Evidence suggests that children with developmental coordination disorder (DCD) have lower levels of cardiorespiratory fitness (CRF) compared to children without the condition. However, these studies were restricted to field-based methods in order to predict V02 peak in the determination of CRF. Such field tests have been criticised for their ability to provide a valid prediction of V02 peak and vulnerability to psychological aspects in children with DCD, such as low perceived adequacy toward physical activity. Moreover, the contribution of physical activity to the variance in V02 peak between the two groups is unknown. The purpose of our study was to determine the mediating role of physical activity and perceived adequacy towards physical activity on V02 peak in children with significant motor impairments. This prospective case-control design involved 122 (age 12-13 years) children with significant motor impairments (n=61) and healthy matched controls (n=61) based on age, gender and school location. Participants had been previously assessed for motor proficiency and classified as a probable DCD (p-DCD) or healthy control using the movement ABC test. V02 peak was measured by a progressive exercise test on a cycle ergometer. Perceived adequacy was measured using a 7 -item subscale from Children's Selfperception of Adequacy and Predilection for Physical Activity scale. Physical activity was monitored for seven days with the Actical® accelerometer. Children with p-DCD had significantly lower V02 peak (48.76±7.2 ml/ffm/min; p:50.05) compared to controls (53.12±8.2 ml/ffm/min), even after correcting for fat free mass. Regression analysis demonstrated that perceived adequacy and physical activity were significant mediators in the relationship between p-DCD and V02 peak. In conclusion, using a stringent laboratory assessment, the results of the current study verify the findings of earlier studies, adding low CRF to the list of health consequences associated with DCD. It seems that when testing for CRF in this population, there is a need to consider the psychological barriers associated with their condition. Moreover, strategies to increase physical activity in children with DCD may result in improvement in their CRF.