784 resultados para ADAPTIVE REGRESSION SPLINES


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a numerically simple routine for locally adaptive smoothing. The locally heterogeneous regression function is modelled as a penalized spline with a smoothly varying smoothing parameter modelled as another penalized spline. This is being formulated as hierarchical mixed model, with spline coe±cients following a normal distribution, which by itself has a smooth structure over the variances. The modelling exercise is in line with Baladandayuthapani, Mallick & Carroll (2005) or Crainiceanu, Ruppert & Carroll (2006). But in contrast to these papers Laplace's method is used for estimation based on the marginal likelihood. This is numerically simple and fast and provides satisfactory results quickly. We also extend the idea to spatial smoothing and smoothing in the presence of non normal response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study was to evaluate the performance of stacked species distribution models in predicting the alpha and gamma species diversity patterns of two important plant clades along elevation in the Andes. We modelled the distribution of the species in the Anthurium genus (53 species) and the Bromeliaceae family (89 species) using six modelling techniques. We combined all of the predictions for the same species in ensemble models based on two different criteria: the average of the rescaled predictions by all techniques and the average of the best techniques. The rescaled predictions were then reclassified into binary predictions (presence/absence). By stacking either the original predictions or binary predictions for both ensemble procedures, we obtained four different species richness models per taxa. The gamma and alpha diversity per elevation band (500 m) was also computed. To evaluate the prediction abilities for the four predictions of species richness and gamma diversity, the models were compared with the real data along an elevation gradient that was independently compiled by specialists. Finally, we also tested whether our richness models performed better than a null model of altitudinal changes of diversity based on the literature. Stacking of the ensemble prediction of the individual species models generated richness models that proved to be well correlated with the observed alpha diversity richness patterns along elevation and with the gamma diversity derived from the literature. Overall, these models tend to overpredict species richness. The use of the ensemble predictions from the species models built with different techniques seems very promising for modelling of species assemblages. Stacking of the binary models reduced the over-prediction, although more research is needed. The randomisation test proved to be a promising method for testing the performance of the stacked models, but other implementations may still be developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Within the regression framework, we show how different levels of nonlinearity influence the instantaneous firing rate prediction of single neurons. Nonlinearity can be achieved in several ways. In particular, we can enrich the predictor set with basis expansions of the input variables (enlarging the number of inputs) or train a simple but different model for each area of the data domain. Spline-based models are popular within the first category. Kernel smoothing methods fall into the second category. Whereas the first choice is useful for globally characterizing complex functions, the second is very handy for temporal data and is able to include inner-state subject variations. Also, interactions among stimuli are considered. We compare state-of-the-art firing rate prediction methods with some more sophisticated spline-based nonlinear methods: multivariate adaptive regression splines and sparse additive models. We also study the impact of kernel smoothing. Finally, we explore the combination of various local models in an incremental learning procedure. Our goal is to demonstrate that appropriate nonlinearity treatment can greatly improve the results. We test our hypothesis on both synthetic data and real neuronal recordings in cat primary visual cortex, giving a plausible explanation of the results from a biological perspective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1Recent studies demonstrated the sensitivity of northern forest ecosystems to changes in the amount and duration of snow cover at annual to decadal time scales. However, the consequences of snowfall variability remain uncertain for ecological variables operating at longer time scales, especially the distributions of forest communities. 2The Great Lakes region of North America offers a unique setting to examine the long-term effects of variable snowfall on forest communities. Lake-effect snow produces a three-fold gradient in annual snowfall over tens of kilometres, and dramatic edaphic variations occur among landform types resulting from Quaternary glaciations. We tested the hypothesis that these factors interact to control the distributions of mesic (dominated by Acer saccharum, Tsuga canadensis and Fagus grandifolia) and xeric forests (dominated by Pinus and Quercus spp.) in northern Lower Michigan. 3We compiled pre-European-settlement vegetation data and overlaid these data with records of climate, water balance and soil, onto Landtype Association polygons in a geographical information system. We then used multivariate adaptive regression splines to model the abundance of mesic vegetation in relation to environmental controls. 4Snowfall is the most predictive among five variables retained by our model, and it affects model performance 29% more than soil texture, the second most important variable. The abundance of mesic trees is high on fine-textured soils regardless of snowfall, but it increases with snowfall on coarse-textured substrates. Lake-effect snowfall also determines the species composition within mesic forests. The weighted importance of A. saccharum is significantly greater than of T. canadensis or F. grandifolia within the lake-effect snowbelt, whereas T. canadensis is more plentiful outside the snowbelt. These patterns are probably driven by the influence of snowfall on soil moisture, nutrient availability and fire return intervals. 5Our results imply that a key factor dictating the spatio-temporal patterns of forest communities in the vast region around the Great Lakes is how the lake-effect snowfall regime responds to global change. Snowfall reductions will probably cause a major decrease in the abundance of ecologically and economically important species, such as A. saccharum.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62G08, 62P30.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cette thèse porte sur l'analyse bayésienne de données fonctionnelles dans un contexte hydrologique. L'objectif principal est de modéliser des données d'écoulements d'eau d'une manière parcimonieuse tout en reproduisant adéquatement les caractéristiques statistiques de celles-ci. L'analyse de données fonctionnelles nous amène à considérer les séries chronologiques d'écoulements d'eau comme des fonctions à modéliser avec une méthode non paramétrique. Dans un premier temps, les fonctions sont rendues plus homogènes en les synchronisant. Ensuite, disposant d'un échantillon de courbes homogènes, nous procédons à la modélisation de leurs caractéristiques statistiques en faisant appel aux splines de régression bayésiennes dans un cadre probabiliste assez général. Plus spécifiquement, nous étudions une famille de distributions continues, qui inclut celles de la famille exponentielle, de laquelle les observations peuvent provenir. De plus, afin d'avoir un outil de modélisation non paramétrique flexible, nous traitons les noeuds intérieurs, qui définissent les éléments de la base des splines de régression, comme des quantités aléatoires. Nous utilisons alors le MCMC avec sauts réversibles afin d'explorer la distribution a posteriori des noeuds intérieurs. Afin de simplifier cette procédure dans notre contexte général de modélisation, nous considérons des approximations de la distribution marginale des observations, nommément une approximation basée sur le critère d'information de Schwarz et une autre qui fait appel à l'approximation de Laplace. En plus de modéliser la tendance centrale d'un échantillon de courbes, nous proposons aussi une méthodologie pour modéliser simultanément la tendance centrale et la dispersion de ces courbes, et ce dans notre cadre probabiliste général. Finalement, puisque nous étudions une diversité de distributions statistiques au niveau des observations, nous mettons de l'avant une approche afin de déterminer les distributions les plus adéquates pour un échantillon de courbes donné.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dans ce mémoire, nous cherchons à modéliser des tables à deux entrées monotones en lignes et/ou en colonnes, pour une éventuelle application sur les tables de mortalité. Nous adoptons une approche bayésienne non paramétrique et représentons la forme fonctionnelle des données par splines bidimensionnelles. L’objectif consiste à condenser une table de mortalité, c’est-à-dire de réduire l’espace d’entreposage de la table en minimisant la perte d’information. De même, nous désirons étudier le temps nécessaire pour reconstituer la table. L’approximation doit conserver les mêmes propriétés que la table de référence, en particulier la monotonie des données. Nous travaillons avec une base de fonctions splines monotones afin d’imposer plus facilement la monotonie au modèle. En effet, la structure flexible des splines et leurs dérivées faciles à manipuler favorisent l’imposition de contraintes sur le modèle désiré. Après un rappel sur la modélisation unidimensionnelle de fonctions monotones, nous généralisons l’approche au cas bidimensionnel. Nous décrivons l’intégration des contraintes de monotonie dans le modèle a priori sous l’approche hiérarchique bayésienne. Ensuite, nous indiquons comment obtenir un estimateur a posteriori à l’aide des méthodes de Monte Carlo par chaînes de Markov. Finalement, nous étudions le comportement de notre estimateur en modélisant une table de la loi normale ainsi qu’une table t de distribution de Student. L’estimation de nos données d’intérêt, soit la table de mortalité, s’ensuit afin d’évaluer l’amélioration de leur accessibilité.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

L’intérêt principal de cette recherche porte sur la validation d’une méthode statistique en pharmaco-épidémiologie. Plus précisément, nous allons comparer les résultats d’une étude précédente réalisée avec un devis cas-témoins niché dans la cohorte utilisé pour tenir compte de l’exposition moyenne au traitement : – aux résultats obtenus dans un devis cohorte, en utilisant la variable exposition variant dans le temps, sans faire d’ajustement pour le temps passé depuis l’exposition ; – aux résultats obtenus en utilisant l’exposition cumulative pondérée par le passé récent ; – aux résultats obtenus selon la méthode bayésienne. Les covariables seront estimées par l’approche classique ainsi qu’en utilisant l’approche non paramétrique bayésienne. Pour la deuxième le moyennage bayésien des modèles sera utilisé pour modéliser l’incertitude face au choix des modèles. La technique utilisée dans l’approche bayésienne a été proposée en 1997 mais selon notre connaissance elle n’a pas été utilisée avec une variable dépendante du temps. Afin de modéliser l’effet cumulatif de l’exposition variant dans le temps, dans l’approche classique la fonction assignant les poids selon le passé récent sera estimée en utilisant des splines de régression. Afin de pouvoir comparer les résultats avec une étude précédemment réalisée, une cohorte de personnes ayant un diagnostique d’hypertension sera construite en utilisant les bases des données de la RAMQ et de Med-Echo. Le modèle de Cox incluant deux variables qui varient dans le temps sera utilisé. Les variables qui varient dans le temps considérées dans ce mémoire sont iv la variable dépendante (premier évènement cérébrovasculaire) et une des variables indépendantes, notamment l’exposition

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The considerable search for synergistic agents in cancer research is motivated by the therapeutic benefits achieved by combining anti-cancer agents. Synergistic agents make it possible to reduce dosage while maintaining or enhancing a desired effect. Other favorable outcomes of synergistic agents include reduction in toxicity and minimizing or delaying drug resistance. Dose-response assessment and drug-drug interaction analysis play an important part in the drug discovery process, however analysis are often poorly done. This dissertation is an effort to notably improve dose-response assessment and drug-drug interaction analysis. The most commonly used method in published analysis is the Median-Effect Principle/Combination Index method (Chou and Talalay, 1984). The Median-Effect Principle/Combination Index method leads to inefficiency by ignoring important sources of variation inherent in dose-response data and discarding data points that do not fit the Median-Effect Principle. Previous work has shown that the conventional method yields a high rate of false positives (Boik, Boik, Newman, 2008; Hennessey, Rosner, Bast, Chen, 2010) and, in some cases, low power to detect synergy. There is a great need for improving the current methodology. We developed a Bayesian framework for dose-response modeling and drug-drug interaction analysis. First, we developed a hierarchical meta-regression dose-response model that accounts for various sources of variation and uncertainty and allows one to incorporate knowledge from prior studies into the current analysis, thus offering a more efficient and reliable inference. Second, in the case that parametric dose-response models do not fit the data, we developed a practical and flexible nonparametric regression method for meta-analysis of independently repeated dose-response experiments. Third, and lastly, we developed a method, based on Loewe additivity that allows one to quantitatively assess interaction between two agents combined at a fixed dose ratio. The proposed method makes a comprehensive and honest account of uncertainty within drug interaction assessment. Extensive simulation studies show that the novel methodology improves the screening process of effective/synergistic agents and reduces the incidence of type I error. We consider an ovarian cancer cell line study that investigates the combined effect of DNA methylation inhibitors and histone deacetylation inhibitors in human ovarian cancer cell lines. The hypothesis is that the combination of DNA methylation inhibitors and histone deacetylation inhibitors will enhance antiproliferative activity in human ovarian cancer cell lines compared to treatment with each inhibitor alone. By applying the proposed Bayesian methodology, in vitro synergy was declared for DNA methylation inhibitor, 5-AZA-2'-deoxycytidine combined with one histone deacetylation inhibitor, suberoylanilide hydroxamic acid or trichostatin A in the cell lines HEY and SKOV3. This suggests potential new epigenetic therapies in cell growth inhibition of ovarian cancer cells.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sediment samples and hydrographic conditions were studied at 28 stations around Iceland. At these sites, Conductivity-Temperature-Depth (CTD) casts were conducted to collect hydrographic data and multicorer casts were conductd to collect data on sediment characteristics including grain size distribution, carbon and nitrogen concentration, and chloroplastic pigment concentration. A total of 14 environmental predictors were used to model sediment characteristics around Iceland on regional geographic space. For these, two approaches were used: Multivariate Adaptation Regression Splines (MARS) and randomForest regression models. RandomForest outperformed MARS in predicting grain size distribution. MARS models had a greater tendency to over- and underpredict sediment values in areas outside the environmental envelope defined by the training dataset. We provide first GIS layers on sediment characteristics around Iceland, that can be used as predictors in future models. Although models performed well, more samples, especially from the shelf areas, will be needed to improve the models in future.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)