885 resultados para Bayesian ridge regression
Resumo:
Additive and nonadditive genetic effects on preweaning weight gain (PWG) of a commercial crossbred population were estimated using different genetic models and estimation methods. The data set consisted of 103,445 records on purebred and crossbred Nelore-Hereford calves raised under pasture conditions on farms located in south, southeast, and middle west Brazilian regions. In addition to breed additive and dominance effects, the models including different epistasis covariables were tested. Models considering joint additive and environment (latitude) by genetic effects interactions were also applied. In a first step, analyses were carried out under animal models. In a second step, preadjusted records were analyzed using ordinary least squares (OLS) and ridge regression (RR). The results reinforced evidence that breed additive and dominance effects are not sufficient to explain the observed variability in preweaning traits of Bos taurus x Bos indicus calves, and that genotype x environment interaction plays an important role in the evaluation of crossbred calves. Data were ill-conditioned to estimate the effects of genotype x environment interactions. Models including these effects presented multicolinearity problems. In this case, RR seemed to be a powerful tool for obtaining more plausible and stable estimates. Estimated prediction error variances and variance inflation factors were drastically reduced, and many effects that were not significant under ordinary least squares became significant under RR. Predictions of PWG based on RR estimates were more acceptable from a biological perspective. In temperate and subtropical regions, calves with intermediate genetic compositions (close to 1/2 Nelore) exhibited greater predicted PWG. In the tropics, predicted PWG increased linearly as genotype got closer to Nelore. ©2006 American Society of Animal Science. All rights reserved.
Resumo:
This paper presents a general modeling approach to investigate and to predict measurement errors in active energy meters both induction and electronic types. The measurement error modeling is based on Generalized Additive Model (GAM), Ridge Regression method and experimental results of meter provided by a measurement system. The measurement system provides a database of 26 pairs of test waveforms captured in a real electrical distribution system, with different load characteristics (industrial, commercial, agricultural, and residential), covering different harmonic distortions, and balanced and unbalanced voltage conditions. In order to illustrate the proposed approach, the measurement error models are discussed and several results, which are derived from experimental tests, are presented in the form of three-dimensional graphs, and generalized as error equations. © 2009 IEEE.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
The study was designed to investigate the impact of air pollution on monthly inhalation/nebulization procedures in Ribeirao Preto, Sao Paulo State, Brazil, from 2004 to 2010. To assess the relationship between the procedures and particulate matter (PM10) a Bayesian Poisson regression model was used, including a random factor that captured extra-Poisson variability between counts. Particulate matter was associated with the monthly number of inhalation/nebulization procedures, but the inclusion of covariates (temperature, precipitation, and season of the year) suggests a possible confounding effect. Although other studies have linked particulate matter to an increasing number of visits due to respiratory morbidity, the results of this study suggest that such associations should be interpreted with caution.
Resumo:
The study was designed to investigate the impact of air pollution on monthly inhalation/nebulization procedures in Ribeirão Preto, São Paulo State, Brazil, from 2004 to 2010. To assess the relationship between the procedures and particulate matter (PM10) a Bayesian Poisson regression model was used, including a random factor that captured extra-Poisson variability between counts. Particulate matter was associated with the monthly number of inhalation/nebulization procedures, but the inclusion of covariates (temperature, precipitation, and season of the year) suggests a possible confounding effect. Although other studies have linked particulate matter to an increasing number of visits due to respiratory morbidity, the results of this study suggest that such associations should be interpreted with caution.
Resumo:
This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
Pancreatic cancer is the 4th most common cause for cancer death in the United States, accompanied by less than 5% five-year survival rate based on current treatments, particularly because it is usually detected at a late stage. Identifying a high-risk population to launch an effective preventive strategy and intervention to control this highly lethal disease is desperately needed. The genetic etiology of pancreatic cancer has not been well profiled. We hypothesized that unidentified genetic variants by previous genome-wide association study (GWAS) for pancreatic cancer, due to stringent statistical threshold or missing interaction analysis, may be unveiled using alternative approaches. To achieve this aim, we explored genetic susceptibility to pancreatic cancer in terms of marginal associations of pathway and genes, as well as their interactions with risk factors. We conducted pathway- and gene-based analysis using GWAS data from 3141 pancreatic cancer patients and 3367 controls with European ancestry. Using the gene set ridge regression in association studies (GRASS) method, we analyzed 197 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Using the logistic kernel machine (LKM) test, we analyzed 17906 genes defined by University of California Santa Cruz (UCSC) database. Using the likelihood ratio test (LRT) in a logistic regression model, we analyzed 177 pathways and 17906 genes for interactions with risk factors in 2028 pancreatic cancer patients and 2109 controls with European ancestry. After adjusting for multiple comparisons, six pathways were marginally associated with risk of pancreatic cancer ( P < 0.00025): Fc epsilon RI signaling, maturity onset diabetes of the young, neuroactive ligand-receptor interaction, long-term depression (Ps < 0.0002), and the olfactory transduction and vascular smooth muscle contraction pathways (P = 0.0002; Nine genes were marginally associated with pancreatic cancer risk (P < 2.62 × 10−5), including five reported genes (ABO, HNF1A, CLPTM1L, SHH and MYC), as well as four novel genes (OR13C4, OR 13C3, KCNA6 and HNF4 G); three pathways significantly interacted with risk factors on modifying the risk of pancreatic cancer (P < 2.82 × 10−4): chemokine signaling pathway with obesity ( P < 1.43 × 10−4), calcium signaling pathway (P < 2.27 × 10−4) and MAPK signaling pathway with diabetes (P < 2.77 × 10−4). However, none of the 17906 genes tested for interactions survived the multiple comparisons corrections. In summary, our current GWAS study unveiled unidentified genetic susceptibility to pancreatic cancer using alternative methods. These novel findings provide new perspectives on genetic susceptibility to and molecular mechanisms of pancreatic cancer, once confirmed, will shed promising light on the prevention and treatment of this disease. ^
Resumo:
Les modèles incrémentaux sont des modèles statistiques qui ont été développés initialement dans le domaine du marketing. Ils sont composés de deux groupes, un groupe contrôle et un groupe traitement, tous deux comparés par rapport à une variable réponse binaire (le choix de réponses est « oui » ou « non »). Ces modèles ont pour but de détecter l’effet du traitement sur les individus à l’étude. Ces individus n’étant pas tous des clients, nous les appellerons : « prospects ». Cet effet peut être négatif, nul ou positif selon les caractéristiques des individus composants les différents groupes. Ce mémoire a pour objectif de comparer des modèles incrémentaux d’un point de vue bayésien et d’un point de vue fréquentiste. Les modèles incrémentaux utilisés en pratique sont ceux de Lo (2002) et de Lai (2004). Ils sont initialement réalisés d’un point de vue fréquentiste. Ainsi, dans ce mémoire, l’approche bayésienne est utilisée et comparée à l’approche fréquentiste. Les simulations sont e ectuées sur des données générées avec des régressions logistiques. Puis, les paramètres de ces régressions sont estimés avec des simulations Monte-Carlo dans l’approche bayésienne et comparés à ceux obtenus dans l’approche fréquentiste. L’estimation des paramètres a une influence directe sur la capacité du modèle à bien prédire l’effet du traitement sur les individus. Nous considérons l’utilisation de trois lois a priori pour l’estimation des paramètres de façon bayésienne. Elles sont choisies de manière à ce que les lois a priori soient non informatives. Les trois lois utilisées sont les suivantes : la loi bêta transformée, la loi Cauchy et la loi normale. Au cours de l’étude, nous remarquerons que les méthodes bayésiennes ont un réel impact positif sur le ciblage des individus composant les échantillons de petite taille.
Resumo:
Viruses play a key role in the complex aetiology of bovine respiratory disease (BRD). Bovine viral diarrhoea virus 1 (BVDV-1) is widespread in Australia and has been shown to contribute to BRD occurrence. As part of a prospective longitudinal study on BRD, effects of exposure to BVDV-1 on risk of BRD in Australian feedlot cattle were investigated. A total of 35,160 animals were enrolled at induction (when animals were identified and characteristics recorded), held in feedlot pens with other cattle (cohorts) and monitored for occurrence of BRD over the first 50 days following induction. Biological samples collected from all animals were tested to determine which animals were persistently infected (PI) with BVDV-1. Data obtained from the Australian National Livestock Identification System database were used to determine which groups of animals that were together at the farm of origin and at 28 days prior to induction (and were enrolled in the study) contained a PI animal and hence to identify animals that had probably been exposed to a PI animal prior to induction. Multi-level Bayesian logistic regression models were fitted to estimate the effects of exposure to BVDV-1 on the risk of occurrence of BRD.Although only a total of 85 study animals (0.24%) were identified as being PI with BVDV-1, BVDV-1 was detected on quantitative polymerase chain reaction in 59% of cohorts. The PI animals were at moderately increased risk of BRD (OR 1.9; 95% credible interval 1.0-3.2). Exposure to BVDV-1 in the cohort was also associated with a moderately increased risk of BRD (OR 1.7; 95% credible interval 1.1-2.5) regardless of whether or not a PI animal was identified within the cohort. Additional analyses indicated that a single quantitative real-time PCR test is useful for distinguishing PI animals from transiently infected animals.The results of the study suggest that removal of PI animals and/or vaccination, both before feedlot entry, would reduce the impact of BVDV-1 on BRD risk in cattle in Australian feedlots. Economic assessment of these strategies under Australian conditions is required. © 2016 Elsevier B.V.
Resumo:
Les modèles incrémentaux sont des modèles statistiques qui ont été développés initialement dans le domaine du marketing. Ils sont composés de deux groupes, un groupe contrôle et un groupe traitement, tous deux comparés par rapport à une variable réponse binaire (le choix de réponses est « oui » ou « non »). Ces modèles ont pour but de détecter l’effet du traitement sur les individus à l’étude. Ces individus n’étant pas tous des clients, nous les appellerons : « prospects ». Cet effet peut être négatif, nul ou positif selon les caractéristiques des individus composants les différents groupes. Ce mémoire a pour objectif de comparer des modèles incrémentaux d’un point de vue bayésien et d’un point de vue fréquentiste. Les modèles incrémentaux utilisés en pratique sont ceux de Lo (2002) et de Lai (2004). Ils sont initialement réalisés d’un point de vue fréquentiste. Ainsi, dans ce mémoire, l’approche bayésienne est utilisée et comparée à l’approche fréquentiste. Les simulations sont e ectuées sur des données générées avec des régressions logistiques. Puis, les paramètres de ces régressions sont estimés avec des simulations Monte-Carlo dans l’approche bayésienne et comparés à ceux obtenus dans l’approche fréquentiste. L’estimation des paramètres a une influence directe sur la capacité du modèle à bien prédire l’effet du traitement sur les individus. Nous considérons l’utilisation de trois lois a priori pour l’estimation des paramètres de façon bayésienne. Elles sont choisies de manière à ce que les lois a priori soient non informatives. Les trois lois utilisées sont les suivantes : la loi bêta transformée, la loi Cauchy et la loi normale. Au cours de l’étude, nous remarquerons que les méthodes bayésiennes ont un réel impact positif sur le ciblage des individus composant les échantillons de petite taille.
Resumo:
There is a need to identify factors that are able to influence health in old age and to develop interventions that could slow down the process of aging and its associated pathologies. Lifestyle modifications, and especially nutrition, appear to be promising strategies to promote healthy aging. Their impact on aging biomarkers has been poorly investigated. In the first part of this work, we evaluated the impact of a one-year Mediterranean-like diet, delivered within the framework of the NU-AGE project in 120 elderly subjects, on epigenetic age acceleration measures assessed with Horvath’s clock. We observed a rejuvenation of participants after nutritional intervention. The effect was more marked in the group of Polish females and in subjects who were epigenetically older at baseline. In the second part of this work, we developed a new model of epigenetic biomarker, based on a gene-targeted approach with the EpiTYPER® system. We selected six regions of interest (associated with ELOVL2, NHLRC1, SIRT7/MAFG, AIM2, EDARADD and TFAP2E genes) and constructed our model through a ridge regression analysis. In controls, estimation of chronological age was accurate, with a correlation coefficient between predicted and chronological age of 0.92 and a mean absolute deviation of 4.70 years. Our model was able to capture phenomena of accelerated or decelerated aging, in Down syndrome subjects and centenarians and offspring respectively. Applying our model to samples of the NU-AGE project, we observed similar results to the ones obtained with the canonical epigenetic clock, with a rejuvenation of the individuals after one-year of nutritional intervention. Together, our findings indicate that nutrition can promote epigenetic rejuvenation and that epigenetic age acceleration measures could be suitable biomarkers to evaluate their impact. We demonstrated that the effect of the dietary intervention is country-, sex- and individual-specific, thus suggesting the need for a personalized approach to nutritional interventions.
Resumo:
This paper considers the instrumental variable regression model when there is uncertainty about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainty can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very exible and can be easily adapted to analyze any of the di¤erent priors that have been proposed in the Bayesian instrumental variables literature. We show how to calculate the probability of any relevant restriction (e.g. the posterior probability that over-identifying restrictions hold) and discuss diagnostic checking using the posterior distribution of discrepancy vectors. We illustrate our methods in a returns-to-schooling application.
Resumo:
This paper proposes a common and tractable framework for analyzingdifferent definitions of fixed and random effects in a contant-slopevariable-intercept model. It is shown that, regardless of whethereffects (i) are treated as parameters or as an error term, (ii) areestimated in different stages of a hierarchical model, or whether (iii)correlation between effects and regressors is allowed, when the sameinformation on effects is introduced into all estimation methods, theresulting slope estimator is also the same across methods. If differentmethods produce different results, it is ultimately because differentinformation is being used for each methods.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.