Biblioteca Digital

919 resultados para hierarchical generalized linear model

A Functional-Based Distribution Diagnostic for a Linear Model with Correlated Outcomes: Technical Report

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed modesl and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated marginal residual vector by the Cholesky decomposition of the inverse of the estimated marginal variance matrix. Linear functions or the resulting "rotated" residuals are used to construct an empirical cumulative distribution function (ECDF), whose stochastic limit is characterized. We describe a resampling technique that serves as a computationally efficient parametric bootstrap for generating representatives of the stochastic limit of the ECDF. Through functionals, such representatives are used to construct global tests for the hypothesis of normal margional errors. In addition, we demonstrate that the ECDF of the predicted random effects, as described by Lange and Ryan (1989), can be formulated as a special case of our approach. Thus, our method supports both omnibus and directed tests. Our method works well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series).

Classification Using Generalized Partial Least Squares

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.

Cholesky Residuals for Assessing Normal Errors in a Linear Model with Correlated Outcomes: Technical Report

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed models and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated margional residual vector by the Cholesky decomposition of the inverse of the estimated margional variance matrix. The resulting "rotated" residuals are used to construct an empirical cumulative distribution function and pointwise standard errors. The theoretical framework, including conditions and asymptotic properties, involves technical details that are motivated by Lange and Ryan (1989), Pierce (1982), and Randles (1982). Our method appears to work well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series). Our methods can produce satisfactory results even for models that do not satisfy all of the technical conditions stated in our theory.

Model Checking for ROC Regression Analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual process and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the Cystic Fibrosis registry.

An insulin infusion advisory system for type 1 diabetes patients based on non-linear model predictive control methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, an Insulin Infusion Advisory System (IIAS) for Type 1 diabetes patients, which use insulin pumps for the Continuous Subcutaneous Insulin Infusion (CSII) is presented. The purpose of the system is to estimate the appropriate insulin infusion rates. The system is based on a Non-Linear Model Predictive Controller (NMPC) which uses a hybrid model. The model comprises a Compartmental Model (CM), which simulates the absorption of the glucose to the blood due to meal intakes, and a Neural Network (NN), which simulates the glucose-insulin kinetics. The NN is a Recurrent NN (RNN) trained with the Real Time Recurrent Learning (RTRL) algorithm. The output of the model consists of short term glucose predictions and provides input to the NMPC, in order for the latter to estimate the optimum insulin infusion rates. For the development and the evaluation of the IIAS, data generated from a Mathematical Model (MM) of a Type 1 diabetes patient have been used. The proposed control strategy is evaluated at multiple meal disturbances, various noise levels and additional time delays. The results indicate that the implemented IIAS is capable of handling multiple meals, which correspond to realistic meal profiles, large noise levels and time delays.

A novel method for evaluating an interaction effect of correlated continuous covariates in a linear model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^

A Bayesian approach to estimating the regression coefficients of a multinomial logit model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^

Detecting genetic and nutritional lung cancer risk factors related to folate metabolism using Bayesian generalized linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^

Application of the general linear model to assess the effect of missing data on bone marrow mononuclear cell therapy with infusion timing and follow-up MRI

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With most clinical trials, missing data presents a statistical problem in evaluating a treatment's efficacy. There are many methods commonly used to assess missing data; however, these methods leave room for bias to enter the study. This thesis was a secondary analysis on data taken from TIME, a phase 2 randomized clinical trial conducted to evaluate the safety and effect of the administration timing of bone marrow mononuclear cells (BMMNC) for subjects with acute myocardial infarction (AMI).^ We evaluated the effect of missing data by comparing the variance inflation factor (VIF) of the effect of therapy between all subjects and only subjects with complete data. Through the general linear model, an unbiased solution was made for the VIF of the treatment's efficacy using the weighted least squares method to incorporate missing data. Two groups were identified from the TIME data: 1) all subjects and 2) subjects with complete data (baseline and follow-up measurements). After the general solution was found for the VIF, it was migrated Excel 2010 to evaluate data from TIME. The resulting numerical value from the two groups was compared to assess the effect of missing data.^ The VIF values from the TIME study were considerably less in the group with missing data. By design, we varied the correlation factor in order to evaluate the VIFs of both groups. As the correlation factor increased, the VIF values increased at a faster rate in the group with only complete data. Furthermore, while varying the correlation factor, the number of subjects with missing data was also varied to see how missing data affects the VIF. When subjects with only baseline data was increased, we saw a significant rate increase in VIF values in the group with only complete data while the group with missing data saw a steady and consistent increase in the VIF. The same was seen when we varied the group with follow-up only data. This essentially showed that the VIFs steadily increased when missing data is not ignored. When missing data is ignored as with our comparison group, the VIF values sharply increase as correlation increases.^

Generalized optoelectronic model of series-connected multijunction solar cells

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The emission of light from each junction in a series-connected multijunction solar cell both complicates and elucidates the understanding of its performance under arbitrary conditions. Bringing together many recent advances in this understanding, we present a general 1-D model to describe luminescent coupling that arises from both voltage-driven electroluminescence and voltage-independent photoluminescence in nonideal junctions that include effects such as Sah-Noyce-Shockley (SNS) recombination with n ≠ 2, Auger recombination, shunt resistance, reverse-bias breakdown, series resistance, and significant dark area losses. The individual junction voltages and currents are experimentally determined from measured optical and electrical inputs and outputs of the device within the context of the model to fit parameters that describe the devices performance under arbitrary input conditions. Techniques to experimentally fit the model are demonstrated for a four-junction inverted metamorphic solar cell, and the predictions of the model are compared with concentrator flash measurements.

Mapping Whidbey Island and Possession Sound landslide susceptibility using an additive linear model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Senior thesis written for Oceanography 445

Comparison of virulence gene profiles of Escherichia coli strains isolated from healthy and diarrheic swine

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combination of uni- and multiplex PCR assays targeting 58 virulence genes (VGs) associated with Escherichia coli strains causing intestinal and extraintestinal disease in humans and other mammals was used to analyze the VG repertoire of 23 commensal E. coli isolates from healthy pigs and 52 clinical isolates associated with porcine neonatal diarrhea (ND) and postweaning diarrhea (PWD). The relationship between the presence and absence of VGs was interrogated using three statistical methods. According to the generalized linear model, 17 of 58 VGs were found to be significant (P < 0.05) in distinguishing between commensal and clinical isolates. Nine of the 17 genes represented by iha, hlyA, aidA, east1, aah, fimH, iroN(E).(coli), traT, and saa have not been previously identified as important VGs in clinical porcine isolates in Australia. The remaining eight VGs code for fimbriae (F4, F5, F18, and F41) and toxins (STa, STh, LT, and Stx2), normally associated with porcine enterotoxigenic E. coli. Agglomerative hierarchical algorithm analysis grouped E. coli strains into subclusters based primarily on their serogroup. Multivariate analyses of clonal relationships based on the 17 VGs were collapsed into two-dimensional space by principal coordinate analysis. PWD clones were distributed in two quadrants, separated from ND and commensal clones, which tended to cluster within one quadrant. Clonal subclusters within quadrants were highly correlated with serogroups. These methods of analysis provide different perspectives in our attempts to understand how commensal and clinical porcine enterotoxigenic E. coli strains have evolved and are engaged in the dynamic process of losing or acquiring VGs within the pig population.

Likelihood-based inference for tweedie generalized linear models

Relevância:

100.00% 100.00%

Publicador:

An upper bound on the Bayesian error bars for generalized linear regression

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the Bayesian framework, predictions for a regression problem are expressed in terms of a distribution of output values. The mode of this distribution corresponds to the most probable output, while the uncertainty associated with the predictions can conveniently be expressed in terms of error bars. In this paper we consider the evaluation of error bars in the context of the class of generalized linear regression models. We provide insights into the dependence of the error bars on the location of the data points and we derive an upper bound on the true error bars in terms of the contributions from individual data points which are themselves easily evaluated.

A hierarchical latent variable model for data visualization

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.

«
1
2
3
4
5
6
7
8
...
61
62
»