15 resultados para Variable-coefficients

em DigitalCommons@The Texas Medical Center


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Lyme disease agent Borrelia burgdorferi can persistently infect humans and other animals despite host active immune responses. This is facilitated, in part, by the vls locus, a complex system consisting of the vlsE expression site and an adjacent set of 11 to 15 silent vls cassettes. Segments of nonexpressed cassettes recombine with the vlsE region during infection of mammalian hosts, resulting in combinatorial antigenic variation of the VlsE outer surface protein. We now demonstrate that synthesis of VlsE is regulated during the natural mammal-tick infectious cycle, being activated in mammals but repressed during tick colonization. Examination of cultured B. burgdorferi cells indicated that the spirochete controls vlsE transcription levels in response to environmental cues. Analysis of PvlsE::gfp fusions in B. burgdorferi indicated that VlsE production is controlled at the level of transcriptional initiation, and regions of 5' DNA involved in the regulation were identified. Electrophoretic mobility shift assays detected qualitative and quantitative changes in patterns of protein-DNA complexes formed between the vlsE promoter and cytoplasmic proteins, suggesting the involvement of DNA-binding proteins in the regulation of vlsE, with at least one protein acting as a transcriptional activator.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An exact knowledge of the kinetic nature of the interaction between the stimulatory G protein (G$\sb{\rm s}$) and the adenylyl cyclase catalytic unit (C) is essential for interpreting the effects of Gs mutations and expression levels on cellular response to a wide variety of hormones, drugs, and neurotransmitters. In particular, insight as to the association of these proteins could lead to progress in tumor biology where single spontaneous mutations in G proteins have been associated with the formation of tumors (118). The question this work attempts to answer is whether the adenylyl cyclase activation by epinephrine stimulated $\beta\sb2$-adrenergic receptors occurs via G$\sb{\rm s}$ proteins by a G$\sb{\rm s}$ to C shuttle or G$\sb{\rm s}$-C precoupled mechanism. The two forms of activation are distinguishable by the effect of G$\sb{\rm s}$ levels on epinephrine stimulated EC50 values for cyclase activation.^ We have made stable transfectants of S49 cyc$\sp-$ cells with the gene for the $\alpha$ protein of G$\sb{\rm s}$ $(\alpha\sb{\rm s})$ which is under the control of the mouse mammary tumor virus LTR promoter (110). Expression of G$\sb{\rm s}\alpha$ was then controlled by incubation of the cells for various times with 5 $\mu$M dexamethasone. Expression of G$\sb{\rm s}\alpha$ led to the appearance of GTP shifts in the competitive binding of epinephrine with $\sp{125}$ICYP to the $\beta$-adrenergic receptors and to agonist dependent adenylyl cyclase activity. High expression of G$\sb{\rm s}\alpha$ resulted in lower EC50's for the adenylyl cyclase activity in response to epinephrine than did low expression. By kinetic modelling, this result is consistent with the existence of a shuttle mechanism for adenylyl cyclase activation by hormones.^ One item of concern that remains to be addressed is the extent to which activation of adenylyl cyclase occurs by a "pure" shuttle mechanism. Kinetic and biochemical experiments by other investigators have revealed that adenylyl cyclase activation, by hormones, may occur via a Gs-C precoupled mechanism (80, 94, 97). Activation of adenylyl cyclase, therefore, probably does not occur by either a pure "'Shuttle" or "Gs-C Precoupled" mechanism, but rather by a "Hybrid" mechanism. The extent to which either the shuttle or precoupled mechanism contributes to hormone stimulated adenylyl cyclase activity is the subject of on-going research. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Public preferences for policy are formed in a little-understood process that is not adequately described by traditional economic theory of choice. In this paper I suggest that U.S. aggregate support for health reform can be modeled as tradeoffs among a small number of behavioral values and the stage of policy development. The theory underlying the model is based on Samuelson, et al.'s (1986) work and Wilke's (1991) elaboration of it as the Greed/Efficiency/Fairness (GEF) hypothesis of motivation in the management of resource dilemmas, and behavioral economics informed by Kahneman and Thaler's prospect theory. ^ The model developed in this paper employs ordered probit econometric techniques applied to data derived from U.S. polls taken from 1990 to mid-2003 that measured support for health reform proposals. Outcome data are four-tiered Likert counts; independent variables are dummies representing the presence or absence of operationalizations of each behavioral variable, along with an integer representing policy process stage. Marginal effects of each independent variable predict how support levels change on triggering that variable. Model estimation results indicate a vanishingly small likelihood that all coefficients are zero and all variables have signs expected from model theory. ^ Three hypotheses were tested: support will drain from health reform policy as it becomes increasingly well-articulated and approaches enactment; reforms appealing to fairness through universal health coverage will enjoy a higher degree of support than those targeted more narrowly; health reforms calling for government operation of the health finance system will achieve lower support than those that do not. Model results support the first and last hypotheses. Contrary to expectations, universal health care proposals did not provide incremental support beyond those targeted to “deserving” populations—children, elderly, working families. In addition, loss of autonomy (e.g. restrictions on choice of care giver) is found to be the “third rail” of health reform with significantly-reduced support. When applied to a hypothetical health reform in which an employer-mandated Medical Savings Account policy is the centerpiece, the model predicts support that may be insufficient to enactment. These results indicate that the method developed in the paper may prove valuable to health policy designers. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Studies on the relationship between psychosocial determinants and HIV risk behaviors have produced little evidence to support hypotheses based on theoretical relationships. One limitation inherent in many articles in the literature is the method of measurement of the determinants and the analytic approach selected. ^ To reduce the misclassification associated with unit scaling of measures specific to internalized homonegativity, I evaluated the psychometric properties of the Reactions to Homosexuality scale in a confirmatory factor analytic framework. In addition, I assessed the measurement invariance of the scale across racial/ethnic classifications in a sample of men who have sex with men. The resulting measure contained eight items loading on three first-order factors. Invariance assessment identified metric and partial strong invariance between racial/ethnic groups in the sample. ^ Application of the updated measure to a structural model allowed for the exploration of direct and indirect effects of internalized homonegativity on unprotected anal intercourse. Pathways identified in the model show that drug and alcohol use at last sexual encounter, the number of sexual partners in the previous three months and sexual compulsivity all contribute directly to risk behavior. Internalized homonegativity reduced the likelihood of exposure to drugs, alcohol or higher numbers of partners. For men who developed compulsive sexual behavior as a coping strategy for internalized homonegativity, there was an increase in the prevalence odds of risk behavior. ^ In the final stage of the analysis, I conducted a latent profile analysis of the items in the updated Reactions to Homosexuality scale. This analysis identified five distinct profiles, which suggested that the construct was not homogeneous in samples of men who have sex with men. Lack of prior consideration of these distinct manifestations of internalized homonegativity may have contributed to the analytic difficulty in identifying a relationship between the trait and high-risk sexual practices. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Because of its simplicity and low cost, arm circumference (AC) is being used increasingly in screening for protein energy malnutrition among pre-school children in many parts of the developing world, especially where minimally trained health workers are employed. The objectives of this study were as follows: (1) To determine the relationship of the AC measure with weight for age and weight for height in the detection of malnutrition among pre-school children in a Guatemalan Indian village. (2) To determine the performance of minimally trained promoters under field conditions in measuring AC, weight and height. (3) To describe the practical aspects of taking AC measures versus weight, age and height.^ The study was conducted in San Pablo La Laguna, one of four villages situated on the shores of Lake Atitlan, Guatemala, in which a program of simplified medical care was implemented by the Institute for Nutrition for Central America and Panama (INCAP). Weight, height, AC and age data were collected for 144 chronically malnourished children. The measurements obtained by the trained investigator under the controlled conditions of the health post were correlated against one another and AC was found to have a correlation with weight for age of 0.7127 and with weight for height of 0.7911, both well within the 0.65 to 0.80 range reported in the literature. False positive and false negative analysis showed that AC was more sensitive when compared with weight for height than with weight for age. This was fortunate since, especially in areas with widespread chronic malnutrition, weight for height detects those acute cases in immediate danger of complicating illness or death. Moreover, most of the cases identified as malnourished by AC, but not by weight for height (false positives), were either young or very stunted which made their selection by AC better than weight for height. The large number of cases detected by weight for age, but not by AC (false negative rate--40%) were, however, mostly beyond the critical age period and had normal weight for heights.^ The performance of AC, weight for height and weight for age under field conditions in the hands of minimally trained health workers was also analyzed by correlating these measurements against the same criterion measurements taken under ideally controlled conditions of the health post. AC had the highest correlation with itself indicating that it deteriorated the least in the move to the field. Moreover, there was a high correlation between AC in the field and criterion weight for height (0.7509); this correlation was almost as high as that for field weight for height versus the same measure in the health post (0.7588). The implication is that field errors are so great for the compounded weight for height variable that, in the field, AC is about as good a predictor of the ideal weight for height measure.^ Minimally trained health workers made more errors than the investigator as exemplified by their lower intra-observer correlation coefficients. They consistently measured larger than the investigator for all measures. Also there was a great deal of variability between these minimally trained workers indicating that careful training and followup is necessary for the success of the AC measure.^ AC has many practical advantages compared to the other anthropometric tools. It does not require age data, which are often unreliable in these settings, and does not require sophisticated subtraction and two dimensional table-handling skills that weight for age and weight for height require. The measure is also more easily applied with less disturbance to the child and the community. The AC tape is cheap and not easily damaged or jarred out of calibration while being transported in rugged settings, as is often the case with weight scales. Moreover, it can be kept in a health worker's pocket at all times for continual use in a widespread range of settings. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study was to examine, in the context of an economic model of health production, the relationship between inputs (health influencing activities) and fitness.^ Primary data were collected from 204 employees of a large insurance company at the time of their enrollment in an industrially-based health promotion program. The inputs of production included medical care use, exercise, smoking, drinking, eating, coronary disease history, and obesity. The variables of age, gender and education known to affect the production process were also examined. Two estimates of fitness were used; self-report and a physiologic estimate based on exercise treadmill performance. Ordinary least squares and two-stage least squares regression analyses were used to estimate the fitness production functions.^ In the production of self-reported fitness status the coefficients for the exercise, smoking, eating, and drinking production inputs, and the control variable of gender were statistically significant and possessed theoretically correct signs. In the production of physiologic fitness exercise, smoking and gender were statistically significant. Exercise and gender were theoretically consistent while smoking was not. Results are compared with previous analyses of health production. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^