12 resultados para variable aggregation
em DigitalCommons@The Texas Medical Center
Resumo:
Alzheimer's disease (AD) is characterized by the cerebral accumulation of misfolded and aggregated amyloid-beta protein (Abeta). Disease symptoms can be alleviated, in vitro and in vivo, by 'beta-sheet breaker' pentapeptides that reduce plaque load. However the peptide nature of these compounds, made them biologically unstable and unable to penetrate membranes with high efficiency. The main goal of this study was to use computational methods to identify small molecule mimetics with better drug-like properties. For this purpose, the docked conformations of the active peptides were used to identify compounds with similar activities. A series of related beta-sheet breaker peptides were docked to solid state NMR structures of a fibrillar form of Abeta. The lowest energy conformations of the active peptides were used to design three dimensional (3D)-pharmacophores, suitable for screening the NCI database with Unity. Small molecular weight compounds with physicochemical features and a conformation similar to the active peptides were selected, ranked by docking and biochemical parameters. Of 16 diverse compounds selected for experimental screening, 2 prevented and reversed Abeta aggregation at 2-3microM concentration, as measured by Thioflavin T (ThT) fluorescence and ELISA assays. They also prevented the toxic effects of aggregated Abeta on neuroblastoma cells. Their low molecular weight and aqueous solubility makes them promising lead compounds for treating AD.
Resumo:
The Lyme disease agent Borrelia burgdorferi can persistently infect humans and other animals despite host active immune responses. This is facilitated, in part, by the vls locus, a complex system consisting of the vlsE expression site and an adjacent set of 11 to 15 silent vls cassettes. Segments of nonexpressed cassettes recombine with the vlsE region during infection of mammalian hosts, resulting in combinatorial antigenic variation of the VlsE outer surface protein. We now demonstrate that synthesis of VlsE is regulated during the natural mammal-tick infectious cycle, being activated in mammals but repressed during tick colonization. Examination of cultured B. burgdorferi cells indicated that the spirochete controls vlsE transcription levels in response to environmental cues. Analysis of PvlsE::gfp fusions in B. burgdorferi indicated that VlsE production is controlled at the level of transcriptional initiation, and regions of 5' DNA involved in the regulation were identified. Electrophoretic mobility shift assays detected qualitative and quantitative changes in patterns of protein-DNA complexes formed between the vlsE promoter and cytoplasmic proteins, suggesting the involvement of DNA-binding proteins in the regulation of vlsE, with at least one protein acting as a transcriptional activator.
Resumo:
An exact knowledge of the kinetic nature of the interaction between the stimulatory G protein (G$\sb{\rm s}$) and the adenylyl cyclase catalytic unit (C) is essential for interpreting the effects of Gs mutations and expression levels on cellular response to a wide variety of hormones, drugs, and neurotransmitters. In particular, insight as to the association of these proteins could lead to progress in tumor biology where single spontaneous mutations in G proteins have been associated with the formation of tumors (118). The question this work attempts to answer is whether the adenylyl cyclase activation by epinephrine stimulated $\beta\sb2$-adrenergic receptors occurs via G$\sb{\rm s}$ proteins by a G$\sb{\rm s}$ to C shuttle or G$\sb{\rm s}$-C precoupled mechanism. The two forms of activation are distinguishable by the effect of G$\sb{\rm s}$ levels on epinephrine stimulated EC50 values for cyclase activation.^ We have made stable transfectants of S49 cyc$\sp-$ cells with the gene for the $\alpha$ protein of G$\sb{\rm s}$ $(\alpha\sb{\rm s})$ which is under the control of the mouse mammary tumor virus LTR promoter (110). Expression of G$\sb{\rm s}\alpha$ was then controlled by incubation of the cells for various times with 5 $\mu$M dexamethasone. Expression of G$\sb{\rm s}\alpha$ led to the appearance of GTP shifts in the competitive binding of epinephrine with $\sp{125}$ICYP to the $\beta$-adrenergic receptors and to agonist dependent adenylyl cyclase activity. High expression of G$\sb{\rm s}\alpha$ resulted in lower EC50's for the adenylyl cyclase activity in response to epinephrine than did low expression. By kinetic modelling, this result is consistent with the existence of a shuttle mechanism for adenylyl cyclase activation by hormones.^ One item of concern that remains to be addressed is the extent to which activation of adenylyl cyclase occurs by a "pure" shuttle mechanism. Kinetic and biochemical experiments by other investigators have revealed that adenylyl cyclase activation, by hormones, may occur via a Gs-C precoupled mechanism (80, 94, 97). Activation of adenylyl cyclase, therefore, probably does not occur by either a pure "'Shuttle" or "Gs-C Precoupled" mechanism, but rather by a "Hybrid" mechanism. The extent to which either the shuttle or precoupled mechanism contributes to hormone stimulated adenylyl cyclase activity is the subject of on-going research. ^
Resumo:
Prostate cancer (PC) is a significant economic and health burden in the U.S. and Europe but its causes are largely unknown. The most significant risk factors (after gender) are age and family history of the disease. A gene with high penetrance but low frequency on chromosome 1q, HPC 1, has been suggested to cause a proportion of the familial aggregation of PC but other more common genes, conferring less risk, are also thought to contribute to disease predisposition. We have pursued a strategy to study both types of genetic risk in PC. To identify high penetrance genes, affected men from thirteen families have been genotyped for genetic linkage analysis at six microsatellite markers spanning 45 cM of 1q24-25. Both LOD score and non-parametric statistics provide no significant support for HPC1 in this genomic region, although 3 of the families did combine to produce a LOD score of 0.9. These families will be included in a genome wide search for other PC predisposition genes as part of a multinational collaboration.^ For study of common genetic factors in PC development, leukocyte DNA samples from an unselected series of 55 patients and 67 controls have been examined for genetic differences in two other candidate genes, the androgen receptor gene, hAR, at Xq11-12, and the vitamin D receptor gene, hVDR, at 12q12-14. hAR was typed for two trinucleotide repeat length polymorphisms, (CAG)$\rm\sb{n}$ and (GGC)$\rm\sb{n},$ encoding polyglutamine and polyglycine tracts, respectively, which have been implicated in PC susceptibility. These data, combined with similarly processed patients and controls from the U.K. show no consistent association of allele length with PC risk. A novel finding, however, has been a significant association between the number of GGC repeats and the length of time between diagnosis and relapse in stage T1-T4 Caucasian patients irrespective of therapy and age of the patient. Of 49 patients who relapsed out of 108 entering the study, those with 16 or fewer GGC repeats had an average relapse-free-period of 101 (+/$-$7.7) months while for those with more than 16 repeats the period averaged 48 (+/$-$2.9) months, a difference of 2.1 fold or 4.4 years.^ The second gene, hVDR, was genotyped at two polymorphisms, a synonymous C/T substitution in exon 9 identified by differential TaqI enzymatic digestion and a variable length polyA tract in the 3$\sp\prime$ UTR. Although these polymorphisms are in strong linkage disequilibrium only the polyA region showed a possible association with PC risk. Men homozygous for alleles with fewer than 18 A's had an increased risk (OR = 3.0, p = 0.0578) compared to controls. This result is opposite to the findings of others and may either indicate off-setting random errors which together balance out to no significant overall effect or reflect more complex genetic and/or environmental associations.^ Overall, this research suggests that single gene familial predisposition may be less prominent in PC than in other cancers and that the characteristics of PC pathology may be useful in identifying the effects of common genetic factors. ^
Resumo:
The genetic etiology of stroke likely reflects the influence of multiple loci with small effects, each modulating different pathophysiological processes. This research project utilized three analytical strategies to address the paucity of information related to the identification and characterization of genetic variation associated with stroke in the general population. ^ First, the general contribution of familial factors to stroke susceptibility was evaluated in a population-based sample of unrelated individuals. Increased risk of subclinical cerebral infarction was observed among individuals with a positive parental history of stroke. This association did not appear to be mediated by established stroke risk factors, specifically blood pressure levels or hypertension status. ^ The need to identify specific gene variation associated with stroke in the general population was addressed by evaluating seven candidate gene polymorphisms in a population-based sample of unrelated individuals. Three polymorphisms were significantly associated with increased subclinical cerebral infarction or incident clinical ischemic stroke risk. These relationships include the G-protein β3 subunit 825C/T polymorphism and clinical stroke in Whites, the lipoprotein lipase S/X447 polymorphism and subclinical and clinical stroke in men, and the angiotensin I-converting enzyme Ins/Del polymorphism and subclinical stroke in White men. These associations did not appear to be obfuscated by the stroke risk factors adjusted for in the analysis models specifically blood pressure levels or anti-hypertensive medication use. ^ The final research strategy considered, on a genome-wide scale, the idea that genetic variation may contribute to the occurrence of hypertension or stroke through a common etiologic pathway. Genomic regions were identified for which significant evidence of heterogeneity was observed among hypertensive sibpairs stratified by family history of stroke information. Regions identified on chromosome 15 in African Americans, and chromosome 13 in Whites and African Americans, suggest the presence of genes influencing hypertension and stroke susceptibility. ^ Insight into the role of genetics in stroke is useful for the potential early identification of individuals at increased risk for stroke and improved understanding of the etiology of the disease. The ultimate goal of these endeavors is to guide the development of therapeutic intervention and informed prevention to provide a lasting and positive impact on public health. ^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
Studies on the relationship between psychosocial determinants and HIV risk behaviors have produced little evidence to support hypotheses based on theoretical relationships. One limitation inherent in many articles in the literature is the method of measurement of the determinants and the analytic approach selected. ^ To reduce the misclassification associated with unit scaling of measures specific to internalized homonegativity, I evaluated the psychometric properties of the Reactions to Homosexuality scale in a confirmatory factor analytic framework. In addition, I assessed the measurement invariance of the scale across racial/ethnic classifications in a sample of men who have sex with men. The resulting measure contained eight items loading on three first-order factors. Invariance assessment identified metric and partial strong invariance between racial/ethnic groups in the sample. ^ Application of the updated measure to a structural model allowed for the exploration of direct and indirect effects of internalized homonegativity on unprotected anal intercourse. Pathways identified in the model show that drug and alcohol use at last sexual encounter, the number of sexual partners in the previous three months and sexual compulsivity all contribute directly to risk behavior. Internalized homonegativity reduced the likelihood of exposure to drugs, alcohol or higher numbers of partners. For men who developed compulsive sexual behavior as a coping strategy for internalized homonegativity, there was an increase in the prevalence odds of risk behavior. ^ In the final stage of the analysis, I conducted a latent profile analysis of the items in the updated Reactions to Homosexuality scale. This analysis identified five distinct profiles, which suggested that the construct was not homogeneous in samples of men who have sex with men. Lack of prior consideration of these distinct manifestations of internalized homonegativity may have contributed to the analytic difficulty in identifying a relationship between the trait and high-risk sexual practices. ^
Resumo:
This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^
Resumo:
The purpose of this study was to determine the effects of nutrient intake, genetic factors and common household environmental factors on the aggregation of fasting blood glucose among Mexican-Americans in Starr County, Texas. This study was designed to determine: (a) the proportion of variation of fasting blood glucose concentration explained by unmeasured genetic and common household environmental effects; (b) the degree of familial aggregation of measures of nutrient intake; and (c) the extent to which the familial aggregation of fasting blood glucose is explained by nutrient intake and its aggregation. The method of path analysis was employed to determine these various effects.^ Genes play an important role in fasting blood glucose: Genetic variation was found to explain about 40% of the total variation in fasting blood glucose. Common household environmental effects, on the other hand, explained less than 3% of the variation in fasting blood glucose levels among individuals. Common household effects, however, did have significant effects on measures of nutrient intake, though it explained only about 10% of the total variance in nutrient intake. Finally, there was significant familial aggregation of nutrient intake measures, but their aggregation did not contribute significantly to the familial aggregation of fasting blood glucose. These results imply that similarities among relatives for fasting blood glucose are not due to similarities in nutrient intake among relatives. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^