915 resultados para deviance information criteria, model averaging, MCMC, genomewide association studies, epistasis, logistic regression, stochastic search algorithm, case-control studies, Type I diabetes, single nucleotide polymorphism, gene expression programming
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
Introduction : La présente étude explore pour la première fois en Afrique, l'association entre l'exposition aux pesticides organochlorés (POC) et le risque de diabète de type 2. L’étude se déroule dans la région du Borgou, au nord du Bénin, où l’utilisation intense de pesticides pour la culture du coton coïncide avec une prévalence élevée du diabète par rapport aux autres régions du pays. Objectifs: 1) Décrire le niveau d’exposition de la population des diabétiques et non diabétiques du Borgou par le taux sérique de certains pesticides organochlorés ; 2) Explorer la relation entre le risque de diabète de type 2 et les concentrations sériques des POC; 3) Examiner l’association entre l’obésité globale, le pourcentage de masse grasse et l’obésité abdominale avec les concentrations sériques des POC; 4) Explorer la contribution de certaines sources d’exposition alimentaire et non-alimentaire aux concentrations sériques des POC. Méthodes : Il s'agit d'une étude cas-témoin qui concerne 258 adultes de 18 à 65 ans identifiés par deux valeurs glycémiques capillaire et veineuse au seuil de 7 mmol/l pour les diabétiques et 5,6mmol/l pour les témoins non diabétiques. Les 129 témoins ont été appariés aux 129 cas selon les critères suivants : l’ethnie, l’âge ± 5ans, le sexe et la localité de résidence. Les informations personnelles et celles portant sur les modes d’exposition ont été recueillies par questionnaire. Les concentrations sériques des POC ont été obtenues par chromatographie gazeuse couplée d’une spectrométrie de masse. L’obésité globale est déterminée par l’IMC ≥ 30 kg/m2. L’obésité abdominale est obtenue par le tour de taille selon les critères consensuels d’Alberti et al. pour la définition du syndrome métabolique. Le pourcentage de masse corporelle a été mesuré par bio-impédance électrique et été considéré comme élevé au seuil de 33% chez les femmes et 25% chez les hommes. Résultats: En comparant les 3ème et premier terciles des concentrations de p,p’-DDE et p,p’-DDT, les sujets du 3e tercile étaient deux à trois fois plus susceptibles de présenter du diabète que ceux du 1er tercile. La probabilité d’obésité abdominale ou de l’obésité générale (en contrôlant pour le statut diabétique) était accrue de trois à cinq fois dans le dernier tercile pour trois des quatre POC qui étaient détectables soit p,p’-DDT, β-HCH et trans-Nonachlore. Les facteurs socioéconomiques associés aux POC sériques sont le niveau d’éducation élevé, un meilleur revenu et la résidence en milieu urbain. Les sources d’exposition non alimentaire significativement associées aux concentrations sériques de POC étaient l’exposition professionnelle mixte (primaire et secondaire) aux pesticides et la consommation de tabac local. L’achat en opposition à l’autoproduction de plusieurs groupes de denrées alimentaire était associé à de plus fortes teneurs de POC. La fréquence de consommation hebdomadaire du poisson, des légumes, du fromage, de l’igname séchée ainsi que du mil, de l’huile de palme et de certaines légumineuses comme le soya, le néré, le niébé et le voandzou était significativement associées aux POC sériques. Conclusion : L’étude a mis en évidence la relation entre le niveau sérique de pesticides organochlorés d’une part, du diabète ou de l’obésité d’autre part. Les concentrations de POC observées au Borgou sont assez élevées et méritent d’être suivies et comparées à celles d’autres régions du pays. Les facteurs contribuant à ces teneurs élevées sont le niveau d’éducation élevé, un meilleur revenu, la résidence en milieu urbain, l’achat et la fréquence de consommation de plusieurs aliments. La contribution du mélange des polluants auxquels les habitants de cette région sont exposés à la prévalence croissante du diabète mérite d’être examinée, notamment les pesticides utilisés actuellement dans la région pour les productions de rente et autres polluants persistants. Ces résultats contribuent à accroître les connaissances sur les facteurs de risque émergents pour le diabète que sont des polluants environnementaux comme les pesticides. Les implications pour la santé publique sont importantes tant pour la prévention des maladies chroniques que pour la sensibilisation des autorités politiques du pays pour une politique agricole et sanitaire adéquate visant la réduction de l’exposition aux pesticides.
Resumo:
Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.
Resumo:
We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.
Resumo:
BACKGROUND: this study examined the association of -866G/A, Ala55Val, 45bpI/D, and -55C/T polymorphisms at the uncoupling protein (UCP) 3-2 loci with type 2 diabetes in Asian Indians. METHODS: a case-control study was performed among 1,406 unrelated subjects (487 with type 2 diabetes and 919 normal glucose-tolerant [NGT]), chosen from the Chennai Urban Rural Epidemiology Study, an ongoing population-based study in Southern India. The polymorphisms were genotyped using polymerase chain reaction-restriction fragment length polymorphism and direct sequencing. Haplotype frequencies were estimated using an expectation-maximization algorithm. Linkage disequilibrium was estimated from the estimates of haplotypic frequencies. RESULTS: the genotype (P = 0.00006) and the allele (P = 0.00007) frequencies of Ala55Val of the UCP2 gene showed a significant protective effect against the development of type 2 diabetes. The odds ratios (adjusted for age, sex, and body mass index) for diabetes for individuals carrying Ala/Val was 0.72, and that for individuals carrying Val/Val was 0.37. Homeostasis insulin resistance model assessment and 2-h plasma glucose were significantly lower among Val-allele carriers compared to the Ala/Ala genotype within the NGT group. The genotype (P = 0.02) and the allele (P = 0.002) frequencies of -55C/T of the UCP3 gene showed a significant protective effect against the development of diabetes. The odds ratio for diabetes for individuals carrying CT was 0.79, and that for individuals carrying TT was 0.61. The haplotype analyses further confirmed the association of Ala55Val with diabetes, where the haplotypes carrying the Ala allele were significantly higher in the cases compared to controls. CONCLUSIONS: Ala55Val and -55C/T polymorphisms at the UCP3-2 loci are associated with a significantly reduced risk of developing type 2 diabetes in Asian Indians.
Resumo:
Background: There is evidence that physical activity (PA) can attenuate the influence of the fat mass- and obesity-associated (FTO) genotype on the risk to develop obesity. However, whether providing personalized information on FTO genotype leads to changes in PA is unknown. Objective: The purpose of this study was to determine if disclosing FTO risk had an impact on change in PA following a 6-month intervention. Methods: The single nucleotide polymorphism (SNP) rs9939609 in the FTO gene was genotyped in 1279 participants of the Food4Me study, a four-arm, Web-based randomized controlled trial (RCT) in 7 European countries on the effects of personalized advice on nutrition and PA. PA was measured objectively using a TracmorD accelerometer and was self-reported using the Baecke questionnaire at baseline and 6 months. Differences in baseline PA variables between risk (AA and AT genotypes) and nonrisk (TT genotype) carriers were tested using multiple linear regression. Impact of FTO risk disclosure on PA change at 6 months was assessed among participants with inadequate PA, by including an interaction term in the model: disclosure (yes/no) × FTO risk (yes/no). Results: At baseline, data on PA were available for 874 and 405 participants with the risk and nonrisk FTO genotypes, respectively. There were no significant differences in objectively measured or self-reported baseline PA between risk and nonrisk carriers. A total of 807 (72.05%) of the participants out of 1120 in the personalized groups were encouraged to increase PA at baseline. Knowledge of FTO risk had no impact on PA in either risk or nonrisk carriers after the 6-month intervention. Attrition was higher in nonrisk participants for whom genotype was disclosed (P=.01) compared with their at-risk counterparts. Conclusions: No association between baseline PA and FTO risk genotype was observed. There was no added benefit of disclosing FTO risk on changes in PA in this personalized intervention. Further RCT studies are warranted to confirm whether disclosure of nonrisk genetic test results has adverse effects on engagement in behavior change.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The recent appreciation of the role played by endogenous counterregulatory mechanisms in controlling the outcome of the host inflammatory response requires specific analysis of their spatial and temporal profiles. In this study, we have focused on the glucocorticoid-regulated anti-inflammatory mediator annexin 1. Induction of peritonitis in wild-type mice rapidly (4 h) produced the expected signs of inflammation, including marked activation of resident cells (e.g., mast cells), migration of blood-borne leukocytes, mirrored by blood neutrophilia. These changes subsided after 48-96 h. In annexin 1null mice, the peritonitis response was exaggerated (∼40% at 4 h), with increased granulocyte migration and cytokine production. In blood leukocytes, annexin 1 gene expression was activated at 4, but not 24, h postzymosan, whereas protein levels were increased ai both time points. Locally, endothelial and mast cell annexin 1 gene expression was not detectable in basal conditions, whereas it was switched on during the inflammatory response. The significance of annexin 1 system plasticity in the anti-inflammatory properties of dexamethasone was assessed. Clear induction of annexin 1 gene in response to dexamethasone treatment was evident in the circulating and migrated leukocytes, and in connective tissue mast cells; this was associated with the steroid failure to inhibit leukocyte trafficking, cytokine synthesis, and mast cell degranulation in the annexin 1null mouse. In conclusion, understanding how inflammation is brought under control will help clarify the complex interplay between pro- and anti-inflammatory pathways operating during the host response to injury and infection. Copyright © 2006 by The American Association of Immunologists, Inc.
Resumo:
Background: The sequencing and publication of the cattle genome and the identification of single nucleotide polymorphism (SNP) molecular markers have provided new tools for animal genetic evaluation and genomic-enhanced selection. These new tools aim to increase the accuracy and scope of selection while decreasing generation interval. The objective of this study was to evaluate the enhancement of accuracy caused by the use of genomic information (Clarifide® - Pfizer) on genetic evaluation of Brazilian Nellore cattle. Review: The application of genome-wide association studies (GWAS) is recognized as one of the most practical approaches to modern genetic improvement. Genomic selection is perhaps most suited to the improvement of traits with low heritability in zebu cattle. The primary interest in livestock genomics has been to estimate the effects of all the markers on the chip, conduct cross-validation to determine accuracy, and apply the resulting information in GWAS either alone [9] or in combination with bull test and pedigree-based genetic evaluation data. The cost of SNP50K genotyping however limits the commercial application of GWAS based on all the SNPs on the chip. However, reasonable predictability and accuracy can be achieved in GWAS by using an assay that contains an optimally selected predictive subset of markers, as opposed to all the SNPs on the chip. The best way to integrate genomic information into genetic improvement programs is to have it included in traditional genetic evaluations. This approach combines traditional expected progeny differences based on phenotype and pedigree with the genomic breeding values based on the markers. Including the different sources of information into a multiple trait genetic evaluation model, for within breed dairy cattle selection, is working with excellent results. However, given the wide genetic diversity of zebu breeds, the high-density panel used for genomic selection in dairy cattle (Ilumina Bovine SNP50 array) appears insufficient for across-breed genomic predictions and selection in beef cattle. Today there is only one breed-specific targeted SNP panel and genomic predictions developed using animals across the entire population of the Nellore breed (www.pfizersaudeanimal.com), which enables genomically - enhanced selection. Genomic profiles are a way to enhance our current selection tools to achieve more accurate predictions for younger animals. Material and Methods: We analyzed the age at first calving (AFC), accumulated productivity (ACP), stayability (STAY) and heifer pregnancy at 30 months (HP30) in Nellore cattle fitting two different animal models; 1) a traditional single trait model, and 2) a two-trait model where the genomic breeding value or molecular value prediction (MVP) was included as a correlated trait. All mixed model analyses were performed using the statistical software ASREML 3.0. Results: Genetic correlation estimates between AFC, ACP, STAY, HP30 and respective MVPs ranged from 0.29 to 0.46. Results also showed an increase of 56%, 36%, 62% and 19% in estimated accuracy of AFC, ACP, STAY and HP30 when MVP information was included in the animal model. Conclusion: Depending upon the trait, integration of MVP information into genetic evaluation resulted in increased accuracy of 19% to 62% as compared to accuracy from traditional genetic evaluation. GE-EPD will be an effective tool to enable faster genetic improvement through more dependable selection of young animals.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Summary: Neuropathic pain (NP) is a well-recognized feature of leprosy neuropathy. However, the diagnosis of NP is difficult using only clinical criteria. In the study reported here, by means of conventional nerve conduction studies, the authors sought for an association between long-latency responses and NP complaints in leprosy patients with type 1 and 2 reactions. Of the 27 ulnar nerves of leprosy patients, 18 with type 1 reaction (T1R) and 9 with type 2 reaction (T2R) were followed-up for 6 months before and after steroid treatment. Clinical characteristics of pain complaints and clinical function were assessed, as well as the presence of F- and A-waves of the ulnar nerve using nerve conduction studies. The clinical and the neurophysiologic findings were compared to note positive concordances (presence of NP and A-waves together) and negative concordances (absence of NP and A-waves together) before and after treatment. Both reactions presented a high frequency of A-waves (61.1% in T1R and 66.7% in T2R, P < 0.05) and prolonged F-waves (69.4% in T1R and 65.8% in T2R, P = 0.4). No concordances were seen between pain complaints and F-waves. However, significant concordances between NP and A-waves were observed, although restricted to the T2R group ([chi]2 = 5.65, P = 0.04). After treatment, there was a significant reduction in pain complaints, as well as the presence of F- and A-waves in both groups (P < 0.05 for all comparisons). In conclusion, the presence of A-waves correlates well with pain complaints of neuropathic characteristics in leprosy patients, especially in those with type 2 reaction. Probably, such response shares similar mechanisms with the small-fiber dysfunction seen in these patients with NP, such as demyelination, intraneural edema, and axonal sprouting. Further studies using specific tools for small-fiber assessment are warranted to confirm our findings.
Resumo:
Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).
Resumo:
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.
Resumo:
The domestic dog offers a unique opportunity to explore the genetic basis of disease, morphology and behaviour. Humans share many diseases with our canine companions, making dogs an ideal model organism for comparative disease genetics. Using newly developed resources, genome-wide association studies in dog breeds are proving to be exceptionally powerful. Towards this aim, veterinarians and geneticists from 12 European countries are collaborating to collect and analyse the DNA from large cohorts of dogs suffering from a range of carefully defined diseases of relevance to human health. This project, named LUPA, has already delivered considerable results. The consortium has collaborated to develop a new high density single nucleotide polymorphism (SNP) array. Mutations for four monogenic diseases have been identified and the information has been utilised to find mutations in human patients. Several complex diseases have been mapped and fine mapping is underway. These findings should ultimately lead to a better understanding of the molecular mechanisms underlying complex diseases in both humans and their best friend.