998 resultados para Bayes factor


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tese de doutoramento, Estatística e Investigação Operacional (Probabilidades e Estatística), Universidade de Lisboa, Faculdade de Ciências, 2014

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BackgroundBipolar disorder is a highly heritable polygenic disorder. Recent enrichment analyses suggest that there may be true risk variants for bipolar disorder in the expression quantitative trait loci (eQTL) in the brain.AimsWe sought to assess the impact of eQTL variants on bipolar disorder risk by combining data from both bipolar disorder genome-wide association studies (GWAS) and brain eQTL.MethodTo detect single nucleotide polymorphisms (SNPs) that influence expression levels of genes associated with bipolar disorder, we jointly analysed data from a bipolar disorder GWAS (7481 cases and 9250 controls) and a genome-wide brain (cortical) eQTL (193 healthy controls) using a Bayesian statistical method, with independent follow-up replications. The identified risk SNP was then further tested for association with hippocampal volume (n = 5775) and cognitive performance (n = 342) among healthy individuals.ResultsIntegrative analysis revealed a significant association between a brain eQTL rs6088662 on chromosome 20q11.22 and bipolar disorder (log Bayes factor = 5.48; bipolar disorder P = 5.85×10(-5)). Follow-up studies across multiple independent samples confirmed the association of the risk SNP (rs6088662) with gene expression and bipolar disorder susceptibility (P = 3.54×10(-8)). Further exploratory analysis revealed that rs6088662 is also associated with hippocampal volume and cognitive performance in healthy individuals.ConclusionsOur findings suggest that 20q11.22 is likely a risk region for bipolar disorder; they also highlight the informative value of integrating functional annotation of genetic variants for gene expression in advancing our understanding of the biological basis underlying complex disorders, such as bipolar disorder.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC) são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important goal of Zebu breeding programs is to improve reproductive performance. A major problem faced with the genetic improvement of reproductive traits is that recording the time for an animal to reach sexual maturity is costly. Another issue is that accurate estimates of breeding values are obtained only a long time after the young bulls have gone through selection. An alternative to overcome these problems is to use traits that are indicators of the reproductive efficiency of the herd and are easier to measure, such as age at first calving. Another problem is that heifers that have conceived once may fail to conceive in the next breeding season, which increases production costs. Thus, increasing heifer's rebreeding rates should improve the economic efficiency of the herd. Response to selection for these traits tends to be slow, since they have a low heritability and phenotypic information is provided only later in the life of the animal. Genome-wide association studies (GWAS) are useful to investigate the genetic mechanisms that underlie these traits by identifying the genes and metabolic pathways involved. Data from 1853 females belonging to the Agricultural Jacarezinho LTDA were used. Genotyping was performed using the BovineHD BeadChip (777 962 single nucleotide polymorphisms (SNPs)) according to the protocol of Illumina - Infinium Assay II ® Multi-Sample HiScan with the unit SQ ™ System. After quality control, 305 348 SNPs were used for GWAS. Forty-two and 19 SNPs had a Bayes factor greater than 150 for heifer rebreeding and age at first calving, respectively. All significant SNPs for age at first calving were significant for heifer rebreeding. These 42 SNPs were next or within 35 genes that were distributed over 18 chromosomes and comprised 27 protein-encoding genes, six pseudogenes and two miscellaneous noncoding RNAs. The use of Bayes factor to determine the significance of SNPs allowed us to identify two sets of 42 and 19 significant SNPs for heifer rebreeding and age at first calving, respectively, which explain 11.35 % and 6.42 % of their phenotypic variance, respectively. These SNPs provide relevant information to help elucidate which genes affect these traits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Source routes and Spatial Diffusion of capuchin monkeys over the past 6 million years, rebuilt in the SPREAD 1.0.6 from the MCC tree. The map shows the 10 different regions to which distinctive samples were associated. The different transmission routes have been calculated from the average rate over time. Only rates with Bayes factor> 3 were considered as significantly different from zero. Significant diffusion pathways are highlighted with color varying from dark brown to red, being the dark brown less significant rates and deep red the most significant rates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We report on measurements of neutrino oscillation using data from the T2K long-baseline neutrino experiment collected between 2010 and 2013. In an analysis of muon neutrino disappearance alone, we find the following estimates and 68% confidence intervals for the two possible mass hierarchies: Normal Hierarchy: sin²θ₂₃= 0.514+0.055−0.056 and ∆m²_32 = (2.51 ± 0.10) × 10⁻³ eV²/c⁴ Inverted Hierarchy: sin²θ₂₃= 0.511 ± 0.055 and ∆m²_13 = (2.48 ± 0.10) × 10⁻³ eV²/c⁴ The analysis accounts for multi-nucleon mechanisms in neutrino interactions which were found to introduce negligible bias. We describe our first analyses that combine measurements of muon neutrino disappearance and electron neutrino appearance to estimate four oscillation parameters, |∆m^2|, sin²θ₂₃, sin²θ₁₃, δCP , and the mass hierarchy. Frequentist and Bayesian intervals are presented for combinations of these parameters, with and without including recent reactor measurements. At 90% confidence level and including reactor measurements, we exclude the region δCP = [0.15, 0.83]π for normal hierarchy and δCP = [−0.08, 1.09]π for inverted hierarchy. The T2K and reactor data weakly favor the normal hierarchy with a Bayes Factor of 2.2. The most probable values and 68% 1D credible intervals for the other oscillation parameters, when reactor data are included, are: sin²θ₂₃= 0.528+0.055−0.038 and |∆m²_32| = (2.51 ± 0.11) × 10⁻³ eV²/c⁴.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study aimed at evaluating the validity, reliability, and factorial invariance of the complete (34-item) and shortened (8-item and 16-item) versions of the Body Shape Questionnaire (BSQ) when applied to Brazilian university students. A total of 739 female students with a mean age of 20.44 (standard deviation = 2.45) years participated. Confirmatory factor analysis was conducted to verify the degree to which the one-factor structure satisfies the proposal for the BSQ's expected structure. Two items of the 34-item version were excluded because they had factor weights (lambda)< 40. All models had adequate convergent validity (average variance extracted =.43-.58; composite reliability=.85-.97) and internal consistency (alpha =.85-.97). The 8-item B version was considered the best shortened BSQ version (Akaike information criterion = 84.07, Bayes information criterion = 157.75, Browne-Cudeck criterion= 84.46), with strong invariance for independent samples (Delta chi(2)lambda(7)= 5.06, Delta chi(2)Cov(8)= 5.11, Delta chi(2)Res(16) = 19.30). (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the present study was to propose and evaluate the use of factor analysis (FA) in obtaining latent variables (factors) that represent a set of pig traits simultaneously, for use in genome-wide selection (GWS) studies. We used crosses between outbred F2 populations of Brazilian Piau X commercial pigs. Data were obtained on 345 F2 pigs, genotyped for 237 SNPs, with 41 traits. FA allowed us to obtain four biologically interpretable factors: ?weight?, ?fat?, ?loin?, and ?performance?. These factors were used as dependent variables in multiple regression models of genomic selection (Bayes A, Bayes B, RR-BLUP, and Bayesian LASSO). The use of FA is presented as an interesting alternative to select individuals for multiple variables simultaneously in GWS studies; accuracy measurements of the factors were similar to those obtained when the original traits were considered individually. The similarities between the top 10% of individuals selected by the factor, and those selected by the individual traits, were also satisfactory. Moreover, the estimated markers effects for the traits were similar to those found for the relevant factor.