834 resultados para Naïve Bayes classifier
Resumo:
New applications of genetic data to questions of historical biogeography have revolutionized our understanding of how organisms have come to occupy their present distributions. Phylogenetic methods in combination with divergence time estimation can reveal biogeographical centres of origin, differentiate between hypotheses of vicariance and dispersal, and reveal the directionality of dispersal events. Despite their power, however, phylogenetic methods can sometimes yield patterns that are compatible with multiple, equally well-supported biogeographical hypotheses. In such cases, additional approaches must be integrated to differentiate among conflicting dispersal hypotheses. Here, we use a synthetic approach that draws upon the analytical strengths of coalescent and population genetic methods to augment phylogenetic analyses in order to assess the biogeographical history of Madagascar's Triaenops bats (Chiroptera: Hipposideridae). Phylogenetic analyses of mitochondrial DNA sequence data for Malagasy and east African Triaenops reveal a pattern that equally supports two competing hypotheses. While the phylogeny cannot determine whether Africa or Madagascar was the centre of origin for the species investigated, it serves as the essential backbone for the application of coalescent and population genetic methods. From the application of these methods, we conclude that a hypothesis of two independent but unidirectional dispersal events from Africa to Madagascar is best supported by the data.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Association between DNA damage response and repair genes and risk of invasive serous ovarian cancer.
Resumo:
BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results.
Resumo:
Currently, no available pathological or molecular measures of tumor angiogenesis predict response to antiangiogenic therapies used in clinical practice. Recognizing that tumor endothelial cells (EC) and EC activation and survival signaling are the direct targets of these therapies, we sought to develop an automated platform for quantifying activity of critical signaling pathways and other biological events in EC of patient tumors by histopathology. Computer image analysis of EC in highly heterogeneous human tumors by a statistical classifier trained using examples selected by human experts performed poorly due to subjectivity and selection bias. We hypothesized that the analysis can be optimized by a more active process to aid experts in identifying informative training examples. To test this hypothesis, we incorporated a novel active learning (AL) algorithm into FARSIGHT image analysis software that aids the expert by seeking out informative examples for the operator to label. The resulting FARSIGHT-AL system identified EC with specificity and sensitivity consistently greater than 0.9 and outperformed traditional supervised classification algorithms. The system modeled individual operator preferences and generated reproducible results. Using the results of EC classification, we also quantified proliferation (Ki67) and activity in important signal transduction pathways (MAP kinase, STAT3) in immunostained human clear cell renal cell carcinoma and other tumors. FARSIGHT-AL enables characterization of EC in conventionally preserved human tumors in a more automated process suitable for testing and validating in clinical trials. The results of our study support a unique opportunity for quantifying angiogenesis in a manner that can now be tested for its ability to identify novel predictive and response biomarkers.
Resumo:
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Resumo:
In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).
Resumo:
Learning multiple tasks across heterogeneous domains is a challenging problem since the feature space may not be the same for different tasks. We assume the data in multiple tasks are generated from a latent common domain via sparse domain transforms and propose a latent probit model (LPM) to jointly learn the domain transforms, and the shared probit classifier in the common domain. To learn meaningful task relatedness and avoid over-fitting in classification, we introduce sparsity in the domain transforms matrices, as well as in the common classifier. We derive theoretical bounds for the estimation error of the classifier in terms of the sparsity of domain transforms. An expectation-maximization algorithm is derived for learning the LPM. The effectiveness of the approach is demonstrated on several real datasets.
Resumo:
CD8+ T cells are associated with long term control of virus replication to low or undetectable levels in a population of HIV+ therapy-naïve individuals known as virus controllers (VCs; <5000 RNA copies/ml and CD4+ lymphocyte counts >400 cells/µl). These subjects' ability to control viremia in the absence of therapy makes them the gold standard for the type of CD8+ T-cell response that should be induced with a vaccine. Studying the regulation of CD8+ T cells responses in these VCs provides the opportunity to discover mechanisms of durable control of HIV-1. Previous research has shown that the CD8+ T cell population in VCs is heterogeneous in its ability to inhibit virus replication and distinct T cells are responsible for virus inhibition. Further defining both the functional properties and regulation of the specific features of the select CD8+ T cells responsible for potent control of viremia the in VCs would enable better evaluation of T cell-directed vaccine strategies and may inform the design of new therapies.
Here we discuss the progress made in elucidating the features and regulation of CD8+ T cell response in virus controllers. We first detail the development of assays to quantify CD8+ T cells' ability to inhibit virus replication. This includes the use of a multi-clade HIV-1 panel which can subsequently be used as a tool for evaluation of T cell directed vaccines. We used these assays to evaluate the CD8+ response among cohorts of HIV-1 seronegative, HIV-1 acutely infected, and HIV-1 chronically infected (both VC and chronic viremic) patients. Contact and soluble CD8+ T cell virus inhibition assays (VIAs) are able to distinguish these patient groups based on the presence and magnitude of the responses. When employed in conjunction with peptide stimulation, the soluble assay reveals peptide stimulation induces CD8+ T cell responses with a prevalence of Gag p24 and Nef specificity among the virus controllers tested. Given this prevalence, we aimed to determine the gene expression profile of Gag p24-, Nef-, and unstimulated CD8+ T cells. RNA was isolated from CD8+ T-cells from two virus controllers with strong virus inhibition and one seronegative donor after a 5.5 hour stimulation period then analyzed using the Illumina Human BeadChip platform (Duke Center for Human Genome Variation). Analysis revealed that 565 (242 Nef and 323 Gag) genes were differentially expressed in CD8+ T-cells that were able to inhibit virus replication compared to those that could not. We compared the differentially expressed genes to published data sets from other CD8+ T-cell effector function experiments focusing our analysis on the most recurring genes with immunological, gene regulatory, apoptotic or unknown functions. The most commonly identified gene in these studies was TNFRSF9. Using PCR in a larger cohort of virus controllers we confirmed the up-regulation of TNFRSF9 in Gag p24 and Nef-specific CD8+ T cell mediated virus inhibition. We also observed increase in the mRNA encoding antiviral cytokines macrophage inflammatory proteins (MIP-1α, MIP-1αP, MIP-1β), interferon gamma (IFN-γ), granulocyte-macrophage colony-stimulating factor (GM-CSF), and recently identified lymphotactin (XCL1).
Our previous work suggests the CD8+ T-cell response to HIV-1 can be regulated at the level of gene regulation. Because RNA abundance is modulated by transcription of new mRNAs and decay of new and existing RNA we aimed to evaluate the net rate of transcription and mRNA decay for the cytokines we identified as differentially regulated. To estimate rate of mRNA synthesis and decay, we stimulated isolated CD8+ T-cells with Gag p24 and Nef peptides adding 4-thiouridine (4SU) during the final hour of stimulation, allowing for separation of RNA made during the final hour of stimulation. Subsequent PCR of RNA isolated from these cells, allowed us to determine how much mRNA was made for our genes of interest during the final hour which we used to calculate rate of transcription. To assess if stimulation caused a change in RNA stability, we calculated the decay rates of these mRNA over time. In Gag p24 and Nef stimulated T cells , the abundance of the mRNA of many of the cytokines examined was dependent on changes in both transcription and mRNA decay with evidence for potential differences in the regulation of mRNA between Nef and Gag specific CD8+ T cells. The results were highly reproducible in that in one subject that was measured in three independent experiments the results were concordant.
This data suggests that mRNA stability, in addition to transcription, is key in regulating the direct anti-HIV-1 function of antigen-specific memory CD8+ T cells by enabling rapid recall of anti-HIV-1 effector functions, namely the production and increased stability of antiviral cytokines. We have started to uncover the mechanisms employed by CD8+ T cell subsets with antigen-specific anti-HIV-1 activity, in turn, enhancing our ability to inhibit virus replication by informing both cure strategies and HIV-1 vaccine designs that aim to reduce transmission and can aid in blocking HIV-1 acquisition.
Resumo:
Participants with posttraumatic stress disorder (PTSD) and participants with a trauma but without PTSD wrote narratives of their trauma and, for comparison, of the most-important and the happiest events that occurred within a year of their trauma. They then rated these three events on coherence. Based on participants' self-ratings and on naïve-observer scorings of the participants' narratives, memories of traumas were not more incoherent than the comparison memories in participants in general or in participants with PTSD. This study comprehensively assesses narrative coherence using a full two (PTSD or not) by two (traumatic event or not) design. The results are counter to most prevalent theoretical views of memory for trauma.
Resumo:
Simian-human immunodeficiency viruses (SHIVs) that mirror natural transmitted/founder (T/F) viruses in man are needed for evaluation of HIV-1 vaccine candidates in nonhuman primates. Currently available SHIVs contain HIV-1 env genes from chronically-infected individuals and do not reflect the characteristics of biologically relevant HIV-1 strains that mediate human transmission. We chose to develop clade C SHIVs, as clade C is the major infecting subtype of HIV-1 in the world. We constructed 10 clade C SHIVs expressing Env proteins from T/F viruses. Three of these ten clade C SHIVs (SHIV KB9 C3, SHIV KB9 C4 and SHIV KB9 C5) replicated in naïve rhesus monkeys. These three SHIVs are mucosally transmissible and are neutralized by sCD4 and several HIV-1 broadly neutralizing antibodies. However, like natural T/F viruses, they exhibit low Env reactivity and a Tier 2 neutralization sensitivity. Of note, none of the clade C T/F SHIVs elicited detectable autologous neutralizing antibodies in the infected monkeys, even though antibodies that neutralized a heterologous Tier 1 HIV-1 were generated. Challenge with these three new clade C SHIVs will provide biologically relevant tests for vaccine protection in rhesus macaques.
Resumo:
OBJECTIVE: To characterize B-cell subsets in patients with muscle-specific tyrosine kinase (MuSK) myasthenia gravis (MG). METHODS: In accordance with Human Immunology Project Consortium guidelines, we performed polychromatic flow cytometry and ELISA assays in peripheral blood samples from 18 patients with MuSK MG and 9 healthy controls. To complement a B-cell phenotype assay that evaluated maturational subsets, we measured B10 cell percentages, plasma B cell-activating factor (BAFF) levels, and MuSK antibody titers. Immunologic variables were compared with healthy controls and clinical outcome measures. RESULTS: As expected, patients treated with rituximab had high percentages of transitional B cells and plasmablasts and thus were excluded from subsequent analysis. The remaining patients with MuSK MG and controls had similar percentages of total B cells and naïve, memory, isotype-switched, plasmablast, and transitional B-cell subsets. However, patients with MuSK MG had higher BAFF levels and lower percentages of B10 cells. In addition, we observed an increase in MuSK antibody levels with more severe disease. CONCLUSIONS: We found prominent B-cell pathology in the distinct form of MG with MuSK autoantibodies. Increased BAFF levels have been described in other autoimmune diseases, including acetylcholine receptor antibody-positive MG. This finding suggests a role for BAFF in the survival of B cells in MuSK MG, which has important therapeutic implications. B10 cells, a recently described rare regulatory B-cell subset that potently blocks Th1 and Th17 responses, were reduced, which suggests a potential mechanism for the breakdown in immune tolerance in patients with MuSK MG.
Resumo:
Subteratogenic and other low-level chronic exposures to toxicant mixtures are an understudied threat to environmental and human health. It is especially important to understand the effects of these exposures for contaminants, such as polycyclic aromatic hydrocarbons (PAHs) a large group of more than 100 individual compounds, which are important environmental (including aquatic) contaminants. Aquatic sediments constitute a major sink for hydrophobic pollutants, and studies show PAHs can persist in sediments over time. Furthermore, estuarine systems (namely breeding grounds) are of particular concern, as they are highly impacted by a wide variety of pollutants, and estuarine fishes are often exposed to some of the highest levels of contaminants of any vertebrate taxon. Acute embryonic exposure to PAHs results in cardiac teratogenesis in fish, and early life exposure to certain individual PAHs and PAH mixtures cause heart alterations with decreased swimming capacity in adult fish. Consequently, the heart and cardiorespiratory system are thought to be targets of PAH mixture exposure. While many studies have investigated acute, teratogenic PAH exposures, few studies have longitudinally examined the impacts of subtle, subteratogenic PAH mixture exposures, which are arguably more broadly applicable to environmental contamination scenarios. The goal of this dissertation was to highlight the later-life consequences of early-life exposure to subteratogenic concentrations of a complex, environmentally relevant PAH mixture.
A unique population of Fundulus heteroclitus (the Atlantic killifish or mummichog, hereafter referred to as killifish), has adapted to creosote-based polycyclic aromatic hydrocarbons (PAHs) found at the Atlantic Wood Industries (AW) Superfund site in the southern branch of the Elizabeth River, VA, USA. This killifish population survives in a site heavily contaminated with a mixture of PAHs from former creosote operations. They have developed resistance to the acute toxicity and teratogenic effects caused by the mixture of PAHs in sediment from the site. The primary goal of this dissertation was to compare and contrast later-life outcomes of early-life, subteratogenic PAH mixture exposure in both the Atlantic Wood killifish (AW) and a naïve reference population of killifish from King’s Creek (KC; a relatively uncontaminated tributary of the Severn River, VA). Killifish from both populations were exposed to subteratogenic concentrations of a complex PAH-sediment extract, Elizabeth River Sediment Extract (ERSE), made by collecting sediment from the AW site. Fish were reared over a 5-month period in the laboratory, during which they were examined for a variety of molecular, physiological and behavioral responses.
The central aims of my dissertation were to determine alterations to embryonic gene expression, larval swimming activity, adult behavior, heart structure, enzyme activity, and swimming/cardiorespiratory performance following subteratogenic exposure to ERSE. I hypothesized that subteratogenic exposure to ERSE would impair cardiac ontogenic processes in a way that would be detectable via gene expression in embryos, and that the misregulation of cardiac genes would help to explain activity changes, behavioral deficits, and later-life swimming deficiencies. I also hypothesized that fish heart structure would be altered. In addition, I hypothesized that the AW killifish population would be resistant to developmental exposures and perform normally in later life challenges. To investigate these hypotheses, a series of experiments were carried out in PAH-adapted killifish from Elizabeth River and in reference killifish. As an ancillary project to the primary aims of the dissertation, I examined the toxicity of weaker aryl hydrocarbon receptor (AHR) agonists in combination with fluoranthene (FL), an inhibitor of cytochrome P4501A1 (CYP1A1). This side project was conducted in both Danio rerio (zebrafish) and the KC and AW killifish.
Embryonic gene expression was measured in both killifish populations over an ERSE dose response with multiple time points (12, 24, 48, and 144 hours post exposure). Genes known to play critical roles in cardiac structure/development, cardiac function, and angiogenesis were elevated, indicating cardiac damage and activation of cardiovascular repair mechanisms. These data helped to inform later-life swimming performance and cardiac histology studies. Behavior was assessed during light and dark cycles in larvae of both populations following developmental exposure to ERSE. While KC killifish showed activity differences following exposure, AW killifish showed no significant changes even at concentrations that would cause overt cardiac toxicity in KC killifish. Juvenile behavior experiments demonstrated hyperactivity following ERSE exposure in KC killifish, but no significant behavioral changes in AW killifish. Adult swimming performance via prolonged critical swimming capacity (Ucrit) demonstrated performance costs in the AW killifish. Furthermore, swimming performance decline was observed in KC killifish following exposure to increasing dilutions of ERSE. Lastly, cardiac histology suggested that early-life exposure to ERSE could result in cardiac structural alteration and extravasation of blood into the pericardial cavity.
Responses to AHR agonists resulted in a ranking of relative potency for agonists, and determined which agonists, when combined with FL, caused cardiac teratogenesis. These experiments showed interesting species differences for zebrafish and killifish. To probe mechanisms responsible for cardiotoxicity, a CYP1A-morpholino and a AHR2-morpholino were used to mimic FL effects or attempt to rescue cardiac deformities respectively. Findings suggested that the cardiac toxicity elicited by weak agonist + FL exposure was likely driven by AHR-independent mechanisms. These studies stand in contrast to previous research from our lab showing that moderate AHR agonist + FL caused cardiac toxicity that can be partially rescued by AHR-morpholino knockdown.
My findings will form better characterization of mechanisms of PAH toxicity, and advance our understanding of how subteratogenic mixtures of PAHs exert their toxic action in naïve killifish. Furthermore, these studies will provide a framework for investigating how subteratogenic exposures to PAH mixtures can impact aquatic organismal health and performance. Most importantly, these experiments have the potential to help inform risk assessment in fish, mammals, and potentially humans. Ultimately, this research will help protect populations exposed to subtle PAH-contamination.
Resumo:
We recently developed an approach for testing the accuracy of network inference algorithms by applying them to biologically realistic simulations with known network topology. Here, we seek to determine the degree to which the network topology and data sampling regime influence the ability of our Bayesian network inference algorithm, NETWORKINFERENCE, to recover gene regulatory networks. NETWORKINFERENCE performed well at recovering feedback loops and multiple targets of a regulator with small amounts of data, but required more data to recover multiple regulators of a gene. When collecting the same number of data samples at different intervals from the system, the best recovery was produced by sampling intervals long enough such that sampling covered propagation of regulation through the network but not so long such that intervals missed internal dynamics. These results further elucidate the possibilities and limitations of network inference based on biological data.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
The purpose of this research was to use next generation sequencing to identify mutations in patients with primary immunodeficiency diseases whose pathogenic gene mutations had not been identified. Remarkably, four unrelated patients were found by next generation sequencing to have the same heterozygous mutation in an essential donor splice site of PIK3R1 (NM_181523.2:c.1425 + 1G > A) found in three prior reports. All four had the Hyper IgM syndrome, lymphadenopathy and short stature, and one also had SHORT syndrome. They were investigated with in vitro immune studies, RT-PCR, and immunoblotting studies of the mutation's effect on mTOR pathway signaling. All patients had very low percentages of memory B cells and class-switched memory B cells and reduced numbers of naïve CD4+ and CD8+ T cells. RT-PCR confirmed the presence of both an abnormal 273 base-pair (bp) size and a normal 399 bp size band in the patient and only the normal band was present in the parents. Following anti-CD40 stimulation, patient's EBV-B cells displayed higher levels of S6 phosphorylation (mTOR complex 1 dependent event), Akt phosphorylation at serine 473 (mTOR complex 2 dependent event), and Akt phosphorylation at threonine 308 (PI3K/PDK1 dependent event) than controls, suggesting elevated mTOR signaling downstream of CD40. These observations suggest that amino acids 435-474 in PIK3R1 are important for its stability and also its ability to restrain PI3K activity. Deletion of Exon 11 leads to constitutive activation of PI3K signaling. This is the first report of this mutation and immunologic abnormalities in SHORT syndrome.