11 resultados para I Mass Function
em Duke University
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Described here is a mass spectrometry-based screening assay for the detection of protein-ligand binding interactions in multicomponent protein mixtures. The assay utilizes an oxidation labeling protocol that involves using hydrogen peroxide to selectively oxidize methionine residues in proteins in order to probe the solvent accessibility of these residues as a function of temperature. The extent to which methionine residues in a protein are oxidized after specified reaction times at a range of temperatures is determined in a MALDI analysis of the intact proteins and/or an LC-MS analysis of tryptic peptide fragments generated after the oxidation reaction is quenched. Ultimately, the mass spectral data is used to construct thermal denaturation curves for the detected proteins. In this proof-of-principle work, the protocol is applied to a four-protein model mixture comprised of ubiquitin, ribonuclease A (RNaseA), cyclophilin A (CypA), and bovine carbonic anhydrase II (BCAII). The new protocol's ability to detect protein-ligand binding interactions by comparing thermal denaturation data obtained in the absence and in the presence of ligand is demonstrated using cyclosporin A (CsA) as a test ligand. The known binding interaction between CsA and CypA was detected using both the MALDI- and LC-MS-based readouts described here.
Atmospheric neutrino oscillation analysis with subleading effects in Super-Kamiokande I, II, and III
Resumo:
We present a search for nonzero θ13 and deviations of sin2θ23 from 0.5 in the oscillations of atmospheric neutrino data from Super-Kamiokande I, II, and III. No distortions of the neutrino flux consistent with nonzero θ13 are found and both neutrino mass hierarchy hypotheses are in agreement with the data. The data are best fit at Δm2=2.1×10-3eV2, sin2θ13=0.0, and sin2θ23=0.5. In the normal (inverted) hierarchy θ13 and Δm2 are constrained at the one-dimensional 90% C.L. to sin2θ13<0.04(0.09) and 1.9(1.7)×10 -3<Δm2<2.6(2.7)×10-3eV2. The atmospheric mixing angle is within 0.407≤sin2θ23≤0.583 at 90% C.L. © 2010 The American Physical Society.
Resumo:
BACKGROUND: The lactogenic hormones prolactin (PRL) and placental lactogens (PL) play central roles in reproduction and mammary development. Their actions are mediated via binding to PRL receptor (PRLR), highly expressed in brown adipose tissue (BAT), yet their impact on adipocyte function and metabolism remains unclear. METHODOLOGY/PRINCIPAL FINDINGS: PRLR knockout (KO) newborn mice were phenotypically characterized in terms of thermoregulation and their BAT differentiation assayed for gene expression studies. Derived brown preadipocyte cell lines were established to evaluate the molecular mechanisms involved in PRL signaling on BAT function. Here, we report that newborn mice lacking PRLR have hypotrophic BAT depots that express low levels of adipocyte nuclear receptor PPARgamma2, its coactivator PGC-1alpha, uncoupling protein 1 (UCP1) and the beta3 adrenoceptor, reducing mouse viability during cold challenge. Immortalized PRLR KO preadipocytes fail to undergo differentiation into mature adipocytes, a defect reversed by reintroduction of PRLR. That the effects of the lactogens in BAT are at least partly mediated by Insulin-like Growth Factor-2 (IGF-2) is supported by: i) a striking reduction in BAT IGF-2 expression in PRLR KO mice and in PRLR-deficient preadipocytes; ii) induction of cellular IGF-2 expression by PRL through JAK2/STAT5 pathway activation; and iii) reversal of defective differentiation in PRLR KO cells by exogenous IGF-2. CONCLUSIONS: Our findings demonstrate that the lactogens act in concert with IGF-2 to control brown adipocyte differentiation and growth. Given the prominent role of brown adipose tissue during the perinatal period, our results identified prolactin receptor signaling as a major player and a potential therapeutic target in protecting newborn mammals against hypothermia.
Resumo:
Regions of the hamster alpha 1-adrenergic receptor (alpha 1 AR) that are important in GTP-binding protein (G protein)-mediated activation of phospholipase C were determined by studying the biological functions of mutant receptors constructed by recombinant DNA techniques. A chimeric receptor consisting of the beta 2-adrenergic receptor (beta 2AR) into which the putative third cytoplasmic loop of the alpha 1AR had been placed activated phosphatidylinositol metabolism as effectively as the native alpha 1AR, as did a truncated alpha 1AR lacking the last 47 residues in its cytoplasmic tail. Substitutions of beta 2AR amino acid sequence in the intermediate portions of the third cytoplasmic loop of the alpha 1AR or at the N-terminal portion of the cytoplasmic tail caused marked decreases in receptor coupling to phospholipase C. Conservative substitutions of two residues in the C terminus of the third cytoplasmic loop (Ala293----Leu, Lys290----His) increased the potency of agonists for stimulating phosphatidylinositol metabolism by up to 2 orders of magnitude. These data indicate (i) that the regions of the alpha 1AR that determine coupling to phosphatidylinositol metabolism are similar to those previously shown to be involved in coupling of beta 2AR to adenylate cyclase stimulation and (ii) that point mutations of a G-protein-coupled receptor can cause remarkable increases in sensitivity of biological response.
Resumo:
Light is a critical environmental signal that regulates every phase of the plant life cycle, from germination to floral initiation. Of the many light receptors in the model plant <italic>Arabidopsis thalianaitalic>, the red- and far-red light-sensing phytochromes (phys) are arguably the best studied, but the earliest events in the phy signaling pathway remain poorly understood. One of the earliest phy signaling events is the translocation of photoactivated phys from the cytoplasm to the nucleus, where they localize to subnuclear foci termed photobodies; in continuous light, photobody localization correlates closely with the light-dependent inhibition of embryonic stem growth. Despite a growing body of evidence supporting the biological significance of photobodies in light signaling, photobodies have also been shown to be dispensable for seedling growth inhibition in continuous light, so their physiological importance remains controversial; additionally, the molecular components that are required for phy localization to photobodies are largely unknown. The overall goal of my dissertation research was to gain insight into the early steps of phy signaling by further defining the role of photobodies in this process and identifying additional intragenic and extragenic requirements for phy localization to photobodies.
Even though the domain structure of phys has been extensively studied, not all of the intramolecular requirements for phy localization to photobodies are known. Previous studies have shown that the entire C-terminus of phys is both necessary and sufficient for their localization to photobodies. However, the importance of the individual subdomains of the C-terminus is still unclear. For example a truncation lacking part of the most C-terminal domain, the histidine kinase-related domain (HKRD), can still localize to small photobodies in the light and behaves like a weak allele. However, a point mutation within the HKRD renders the entire molecule completely inactive. To resolve this discrepancy, I explored the hypothesis that this point mutation might impair the dimerization of the HKRD; dimerization has been shown to occur via the C-terminus of phy and is required for more efficient signaling. I show that this point mutation impairs nuclear localization of phy as well as its subnuclear localization to photobodies. Additionally, yeast-two-hybrid analysis shows that the wild-type HKRD can homodimerize but that the HKRD containing the point mutation fails to dimerize with both itself and with wild-type HKRD. These results demonstrate that dimerization of the HKRD is required for both nuclear and photobody localization of phy.
Studies of seedlings grown in diurnal conditions show that photoactivated phy can persist into darkness to repress seedling growth; a seedling's growth rate is therefore fastest at the end of the night. To test the idea that photobodies could be involved in regulating seedling growth in the dark, I compared the growth of two transgenic Arabidopsis lines, one in which phy can localize to photobodies (<italic>PBGitalic>), and one in which it cannot (<italic>NGBitalic>). Despite these differences in photobody morphology, both lines are capable of transducing light signals and inhibiting seedling growth in continuous light. After the transition from red light to darkness, the PBG line was able to repress seedling growth, as well as the accumulation of the growth-promoting, light-labile transcription factor PHYTOCHROME INTERACTING FACTOR 3 (PIF3), for eighteen hours, and this correlated perfectly with the presence of photobodies. Reducing the amount of active phy by either reducing the light intensity or adding a phy-inactivating far-red pulse prior to darkness led to faster accumulation of PIF3 and earlier seedling growth. In contrast, the <italic>NGBitalic> line accumulated PIF3 even in the light, and seedling growth was only repressed for six hours; this behavior was similar in <italic>NGBitalic> regardless of the light treatment. These results suggest that photobodies are required for the degradation of PIF3 and for the prolonged stabilization of active phy in darkness. They also support the hypothesis that photobody localization of phys could serve as an instructive cue during the light-to-dark transition, thereby fine-tuning light-dependent responses in darkness.
In addition to determining an intragenic requirement for photobody localization and further exploring the significance of photobodies in phy signaling, I wanted to identify extragenic regulators of photobody localization. A recent study identified one such factor, HEMERA (HMR); <italic>hmritalic> mutants do not form large photobodies, and they are tall and albino in the light. To identify other components in the HMR-mediated branch of the phy signaling pathway, I performed a forward genetic screen for suppressors of a weak <italic>hmritalic> allele. Surprisingly, the first three mutants isolated from the screen were alleles of the same novel gene, <italic>SON OF HEMERAitalic> (<italic>SOHitalic>). The <italic>sohitalic> mutations rescue all of the phenotypes associated with the weak <italic>hmritalic> allele, and they do so in an allele-specific manner, suggesting a direct interaction between SOH and HMR. Null <italic>sohitalic> alleles, which were isolated in an independent, tall, albino screen, are defective in photobody localization, demonstrating that SOH is an extragenic regulator of phy localization to photobodies that works in the same genetic pathway as HMR.
In this work, I show that dimerization of the HKRD is required for both the nuclear and photobody localization of phy. I also demonstrate a tight correlation between photobody localization and PIF3 degradation, further establishing the significance of photobodies in phy signaling. Finally, I identify a novel gene, <italic>SON OF HEMERAitalic>, whose product is necessary for phy localization to photobodies in the light, thereby isolating a new extragenic determinant of photobody localization. These results are among the first to focus exclusively on one of the earliest cellular responses to light - photobody localization of phys - and they promise to open up new avenues into the study of a poorly understood facet of the phy signaling pathway.
Resumo:
Lymphomas comprise a diverse group of malignancies derived from immune cells. High throughput sequencing has recently emerged as a powerful and versatile method for analysis of the cancer genome and transcriptome. As these data continue to emerge, the crucial work lies in sorting through the wealth of information to hone in on the critical aspects that will give us a better understanding of biology and new insight for how to treat disease. Finding the important signals within these large data sets is one of the major challenges of next generation sequencing.
In this dissertation, I have developed several complementary strategies to describe the genetic underpinnings of lymphomas. I begin with developing a better method for RNA sequencing that enables strand-specific total RNA sequencing and alternative splicing profiling in the same analysis. I then combine this RNA sequencing technique with whole exome sequencing to better understand the global landscape of aberrations in these diseases. Finally, I use traditional cell and molecular biology techniques to define the consequences of major genetic alterations in lymphoma.
Through this analysis, I find recurrent silencing mutations in the G alpha binding protein GNA13 and associated focal adhesion proteins. I aim to describe how loss-of-function mutations in GNA13 can be oncogenic in the context of germinal center B cell biology. Using in vitro techniques including liquid chromatography-mass spectrometry and knockdown and overexpression of genes in B cell lymphoma cell lines, I determine protein binding partners and downstream effectors of GNA13. I also develop a transgenic mouse model to study the role of GNA13 in the germinal center in vivo to determine effects of GNA13 deletion on germinal center structure and cell migration.
Thus, I have developed complementary approaches that span the spectrum from discovery to context-dependent gene models that afford a better understanding of the biological function of aberrant events and ultimately result in a better understanding of disease.
Resumo:
The size, shape, and connectivity of water bodies (lakes, ponds, and wetlands) can have important effects on ecological communities and ecosystem processes, but how these characteristics are influenced by land use and land cover change over broad spatial scales is not known. Intensive alteration of water bodies during urban development, including construction, burial, drainage, and reshaping, may select for certain morphometric characteristics and influence the types of water bodies present in cities. We used a database of over one million water bodies in 100 cities across the conterminous United States to compare the size distributions, connectivity (as intersection with surface flow lines), and shape (as measured by shoreline development factor) of water bodies in different land cover classes. Water bodies in all urban land covers were dominated by lakes and ponds, while reservoirs and wetlands comprised only a small fraction of the sample. In urban land covers, as compared to surrounding undeveloped land, water body size distributions converged on moderate sizes, shapes toward less tortuous shorelines, and the number and area of water bodies that intersected surface flow lines (i.e., streams and rivers) decreased. Potential mechanisms responsible for changing the characteristics of urban water bodies include: preferential removal, physical reshaping or addition of water bodies, and selection of locations for development. The relative contributions of each mechanism likely change as cities grow. The larger size and reduced surface connectivity of urban water bodies may affect the role of internal dynamics and sensitivity to catchment processes. More broadly, these results illustrate the complex nature of urban watersheds and highlight the need to develop a conceptual framework for urban water bodies.
Resumo:
The short arms of the ten acrocentric human chromosomes share several repetitive DNAs, including ribosomal RNA genes (rDNA). The rDNA arrays correspond to nucleolar organizing regions that coalesce each cell cycle to form the nucleolus. Telomere disruption by expressing a mutant version of telomere binding protein TRF2 (dnTRF2) causes non-random acrocentric fusions, as well as large-scale nucleolar defects. The mechanisms responsible for acrocentric chromosome sensitivity to dysfunctional telomeres are unclear. In this study, we show that TRF2 normally associates with the nucleolus and rDNA. However, when telomeres are crippled by dnTRF2 or RNAi knockdown of TRF2, gross nucleolar and chromosomal changes occur. We used the controllable dnTRF2 system to precisely dissect the timing and progression of nucleolar and chromosomal instability induced by telomere dysfunction, demonstrating that nucleolar changes precede the DNA damage and morphological changes that occur at acrocentric short arms. The rDNA repeat arrays on the short arms decondense, and are coated by RNA polymerase I transcription binding factor UBF, physically linking acrocentrics to one another as they become fusogenic. These results highlight the importance of telomere function in nucleolar stability and structural integrity of acrocentric chromosomes, particularly the rDNA arrays. Telomeric stress is widely accepted to cause DNA damage at chromosome ends, but our findings suggest that it also disrupts chromosome structure beyond the telomere region, specifically within the rDNA arrays located on acrocentric chromosomes. These results have relevance for Robertsonian translocation formation in humans and mechanisms by which acrocentric-acrocentric fusions are promoted by DNA damage and repair.
Resumo:
UNLABELLED: Vaccine-induced HIV antibodies were evaluated in serum samples collected from healthy Tanzanian volunteers participating in a phase I/II placebo-controlled double blind trial using multi-clade, multigene HIV-DNA priming and recombinant modified vaccinia Ankara (HIV-MVA) virus boosting (HIVIS03). The HIV-DNA vaccine contained plasmids expressing HIV-1 gp160 subtypes A, B, C, Rev B, Gag A, B and RTmut B, and the recombinant HIV-MVA boost expressed CRF01_AE HIV-1 Env subtype E and Gag-Pol subtype A. While no neutralizing antibodies were detected using pseudoviruses in the TZM-bl cell assay, this prime-boost vaccination induced neutralizing antibodies in 83% of HIVIS03 vaccinees when a peripheral blood mononuclear cell (PBMC) assay using luciferase reporter-infectious molecular clones (LucR-IMC) was employed. The serum neutralizing activity was significantly (but not completely) reduced upon depletion of natural killer (NK) cells from PBMC (p=0.006), indicating a role for antibody-mediated Fcγ-receptor function. High levels of antibody-dependent cellular cytotoxicity (ADCC)-mediating antibodies against CRF01_AE and/or subtype B were subsequently demonstrated in 97% of the sera of vaccinees. The magnitude of ADCC-mediating antibodies against CM235 CRF01_AE IMC-infected cells correlated with neutralizing antibodies against CM235 in the IMC/PBMC assay. In conclusion, HIV-DNA priming, followed by two HIV-MVA boosts elicited potent ADCC responses in a high proportion of Tanzanian vaccinees. Our findings highlight the potential of HIV-DNA prime HIV-MVA boost vaccines for induction of functional antibody responses and suggest this vaccine regimen and ADCC studies as potentially important new avenues in HIV vaccine development. TRIAL REGISTRATION: Controlled-Trials ISRCTN90053831 The Pan African Clinical Trials Registry ATMR2009040001075080 (currently PACTR2009040001075080).
Resumo:
The role of GTPase-activating protein (GAP) that deactivates ADP-ribosylation factor 1 (ARF1) during the formation of coat protein I (COPI) vesicles has been unclear. GAP is originally thought to antagonize vesicle formation by triggering uncoating, but later studies suggest that GAP promotes cargo sorting, a process that occurs during vesicle formation. Recent models have attempted to reconcile these seemingly contradictory roles by suggesting that cargo proteins suppress GAP activity during vesicle formation, but whether GAP truly antagonizes coat recruitment in this process has not been assessed directly. We have reconstituted the formation of COPI vesicles by incubating Golgi membrane with purified soluble components, and find that ARFGAP1 in the presence of GTP promotes vesicle formation and cargo sorting. Moreover, the presence of GTPgammaS not only blocks vesicle uncoating but also vesicle formation by preventing the proper recruitment of GAP to nascent vesicles. Elucidating how GAP functions in vesicle formation, we find that the level of GAP on the reconstituted vesicles is at least as abundant as COPI and that GAP binds directly to the dilysine motif of cargo proteins. Collectively, these findings suggest that ARFGAP1 promotes vesicle formation by functioning as a component of the COPI coat.