21 resultados para LATENT
em Duke University
Resumo:
This article examines the behavior of equity trading volume and volatility for the individual firms composing the Standard & Poor's 100 composite index. Using multivariate spectral methods, we find that fractionally integrated processes best describe the long-run temporal dependencies in both series. Consistent with a stylized mixture-of-distributions hypothesis model in which the aggregate "news"-arrival process possesses long-memory characteristics, the long-run hyperbolic decay rates appear to be common across each volume-volatility pair.
Resumo:
We develop a model for stochastic processes with random marginal distributions. Our model relies on a stick-breaking construction for the marginal distribution of the process, and introduces dependence across locations by using a latent Gaussian copula model as the mechanism for selecting the atoms. The resulting latent stick-breaking process (LaSBP) induces a random partition of the index space, with points closer in space having a higher probability of being in the same cluster. We develop an efficient and straightforward Markov chain Monte Carlo (MCMC) algorithm for computation and discuss applications in financial econometrics and ecology. This article has supplementary material online.
Resumo:
Tumor microenvironmental stresses, such as hypoxia and lactic acidosis, play important roles in tumor progression. Although gene signatures reflecting the influence of these stresses are powerful approaches to link expression with phenotypes, they do not fully reflect the complexity of human cancers. Here, we describe the use of latent factor models to further dissect the stress gene signatures in a breast cancer expression dataset. The genes in these latent factors are coordinately expressed in tumors and depict distinct, interacting components of the biological processes. The genes in several latent factors are highly enriched in chromosomal locations. When these factors are analyzed in independent datasets with gene expression and array CGH data, the expression values of these factors are highly correlated with copy number alterations (CNAs) of the corresponding BAC clones in both the cell lines and tumors. Therefore, variation in the expression of these pathway-associated factors is at least partially caused by variation in gene dosage and CNAs among breast cancers. We have also found the expression of two latent factors without any chromosomal enrichment is highly associated with 12q CNA, likely an instance of "trans"-variations in which CNA leads to the variations in gene expression outside of the CNA region. In addition, we have found that factor 26 (1q CNA) is negatively correlated with HIF-1alpha protein and hypoxia pathways in breast tumors and cell lines. This agrees with, and for the first time links, known good prognosis associated with both a low hypoxia signature and the presence of CNA in this region. Taken together, these results suggest the possibility that tumor segmental aneuploidy makes significant contributions to variation in the lactic acidosis/hypoxia gene signatures in human cancers and demonstrate that latent factor analysis is a powerful means to uncover such a linkage.
Resumo:
We discuss a general approach to dynamic sparsity modeling in multivariate time series analysis. Time-varying parameters are linked to latent processes that are thresholded to induce zero values adaptively, providing natural mechanisms for dynamic variable inclusion/selection. We discuss Bayesian model specification, analysis and prediction in dynamic regressions, time-varying vector autoregressions, and multivariate volatility models using latent thresholding. Application to a topical macroeconomic time series problem illustrates some of the benefits of the approach in terms of statistical and economic interpretations as well as improved predictions. Supplementary materials for this article are available online. © 2013 Copyright Taylor and Francis Group, LLC.
Resumo:
Learning multiple tasks across heterogeneous domains is a challenging problem since the feature space may not be the same for different tasks. We assume the data in multiple tasks are generated from a latent common domain via sparse domain transforms and propose a latent probit model (LPM) to jointly learn the domain transforms, and the shared probit classifier in the common domain. To learn meaningful task relatedness and avoid over-fitting in classification, we introduce sparsity in the domain transforms matrices, as well as in the common classifier. We derive theoretical bounds for the estimation error of the classifier in terms of the sparsity of domain transforms. An expectation-maximization algorithm is derived for learning the LPM. The effectiveness of the approach is demonstrated on several real datasets.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.
To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.
The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.
Resumo:
This dissertation documents the results of a theoretical and numerical study of time dependent storage of energy by melting a phase change material. The heating is provided along invading lines, which change from single-line invasion to tree-shaped invasion. Chapter 2 identifies the special design feature of distributing energy storage in time-dependent fashion on a territory, when the energy flows by fluid flow from a concentrated source to points (users) distributed equidistantly on the area. The challenge in this chapter is to determine the architecture of distributed energy storage. The chief conclusion is that the finite amount of storage material should be distributed proportionally with the distribution of the flow rate of heating agent arriving on the area. The total time needed by the source stream to ‘invade’ the area is cumulative (the sum of the storage times required at each storage site), and depends on the energy distribution paths and the sequence in which the users are served by the source stream. Chapter 3 shows theoretically that the melting process consists of two phases: “invasion” thermal diffusion along the invading line, which is followed by “consolidation” as heat diffuses perpendicularly to the invading line. This chapter also reports the duration of both phases and the evolution of the melt layer around the invading line during the two-dimensional and three-dimensional invasion. It also shows that the amount of melted material increases in time according to a curve shaped as an S. These theoretical predictions are validated by means of numerical simulations in chapter 4. This chapter also shows that the heat transfer rate density increases (i.e., the S curve becomes steeper) as the complexity and number of degrees of freedom of the structure are increased, in accord with the constructal law. The optimal geometric features of the tree structure are detailed in this chapter. Chapter 5 documents a numerical study of time-dependent melting where the heat transfer is convection dominated, unlike in chapter 3 and 4 where the melting is ruled by pure conduction. In accord with constructal design, the search is for effective heat-flow architectures. The volume-constrained improvement of the designs for heat flow begins with assuming the simplest structure, where a single line serves as heat source. Next, the heat source is endowed with freedom to change its shape as it grows. The objective of the numerical simulations is to discover the geometric features that lead to the fastest melting process. The results show that the heat transfer rate density increases as the complexity and number of degrees of freedom of the structure are increased. Furthermore, the angles between heat invasion lines have a minor effect on the global performance compared to other degrees of freedom: number of branching levels, stem length, and branch lengths. The effect of natural convection in the melt zone is documented.
Resumo:
Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Resumo:
We exploit the distributional information contained in high-frequency intraday data in constructing a simple conditional moment estimator for stochastic volatility diffusions. The estimator is based on the analytical solutions of the first two conditional moments for the latent integrated volatility, the realization of which is effectively approximated by the sum of the squared high-frequency increments of the process. Our simulation evidence indicates that the resulting GMM estimator is highly reliable and accurate. Our empirical implementation based on high-frequency five-minute foreign exchange returns suggests the presence of multiple latent stochastic volatility factors and possible jumps. © 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Knowing one's HIV status is particularly important in the setting of recent tuberculosis (TB) exposure. Blood tests for assessment of tuberculosis infection, such as the QuantiFERON Gold in-tube test (QFT; Cellestis Limited, Carnegie, Victoria, Australia), offer the possibility of simultaneous screening for TB and HIV with a single blood draw. We performed a cross-sectional analysis of all contacts to a highly infectious TB case in a large meatpacking factory. Twenty-two percent were foreign-born and 73% were black. Contacts were tested with both tuberculin skin testing (TST) and QFT. HIV testing was offered on an opt-out basis. Persons with TST >or=10 mm, positive QFT, and/or positive HIV test were offered latent TB treatment. Three hundred twenty-six contacts were screened: TST results were available for 266 people and an additional 24 reported a prior positive TST for a total of 290 persons with any TST result (89.0%). Adequate QFT specimens were obtained for 312 (95.7%) of persons. Thirty-two persons had QFT results but did not return for TST reading. Twenty-two percent met the criteria for latent TB infection. Eighty-eight percent accepted HIV testing. Two (0.7%) were HIV seropositive; both individuals were already aware of their HIV status, but one had stopped care a year previously. None of the HIV-seropositive persons had latent TB, but all were offered latent TB treatment per standard guidelines. This demonstrates that opt-out HIV testing combined with QFT in a large TB contact investigation was feasible and useful. HIV testing was also widely accepted. Pairing QFT with opt-out HIV testing should be strongly considered when possible.
Resumo:
We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.
Resumo:
OBJECTIVE: To examine the associations between attention-deficit/hyperactivity disorder (ADHD) symptoms, obesity and hypertension in young adults in a large population-based cohort. DESIGN, SETTING AND PARTICIPANTS: The study population consisted of 15,197 respondents from the National Longitudinal Study of Adolescent Health, a nationally representative sample of adolescents followed from 1995 to 2009 in the United States. Multinomial logistic and logistic models examined the odds of overweight, obesity and hypertension in adulthood in relation to retrospectively reported ADHD symptoms. Latent curve modeling was used to assess the association between symptoms and naturally occurring changes in body mass index (BMI) from adolescence to adulthood. RESULTS: Linear association was identified between the number of inattentive (IN) and hyperactive/impulsive (HI) symptoms and waist circumference, BMI, diastolic blood pressure and systolic blood pressure (all P-values for trend <0.05). Controlling for demographic variables, physical activity, alcohol use, smoking and depressive symptoms, those with three or more HI or IN symptoms had the highest odds of obesity (HI 3+, odds ratio (OR)=1.50, 95% confidence interval (CI) = 1.22-2.83; IN 3+, OR = 1.21, 95% CI = 1.02-1.44) compared with those with no HI or IN symptoms. HI symptoms at the 3+ level were significantly associated with a higher OR of hypertension (HI 3+, OR = 1.24, 95% CI = 1.01-1.51; HI continuous, OR = 1.04, 95% CI = 1.00-1.09), but associations were nonsignificant when models were adjusted for BMI. Latent growth modeling results indicated that compared with those reporting no HI or IN symptoms, those reporting 3 or more symptoms had higher initial levels of BMI during adolescence. Only HI symptoms were associated with change in BMI. CONCLUSION: Self-reported ADHD symptoms were associated with adult BMI and change in BMI from adolescence to adulthood, providing further evidence of a link between ADHD symptoms and obesity.
Resumo:
Phosphorylation of G-protein-coupled receptors plays an important role in regulating their function. In this study the G-protein-coupled receptor phosphatase (GRP) capable of dephosphorylating G-protein-coupled receptor kinase-phosphorylated receptors is described. The GRP activity of bovine brain is a latent oligomeric form of protein phosphatase type 2A (PP-2A) exclusively associated with the particulate fraction. GRP activity is observed only when assayed in the presence of protamine or when phosphatase-containing fractions are subjected to freeze/thaw treatment under reducing conditions. Consistent with its identification as a member of the PP-2A family, the GRP is potently inhibited by okadaic acid but not by I-2, the specific inhibitor of protein phosphatase type 1. Solubilization of the membrane-associated GRP followed by gel filtration in the absence of detergent yields a 150-kDa peak of latent receptor phosphatase activity. Western blot analysis of this phosphatase reveals a likely subunit composition of AB alpha C. PP-2A of this subunit composition has previously been characterized as a soluble enzyme, yet negligible soluble GRP activity was observed. The subcellular distribution and substrate specificity of the GRP suggests significant differences between it and previously characterized forms of PP-2A.