10 resultados para Bayesian statistical decision theory
em Duke University
Resumo:
The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.
Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
BACKGROUND: Guidance for appropriate utilisation of transthoracic echocardiograms (TTEs) can be incorporated into ordering prompts, potentially affecting the number of requests. METHODS: We incorporated data from the 2011 Appropriate Use Criteria for Echocardiography, the 2010 National Institute for Clinical Excellence Guideline on Chronic Heart Failure, and American College of Cardiology Choosing Wisely list on TTE use for dyspnoea, oedema and valvular disease into electronic ordering systems at Durham Veterans Affairs Medical Center. Our primary outcome was TTE orders per month. Secondary outcomes included rates of outpatient TTE ordering per 100 visits and frequency of brain natriuretic peptide (BNP) ordering prior to TTE. Outcomes were measured for 20 months before and 12 months after the intervention. RESULTS: The number of TTEs ordered did not decrease (338±32 TTEs/month prior vs 320±33 afterwards, p=0.12). Rates of outpatient TTE ordering decreased minimally post intervention (2.28 per 100 primary care/cardiology visits prior vs 1.99 afterwards, p<0.01). Effects on TTE ordering and ordering rate significantly interacted with time from intervention (p<0.02 for both), as the small initial effects waned after 6 months. The percentage of TTE orders with preceding BNP increased (36.5% prior vs 42.2% after for inpatients, p=0.01; 10.8% prior vs 14.5% after for outpatients, p<0.01). CONCLUSIONS: Ordering prompts for TTEs initially minimally reduced the number of TTEs ordered and increased BNP measurement at a single institution, but the effect on TTEs ordered was likely insignificant from a utilisation standpoint and decayed over time.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Brain-computer interfaces (BCI) have the potential to restore communication or control abilities in individuals with severe neuromuscular limitations, such as those with amyotrophic lateral sclerosis (ALS). The role of a BCI is to extract and decode relevant information that conveys a user's intent directly from brain electro-physiological signals and translate this information into executable commands to control external devices. However, the BCI decision-making process is error-prone due to noisy electro-physiological data, representing the classic problem of efficiently transmitting and receiving information via a noisy communication channel.
This research focuses on P300-based BCIs which rely predominantly on event-related potentials (ERP) that are elicited as a function of a user's uncertainty regarding stimulus events, in either an acoustic or a visual oddball recognition task. The P300-based BCI system enables users to communicate messages from a set of choices by selecting a target character or icon that conveys a desired intent or action. P300-based BCIs have been widely researched as a communication alternative, especially in individuals with ALS who represent a target BCI user population. For the P300-based BCI, repeated data measurements are required to enhance the low signal-to-noise ratio of the elicited ERPs embedded in electroencephalography (EEG) data, in order to improve the accuracy of the target character estimation process. As a result, BCIs have relatively slower speeds when compared to other commercial assistive communication devices, and this limits BCI adoption by their target user population. The goal of this research is to develop algorithms that take into account the physical limitations of the target BCI population to improve the efficiency of ERP-based spellers for real-world communication.
In this work, it is hypothesised that building adaptive capabilities into the BCI framework can potentially give the BCI system the flexibility to improve performance by adjusting system parameters in response to changing user inputs. The research in this work addresses three potential areas for improvement within the P300 speller framework: information optimisation, target character estimation and error correction. The visual interface and its operation control the method by which the ERPs are elicited through the presentation of stimulus events. The parameters of the stimulus presentation paradigm can be modified to modulate and enhance the elicited ERPs. A new stimulus presentation paradigm is developed in order to maximise the information content that is presented to the user by tuning stimulus paradigm parameters to positively affect performance. Internally, the BCI system determines the amount of data to collect and the method by which these data are processed to estimate the user's target character. Algorithms that exploit language information are developed to enhance the target character estimation process and to correct erroneous BCI selections. In addition, a new model-based method to predict BCI performance is developed, an approach which is independent of stimulus presentation paradigm and accounts for dynamic data collection. The studies presented in this work provide evidence that the proposed methods for incorporating adaptive strategies in the three areas have the potential to significantly improve BCI communication rates, and the proposed method for predicting BCI performance provides a reliable means to pre-assess BCI performance without extensive online testing.
Resumo:
Social decision-making is often complex, requiring the decision-maker to make social inferences about another person in addition to engaging traditional decision-making processes. However, until recently, much research in neuroeconomics and behavioral economics has examined social decision-making while failing to take into account the importance of the social context and social cognitive processes that are engaged when viewing another person. Using social psychological theory to guide our hypotheses, four research studies investigate the role of social cognition and person perception in guiding economic decisions made in social contexts. The first study (Chapter 2) demonstrates that only specific types of social information engage brain regions implicated in social cognition and hinder learning in social contexts. Study 2 (Chapter 3) extends these findings and examines contexts in which this social information is used to generalize across contexts to form predictions about another person’s behavior. Study 3 (Chapter 4) demonstrates that under certain contexts these social cognitive processes may be withheld in order to more effectively complete the task at hand. Last, Study 4 (Chapter 5) examines how this knowledge of social cognitive processing can be used to change behavior in a prosocial group context. Taken together, these studies add to the growing body of literature examining decision-making in social contexts and highlight the importance of social cognitive processing in guiding these decisions. Although social cognitive processing typically facilitates social interactions, these processes may alter economic decision-making in social contexts.
Resumo:
Humans are natural politicians. We obsessively collect social information that is both observable (e.g., about third-party relationships) and unobservable (e.g., about others’ psychological states), and we strategically employ that information to manage our cooperative and competitive relationships. To what extent are these abilities unique to our species, and how did they evolve? The present dissertation seeks to contribute to these two questions. To do so, I take a comparative perspective, investigating social decision-making in humans’ closest living relatives, bonobos and chimpanzees. In Chapter 1, I review existing literature on theory of mind—or the ability to understand others’ psychological states—in these species. I also present a theoretical framework to guide further investigation of social cognition in bonobos and chimpanzees based on hypotheses about the proximate and ultimate origins of their species differences. In Chapter 2, I experimentally investigate differences in the prosocial behavior of bonobos and chimpanzees, revealing species-specific prosocial motivations that appear to be less flexible than those exhibited by humans. In Chapter 3, I explore through decision-making experiments bonobos’ ability to evaluate others based on their prosocial or antisocial behavior during third-party interactions. Bonobos do track the interactions of third-parties and evaluate actors based on these interactions. However, they do not exhibit the human preference for those who are prosocial towards others, instead consistently favoring an antisocial individual. The motivation to prefer those who demonstrate a prosocial disposition may be a unique feature of human psychology that contributes to our ultra-cooperative nature. In Chapter 4, I investigate the adaptive value of social cognition in wild primates. I show that the recruitment behavior of wild chimpanzees at Gombe National Park, Tanzania is consistent with the use of third-party knowledge, and that those who appear to use third-party knowledge receive immediate proximate benefits. They escape further aggression from their opponents. These findings directly support the social intelligence hypothesis that social cognition has evolved in response to the demands of competing with one’s own group-mates. Thus, the studies presented here help to better characterize the features of social decision-making that are unique to humans, and how these abilities evolved.
Resumo:
As the world population continues to grow past seven billion people and global challenges continue to persist including resource availability, biodiversity loss, climate change and human well-being, a new science is required that can address the integrated nature of these challenges and the multiple scales on which they are manifest. Sustainability science has emerged to fill this role. In the fifteen years since it was first called for in the pages of Science, it has rapidly matured, however its place in the history of science and the way it is practiced today must be continually evaluated. In Part I, two chapters address this theoretical and practical grounding. Part II transitions to the applied practice of sustainability science in addressing the urban heat island (UHI) challenge wherein the climate of urban areas are warmer than their surrounding rural environs. The UHI has become increasingly important within the study of earth sciences given the increased focus on climate change and as the balance of humans now live in urban areas.
In Chapter 2 a novel contribution to the historical context of sustainability is argued. Sustainability as a concept characterizing the relationship between humans and nature emerged in the mid to late 20th century as a response to findings used to also characterize the Anthropocene. Emerging from the human-nature relationships that came before it, evidence is provided that suggests Sustainability was enabled by technology and a reorientation of world-view and is unique in its global boundary, systematic approach and ambition for both well being and the continued availability of resources and Earth system function. Sustainability is further an ambition that has wide appeal, making it one of the first normative concepts of the Anthropocene.
Despite its widespread emergence and adoption, sustainability science continues to suffer from definitional ambiguity within the academe. In Chapter 3, a review of efforts to provide direction and structure to the science reveals a continuum of approaches anchored at either end by differing visions of how the science interfaces with practice (solutions). At one end, basic science of societally defined problems informs decisions about possible solutions and their application. At the other end, applied research directly affects the options available to decision makers. While clear from the literature, survey data further suggests that the dichotomy does not appear to be as apparent in the minds of practitioners.
In Chapter 4, the UHI is first addressed at the synoptic, mesoscale. Urban climate is the most immediate manifestation of the warming global climate for the majority of people on earth. Nearly half of those people live in small to medium sized cities, an understudied scale in urban climate research. Widespread characterization would be useful to decision makers in planning and design. Using a multi-method approach, the mesoscale UHI in the study region is characterized and the secular trend over the last sixty years evaluated. Under isolated ideal conditions the findings indicate a UHI of 5.3 ± 0.97 °C to be present in the study area, the magnitude of which is growing over time.
Although urban heat islands (UHI) are well studied, there remain no panaceas for local scale mitigation and adaptation methods, therefore continued attention to characterization of the phenomenon in urban centers of different scales around the globe is required. In Chapter 5, a local scale analysis of the canopy layer and surface UHI in a medium sized city in North Carolina, USA is conducted using multiple methods including stationary urban sensors, mobile transects and remote sensing. Focusing on the ideal conditions for UHI development during an anticyclonic summer heat event, the study observes a range of UHI intensity depending on the method of observation: 8.7 °C from the stationary urban sensors; 6.9 °C from mobile transects; and, 2.2 °C from remote sensing. Additional attention is paid to the diurnal dynamics of the UHI and its correlation with vegetation indices, dewpoint and albedo. Evapotranspiration is shown to drive dynamics in the study region.
Finally, recognizing that a bridge must be established between the physical science community studying the Urban Heat Island (UHI) effect, and the planning community and decision makers implementing urban form and development policies, Chapter 6 evaluates multiple urban form characterization methods. Methods evaluated include local climate zones (LCZ), national land cover database (NCLD) classes and urban cluster analysis (UCA) to determine their utility in describing the distribution of the UHI based on three standard observation types 1) fixed urban temperature sensors, 2) mobile transects and, 3) remote sensing. Bivariate, regression and ANOVA tests are used to conduct the analyses. Findings indicate that the NLCD classes are best correlated to the UHI intensity and distribution in the study area. Further, while the UCA method is not useful directly, the variables included in the method are predictive based on regression analysis so the potential for better model design exists. Land cover variables including albedo, impervious surface fraction and pervious surface fraction are found to dominate the distribution of the UHI in the study area regardless of observation method.
Chapter 7 provides a summary of findings, and offers a brief analysis of their implications for both the scientific discourse generally, and the study area specifically. In general, the work undertaken does not achieve the full ambition of sustainability science, additional work is required to translate findings to practice and more fully evaluate adoption. The implications for planning and development in the local region are addressed in the context of a major light-rail infrastructure project including several systems level considerations like human health and development. Finally, several avenues for future work are outlined. Within the theoretical development of sustainability science, these pathways include more robust evaluations of the theoretical and actual practice. Within the UHI context, these include development of an integrated urban form characterization model, application of study methodology in other geographic areas and at different scales, and use of novel experimental methods including distributed sensor networks and citizen science.
Resumo:
This dissertation contributes to the rapidly growing empirical research area in the field of operations management. It contains two essays, tackling two different sets of operations management questions which are motivated by and built on field data sets from two very different industries --- air cargo logistics and retailing.
The first essay, based on the data set obtained from a world leading third-party logistics company, develops a novel and general Bayesian hierarchical learning framework for estimating customers' spillover learning, that is, customers' learning about the quality of a service (or product) from their previous experiences with similar yet not identical services. We then apply our model to the data set to study how customers' experiences from shipping on a particular route affect their future decisions about shipping not only on that route, but also on other routes serviced by the same logistics company. We find that customers indeed borrow experiences from similar but different services to update their quality beliefs that determine future purchase decisions. Also, service quality beliefs have a significant impact on their future purchasing decisions. Moreover, customers are risk averse; they are averse to not only experience variability but also belief uncertainty (i.e., customer's uncertainty about their beliefs). Finally, belief uncertainty affects customers' utilities more compared to experience variability.
The second essay is based on a data set obtained from a large Chinese supermarket chain, which contains sales as well as both wholesale and retail prices of un-packaged perishable vegetables. Recognizing the special characteristics of this particularly product category, we develop a structural estimation model in a discrete-continuous choice model framework. Building on this framework, we then study an optimization model for joint pricing and inventory management strategies of multiple products, which aims at improving the company's profit from direct sales and at the same time reducing food waste and thus improving social welfare.
Collectively, the studies in this dissertation provide useful modeling ideas, decision tools, insights, and guidance for firms to utilize vast sales and operations data to devise more effective business strategies.
Resumo:
Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.