3 resultados para Analysis and statistical methods
em Duke University
                                
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
                                
Resumo:
The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.
Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.
                                
Resumo:
The dissertation consists of three chapters related to the low-price guarantee marketing strategy and energy efficiency analysis. The low-price guarantee is a marketing strategy in which firms promise to charge consumers the lowest price among their competitors. Chapter 1 addresses the research question "Does a Low-Price Guarantee Induce Lower Prices'' by looking into the retail gasoline industry in Quebec where there was a major branded firm which started a low-price guarantee back in 1996. Chapter 2 does a consumer welfare analysis of low-price guarantees to drive police indications and offers a new explanation of the firms' incentives to adopt a low-price guarantee. Chapter 3 develops the energy performance indicators (EPIs) to measure energy efficiency of the manufacturing plants in pulp, paper and paperboard industry.
Chapter 1 revisits the traditional view that a low-price guarantee results in higher prices by facilitating collusion. Using accurate market definitions and station-level data from the retail gasoline industry in Quebec, I conducted a descriptive analysis based on stations and price zones to compare the price and sales movement before and after the guarantee was adopted. I find that, contrary to the traditional view, the stores that offered the guarantee significantly decreased their prices and increased their sales. I also build a difference-in-difference model to quantify the decrease in posted price of the stores that offered the guarantee to be 0.7 cents per liter. While this change is significant, I do not find the response in comeptitors' prices to be significant. The sales of the stores that offered the guarantee increased significantly while the competitors' sales decreased significantly. However, the significance vanishes if I use the station clustered standard errors. Comparing my observations and the predictions of different theories of modeling low-price guarantees, I conclude the empirical evidence here supports that the low-price guarantee is a simple commitment device and induces lower prices.
Chapter 2 conducts a consumer welfare analysis of low-price guarantees to address the antitrust concerns and potential regulations from the government; explains the firms' potential incentives to adopt a low-price guarantee. Using station-level data from the retail gasoline industry in Quebec, I estimated consumers' demand of gasoline by a structural model with spatial competition incorporating the low-price guarantee as a commitment device, which allows firms to pre-commit to charge the lowest price among their competitors. The counterfactual analysis under the Bertrand competition setting shows that the stores that offered the guarantee attracted a lot more consumers and decreased their posted price by 0.6 cents per liter. Although the matching stores suffered a decrease in profits from gasoline sales, they are incentivized to adopt the low-price guarantee to attract more consumers to visit the store likely increasing profits at attached convenience stores. Firms have strong incentives to adopt a low-price guarantee on the product that their consumers are most price-sensitive about, while earning a profit from the products that are not covered in the guarantee. I estimate that consumers earn about 0.3% more surplus when the low-price guarantee is in place, which suggests that the authorities should not be concerned and regulate low-price guarantees. In Appendix B, I also propose an empirical model to look into how low-price guarantees would change consumer search behavior and whether consumer search plays an important role in estimating consumer surplus accurately.
Chapter 3, joint with Gale Boyd, describes work with the pulp, paper, and paperboard (PP&PB) industry to provide a plant-level indicator of energy efficiency for facilities that produce various types of paper products in the United States. Organizations that implement strategic energy management programs undertake a set of activities that, if carried out properly, have the potential to deliver sustained energy savings. Energy performance benchmarking is a key activity of strategic energy management and one way to enable companies to set energy efficiency targets for manufacturing facilities. The opportunity to assess plant energy performance through a comparison with similar plants in its industry is a highly desirable and strategic method of benchmarking for industrial energy managers. However, access to energy performance data for conducting industry benchmarking is usually unavailable to most industrial energy managers. The U.S. Environmental Protection Agency (EPA), through its ENERGY STAR program, seeks to overcome this barrier through the development of manufacturing sector-based plant energy performance indicators (EPIs) that encourage U.S. industries to use energy more efficiently. In the development of the energy performance indicator tools, consideration is given to the role that performance-based indicators play in motivating change; the steps necessary for indicator development, from interacting with an industry in securing adequate data for the indicator; and actual application and use of an indicator when complete. How indicators are employed in EPA’s efforts to encourage industries to voluntarily improve their use of energy is discussed as well. The chapter describes the data and statistical methods used to construct the EPI for plants within selected segments of the pulp, paper, and paperboard industry: specifically pulp mills and integrated paper & paperboard mills. The individual equations are presented, as are the instructions for using those equations as implemented in an associated Microsoft Excel-based spreadsheet tool.
 
                    