3 resultados para Distance between Populations based on Continuous and Discrete Variables

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract

Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.

The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.

The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.

The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.

In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.

By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.

Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Monitoring and enforcement are perhaps the biggest challenges in the design and implementation of environmental policies in developing countries where the actions of many small informal actors cause significant impacts on the ecosystem services and where the transaction costs for the state to regulate them could be enormous. This dissertation studies the potential of innovative institutions based on decentralized coordination and enforcement to induce better environmental outcomes. Such policies have in common that the state plays the role of providing the incentives for organization but the process of compliance happens through decentralized agreements, trust building, signaling and monitoring. I draw from the literatures in collective action, common-pool resources, game-theory and non-point source pollution to develop the instruments proposed here. To test the different conditions in which such policies could be implemented I designed two field-experiments that I conducted with small-scale gold miners in the Colombian Pacific and with users and providers of ecosystem services in the states of Veracruz, Quintana Roo and Yucatan in Mexico. This dissertation is organized in three essays.

The first essay, “Collective Incentives for Cleaner Small-Scale Gold Mining on the Frontier: Experimental Tests of Compliance with Group Incentives given Limited State Monitoring”, examines whether collective incentives, i.e. incentives provided to a group conditional on collective compliance, could “outsource” the required local monitoring, i.e. induce group interactions that extend the reach of the state that can observe only aggregate consequences in the context of small-scale gold mining. I employed a framed field-lab experiment in which the miners make decisions regarding mining intensity. The state sets a collective target for an environmental outcome, verifies compliance and provides a group reward for compliance which is split equally among members. Since the target set by the state transforms the situation into a coordination game, outcomes depend on expectations of what others will do. I conducted this experiment with 640 participants in a mining region of the Colombian Pacific and I examine different levels of policy severity and their ordering. The findings of the experiment suggest that such instruments can induce compliance but this regulation involves tradeoffs. For most severe targets – with rewards just above costs – raise gains if successful but can collapse rapidly and completely. In terms of group interactions, better outcomes are found when severity initially is lower suggesting learning.

The second essay, “Collective Compliance can be Efficient and Inequitable: Impacts of Leaders among Small-Scale Gold Miners in Colombia”, explores the channels through which communication help groups to coordinate in presence of collective incentives and whether the reached solutions are equitable or not. Also in the context of small-scale gold mining in the Colombian Pacific, I test the effect of communication in compliance with a collective environmental target. The results suggest that communication, as expected, helps to solve coordination challenges but still some groups reach agreements involving unequal outcomes. By examining the agreements that took place in each group, I observe that the main coordination mechanism was the presence of leaders that help other group members to clarify the situation. Interestingly, leaders not only helped groups to reach efficiency but also played a key role in equity by defining how the costs of compliance would be distributed among group members.

The third essay, “Creating Local PES Institutions and Increasing Impacts of PES in Mexico: A real-Time Watershed-Level Framed Field Experiment on Coordination and Conditionality”, considers the creation of a local payments for ecosystem services (PES) mechanism as an assurance game that requires the coordination between two groups of participants: upstream and downstream. Based on this assurance interaction, I explore the effect of allowing peer-sanctions on upstream behavior in the functioning of the mechanism. This field-lab experiment was implemented in three real cases of the Mexican Fondos Concurrentes (matching funds) program in the states of Veracruz, Quintana Roo and Yucatan, where 240 real users and 240 real providers of hydrological services were recruited and interacted with each other in real time. The experimental results suggest that initial trust-game behaviors align with participants’ perceptions and predicts baseline giving in assurance game. For upstream providers, i.e. those who get sanctioned, the threat and the use of sanctions increase contributions. Downstream users contribute less when offered the option to sanction – as if that option signal an uncooperative upstream – then the contributions rise in line with the complementarity in payments of the assurance game.