11 resultados para Intractable Likelihood
em CaltechTHESIS
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
The main theme running through these three chapters is that economic agents are often forced to respond to events that are not a direct result of their actions or other agents actions. The optimal response to these shocks will necessarily depend on agents' understanding of how these shocks arise. The economic environment in the first two chapters is analogous to the classic chain store game. In this setting, the addition of unintended trembles by the agents creates an environment better suited to reputation building. The third chapter considers the competitive equilibrium price dynamics in an overlapping generations environment when there are supply and demand shocks.
The first chapter is a game theoretic investigation of a reputation building game. A sequential equilibrium model, called the "error prone agents" model, is developed. In this model, agents believe that all actions are potentially subjected to an error process. Inclusion of this belief into the equilibrium calculation provides for a richer class of reputation building possibilities than when perfect implementation is assumed.
In the second chapter, maximum likelihood estimation is employed to test the consistency of this new model and other models with data from experiments run by other researchers that served as the basis for prominent papers in this field. The alternate models considered are essentially modifications to the standard sequential equilibrium. While some models perform quite well in that the nature of the modification seems to explain deviations from the sequential equilibrium quite well, the degree to which these modifications must be applied shows no consistency across different experimental designs.
The third chapter is a study of price dynamics in an overlapping generations model. It establishes the existence of a unique perfect-foresight competitive equilibrium price path in a pure exchange economy with a finite time horizon when there are arbitrarily many shocks to supply or demand. One main reason for the interest in this equilibrium is that overlapping generations environments are very fruitful for the study of price dynamics, especially in experimental settings. The perfect foresight assumption is an important place to start when examining these environments because it will produce the ex post socially efficient allocation of goods. This characteristic makes this a natural baseline to which other models of price dynamics could be compared.
Resumo:
Quantum computing offers powerful new techniques for speeding up the calculation of many classically intractable problems. Quantum algorithms can allow for the efficient simulation of physical systems, with applications to basic research, chemical modeling, and drug discovery; other algorithms have important implications for cryptography and internet security.
At the same time, building a quantum computer is a daunting task, requiring the coherent manipulation of systems with many quantum degrees of freedom while preventing environmental noise from interacting too strongly with the system. Fortunately, we know that, under reasonable assumptions, we can use the techniques of quantum error correction and fault tolerance to achieve an arbitrary reduction in the noise level.
In this thesis, we look at how additional information about the structure of noise, or "noise bias," can improve or alter the performance of techniques in quantum error correction and fault tolerance. In Chapter 2, we explore the possibility of designing certain quantum gates to be extremely robust with respect to errors in their operation. This naturally leads to structured noise where certain gates can be implemented in a protected manner, allowing the user to focus their protection on the noisier unprotected operations.
In Chapter 3, we examine how to tailor error-correcting codes and fault-tolerant quantum circuits in the presence of dephasing biased noise, where dephasing errors are far more common than bit-flip errors. By using an appropriately asymmetric code, we demonstrate the ability to improve the amount of error reduction and decrease the physical resources required for error correction.
In Chapter 4, we analyze a variety of protocols for distilling magic states, which enable universal quantum computation, in the presence of faulty Clifford operations. Here again there is a hierarchy of noise levels, with a fixed error rate for faulty gates, and a second rate for errors in the distilled states which decreases as the states are distilled to better quality. The interplay of of these different rates sets limits on the achievable distillation and how quickly states converge to that limit.
Resumo:
The dissertation studies the general area of complex networked systems that consist of interconnected and active heterogeneous components and usually operate in uncertain environments and with incomplete information. Problems associated with those systems are typically large-scale and computationally intractable, yet they are also very well-structured and have features that can be exploited by appropriate modeling and computational methods. The goal of this thesis is to develop foundational theories and tools to exploit those structures that can lead to computationally-efficient and distributed solutions, and apply them to improve systems operations and architecture.
Specifically, the thesis focuses on two concrete areas. The first one is to design distributed rules to manage distributed energy resources in the power network. The power network is undergoing a fundamental transformation. The future smart grid, especially on the distribution system, will be a large-scale network of distributed energy resources (DERs), each introducing random and rapid fluctuations in power supply, demand, voltage and frequency. These DERs provide a tremendous opportunity for sustainability, efficiency, and power reliability. However, there are daunting technical challenges in managing these DERs and optimizing their operation. The focus of this dissertation is to develop scalable, distributed, and real-time control and optimization to achieve system-wide efficiency, reliability, and robustness for the future power grid. In particular, we will present how to explore the power network structure to design efficient and distributed market and algorithms for the energy management. We will also show how to connect the algorithms with physical dynamics and existing control mechanisms for real-time control in power networks.
The second focus is to develop distributed optimization rules for general multi-agent engineering systems. A central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent global behavior is desirable with respect to the given system level objective. Ideally, a system designer seeks to satisfy this goal while conditioning each agent’s control on the least amount of information possible. Our work focused on achieving this goal using the framework of game theory. In particular, we derived a systematic methodology for designing local agent objective functions that guarantees (i) an equivalence between the resulting game-theoretic equilibria and the system level design objective and (ii) that the resulting game possesses an inherent structure that can be exploited for distributed learning, e.g., potential games. The control design can then be completed by applying any distributed learning algorithm that guarantees convergence to the game-theoretic equilibrium. One main advantage of this game theoretic approach is that it provides a hierarchical decomposition between the decomposition of the systemic objective (game design) and the specific local decision rules (distributed learning algorithms). This decomposition provides the system designer with tremendous flexibility to meet the design objectives and constraints inherent in a broad class of multiagent systems. Furthermore, in many settings the resulting controllers will be inherently robust to a host of uncertainties including asynchronous clock rates, delays in information, and component failures.
Resumo:
The Hamilton Jacobi Bellman (HJB) equation is central to stochastic optimal control (SOC) theory, yielding the optimal solution to general problems specified by known dynamics and a specified cost functional. Given the assumption of quadratic cost on the control input, it is well known that the HJB reduces to a particular partial differential equation (PDE). While powerful, this reduction is not commonly used as the PDE is of second order, is nonlinear, and examples exist where the problem may not have a solution in a classical sense. Furthermore, each state of the system appears as another dimension of the PDE, giving rise to the curse of dimensionality. Since the number of degrees of freedom required to solve the optimal control problem grows exponentially with dimension, the problem becomes intractable for systems with all but modest dimension.
In the last decade researchers have found that under certain, fairly non-restrictive structural assumptions, the HJB may be transformed into a linear PDE, with an interesting analogue in the discretized domain of Markov Decision Processes (MDP). The work presented in this thesis uses the linearity of this particular form of the HJB PDE to push the computational boundaries of stochastic optimal control.
This is done by crafting together previously disjoint lines of research in computation. The first of these is the use of Sum of Squares (SOS) techniques for synthesis of control policies. A candidate polynomial with variable coefficients is proposed as the solution to the stochastic optimal control problem. An SOS relaxation is then taken to the partial differential constraints, leading to a hierarchy of semidefinite relaxations with improving sub-optimality gap. The resulting approximate solutions are shown to be guaranteed over- and under-approximations for the optimal value function. It is shown that these results extend to arbitrary parabolic and elliptic PDEs, yielding a novel method for Uncertainty Quantification (UQ) of systems governed by partial differential constraints. Domain decomposition techniques are also made available, allowing for such problems to be solved via parallelization and low-order polynomials.
The optimization-based SOS technique is then contrasted with the Separated Representation (SR) approach from the applied mathematics community. The technique allows for systems of equations to be solved through a low-rank decomposition that results in algorithms that scale linearly with dimensionality. Its application in stochastic optimal control allows for previously uncomputable problems to be solved quickly, scaling to such complex systems as the Quadcopter and VTOL aircraft. This technique may be combined with the SOS approach, yielding not only a numerical technique, but also an analytical one that allows for entirely new classes of systems to be studied and for stability properties to be guaranteed.
The analysis of the linear HJB is completed by the study of its implications in application. It is shown that the HJB and a popular technique in robotics, the use of navigation functions, sit on opposite ends of a spectrum of optimization problems, upon which tradeoffs may be made in problem complexity. Analytical solutions to the HJB in these settings are available in simplified domains, yielding guidance towards optimality for approximation schemes. Finally, the use of HJB equations in temporal multi-task planning problems is investigated. It is demonstrated that such problems are reducible to a sequence of SOC problems linked via boundary conditions. The linearity of the PDE allows us to pre-compute control policy primitives and then compose them, at essentially zero cost, to satisfy a complex temporal logic specification.
Resumo:
There is a growing interest in taking advantage of possible patterns and structures in data so as to extract the desired information and overcome the curse of dimensionality. In a wide range of applications, including computer vision, machine learning, medical imaging, and social networks, the signal that gives rise to the observations can be modeled to be approximately sparse and exploiting this fact can be very beneficial. This has led to an immense interest in the problem of efficiently reconstructing a sparse signal from limited linear observations. More recently, low-rank approximation techniques have become prominent tools to approach problems arising in machine learning, system identification and quantum tomography.
In sparse and low-rank estimation problems, the challenge is the inherent intractability of the objective function, and one needs efficient methods to capture the low-dimensionality of these models. Convex optimization is often a promising tool to attack such problems. An intractable problem with a combinatorial objective can often be "relaxed" to obtain a tractable but almost as powerful convex optimization problem. This dissertation studies convex optimization techniques that can take advantage of low-dimensional representations of the underlying high-dimensional data. We provide provable guarantees that ensure that the proposed algorithms will succeed under reasonable conditions, and answer questions of the following flavor:
- For a given number of measurements, can we reliably estimate the true signal?
- If so, how good is the reconstruction as a function of the model parameters?
More specifically, i) Focusing on linear inverse problems, we generalize the classical error bounds known for the least-squares technique to the lasso formulation, which incorporates the signal model. ii) We show that intuitive convex approaches do not perform as well as expected when it comes to signals that have multiple low-dimensional structures simultaneously. iii) Finally, we propose convex relaxations for the graph clustering problem and give sharp performance guarantees for a family of graphs arising from the so-called stochastic block model. We pay particular attention to the following aspects. For i) and ii), we aim to provide a general geometric framework, in which the results on sparse and low-rank estimation can be obtained as special cases. For i) and iii), we investigate the precise performance characterization, which yields the right constants in our bounds and the true dependence between the problem parameters.
Resumo:
In this thesis we build a novel analysis framework to perform the direct extraction of all possible effective Higgs boson couplings to the neutral electroweak gauge bosons in the H → ZZ(*) → 4l channel also referred to as the golden channel. We use analytic expressions of the full decay differential cross sections for the H → VV' → 4l process, and the dominant irreducible standard model qq ̄ → 4l background where 4l = 2e2μ,4e,4μ. Detector effects are included through an explicit convolution of these analytic expressions with transfer functions that model the detector responses as well as acceptance and efficiency effects. Using the full set of decay observables, we construct an unbinned 8-dimensional detector level likelihood function which is con- tinuous in the effective couplings, and includes systematics. All potential anomalous couplings of HVV' where V = Z,γ are considered, allowing for general CP even/odd admixtures and any possible phases. We measure the CP-odd mixing between the tree-level HZZ coupling and higher order CP-odd couplings to be compatible with zero, and in the range [−0.40, 0.43], and the mixing between HZZ tree-level coupling and higher order CP -even coupling to be in the ranges [−0.66, −0.57] ∪ [−0.15, 1.00]; namely compatible with a standard model Higgs. We discuss the expected precision in determining the various HVV' couplings in future LHC runs. A powerful and at first glance surprising prediction of the analysis is that with 100-400 fb-1, the golden channel will be able to start probing the couplings of the Higgs boson to diphotons in the 4l channel. We discuss the implications and further optimization of the methods for the next LHC runs.
Resumo:
This thesis studies decision making under uncertainty and how economic agents respond to information. The classic model of subjective expected utility and Bayesian updating is often at odds with empirical and experimental results; people exhibit systematic biases in information processing and often exhibit aversion to ambiguity. The aim of this work is to develop simple models that capture observed biases and study their economic implications.
In the first chapter I present an axiomatic model of cognitive dissonance, in which an agent's response to information explicitly depends upon past actions. I introduce novel behavioral axioms and derive a representation in which beliefs are directionally updated. The agent twists the information and overweights states in which his past actions provide a higher payoff. I then characterize two special cases of the representation. In the first case, the agent distorts the likelihood ratio of two states by a function of the utility values of the previous action in those states. In the second case, the agent's posterior beliefs are a convex combination of the Bayesian belief and the one which maximizes the conditional value of the previous action. Within the second case a unique parameter captures the agent's sensitivity to dissonance, and I characterize a way to compare sensitivity to dissonance between individuals. Lastly, I develop several simple applications and show that cognitive dissonance contributes to the equity premium and price volatility, asymmetric reaction to news, and belief polarization.
The second chapter characterizes a decision maker with sticky beliefs. That is, a decision maker who does not update enough in response to information, where enough means as a Bayesian decision maker would. This chapter provides axiomatic foundations for sticky beliefs by weakening the standard axioms of dynamic consistency and consequentialism. I derive a representation in which updated beliefs are a convex combination of the prior and the Bayesian posterior. A unique parameter captures the weight on the prior and is interpreted as the agent's measure of belief stickiness or conservatism bias. This parameter is endogenously identified from preferences and is easily elicited from experimental data.
The third chapter deals with updating in the face of ambiguity, using the framework of Gilboa and Schmeidler. There is no consensus on the correct way way to update a set of priors. Current methods either do not allow a decision maker to make an inference about her priors or require an extreme level of inference. In this chapter I propose and axiomatize a general model of updating a set of priors. A decision maker who updates her beliefs in accordance with the model can be thought of as one that chooses a threshold that is used to determine whether a prior is plausible, given some observation. She retains the plausible priors and applies Bayes' rule. This model includes generalized Bayesian updating and maximum likelihood updating as special cases.
Resumo:
Several patients of P. J. Vogel who had undergone cerebral commissurotomy for the control of intractable epilepsy were tested on a variety of tasks to measure aspects of cerebral organization concerned with lateralization in hemispheric function. From tests involving identification of shapes it was inferred that in the absence of the neocortical commissures, the left hemisphere still has access to certain types of information from the ipsilateral field. The major hemisphere can still make crude differentiations between various left-field stimuli, but is unable to specify exact stimulus properties. Most of the time the major hemisphere, having access to some ipsilateral stimuli, dominated the minor hemisphere in control of the body.
Competition for control of the body between the hemispheres is seen most clearly in tests of minor hemisphere language competency, in which it was determined that though the minor hemisphere does possess some minimal ability to express language, the major hemisphere prevented its expression much of the time. The right hemisphere was superior to the left in tests of perceptual visualization, and the two hemispheres appeared to use different strategies in attempting to solve the problems, namely, analysis for the left hemisphere and synthesis for the right hemisphere.
Analysis of the patients' verbal and performance I.Q.'s, as well as observations made throughout testing, suggest that the corpus callosum plays a critical role in activities that involve functions in which the minor hemisphere normally excels, that the motor expression of these functions may normally come through the major hemisphere by way of the corpus callosum.
Lateral specialization is thought to be an evolutionary adaptation which overcame problems of a functional antagonism between the abilities normally associated with the two hemispheres. The tests of perception suggested that this function lateralized into the mute hemisphere because of an active counteraction by language. This latter idea was confirmed by the finding that left-handers, in whom there is likely to be bilateral language centers, are greatly deficient on tests of perception.
Resumo:
The problem of global optimization of M phase-incoherent signals in N complex dimensions is formulated. Then, by using the geometric approach of Landau and Slepian, conditions for optimality are established for N = 2 and the optimal signal sets are determined for M = 2, 3, 4, 6, and 12.
The method is the following: The signals are assumed to be equally probable and to have equal energy, and thus are represented by points ṡi, i = 1, 2, …, M, on the unit sphere S1 in CN. If Wik is the halfspace determined by ṡi and ṡk and containing ṡi, i.e. Wik = {ṙϵCN:| ≥ | ˂ṙ, ṡk˃|}, then the Ʀi = ∩/k≠i Wik, i = 1, 2, …, M, the maximum likelihood decision regions, partition S1. For additive complex Gaussian noise ṅ and a received signal ṙ = ṡiejϴ + ṅ, where ϴ is uniformly distributed over [0, 2π], the probability of correct decoding is PC = 1/πN ∞/ʃ/0 r2N-1e-(r2+1)U(r)dr, where U(r) = 1/M M/Ʃ/i=1 Ʀi ʃ/∩ S1 I0(2r | ˂ṡ, ṡi˃|)dσ(ṡ), and r = ǁṙǁ.
For N = 2, it is proved that U(r) ≤ ʃ/Cα I0(2r|˂ṡ, ṡi˃|)dσ(ṡ) – 2K/M. h(1/2K [Mσ(Cα)-σ(S1)]), where Cα = {ṡϵS1:|˂ṡ, ṡi˃| ≥ α}, K is the total number of boundaries of the net on S1 determined by the decision regions, and h is the strictly increasing strictly convex function of σ(Cα∩W), (where W is a halfspace not containing ṡi), given by h = ʃ/Cα∩W I0 (2r|˂ṡ, ṡi˃|)dσ(ṡ). Conditions for equality are established and these give rise to the globally optimal signal sets for M = 2, 3, 4, 6, and 12.
Resumo:
The centralized paradigm of a single controller and a single plant upon which modern control theory is built is no longer applicable to modern cyber-physical systems of interest, such as the power-grid, software defined networks or automated highways systems, as these are all large-scale and spatially distributed. Both the scale and the distributed nature of these systems has motivated the decentralization of control schemes into local sub-controllers that measure, exchange and act on locally available subsets of the globally available system information. This decentralization of control logic leads to different decision makers acting on asymmetric information sets, introduces the need for coordination between them, and perhaps not surprisingly makes the resulting optimal control problem much harder to solve. In fact, shortly after such questions were posed, it was realized that seemingly simple decentralized optimal control problems are computationally intractable to solve, with the Wistenhausen counterexample being a famous instance of this phenomenon. Spurred on by this perhaps discouraging result, a concerted 40 year effort to identify tractable classes of distributed optimal control problems culminated in the notion of quadratic invariance, which loosely states that if sub-controllers can exchange information with each other at least as quickly as the effect of their control actions propagates through the plant, then the resulting distributed optimal control problem admits a convex formulation.
The identification of quadratic invariance as an appropriate means of "convexifying" distributed optimal control problems led to a renewed enthusiasm in the controller synthesis community, resulting in a rich set of results over the past decade. The contributions of this thesis can be seen as being a part of this broader family of results, with a particular focus on closing the gap between theory and practice by relaxing or removing assumptions made in the traditional distributed optimal control framework. Our contributions are to the foundational theory of distributed optimal control, and fall under three broad categories, namely controller synthesis, architecture design and system identification.
We begin by providing two novel controller synthesis algorithms. The first is a solution to the distributed H-infinity optimal control problem subject to delay constraints, and provides the only known exact characterization of delay-constrained distributed controllers satisfying an H-infinity norm bound. The second is an explicit dynamic programming solution to a two player LQR state-feedback problem with varying delays. Accommodating varying delays represents an important first step in combining distributed optimal control theory with the area of Networked Control Systems that considers lossy channels in the feedback loop. Our next set of results are concerned with controller architecture design. When designing controllers for large-scale systems, the architectural aspects of the controller such as the placement of actuators, sensors, and the communication links between them can no longer be taken as given -- indeed the task of designing this architecture is now as important as the design of the control laws themselves. To address this task, we formulate the Regularization for Design (RFD) framework, which is a unifying computationally tractable approach, based on the model matching framework and atomic norm regularization, for the simultaneous co-design of a structured optimal controller and the architecture needed to implement it. Our final result is a contribution to distributed system identification. Traditional system identification techniques such as subspace identification are not computationally scalable, and destroy rather than leverage any a priori information about the system's interconnection structure. We argue that in the context of system identification, an essential building block of any scalable algorithm is the ability to estimate local dynamics within a large interconnected system. To that end we propose a promising heuristic for identifying the dynamics of a subsystem that is still connected to a large system. We exploit the fact that the transfer function of the local dynamics is low-order, but full-rank, while the transfer function of the global dynamics is high-order, but low-rank, to formulate this separation task as a nuclear norm minimization problem. Finally, we conclude with a brief discussion of future research directions, with a particular emphasis on how to incorporate the results of this thesis, and those of optimal control theory in general, into a broader theory of dynamics, control and optimization in layered architectures.