6 resultados para Constraint programming (Computer science)
em Duke University
Resumo:
Nucleic Acid hairpins have been a subject of study for the last four decades. They are composed of single strand that is
hybridized to itself, and the central section forming an unhybridized loop. In nature, they stabilize single stranded RNA, serve as nucleation
sites for RNA folding, protein recognition signals, mRNA localization and regulation of mRNA degradation. On the other hand,
DNA hairpins in biological contexts have been studied with respect to forming cruciform structures that can regulate gene expression.
The use of DNA hairpins as fuel for synthetic molecular devices, including locomotion, was proposed and experimental demonstrated in 2003. They
were interesting because they bring to the table an on-demand energy/information supply mechanism.
The energy/information is hidden (from hybridization) in the hairpin’s loop, until required.
The energy/information is harnessed by opening the stem region, and exposing the single stranded loop section.
The loop region is now free for possible hybridization and help move the system into a thermodynamically favourable state.
The hidden energy and information coupled with
programmability provides another functionality, of selectively choosing what reactions to hide and
what reactions to allow to proceed, that helps develop a topological sequence of events.
Hairpins have been utilized as a source of fuel for many different DNA devices. In this thesis, we program four different
molecular devices using DNA hairpins, and experimentally validate them in the
laboratory. 1) The first device: A
novel enzyme-free autocatalytic self-replicating system composed entirely of DNA that operates isothermally. 2) The second
device: Time-Responsive Circuits using DNA have two properties: a) asynchronous: the final output is always correct
regardless of differences in the arrival time of different inputs.
b) renewable circuits which can be used multiple times without major degradation of the gate motifs
(so if the inputs change over time, the DNA-based circuit can re-compute the output correctly based on the new inputs).
3) The third device: Activatable tiles are a theoretical extension to the Tile assembly model that enhances
its robustness by protecting the sticky sides of tiles until a tile is partially incorporated into a growing assembly.
4) The fourth device: Controlled Amplification of DNA catalytic system: a device such that the amplification
of the system does not run uncontrollably until the system runs out of fuel, but instead achieves a finite
amount of gain.
Nucleic acid circuits with the ability
to perform complex logic operations have many potential practical applications, for example the ability to achieve point of care diagnostics.
We discuss the designs of our DNA Hairpin molecular devices, the results we have obtained, and the challenges we have overcome
to make these truly functional.
Resumo:
Distributed Computing frameworks belong to a class of programming models that allow developers to
launch workloads on large clusters of machines. Due to the dramatic increase in the volume of
data gathered by ubiquitous computing devices, data analytic workloads have become a common
case among distributed computing applications, making Data Science an entire field of
Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,
a sequence of operations they wish to apply on this dataset, and some constraint they may have
related to their work (performances, QoS, budget, etc). However, it is actually extremely
difficult, without domain expertise, to perform data science. One need to select the right amount
and type of resources, pick up a framework, and configure it. Also, users are often running their
application in shared environments, ruled by schedulers expecting them to specify precisely their resource
needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and
profiling are hard, high dimensional problems that block users from making the right
configuration choices and determining the right amount of resources they need. Paradoxically, the
system is gathering a large amount of monitoring data at runtime, which remains unused.
In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit
monitoring data to learn about workloads, and process user requests into a tailored execution
context. In this work, we study different techniques that have been used to make steps toward
such system awareness, and explore a new way to do so by implementing machine learning
techniques to recommend a specific subset of system configurations for Apache Spark applications.
Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight
the complexity in choosing the best one for a given workload.
Resumo:
Allocating resources optimally is a nontrivial task, especially when multiple
self-interested agents with conflicting goals are involved. This dissertation
uses techniques from game theory to study two classes of such problems:
allocating resources to catch agents that attempt to evade them, and allocating
payments to agents in a team in order to stabilize it. Besides discussing what
allocations are optimal from various game-theoretic perspectives, we also study
how to efficiently compute them, and if no such algorithms are found, what
computational hardness results can be proved.
The first class of problems is inspired by real-world applications such as the
TOEFL iBT test, course final exams, driver's license tests, and airport security
patrols. We call them test games and security games. This dissertation first
studies test games separately, and then proposes a framework of Catcher-Evader
games (CE games) that generalizes both test games and security games. We show
that the optimal test strategy can be efficiently computed for scored test
games, but it is hard to compute for many binary test games. Optimal Stackelberg
strategies are hard to compute for CE games, but we give an empirically
efficient algorithm for computing their Nash equilibria. We also prove that the
Nash equilibria of a CE game are interchangeable.
The second class of problems involves how to split a reward that is collectively
obtained by a team. For example, how should a startup distribute its shares, and
what salary should an enterprise pay to its employees. Several stability-based
solution concepts in cooperative game theory, such as the core, the least core,
and the nucleolus, are well suited to this purpose when the goal is to avoid
coalitions of agents breaking off. We show that some of these solution concepts
can be justified as the most stable payments under noise. Moreover, by adjusting
the noise models (to be arguably more realistic), we obtain new solution
concepts including the partial nucleolus, the multiplicative least core, and the
multiplicative nucleolus. We then study the computational complexity of those
solution concepts under the constraint of superadditivity. Our result is based
on what we call Small-Issues-Large-Team games and it applies to popular
representation schemes such as MC-nets.
Resumo:
Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.
A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.
The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.
From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.
Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.
Resumo:
This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Resumo:
With the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.
The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.
The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this
shortest-path cover problem.