3 resultados para cloud computing resources

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Allocating resources optimally is a nontrivial task, especially when multiple

self-interested agents with conflicting goals are involved. This dissertation

uses techniques from game theory to study two classes of such problems:

allocating resources to catch agents that attempt to evade them, and allocating

payments to agents in a team in order to stabilize it. Besides discussing what

allocations are optimal from various game-theoretic perspectives, we also study

how to efficiently compute them, and if no such algorithms are found, what

computational hardness results can be proved.

The first class of problems is inspired by real-world applications such as the

TOEFL iBT test, course final exams, driver's license tests, and airport security

patrols. We call them test games and security games. This dissertation first

studies test games separately, and then proposes a framework of Catcher-Evader

games (CE games) that generalizes both test games and security games. We show

that the optimal test strategy can be efficiently computed for scored test

games, but it is hard to compute for many binary test games. Optimal Stackelberg

strategies are hard to compute for CE games, but we give an empirically

efficient algorithm for computing their Nash equilibria. We also prove that the

Nash equilibria of a CE game are interchangeable.

The second class of problems involves how to split a reward that is collectively

obtained by a team. For example, how should a startup distribute its shares, and

what salary should an enterprise pay to its employees. Several stability-based

solution concepts in cooperative game theory, such as the core, the least core,

and the nucleolus, are well suited to this purpose when the goal is to avoid

coalitions of agents breaking off. We show that some of these solution concepts

can be justified as the most stable payments under noise. Moreover, by adjusting

the noise models (to be arguably more realistic), we obtain new solution

concepts including the partial nucleolus, the multiplicative least core, and the

multiplicative nucleolus. We then study the computational complexity of those

solution concepts under the constraint of superadditivity. Our result is based

on what we call Small-Issues-Large-Team games and it applies to popular

representation schemes such as MC-nets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed Computing frameworks belong to a class of programming models that allow developers to

launch workloads on large clusters of machines. Due to the dramatic increase in the volume of

data gathered by ubiquitous computing devices, data analytic workloads have become a common

case among distributed computing applications, making Data Science an entire field of

Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,

a sequence of operations they wish to apply on this dataset, and some constraint they may have

related to their work (performances, QoS, budget, etc). However, it is actually extremely

difficult, without domain expertise, to perform data science. One need to select the right amount

and type of resources, pick up a framework, and configure it. Also, users are often running their

application in shared environments, ruled by schedulers expecting them to specify precisely their resource

needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and

profiling are hard, high dimensional problems that block users from making the right

configuration choices and determining the right amount of resources they need. Paradoxically, the

system is gathering a large amount of monitoring data at runtime, which remains unused.

In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit

monitoring data to learn about workloads, and process user requests into a tailored execution

context. In this work, we study different techniques that have been used to make steps toward

such system awareness, and explore a new way to do so by implementing machine learning

techniques to recommend a specific subset of system configurations for Apache Spark applications.

Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight

the complexity in choosing the best one for a given workload.