1 resultado para P System

em Duke University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

<p>Distributed Computing frameworks belong to a class of programming models that allow developers top><p> launch workloads on large clusters of machines. Due to the dramatic increase in the volume ofp><p> data gathered by ubiquitous computing devices, data analytic workloads have become a commonp><p> case among distributed computing applications, making Data Science an entire field ofp><p> Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,p><p> a sequence of operations they wish to apply on this dataset, and some constraint they may havep><p> related to their work (performances, QoS, budget, etc). However, it is actually extremelyp><p> difficult, without domain expertise, to perform data science. One need to select the right amountp><p> and type of resources, pick up a framework, and configure it. Also, users are often running theirp><p> application in shared environments, ruled by schedulers expecting them to specify precisely their resourcep><p> needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and p><p> profiling are hard, high dimensional problems that block users from making the rightp><p> configuration choices and determining the right amount of resources they need. Paradoxically, the p><p> system is gathering a large amount of monitoring data at runtime, which remains unused.p><p> In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploitp><p> monitoring data to learn about workloads, and process user requests into a tailored executionp><p> context. In this work, we study different techniques that have been used to make steps towardp><p> such system awareness, and explore a new way to do so by implementing machine learningp><p> techniques to recommend a specific subset of system configurations for Apache Spark applications.p><p> Furthermore, we present an in depth study of Apache Spark executors configuration, which highlightp><p> the complexity in choosing the best one for a given workload.p>