Biblioteca Digital

991 resultados para multi-execution

Fighting Evasive Malware with DVasion

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Malware is a foundational component of cyber crime that enables an attacker to modify the normal operation of a computer or access sensitive, digital information. Despite the extensive research performed to identify such programs, existing schemes fail to detect evasive malware, an increasingly popular class of malware that can alter its behavior at run-time, making it difficult to detect using today’s state of the art malware analysis systems. In this thesis, we present DVasion, a comprehensive strategy that exposes such evasive behavior through a multi-execution technique. DVasion successfully detects behavior that would have been missed by traditional, single-execution approaches, while addressing the limitations of previously proposed multi-execution systems. We demonstrate the accuracy of our system through strong parallels with existing work on evasive malware, as well as uncover the hidden behavior within 167 of 1,000 samples.

Fault-Tolerant Average Execution Time Optimization for General Purpose Multi-Processor System-on Chips

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.

CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Pi: A Parallel Architecture Interface for Multi-Model Execution

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.

A Task-Graph Execution Manager for Reconfigurable Multi-tasking Systems

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Reconfigurable hardware can be used to build multi tasking systems that dynamically adapt themselves to the requirements of the running applications. This is especially useful in embedded systems, since the available resources are very limited and the reconfigurable hardware can be reused for different applications. In these systems computations are frequently represented as task graphs that are executed taking into account their internal dependencies and the task schedule. The management of the task graph execution is critical for the system performance. In this regard, we have developed two dif erent versions, a software module and a hardware architecture, of a generic task-graph execution manager for reconfigurable multi-tasking systems. The second version reduces the run-time management overheads by almost two orders of magnitude. Hence it is especially suitable for systems with exigent timing constraints. Both versions include specific support to optimize the reconfiguration process.

Multi-robot control in highly dynamic, competitive environments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The control and coordination of multiple mobile robots is a challenging task; particularly in environments with multiple, rapidly moving obstacles and agents. This paper describes a robust approach to multi-robot control, where robustness is gained from competency at every layer of robot control. The layers are: (i) a central coordination system (MAPS), (ii) an action system (AES), (iii) a navigation module, and (iv) a low level dynamic motion control system. The multi-robot coordination system assigns each robot a role and a sub-goal. Each robots action execution system then assumes the assigned role and attempts to achieve the specified sub-goal. The robots navigation system directs the robot to specific goal locations while ensuring that the robot avoids any obstacles. The motion system maps the heading and speed information from the navigation system to force-constrained motion. This multi-robot system has been extensively tested and applied in the robot soccer domain using both centralized and distributed coordination.

CubIT : Large-scale multi-user presentation and collaboration

Relevância:

30.00% 30.00%

Publicador:

Resumo:

CubIT is a multi-user, large-scale presentation and collaboration framework installed at the Queensland University of Technology’s (QUT) Cube facility, an interactive facility made up 48 multi-touch screens and very large projected display screens. CubIT was built to make the Cube facility accessible to QUT’s academic and student population. The system allows users to upload, interact with and share media content on the Cube’s very large display surfaces. CubIT implements a unique combination of features including RFID authentication, content management through multiple interfaces, multi-user shared workspace support, drag and drop upload and sharing, dynamic state control between different parts of the system and execution and synchronisation of the system across multiple computing nodes.

Multi-party computation with conversion of secret sharing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Classical results in unconditionally secure multi-party computation (MPC) protocols with a passive adversary indicate that every n-variate function can be computed by n participants, such that no set of size t < n/2 participants learns any additional information other than what they could derive from their private inputs and the output of the protocol. We study unconditionally secure MPC protocols in the presence of a passive adversary in the trusted setup (‘semi-ideal’) model, in which the participants are supplied with some auxiliary information (which is random and independent from the participant inputs) ahead of the protocol execution (such information can be purchased as a “commodity” well before a run of the protocol). We present a new MPC protocol in the trusted setup model, which allows the adversary to corrupt an arbitrary number t < n of participants. Our protocol makes use of a novel subprotocol for converting an additive secret sharing over a field to a multiplicative secret sharing, and can be used to securely evaluate any n-variate polynomial G over a field F, with inputs restricted to non-zero elements of F. The communication complexity of our protocol is O(ℓ · n 2) field elements, where ℓ is the number of non-linear monomials in G. Previous protocols in the trusted setup model require communication proportional to the number of multiplications in an arithmetic circuit for G; thus, our protocol may offer savings over previous protocols for functions with a small number of monomials but a large number of multiplications.

Grids with multiple batch systems for performance enhancement of multi-component and parameter sweep parallel applications

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we evaluate the benefits of using Grids with multiple batch systems to improve the performance of multi-component and parameter sweep parallel applications by reduction in queue waiting times. Using different job traces of different loads, job distributions and queue waiting times corresponding to three different queuing policies(FCFS, conservative and EASY backfilling), we conducted a large number of experiments using simulators of two important classes of applications. The first simulator models Community Climate System Model (CCSM), a prominent multi-component application and the second simulator models parameter sweep applications. We compare the performance of the applications when executed on multiple batch systems and on a single batch system for different system and application configurations. We show that there are a large number of configurations for which application execution using multiple batch systems can give improved performance over execution on a single system.

Software Pipelined Execution of Stream Programs on GPUs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.

Accelerating multi-core simulators

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.

Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As computational Grids are increasingly used for executing long running multi-phase parallel applications, it is important to develop efficient rescheduling frameworks that adapt application execution in response to resource and application dynamics. In this paper, three strategies or algorithms have been developed for deciding when and where to reschedule parallel applications that execute on multi-cluster Grids. The algorithms derive rescheduling plans that consist of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. Using large number of simulations, it is shown that the rescheduling plans developed by the algorithms can lead to large decrease in application execution times when compared to executions without rescheduling on dynamic Grid resources. The rescheduling plans generated by the algorithms are also shown to be competitive when compared to the near-optimal plans generated by brute-force methods. Of the algorithms, genetic algorithm yielded the most efficient rescheduling plans with 9-12% smaller average execution times than the other algorithms.

Executing Long-running Multi-component Applications on Batch Grids

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computational grids are increasingly being used for executing large multi-component scientific applications. The most widely reported advantages of application execution on grids are the performance benefits, in terms of speeds, problem sizes or quality of solutions, due to increased number of processors. We explore the possibility of improved performance on grids without increasing the application’s processor space. For this, we consider grids with multiple batch systems. We explore the challenges involved in and the advantages of executing long-running multi-component applications on multiple batch sites with a popular multi-component climate simulation application, CCSM, as the motivation.We have performed extensive simulation studies to estimate the single and multi-site execution rates of the applications for different system characteristics.Our experiments show that in many cases, multiple batch executions can have better execution rates than a single site execution.

Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.

MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a decentralized/peer-to-peer architecture-based parallel version of the vector evaluated particle swarm optimization (VEPSO) algorithm for multi-objective design optimization of laminated composite plates using message passing interface (MPI). The design optimization of laminated composite plates being a combinatorially explosive constrained non-linear optimization problem (CNOP), with many design variables and a vast solution space, warrants the use of non-parametric and heuristic optimization algorithms like PSO. Optimization requires minimizing both the weight and cost of these composite plates, simultaneously, which renders the problem multi-objective. Hence VEPSO, a multi-objective variant of the PSO algorithm, is used. Despite the use of such a heuristic, the application problem, being computationally intensive, suffers from long execution times due to sequential computation. Hence, a parallel version of the PSO algorithm for the problem has been developed to run on several nodes of an IBM P720 cluster. The proposed parallel algorithm, using MPI's collective communication directives, establishes a peer-to-peer relationship between the constituent parallel processes, deviating from the more common master-slave approach, in achieving reduction of computation time by factor of up to 10. Finally we show the effectiveness of the proposed parallel algorithm by comparing it with a serial implementation of VEPSO and a parallel implementation of the vector evaluated genetic algorithm (VEGA) for the same design problem. (c) 2012 Elsevier Ltd. All rights reserved.

«
1
2
3
4
5
6
7
8
...
66
67
»