965 resultados para ubiquitous multi-core framework


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an innovative fusion based multi-classifier email classification on a ubiquitous multi-core architecture. Many approaches use text-based single classifiers or multiple weakly trained classifiers to identify spam messages from a large email corpus. We build upon our previous work on multi-core by apply our ubiquitous multi-core framework to run our fusion based multi-classifier architecture. By running each classifier process in parallel within their dedicated core, we greatly improve the performance of our proposed multi-classifier based filtering system. Our proposed architecture also provides a safeguard of user mailbox from different malicious attacks. Our experimental results show that we achieved an average of 30% speedup at the average cost of 1.4 ms. We also reduced the instance of false positive, which is one of the key challenges in spam filtering system, and increases email classification accuracy substantially compared with single classification techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed Denial of Service attacks is one of the most challenging areas to deal with in Security. Not only do security managers have to deal with flood and vulnerability attacks. They also have to consider whether they are from legitimate or malicious attackers. In our previous work we developed a framework called bodyguard, which is to help security software developers from the current serialized paradigm, to a multi-core paradigm. In this paper, we update our research work by moving our bodyguard paradigm, into our new Ubiquitous Multi-Core Framework. From this shift, we show a marked improvement from our previous result of 20% to 110% speedup performance with an average cost of 1.5 ms. We also conducted a second series of experiments, which we trained up Neural Network, and tested it against actual DDoS attack traffic. From these experiments, we were able to achieve an average of 93.36%, of this attack traffic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an innovative fusion-based multi-classifier e-mail classification on a ubiquitous multicore architecture. Many previous approaches used text-based single classifiers to identify spam messages from a large e-mail corpus with some amount of false positive tradeoffs. Researchers are trying to prevent false positive in their filtering methods, but so far none of the current research has claimed zero false positive results. In e-mail classification false positive can potentially cause serious problems for the user. In this paper, we use fusion-based multi-classifier classification technique in a multi-core framework. By running each classifier process in parallel within their dedicated core, we greatly improve the performance of our multi-classifier-based filtering system in terms of running time, false positive rate, and filtering accuracy. Our proposed architecture also provides a safeguard of user mailbox from different malicious attacks. Our experimental results show that we achieved an average of 30% speedup at an average cost of 1.4 ms. We also reduced the instances of false positives, which are one of the key challenges in a spam filtering system, and increases e-mail classification accuracy substantially compared with single classification techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For more than a decade now, multimedia developers have usually "ride the waves", so to speak, with the coming of each generation of microprocessors, which allows their applications, designs and programs to usually running more proficiently, efficiently and effectively. This so-called "free" ride seems to be coming to an end, with results of increases clock speeds, the widening of the gap in processor and memory performance, and the tradeoffs that are needed to meet the former two points, with the new multi-core systems. In this paper, we build upon our previous work within multi-core systems, by proposing a ubiquitous multi-core (UM) design. The goal of such a framework is help researchers to plan and implement their multimedia applications so they can take advantage of speed up computations of multi-core systems and allow real-time multimedia. As our experiments show, our UM system increases performance speeds at an average of 100%, with the average execution cost of 1.4ms, showing that multimedia can use multi-core resources efficiently and effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented for a simple model stencil running on Intel and AMD CPUs as well as the NVIDIA GT200 GPU. The generality of the framework is demonstrated through the implementation of a complete application consisting of many different stencil computations, taken from the field of computational fluid dynamics. © 2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The last decade has witnessed a major shift towards the deployment of embedded applications on multi-core platforms. However, real-time applications have not been able to fully benefit from this transition, as the computational gains offered by multi-cores are often offset by performance degradation due to shared resources, such as main memory. To efficiently use multi-core platforms for real-time systems, it is hence essential to tightly bound the interference when accessing shared resources. Although there has been much recent work in this area, a remaining key problem is to address the diversity of memory arbiters in the analysis to make it applicable to a wide range of systems. This work handles diverse arbiters by proposing a general framework to compute the maximum interference caused by the shared memory bus and its impact on the execution time of the tasks running on the cores, considering different bus arbiters. Our novel approach clearly demarcates the arbiter-dependent and independent stages in the analysis of these upper bounds. The arbiter-dependent phase takes the arbiter and the task memory-traffic pattern as inputs and produces a model of the availability of the bus to a given task. Then, based on the availability of the bus, the arbiter-independent phase determines the worst-case request-release scenario that maximizes the interference experienced by the tasks due to the contention for the bus. We show that the framework addresses the diversity problem by applying it to a memory bus shared by a fixed-priority arbiter, a time-division multiplexing (TDM) arbiter, and an unspecified work-conserving arbiter using applications from the MediaBench test suite. We also experimentally evaluate the quality of the analysis by comparison with a state-of-the-art TDM analysis approach and consistently showing a considerable reduction in maximum interference.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today's security program developers are not only facing an uphill battle of developing and implementing. But now have to take into consideration, the emergence of next generation of multi-core system, and its effect on security application design. In our previous work, we developed a framework called bodyguard. The objective of this framework was to help security software developers, shift from their use of serialized paradigm, to a multi-core paradigm. Working within this paradigm, we developed a security bodyguard system called Farmer. This abstract framework placed particular applications into categories, like security or multi-media, which were ran on separate core processors within the multi-core system. With further analysis of the bodyguard paradigm, we found that this paradigm was suitable to be used in other computer science areas, such as spam filtering and multi-media. In this paper, we update our research work within the bodyguard paradigm, and showed a marked improvement of 110% speedup performance with an average cost of 1.5 ms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single- Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications. © 2012 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An approach of building distributed decision support systems is proposed. There is defined a framework of a distributed DSS and examined questions of problem formulation and solving using artificial intellectual agents in system core.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose an extension to the I/O device architecture, as recommended in the PCI-SIG IOV specification, for virtualizing network I/O devices. The aim is to enable fine-grained controls to a virtual machine on the I/O path of a shared device. The architecture allows native access of I/O devices to virtual machines and provides device level QoS hooks for controlling VM specific device usage. For evaluating the architecture we use layered queuing network (LQN) models. We implement the architecture and evaluate it using simulation techniques, on the LQN model, to demonstrate the benefits. With the architecture, the benefit for network I/O is 60% more than what can be expected on the existing architecture. Also, the proposed architecture improves scalability in terms of the number of virtual machines intending to share the I/O device.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.