955 resultados para Quadratic, sieve, CUDA, OpenMP, SOC, Tegrak1


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations,computing clusters and distributed cloud appliances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

By the Golod–Shafarevich theorem, an associative algebra $R$ given by $n$ generators and $<n^2/3$ homogeneous quadratic relations is not 5-step nilpotent. We prove that this estimate is optimal. Namely, we show that for every positive integer $n$, there is an algebra $R$ given by $n$ generators and $\lceil n^2/3\rceil$ homogeneous quadratic relations such that $R$ is 5-step nilpotent.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the book ’Quadratic algebras’ by Polishchuk and Positselski [23] algebras with a small number of generators (n = 2, 3) are considered. For some number r of relations possible Hilbert series are listed, and those appearing as series of Koszul algebras are specified. The first case, where it was not possible to do, namely the case of three generators n = 3 and six relations r = 6 is formulated as an open problem. We give here a complete answer to this question, namely for quadratic algebras with dimA_1 = dimA_2 = 3, we list all possible Hilbert series, and find out which of them can come from Koszul algebras, and which can not. As a consequence of this classification, we found an algebra, which serves as a counterexample to another problem from the same book [23] (Chapter 7, Sec. 1, Conjecture 2), saying that Koszul algebra of finite global homological dimension d has dimA_1 > d. Namely, the 3-generated algebra A given by relations xx + yx = xz = zy = 0 is Koszul and its Koszul dual algebra A^! has Hilbert series of degree 4: HA! (t) = 1 + 3t + 3t^2 + 2t^3 + t^4, hence A has global homological dimension 4.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hexagonal Resonant Triad patterns are shown to exist as stable solutions of a particular type of nonlinear field where no cubic field nonlinearity is present. The zero ‘dc’ Fourier mode is shown to stabilize these patterns produced by a pure quadratic field nonlinearity. Closed form solutions and stability results are obtained near the critical point, complimented by numerical studies far from the critical point. These results are obtained using a neural field based on the Helmholtzian operator. Constraints on structure and parameters for a general pure quadratic neural field which supports hexagonal patterns are obtained.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Nha Trang Bay (latitude 12°15'N) and central areas of Vietnam present strong ecological differences from other parts of the country.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

En este documento se expondrá una implementación del problema del viajante de comercio usando una implementación personalizada de un mapa auto-organizado basándose en soluciones anteriores y adaptándolas a la arquitectura CUDA, haciendo a la vez una comparativa de la implementación eficiente en CUDA C/C++ con la implementación de las funciones de GPU incluidas en el Parallel Computing Toolbox de Matlab. La solución que se da reduce en casi un cuarto las iteraciones necesarias para llegar a una solución buena del problema mencionado, además de la mejora inminente del uso de las arquitecturas paralelas. En esta solución se estudia la mejora en tiempo que se consigue con el uso específico de la memoria compartida, siendo esta una de las herramientas más potentes para mejorar el rendimiento. En lo referente a los tiempos de ejecución, se llega a concluir que la mejor solución es el lanzamiento de un kernel de CUDA desde Matlab a través de la funcionalidad incluida en el Parallel Computing Toolbox.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

International audience

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The equivalence of the noncommutative U(N) quantum field theories related by the θ-exact Seiberg-Witten maps is, in this paper, proven to all orders in the perturbation theory with respect to the coupling constant. We show that this holds for super Yang-Mills theories with N=0, 1, 2, 4 supersymmetry. A direct check of this equivalence relation is performed by computing the one-loop quantum corrections to the quadratic part of the effective action in the noncommutative U(1) gauge theory with N=0, 1, 2, 4 supersymmetry.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Production companies use raw materials to compose end-products. They often make different products with the same raw materials. In this research, the focus lies on the production of two end-products consisting of (partly) the same raw materials as cheap as possible. Each of the products has its own demand and quality requirements consisting of quadratic constraints. The minimization of the costs, given the quadratic constraints is a global optimization problem, which can be difficult because of possible local optima. Therefore, the multi modal character of the (bi-) blend problem is investigated. Standard optimization packages (solvers) in Matlab and GAMS were tested on their ability to solve the problem. In total 20 test cases were generated and taken from literature to test solvers on their effectiveness and efficiency to solve the problem. The research also gives insight in adjusting the quadratic constraints of the problem in order to make a robust problem formulation of the bi-blend problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dato il recente avvento delle tecnologie NGS, in grado di sequenziare interi genomi umani in tempi e costi ridotti, la capacità di estrarre informazioni dai dati ha un ruolo fondamentale per lo sviluppo della ricerca. Attualmente i problemi computazionali connessi a tali analisi rientrano nel topic dei Big Data, con databases contenenti svariati tipi di dati sperimentali di dimensione sempre più ampia. Questo lavoro di tesi si occupa dell'implementazione e del benchmarking dell'algoritmo QDANet PRO, sviluppato dal gruppo di Biofisica dell'Università di Bologna: il metodo consente l'elaborazione di dati ad alta dimensionalità per l'estrazione di una Signature a bassa dimensionalità di features con un'elevata performance di classificazione, mediante una pipeline d'analisi che comprende algoritmi di dimensionality reduction. Il metodo è generalizzabile anche all'analisi di dati non biologici, ma caratterizzati comunque da un elevato volume e complessità, fattori tipici dei Big Data. L'algoritmo QDANet PRO, valutando la performance di tutte le possibili coppie di features, ne stima il potere discriminante utilizzando un Naive Bayes Quadratic Classifier per poi determinarne il ranking. Una volta selezionata una soglia di performance, viene costruito un network delle features, da cui vengono determinate le componenti connesse. Ogni sottografo viene analizzato separatamente e ridotto mediante metodi basati sulla teoria dei networks fino all'estrapolazione della Signature finale. Il metodo, già precedentemente testato su alcuni datasets disponibili al gruppo di ricerca con riscontri positivi, è stato messo a confronto con i risultati ottenuti su databases omici disponibili in letteratura, i quali costituiscono un riferimento nel settore, e con algoritmi già esistenti che svolgono simili compiti. Per la riduzione dei tempi computazionali l'algoritmo è stato implementato in linguaggio C++ su HPC, con la parallelizzazione mediante librerie OpenMP delle parti più critiche.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression algorithm for multimedia contents, namely images, achieving high compression ratios, maintaining good image quality, Rodrigues et al. [2008]. However, in comparison with other existing algorithms, this algorithm takes some time to execute. Therefore, two parallel implementations for GPUs were proposed by Ribeiro [2016] and Silva [2015] in CUDA and OpenCL-GPU, respectively. In this dissertation, to complement the referred work, we propose two parallel versions that run the MMP algorithm in CPU: one resorting to OpenMP and another that converts the existing OpenCL-GPU into OpenCL-CPU. The proposed solutions are able to improve the computational performance of MMP by 3 and 2:7 , respectively. The High Efficiency Video Coding (HEVC/H.265) is the most recent standard for compression of image and video. Its impressive compression performance, makes it a target for many adaptations, particularly for holoscopic image/video processing (or light field). Some of the proposed modifications to encode this new multimedia content are based on geometry-based disparity compensations (SS), developed by Conti et al. [2014], and a Geometric Transformations (GT) module, proposed by Monteiro et al. [2015]. These compression algorithms for holoscopic images based on HEVC present an implementation of specific search for similar micro-images that is more efficient than the one performed by HEVC, but its implementation is considerably slower than HEVC. In order to enable better execution times, we choose to use the OpenCL API as the GPU enabling language in order to increase the module performance. With its most costly setting, we are able to reduce the GT module execution time from 6.9 days to less then 4 hours, effectively attaining a speedup of 45 .