904 resultados para 291605 Processor Architectures


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Consider scheduling of real-time tasks on a multiprocessor where migration is forbidden. Specifically, consider the problem of determining a task-to-processor assignment for a given collection of implicit-deadline sporadic tasks upon a multiprocessor platform in which there are two distinct types of processors. For this problem, we propose a new algorithm, LPC (task assignment based on solving a Linear Program with Cutting planes). The algorithm offers the following guarantee: for a given task set and a platform, if there exists a feasible task-to-processor assignment, then LPC succeeds in finding such a feasible task-to-processor assignment as well but on a platform in which each processor is 1.5 × faster and has three additional processors. For systems with a large number of processors, LPC has a better approximation ratio than state-of-the-art algorithms. To the best of our knowledge, this is the first work that develops a provably good real-time task assignment algorithm using cutting planes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Consider the problem of scheduling a task set τ of implicit-deadline sporadic tasks to meet all deadlines on a t-type heterogeneous multiprocessor platform where tasks may access multiple shared resources. The multiprocessor platform has m k processors of type-k, where k∈{1,2,…,t}. The execution time of a task depends on the type of processor on which it executes. The set of shared resources is denoted by R. For each task τ i , there is a resource set R i ⊆R such that for each job of τ i , during one phase of its execution, the job requests to hold the resource set R i exclusively with the interpretation that (i) the job makes a single request to hold all the resources in the resource set R i and (ii) at all times, when a job of τ i holds R i , no other job holds any resource in R i . Each job of task τ i may request the resource set R i at most once during its execution. A job is allowed to migrate when it requests a resource set and when it releases the resource set but a job is not allowed to migrate at other times. Our goal is to design a scheduling algorithm for this problem and prove its performance. We propose an algorithm, LP-EE-vpr, which offers the guarantee that if an implicit-deadline sporadic task set is schedulable on a t-type heterogeneous multiprocessor platform by an optimal scheduling algorithm that allows a job to migrate only when it requests or releases a resource set, then our algorithm also meets the deadlines with the same restriction on job migration, if given processors 4×(1+MAXP×⌈|P|×MAXPmin{m1,m2,…,mt}⌉) times as fast. (Here MAXP and |P| are computed based on the resource sets that tasks request.) For the special case that each task requests at most one resource, the bound of LP-EE-vpr collapses to 4×(1+⌈|R|min{m1,m2,…,mt}⌉). To the best of our knowledge, LP-EE-vpr is the first algorithm with proven performance guarantee for real-time scheduling of sporadic tasks with resource sharing on t-type heterogeneous multiprocessors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task scheduling is one of the key mechanisms to ensure timeliness in embedded real-time systems. Such systems have often the need to execute not only application tasks but also some urgent routines (e.g. error-detection actions, consistency checkers, interrupt handlers) with minimum latency. Although fixed-priority schedulers such as Rate-Monotonic (RM) are in line with this need, they usually make a low processor utilization available to the system. Moreover, this availability usually decreases with the number of considered tasks. If dynamic-priority schedulers such as Earliest Deadline First (EDF) are applied instead, high system utilization can be guaranteed but the minimum latency for executing urgent routines may not be ensured. In this paper we describe a scheduling model according to which urgent routines are executed at the highest priority level and all other system tasks are scheduled by EDF. We show that the guaranteed processor utilization for the assumed scheduling model is at least as high as the one provided by RM for two tasks, namely 2(2√−1). Seven polynomial time tests for checking the system timeliness are derived and proved correct. The proposed tests are compared against each other and to an exact but exponential running time test.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a new parallel implementation of a previously hyperspectral coded aperture (HYCA) algorithm for compressive sensing on graphics processing units (GPUs). HYCA method combines the ideas of spectral unmixing and compressive sensing exploiting the high spatial correlation that can be observed in the data and the generally low number of endmembers needed in order to explain the data. The proposed implementation exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs using shared memory and coalesced accesses to memory. The proposed algorithm is evaluated not only in terms of reconstruction error but also in terms of computational performance using two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN. Experimental results using real data reveals signficant speedups up with regards to serial implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Trabalho final de Mestrado para obtenção do grau de Mestre em Engenharia de Redes de Comunicação e Multimédia

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coupling five rigid or flexible bis(pyrazolato)based tectons with late transition metal ions allowed us to isolate 18 coordination polymers (CPs). As assessed by thermal analysis, all of them possess a remarkable thermal stability, their decomposition temperatures lying in the range of 340-500 degrees C. As demonstrated by N-2 adsorption measurements at 77 K, their Langmuir specific surface areas span the rather vast range of 135-1758 m(2)/g, in agreement with the porous or dense polymeric architectures retrieved by powder X-ray diffraction structure solution methods. Two representative families of CPs, built up with either rigid or flexible spacers, were tested as catalysts in (0 the microwave-assisted solvent-free peroxidative oxidation of alcohols by t-BuOOH, and (ii) the peroxidative oxidation of cydohexane to cydohexanol and cydohexanone by H2O2 in acetonitrile. Those CPs bearing the rigid spacer, concurrently possessing higher specific surface areas, are more active than the corresponding ones with the flexible spacer. Moreover, the two copper(I)-containing CPs investigated exhibit the highest efficiency in both reactions, leading selectively to a maximum product yield of 92% (and TON up to 1.5 x 10(3)) in the oxidation of 1-phenylethanol and of 11% in the oxidation of cydohexane, the latter value being higher than that granted by the current industrial process.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nos últimos anos, o processo de ensino e aprendizagem tem sofrido significativas alterações graças ao aparecimento da Internet. Novas ferramentas para apoio ao ensino têm surgido, nas quais se destacam os laboratórios remotos. Atualmente, muitas instituições de ensino disponibilizam laboratórios remotos nos seus cursos, que permitem, a professores e alunos, a realização de experiências reais através da Internet. Estes são implementados por diferentes arquiteturas e infraestruturas, suportados por vários módulos de laboratório acessíveis remotamente (e.g. instrumentos de medição). No entanto, a sua inclusão no ensino é ainda deficitária, devido: i) à falta de meios e competências técnicas das instituições de ensino para os desenvolverem, ii) à dificuldade na partilha dos módulos de laboratório por diferentes infraestruturas e, iii) à reduzida capacidade de os reconfigurar com esses módulos. Para ultrapassar estas limitações, foi idealizado e desenvolvido no âmbito de um trabalho de doutoramento [1] um protótipo, cuja arquitetura é baseada na norma IEEE 1451.0 e na tecnologia de FPGAs. Para além de garantir o desenvolvimento e o acesso de forma normalizada a um laboratório remoto, este protótipo promove ainda a partilha de módulos de laboratório por diferentes infraestruturas. Nesse trabalho explorou-se a capacidade de reconfiguração de FPGAs para embutir na infraestrutura do laboratório vários módulos, todos descritos em ficheiros, utilizando linguagens de descrição de hardware estruturados de acordo com a norma IEEE 1451.0. A definição desses módulos obriga à criação de estruturas de dados binárias (Transducer Electronic Data Sheets, TEDSs), bem como de outros ficheiros que possibilitam a sua interligação com a infraestrutura do laboratório. No entanto, a criação destes ficheiros é bastante complexa, uma vez que exige a realização de vários cálculos e conversões. Tendo em consideração essa mesma complexidade, esta dissertação descreve o desenvolvimento de uma aplicação Web para leitura e escrita dos TEDSs. Para além de um estudo sobre os laboratórios remotos, é efetuada uma descrição da norma IEEE 1451.0, com particular atenção para a sua arquitetura e para a estrutura dos diferentes TEDSs. Com o objetivo de enquadrar a aplicação desenvolvida, efetua-se ainda uma breve apresentação de um protótipo de um laboratório remoto reconfigurável, cuja reconfiguração é apoiada por esta aplicação. Por fim, é descrita a verificação da aplicação Web, de forma a tirar conclusões sobre o seu contributo para a simplificação dessa reconfiguração.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Atualmente, verifica-se um aumento na necessidade de software feito à medida do cliente, que se consiga adaptar de forma rápida as constantes mudanças da sua área de negócio. Cada cliente tem os seus problemas concretos que precisa de resolver, não lhe sendo muitas vezes possível dispensar uma elevada quantidade de recursos para atingir os fins pretendidos. De forma a dar resposta a estes problemas surgiram várias arquiteturas e metodologias de desenvolvimento de software, que permitem o desenvolvimento ágil de aplicações altamente configuráveis, que podem ser personalizadas por qualquer utilizador das mesmas. Este dinamismo, trazido para as aplicações sobre a forma de modelos que são personalizados pelos utilizadores e interpretados por uma plataforma genérica, cria maiores desafios no momento de realizar testes, visto existir um número de variáveis consideravelmente maior que numa aplicação com uma arquitetura tradicional. É necessário, em todos os momentos, garantir a integridade de todos os modelos, bem como da plataforma responsável pela sua interpretação, sem ser necessário o desenvolvimento constante de aplicações para suportar os testes sobre os diferentes modelos. Esta tese debruça-se sobre uma aplicação, a plataforma myMIS, que permite a interpretação de modelos orientados à gestão, escritos numa linguagem específica de domínio, sendo realizada a avaliação do estado atual e definida uma proposta de práticas de testes a aplicar no desenvolvimento da mesma. A proposta resultante desta tese permitiu verificar que, apesar das dificuldades inerentes à arquitetura da aplicação, o desenvolvimento de testes de uma forma genérica é possível, podendo as mesmas lógicas ser utilizadas para o teste de diversos modelos distintos.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this talk, we discuss a scheduling problem that originated at TAP - Maintenance & Engineering - the maintenance, repair and overhaul organization of Portugal’s leading airline. In the repair process of aircrafts’ engines, the operations to be scheduled may be executed on a certain workstation by any processor of a given set, and the objective is to minimize the total weighted tardiness. A mixed integer linear programming formulation, based on the flexible job shop scheduling, is presented here, along with computational experiment on a real instance, provided by TAP-ME, from a regular working week. The model was also tested using benchmarking instances available in literature.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Text based on the paper presented at the Conference "Autonomous systems: inter-relations of technical and societal issues" held at Monte de Caparica (Portugal), Universidade Nova de Lisboa, November, 5th and 6th 2009 and organized by IET-Research Centre on Enterprise and Work Innovation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Maintaining a high level of data security with a low impact on system performance is more challenging in wireless multimedia applications. Protocols that are used for wireless local area network (WLAN) security are known to significantly degrade performance. In this paper, we propose an enhanced security system for a WLAN. Our new design aims to decrease the processing delay and increase both the speed and throughput of the system, thereby making it more efficient for multimedia applications. Our design is based on the idea of offloading computationally intensive encryption and authentication services to the end systems’ CPUs. The security operations are performed by the hosts’ central processor (which is usually a powerful processor) before delivering the data to a wireless card (which usually has a low-performance processor). By adopting this design, we show that both the delay and the jitter are significantly reduced. At the access point, we improve the performance of network processing hardware for real-time cryptographic processing by using a specialized processor implemented with field-programmable gate array technology. Furthermore, we use enhanced techniques to implement the Counter (CTR) Mode with Cipher Block Chaining Message Authentication Code Protocol (CCMP) and the CTR protocol. Our experiments show that it requires timing in the range of 20–40 μs to perform data encryption and authentication on different end-host CPUs (e.g., Intel Core i5, i7, and AMD 6-Core) as compared with 10–50 ms when performed using the wireless card. Furthermore, when compared with the standard WiFi protected access II (WPA2), results show that our proposed security system improved the speed to up to 3.7 times.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer time-multiplexing and dynamic applications parallelism to enhance device utilization and reduce energy consumption at the cost of additional memory (up to 50% area of the overall platform). To reduce the memory overheads, novel CGRAs employ either statistical compression, intermediate compact representation, or multicasting. Each compaction technique has different properties (i.e. compression ratio, decompression time and decompression energy) and is best suited for a particular class of applications. However, existing research only deals with these methods separately. Moreover, they only analyze the compaction ratio and do not evaluate the associated energy overheads. To tackle these issues, we propose a polymorphic compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to take advantage of a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (up to 52%), decompression energy (up to 4 orders of magnitude), and configuration time (from 33 n to 1.5 s) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.