414 resultados para runtime bloat
Resumo:
The correctness of a hard real-time system depends its ability to meet all its deadlines. Existing real-time systems use either a pure real-time scheduler or a real-time scheduler embedded as a real-time scheduling class in the scheduler of an operating system (OS). Existing implementations of schedulers in multicore systems that support real-time and non-real-time tasks, permit the execution of non-real-time tasks in all the cores with priorities lower than those of real-time tasks, but interrupts and softirqs associated with these non-real-time tasks can execute in any core with priorities higher than those of real-time tasks. As a result, the execution overhead of real-time tasks is quite large in these systems, which, in turn, affects their runtime. In order that the hard real-time tasks can be executed in such systems with minimal interference from other Linux tasks, we propose, in this paper, an integrated scheduler architecture, called SchedISA, which aims to considerably reduce the execution overhead of real-time tasks in these systems. In order to test the efficacy of the proposed scheduler, we implemented partitioned earliest deadline first (P-EDF) scheduling algorithm in SchedISA on Linux kernel, version 3.8, and conducted experiments on Intel core i7 processor with eight logical cores. We compared the execution overhead of real-time tasks in the above implementation of SchedISA with that in SCHED_DEADLINE's P-EDF implementation, which concurrently executes real-time and non-real-time tasks in Linux OS in all the cores. The experimental results show that the execution overhead of real-time tasks in the above implementation of SchedISA is considerably less than that in SCHED_DEADLINE. We believe that, with further refinement of SchedISA, the execution overhead of real-time tasks in SchedISA can be reduced to a predictable maximum, making it suitable for scheduling hard real-time tasks without affecting the CPU share of Linux tasks.
Resumo:
The growing number of applications and processing units in modern Multiprocessor Systems-on-Chips (MPSoCs) come along with reduced time to market. Different IP cores can come from different vendors, and their trust levels are also different, but typically they use Network-on-Chip (NoC) as their communication infrastructure. An MPSoC can have multiple Trusted Execution Environments (TEEs). Apart from performance, power, and area research in the field of MPSoC, robust and secure system design is also gaining importance in the research community. To build a secure system, the designer must know beforehand all kinds of attack possibilities for the respective system (MPSoC). In this paper we survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC architecture. We show the validity of this attack by analyzing different present-day NoC architectures and show that they are all vulnerable to this type of attack. By launching a router attack, an attacker can control the whole chip very easily, which makes it a very serious issue. Both routing tables and routing logic-based routers are vulnerable to such attacks. In this paper, we address attacks on routing tables. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple TEEs. Synthesis results show that proposed countermeasures, viz. Runtime-monitor, Restart-monitor, Intermediate manager, and Auditor, occupy areas that are 26.6, 22, 0.2, and 12.2 % of a routing table-based router area. Apart from these, we propose Ejection address checker and Local monitoring module inside a router that cause 3.4 and 10.6 % increase of a router area, respectively. Simulation results are also given, which shows effectiveness of proposed monitoring-based countermeasures.
Resumo:
The time division multiple access (TDMA) based channel access mechanisms perform better than the contention based channel access mechanisms, in terms of channel utilization, reliability and power consumption, specially for high data rate applications in wireless sensor networks (WSNs). Most of the existing distributed TDMA scheduling techniques can be classified as either static or dynamic. The primary purpose of static TDMA scheduling algorithms is to improve the channel utilization by generating a schedule of smaller length. But, they usually take longer time to schedule, and hence, are not suitable for WSNs, in which the network topology changes dynamically. On the other hand, dynamic TDMA scheduling algorithms generate a schedule quickly, but they are not efficient in terms of generated schedule length. In this paper, we propose a novel scheme for TDMA scheduling in WSNs, which can generate a compact schedule similar to static scheduling algorithms, while its runtime performance can be matched with those of dynamic scheduling algorithms. Furthermore, the proposed distributed TDMA scheduling algorithm has the capability to trade-off schedule length with the time required to generate the schedule. This would allow the developers of WSNs, to tune the performance, as per the requirement of prevalent WSN applications, and the requirement to perform re-scheduling. Finally, the proposed TDMA scheduling is fault-tolerant to packet loss due to erroneous wireless channel. The algorithm has been simulated using the Castalia simulator to compare its performance with those of others in terms of generated schedule length and the time required to generate the TDMA schedule. Simulation results show that the proposed algorithm generates a compact schedule in a very less time.
Resumo:
In wireless sensor networks (WSNs), contention occurs when two or more nodes in a proximity simultaneously try to access the channel. The contention causes collisions, which are very likely to occur when traffic is correlated. The excessive collision not only affects the reliability and the QoS of the application, but also the lifetime of the network. It is well-known that random access mechanisms do not efficiently handle correlated-contention, and therefore, suffer from high collision rate. Most of the existing TDMA scheduling techniques try to find an optimal or a sub-optimal schedule. Usually, the situation of correlated-contention persists only for a short duration, and therefore, it is not worthwhile to take a long time to generate an optimal or a sub-optimal schedule. We propose a randomized distributed TDMA scheduling (RD-TDMA) algorithm to quickly generate a feasible schedule (not necessarily optimal) to handle correlated-contention in WSNs. In RD-TDMA, a node in the network negotiates a slot with its neighbors using the message exchange mechanism. The proposed protocol has been simulated using the Castalia simulator to evaluate its runtime performance. Simulation results show that the RD-TDMA algorithm considerably reduces the time required to schedule.
Resumo:
This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM [1], in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging. © 2010 IEEE.
Resumo:
This thesis describes the design, construction and performance of a high-pressure, xenon, gas time projection chamber (TPC) for the study of double beta decay in ^(136) Xe. The TPC when operating at 5 atm can accommodate 28 moles of 60% enriched ^(136) Xe. The TPC has operated as a detector at Caltech since 1986. It is capable of reconstructing a charged particle trajectory and can easily distinguish between different kinds of charged particles. A gas purification and xenon gas recovery system were developed. The electronics for the 338 channels of readout was developed along with a data acquistion system. Currently, the detector is being prepared at the University of Neuchatel for installation in the low background laboratory situated in the St. Gotthard tunnel, Switzerland. In one year of runtime the detector should be sensitive to a 0ν lifetime of the order of 10^(24) y, which corresponds to a neutrino mass in the range 0.3 to 3.3 eV.
Resumo:
A demanda crescente por poder computacional estimulou a pesquisa e desenvolvimento de processadores digitais cada vez mais densos em termos de transistores e com clock mais rápido, porém não podendo desconsiderar aspectos limitantes como consumo, dissipação de calor, complexidade fabril e valor comercial. Em outra linha de tratamento da informação, está a computação quântica, que tem como repositório elementar de armazenamento a versão quântica do bit, o q-bit ou quantum bit, guardando a superposição de dois estados, diferentemente do bit clássico, o qual registra apenas um dos estados. Simuladores quânticos, executáveis em computadores convencionais, possibilitam a execução de algoritmos quânticos mas, devido ao fato de serem produtos de software, estão sujeitos à redução de desempenho em razão do modelo computacional e limitações de memória. Esta Dissertação trata de uma versão implementável em hardware de um coprocessador para simulação de operações quânticas, utilizando uma arquitetura dedicada à aplicação, com possibilidade de explorar o paralelismo por replicação de componentes e pipeline. A arquitetura inclui uma memória de estado quântico, na qual são armazenados os estados individuais e grupais dos q-bits; uma memória de rascunho, onde serão armazenados os operadores quânticos para dois ou mais q-bits construídos em tempo de execução; uma unidade de cálculo, responsável pela execução de produtos de números complexos, base dos produtos tensoriais e matriciais necessários à execução das operações quânticas; uma unidade de medição, necessária à determinação do estado quântico da máquina; e, uma unidade de controle, que permite controlar a operação correta dos componente da via de dados, utilizando um microprograma e alguns outros componentes auxiliares.
Resumo:
Most wearable activity recognition systems assume a predefined sensor deployment that remains unchanged during runtime. However, this assumption does not reflect real-life conditions. During the normal use of such systems, users may place the sensors in a position different from the predefined sensor placement. Also, sensors may move from their original location to a different one, due to a loose attachment. Activity recognition systems trained on activity patterns characteristic of a given sensor deployment may likely fail due to sensor displacements. In this work, we innovatively explore the effects of sensor displacement induced by both the intentional misplacement of sensors and self-placement by the user. The effects of sensor displacement are analyzed for standard activity recognition techniques, as well as for an alternate robust sensor fusion method proposed in a previous work. While classical recognition models show little tolerance to sensor displacement, the proposed method is proven to have notable capabilities to assimilate the changes introduced in the sensor position due to self-placement and provides considerable improvements for large misplacements.
Resumo:
Um dos problemas mais relevantes em organizações de grande porte é a escolha de locais para instalação de plantas industriais, centros de distribuição ou mesmo pontos comerciais. Esse problema logístico é uma decisão estratégica que pode causar um impacto significativo no custo total do produto comercializado. Existem na literatura diversos trabalhos que abordam esse problema. Assim, o objetivo desse trabalho é analisar o problema da localização de instalações proposto por diferentes autores e definir um modelo que seja o mais adequado possível ao mercado de distribuição de combustíveis no Brasil. Para isso, foi realizada uma análise do fluxo de refino e distribuição praticado neste segmento e da formação do respectivo custo de transporte. Foram consideradas restrições como capacidade de estoque, gama de produtos ofertados e níveis da hierarquia de distribuição. A partir dessa análise, foi definido um modelo matemático aplicado à redução dos custos de frete considerando-se a carga tributária. O modelo matemático foi implementado, em linguagem C, e permite simular o problema. Foram aplicadas técnicas de computação paralela visando reduzir o tempo de execução do algoritmo. Os resultados obtidos com o modelo Single Uncapacited Facility Location Problem (SUFLP) simulado nas duas versões do programa, sequencial e paralela, demonstram ganhos de até 5% em economia de custos e redução do tempo de execução em mais de 50%.
Resumo:
Although partially observable Markov decision processes (POMDPs) have shown great promise as a framework for dialog management in spoken dialog systems, important scalability issues remain. This paper tackles the problem of scaling slot-filling POMDP-based dialog managers to many slots with a novel technique called composite point-based value iteration (CSPBVI). CSPBVI creates a "local" POMDP policy for each slot; at runtime, each slot nominates an action and a heuristic chooses which action to take. Experiments in dialog simulation show that CSPBVI successfully scales POMDP-based dialog managers without compromising performance gains over baseline techniques and preserving robustness to errors in user model estimation. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
Resumo:
In this article, we detail the methodology developed to construct arbitrarily high order schemes - linear and WENO - on 3D mixed-element unstructured meshes made up of general convex polyhedral elements. The approach is tailored specifically for the solution of scalar level set equations for application to incompressible two-phase flow problems. The construction of WENO schemes on 3D unstructured meshes is notoriously difficult, as it involves a much higher level of complexity than 2D approaches. This due to the multiplicity of geometrical considerations introduced by the extra dimension, especially on mixed-element meshes. Therefore, we have specifically developed a number of algorithms to handle mixed-element meshes composed of convex polyhedra with convex polygonal faces. The contribution of this work concerns several areas of interest: the formulation of an improved methodology in 3D, the minimisation of computational runtime in the implementation through the maximum use of pre-processing operations, the generation of novel methods to handle complex 3D mixed-element meshes and finally the application of the method to the transport of a scalar level set. © 2012 Global-Science Press.
Resumo:
This paper presents a novel, three-dimensional, single-pile model, formulated in the wavenumber domain and adapted to account for boundary conditions using the superposition of loading cases. The pile is modelled as a column in axial vibration, and a Euler-Bernoulli beam in lateral vibration. The surrounding soil is treated as a viscoelastic continuum. The response of the pile is presented in terms of the stiffness and damping coefficients, and also the magnitude and phase of the pile-head frequency-response function. Comparison with existing models shows that excellent agreement is observed between this model, a boundary-element formulation, and an elastic-continuum-type formulation. This three-dimensional model has an accuracy equivalent to a 3D boundary-element model, and a runtime similar to a 2D plane-strain analytical model. Analysis of the response of the single pile illustrates a difference in axial and lateral vibration behaviour; the displacement along the pile is relatively invariant under axial loads, but in lateral vibration the pile exhibits localised deformations. This implies that a plane-strain assumption is valid for axial loadings and only at higher frequencies for lateral loadings. © 2013 Elsevier Ltd.
Resumo:
We are developing a wind turbine blade optimisation package CoBOLDT (COmputa- tional Blade Optimisation and Load De ation Tool) for the optimisation of large horizontal- axis wind turbines. The core consists of the Multi-Objective Tabu Search (MOTS), which controls a spline parameterisation module, a fast geometry generation and a stationary Blade Element Momentum (BEM) code to optimise an initial wind turbine blade design. The objective functions we investigate are the Annual Energy Production (AEP) and the fl apwise blade root bending moment (MY0) for a stationary wind speed of 50 m/s. For this task we use nine parameters which define the blade chord, the blade twist (4 parameters each) and the blade radius. Throughout the optimisation a number of binary constraints are defined to limit the noise emission, to allow for transportation on land and to control the aerodynamic conditions during all phases of turbine operation. The test case shows that MOTS is capable to find enhanced designs very fast and eficiently and will provide a rich and well explored Pareto front for the designer to chose from. The optimised blade de- sign could improve the AEP of the initial blade by 5% with the same flapwise root bending moment or reduce MY0 by 7.5% with the original energy yield. Due to the fast runtime of order 10 seconds per design, a huge number of optimisation iterations is possible without the need for a large computing cluster. This also allows for increased design flexibility through the introduction of more parameters per blade function or parameterisation of the airfoils in future. © 2012 by Nordex Energy GmbH.
Resumo:
We are developing a wind turbine blade optimisation package CoBOLDT (COmputa- tional Blade Optimisation and Load Deation Tool) for the optimisation of large horizontal- axis wind turbines. The core consists of the Multi-Objective Tabu Search (MOTS), which controls a spline parameterisation module, a fast geometry generation and a stationary Blade Element Momentum (BEM) code to optimise an initial wind turbine blade design. The objective functions we investigate are the Annual Energy Production (AEP) and the apwise blade root bending moment (MY0) for a stationary wind speed of 50 m/s. For this task we use nine parameters which define the blade chord, the blade twist (4 parameters each) and the blade radius. Throughout the optimisation a number of binary constraints are defined to limit the noise emission, to allow for transportation on land and to control the aerodynamic conditions during all phases of turbine operation. The test case shows that MOTS is capable to find enhanced designs very fast and efficiently and will provide a rich and well explored Pareto front for the designer to chose from. The optimised blade de- sign could improve the AEP of the initial blade by 5% with the same apwise root bending moment or reduce MY0 by 7.5% with the original energy yield. Due to the fast runtime of order 10 seconds per design, a huge number of optimisation iterations is possible without the need for a large computing cluster. This also allows for increased design flexibility through the introduction of more parameters per blade function or parameterisation of the airfoils in future. © 2012 AIAA.
Resumo:
OpenMP是一种支持Fortran,C/C++的共享存储并行编程标准。它基于fork-join的并行执行模型,将程序划分为并行区和串行区。近几年来,OpenMP在SMP(Symmetric Multi-Processing)和多核体系结构的并行编程中得到了广泛的应用。随着多核处理器的发展,实际的应用程序如何充分利用多个处理器核来提高运算效率也成为研究的热点。 在科学计算中,循环结构是最核心的并行对象之一。考虑到负载平衡、调度开销、同步开销等多方面因素,OpenMP标准制定了Static调度、Dynamic调度、Guided调度和Runtime调度等不同策略。针对Guided调度策略不适合递减型循环结构的缺点,本文提出了一种改进的new_guided调度策略,并在OMPi编译器上加以实现。New_guided调度策略的主要思想是对前半部分的循环采用Static调度,后半部分的循环采用Guided调度。此外,本文针对不同的循环结构,在多核处理器上对不同的调度策略进行了评测。测试结果表明,在一般情况下,OpenMP默认的Static策略的调度性能最差;对于规则的循环结构和递增的循环结构,Dynamic调度策略、Guided调度策略和new_guided策略的性能差别不大;对于递减型的循环结构,Dynamic调度策略和new_guided策略的性能相当,要优于Guided调度策略;对于求解Mandelbrot集合这类计算量集中在中间的随机循环结构,Dynamic调度策略优于其它策略,new_guided策略的性能介于Dynamic调度和Guided调度之间。 随着多核处理器的问世和发展,多线程程序设计也已经成为一个不可回避的问题。稀疏矩阵向量乘(SpMV, Sparse Matrix-Vector Multiplication)是一个十分重要且经常被大量调用的科学计算内核。SpMV的存储访问一般都极不规则,导致现有的SpMV算法效率都比较低。目前,多核处理器芯片上的内核数量正在逐步增加。这使得在多核处理器上对SpMV进行并行化加速变得非常重要。本文介绍了稀疏矩阵的两种常用的存储格式CSR和BCSR,并采用OpenMP实现了SpMV的多核并行化。此外,本文还讨论了寄存器分块算法、压缩列索引等优化技术,以及不同调度策略对多线程并行后的SpMV的影响。在曙光天阔服务器S4800A1上的测试表明,大部分矩阵都取得了可扩展、甚至是超线性的加速比,但是对于部分规模较大的矩阵,加速效果并不明显。在我们的测试中,与基于CSR实现的多线程SpMV相比,采用寄存器分块算法优化后的SpMV运算速度平均提高了28.09%。在基于CSR实现的多线程SpMV中,采用列索引优化技术后的程序比优化前的速度平均提高了13.05%。此外,本文实现了一种基于非零元个数的调度策略。在该策略中,每个线程处理几乎相同数量的非零元。我们将它和OpenMP标准提供的三种调度策略进行了测试和分析。测试结果表明:与OpenMP提供的调度策略相比,基于非零元个数的调度策略能取得更好的负载平衡;Dynamic调度和Guided调度在多线程SpMV中的性能基本相当,均优于Static调度策略。