686 resultados para Processors
Resumo:
The aim of this paper is to analyse vulnerability and robustness of small and medium size enterprises (SMEs) supply chains and to consider contextual factors that might influence the success of their disturbance management: Risky product and business environment. By using an exploratory case study it is shown how these contextual factors attribute vulnerability sources, contribute to the robustness of a company’s performance and supply chain vulnerability, as well as how a company seeks to manage internal and external vulnerability sources. The exploratory case is based on a fresh food supply chain of a manufacturing SME operating in a developing market.
Case findings suggest that fresh food supply chains of a manufacturing SME in developing markets are prone to disruptions of their logistics and production processes due to ‘riskiness’ of fresh food products, the ‘riskiness’ of developing markets, as well as ‘riskiness’ of SMEs themselves. However, this does not necessarily indicate the vulnerability of an SME and its entire supply chain. Findings indicate that SMEs can be very successful in disturbance management by selective use of redesign strategies that aim to prevent or reduce the impact of disturbances. More precise, it is likely that an SME can achieve robust performance by employing preventive redesign strategies in managing disturbances that result from internal, company related vulnerability sources, while impact reduction strategies are likely to contribute to robust performance of an SME if used to manage disturbances that result from internal, supply chain related vulnerability sources, as well as external vulnerability sources.
Resumo:
There is demand for an easily programmable, high performance image processing platform based on FPGAs. In previous work, a novel, high performance processor - IPPro was developed and a Histogram of Orientated Gradients (HOG) algorithm study undertaken on a Xilinx Zynq platform. Here, we identify and explore a number of mapping strategies to improve processing efficiency for soft-cores and a number of options for creation of a division coprocessor. This is demonstrated for the revised high definition HOG implementation on a Zynq platform, resulting in a performance of 328 fps which represents a 146% speed improvement over the original realization and a tenfold reduction in energy.
Resumo:
Recent developments of high-end processors recognize temperature monitoring and tuning as one of the main challenges towards achieving higher performance given the growing power and temperature constraints. To address this challenge, one needs both suitable thermal energy abstraction and corresponding instrumentation. Our model is based on application-specific parameters such as power consumption, execution time, and asymptotic temperature as well as hardware-specific parameters such as half time for thermal rise or fall. As observed with our out-of-band instrumentation and monitoring infrastructure, the temperature changes follow a relatively slow capacitor-style charge-discharge process. Therefore, we use the lumped thermal model that initiates an exponential process whenever there is a change in processor’s power consumption. Initial experiments with two codes – Firestarter and Nekbone – validate our thermal energy model and demonstrate its use for analyzing and potentially improving the application-specific balance between temperature, power, and performance.
Resumo:
A preliminary version of this paper appeared in Proceedings of the 31st IEEE Real-Time Systems Symposium, 2010, pp. 239–248.
Resumo:
Consider the problem of designing an algorithm with a high utilisation bound for scheduling sporadic tasks with implicit deadlines on identical processors. A task is characterised by its minimum interarrival time and its execution time. Task preemption and migration is permitted. Still, low preemption and migration counts are desirable. We formulate an algorithm with a utilisation bound no less than 66.¯6%, characterised by worst-case preemption counts comparing favorably against the state-of-the-art.
Resumo:
Bank switching in embedded processors having partitioned memory architecture results in code size as well as run time overhead. An algorithm and its application to assist the compiler in eliminating the redundant bank switching codes introduced and deciding the optimum data allocation to banked memory is presented in this work. A relation matrix formed for the memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Data allocation to memory is done by considering all possible permutation of memory banks and combination of data. The compiler output corresponding to each data mapping scheme is subjected to a static machine code analysis which identifies the one with minimum number of bank switching codes. Even though the method is compiler independent, the algorithm utilizes certain architectural features of the target processor. A prototype based on PIC 16F87X microcontrollers is described. This method scales well into larger number of memory blocks and other architectures so that high performance compilers can integrate this technique for efficient code generation. The technique is illustrated with an example
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
A patent describing a computer architecture which implements a Perspex instruction.
Resumo:
A processing system comprises a plurality of processors (12) and communication means (20) arranged to carry messages between the processors, wherein each of the processors (12) has an operating instruction memory field (32, 34, 36) arranged to hold stored operating instructions including a re-routing target address. Each processor is arranged to receive a message (38) including operating instructions including a target address. On receipt of the message, each processor is arranged to: check the target address in the message to determine whether it corresponds to an address associated with the processor; if the target address in the message does correspond to an address associated with the processor, to check the operating instructions in the message to determine whether the message is to be re-routed; and, if the message is to be re-routed, to replace operating instructions within the message with the stored operating instructions, and place the message on the communication means for delivery to the re-routing target address.
Resumo:
This paper presents the use of a multiprocessor architecture for the performance improvement of tomographic image reconstruction. Image reconstruction in computed tomography (CT) is an intensive task for single-processor systems. We investigate the filtered image reconstruction suitability based on DSPs organized for parallel processing and its comparison with the Message Passing Interface (MPI) library. The experimental results show that the speedups observed for both platforms were increased in the same direction of the image resolution. In addition, the execution time to communication time ratios (Rt/Rc) as a function of the sample size have shown a narrow variation for the DSP platform in comparison with the MPI platform, which indicates its better performance for parallel image reconstruction.
Resumo:
This work presents the development of an IEEE 1451.2 protocol controller based on a low-cost FPGA that is directly connected to the parallel port of a conventional personal computer. In this manner it is possible to implement a Network Capable Application Processor (NCAP) based on a personal computer, without parallel port modifications. This approach allows supporting the ten signal lines of the 10-wire IEEE 1451.2 Transducer Independent Interface (TII), that connects the network processor to the Smart Transducer Interface Module (STIM) also defined in the IEEE 1451.2 standard. The protocol controller is connected to the STIM through the TII's physical interface, enabling the portability of the application at the transducer and network processor level. The protocol controller architecture was fully developed in VHDL language and we have projected a special prototype configured in a general-purpose programmable logic device. We have implemented two versions of the protocol controller, which is based on IEEE 1451 standard, and we have obtained results using simulation and experimental tests. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
The hydrologic risk (and the hydro-geologic one, closely related to it) is, and has always been, a very relevant issue, due to the severe consequences that may be provoked by a flooding or by waters in general in terms of human and economic losses. Floods are natural phenomena, often catastrophic, and cannot be avoided, but their damages can be reduced if they are predicted sufficiently in advance. For this reason, the flood forecasting plays an essential role in the hydro-geological and hydrological risk prevention. Thanks to the development of sophisticated meteorological, hydrologic and hydraulic models, in recent decades the flood forecasting has made a significant progress, nonetheless, models are imperfect, which means that we are still left with a residual uncertainty on what will actually happen. In this thesis, this type of uncertainty is what will be discussed and analyzed. In operational problems, it is possible to affirm that the ultimate aim of forecasting systems is not to reproduce the river behavior, but this is only a means through which reducing the uncertainty associated to what will happen as a consequence of a precipitation event. In other words, the main objective is to assess whether or not preventive interventions should be adopted and which operational strategy may represent the best option. The main problem for a decision maker is to interpret model results and translate them into an effective intervention strategy. To make this possible, it is necessary to clearly define what is meant by uncertainty, since in the literature confusion is often made on this issue. Therefore, the first objective of this thesis is to clarify this concept, starting with a key question: should be the choice of the intervention strategy to adopt based on the evaluation of the model prediction based on its ability to represent the reality or on the evaluation of what actually will happen on the basis of the information given by the model forecast? Once the previous idea is made unambiguous, the other main concern of this work is to develope a tool that can provide an effective decision support, making possible doing objective and realistic risk evaluations. In particular, such tool should be able to provide an uncertainty assessment as accurate as possible. This means primarily three things: it must be able to correctly combine all the available deterministic forecasts, it must assess the probability distribution of the predicted quantity and it must quantify the flooding probability. Furthermore, given that the time to implement prevention strategies is often limited, the flooding probability will have to be linked to the time of occurrence. For this reason, it is necessary to quantify the flooding probability within a horizon time related to that required to implement the intervention strategy and it is also necessary to assess the probability of the flooding time.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.