936 resultados para Thread safe parallel run-time
Resumo:
Embedded software systems in vehicles are of rapidly increasing commercial importance for the automotive industry. Current systems employ a static run-time environment; due to the difficulty and cost involved in the development of dynamic systems in a high-integrity embedded control context. A dynamic system, referring to the system configuration, would greatly increase the flexibility of the offered functionality and enable customised software configuration for individual vehicles, adding customer value through plug-and-play capability, and increased quality due to its inherent ability to adjust to changes in hardware and software. We envisage an automotive system containing a variety of components, from a multitude of organizations, not necessarily known at development time. The system dynamically adapts its configuration to suit the run-time system constraints. This paper presents our vision for future automotive control systems that will be regarded in an EU research project, referred to as DySCAS (Dynamically Self-Configuring Automotive Systems). We propose a self-configuring vehicular control system architecture, with capabilities that include automatic discovery and inclusion of new devices, self-optimisation to best-use the processing, storage and communication resources available, self-diagnostics and ultimately self-healing. Such an architecture has benefits extending to reduced development and maintenance costs, improved passenger safety and comfort, and flexible owner customisation. Specifically, this paper addresses the following issues: The state of the art of embedded software systems in vehicles, emphasising the current limitations arising from fixed run-time configurations; and the benefits and challenges of dynamic configuration, giving rise to opportunities for self-healing, self-optimisation, and the automatic inclusion of users’ Consumer Electronic (CE) devices. Our proposal for a dynamically reconfigurable automotive software system platform is outlined and a typical use-case is presented as an example to exemplify the benefits of the envisioned dynamic capabilities.
Resumo:
Dynamically reconfigurable hardware is a promising technology that combines in the same device both the high performance and the flexibility that many recent applications demand. However, one of its main drawbacks is the reconfiguration overhead, which involves important delays in the task execution, usually in the order of hundreds of milliseconds, as well as high energy consumption. One of the most powerful ways to tackle this problem is configuration reuse, since reusing a task does not involve any reconfiguration overhead. In this paper we propose a configuration replacement policy for reconfigurable systems that maximizes task reuse in highly dynamic environments. We have integrated this policy in an external taskgraph execution manager that applies task prefetch by loading and executing the tasks as soon as possible (ASAP). However, we have also modified this ASAP technique in order to make the replacements more flexible, by taking into account the mobility of the tasks and delaying some of the reconfigurations. In addition, this replacement policy is a hybrid design-time/run-time approach, which performs the bulk of the computations at design time in order to save run-time computations. Our results illustrate that the proposed strategy outperforms other state-ofthe-art replacement policies in terms of reuse rates and achieves near-optimal reconfiguration overhead reductions. In addition, by performing the bulk of the computations at design time, we reduce the execution time of the replacement technique by 10 times with respect to an equivalent purely run-time one.
Resumo:
Reconfigurable HW can be used to build a hardware multitasking system where tasks can be assigned to the reconfigurable HW at run-time according to the requirements of the running applications. Normally the execution in this kind of systems is controlled by an embedded processor. In these systems tasks are frequently represented as subtask graphs, where a subtask is the basic scheduling unit that can be assigned to a reconfigurable HW. In order to control the execution of these tasks, the processor must manage at run-time complex data structures, like graphs or linked list, which may generate significant execution-time penalties. In addition, HW/SW communications are frequently a system bottleneck. Hence, it is very interesting to find a way to reduce the run-time SW computations and the HW/SW communications. To this end we have developed a HW execution manager that controls the execution of subtask graphs over a set of reconfigurable units. This manager receives as input a subtask graph coupled to a subtask schedule, and guarantees its proper execution. In addition it includes support to reduce the execution-time overhead due to reconfigurations. With this HW support the execution of task graphs can be managed efficiently generating only very small run-time penalties.
Resumo:
Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time according to the requirements of the running applications. These tasks are frequently represented as direct acyclic graphs and their execution is typically controlled by an embedded processor that schedules the graph execution. In order to improve the efficiency of the system, the scheduler can apply prefetch and reuse techniques that can greatly reduce the reconfiguration latencies. For an embedded processor all these computations represent a heavy computational load that can significantly reduce the system performance. To overcome this problem we have implemented a HW scheduler using reconfigurable resources. In addition we have implemented both prefetch and replacement techniques that obtain as good results as previous complex SW approaches, while demanding just a few clock cycles to carry out the computations. We consider that the HW cost of the system (in our experiments 3% of a Virtex-II PRO xc2vp30 FPGA) is affordable taking into account the great efficiency of the techniques applied to hide the reconfiguration latency and the negligible run-time penalty introduced by the scheduler computations.
Resumo:
Reconfigurable hardware can be used to build multi tasking systems that dynamically adapt themselves to the requirements of the running applications. This is especially useful in embedded systems, since the available resources are very limited and the reconfigurable hardware can be reused for different applications. In these systems computations are frequently represented as task graphs that are executed taking into account their internal dependencies and the task schedule. The management of the task graph execution is critical for the system performance. In this regard, we have developed two dif erent versions, a software module and a hardware architecture, of a generic task-graph execution manager for reconfigurable multi-tasking systems. The second version reduces the run-time management overheads by almost two orders of magnitude. Hence it is especially suitable for systems with exigent timing constraints. Both versions include specific support to optimize the reconfiguration process.
Resumo:
Purpose: To develop and validate a simple, efficient and reliable Liquid chromatographic-mass spectrometric (LC-MS/MS) method for the quantitative determination of two dermatological drugs, Lamisil® (terbinafine) and Proscar® (finasteride), in split tablet dosage form. Methods: Thirty tablets each of the 2 studied medications were randomly selected. Tablets were weighed and divided into 3 groups. Ten tablets of each drug were kept intact, another group of 10 tablets were manually split into halves using a tablet cutter and weighed with an analytical balance; a third group were split into quarters and weighed. All intact and split tablets were individually dissolved in a water: methanol mixture (4:1), sonicated, filtered and further diluted with mobile phase. Optimal chromatographic separation and mass spectrometric detection were achieved using an Agilent 1200 HPLC system coupled with an Agilent 6410 triple quadrupole mass spectrometer. Analytes were eluted through an Agilent eclipse plus C8 analytical column (150 mm × 4.6 mm, 5 μm) with a mobile phase composed of solvent A (water) containing 0.1% formic acid and 5mM ammonium formate pH 7.5, and solvent B (acetonitrile mixed with water in a ratio A:B 55:45) at a flow rate of 0.8 mL min-1 with a total run time of 12 min. Mass spectrometric detection was carried out using positive ionization mode with analyte quantitation monitored by multiple reaction monitoring (MRM) mode. Results: The proposed analytical method proved to be specific, robust and adequately sensitive. The results showed a good linear fit over the concentration range of 20 - 100 ng mL-1 for both analytes, with a correlation coefficient (r2) ≥ 0.999 and 0.998 for finasteride and terbinafine, respectively. Following tablet splitting, the drug content of the split tablets fell outside of the proxy USP specification for at least 14 halves (70 %) and 34 quarters (85 %) of FIN, as well as 16 halves (80 %) and 37 quarters (92.5 %) of TBN. Mean weight loss, after splitting, was 0.58 and 2.22 % for FIN half- and quarter tablets, respectively, and 3.96 and 4.09 % for TBN half- and quarter tablets,respectively. Conclusion: The proposed LC-MS/MS method has successfully been used to provide precise drug content uniformity of split tablets of FIN and TBN. Unequal distribution of the drug on the split tablets is indicated by the high standard deviation beyond the accepted value. Hence, it is recommended not to split non-scored tablets especially, for those medications with significant toxicity
Resumo:
An automated on-line SPE-LC-MS/MS method was developed for the quantitation of multiple classes of antibiotics in environmental waters. High sensitivity in the low ng/L range was accomplished by using large volume injections with 10-mL of sample. Positive confirmation of analytes was achieved using two selected reaction monitoring (SRM) transitions per antibiotic and quantitation was performed using an internal standard approach. Samples were extracted using online solid phase extraction, then using column switching technique; extracted samples were immediately passed through liquid chromatography and analyzed by tandem mass spectrometry. The total run time per each sample was 20 min. The statistically calculated method detection limits for various environmental samples were between 1.2 and 63 ng/L. Furthermore, the method was validated in terms of precision, accuracy and linearity. The developed analytical methodology was used to measure the occurrence of antibiotics in reclaimed waters (n=56), surface waters (n=53), ground waters (n=8) and drinking waters (n=54) collected from different parts of South Florida. In reclaimed waters, the most frequently detected antibiotics were nalidixic acid, erythromycin, clarithromycin, azithromycin trimethoprim, sulfamethoxazole and ofloxacin (19.3-604.9 ng/L). Detection of antibiotics in reclaimed waters indicates that they can’t be completely removed by conventional wastewater treatment process. Furthermore, the average mass loads of antibiotics released into the local environment through reclaimed water were estimated as 0.248 Kg/day. Among the surface waters samples, Miami River (reaching up to 580 ng/L) and Black Creek canal (up to 124 ng/L) showed highest concentrations of antibiotics. No traces of antibiotics were found in ground waters. On the other hand, erythromycin (monitored as anhydro erythromycin) was detected in 82% of the drinking water samples (n.d-66 ng/L). The developed approach is suitable for both research and monitoring applications. Major metabolites of antibiotics in reclaimed wates were identified and quantified using high resolution benchtop Q-Exactive orbitrap mass spectrometer. A phase I metabolite of erythromycin was tentatively identified in full scan based on accurate mass measurement. Using extracted ion chromatogram (XIC), high resolution data-dependent MS/MS spectra and metabolic profiling software the metabolite was identified as desmethyl anhydro erythromycin with molecular formula C36H63NO12 and m/z 702.4423. The molar concentration of the metabolite to erythromycin was in the order of 13 %. To my knowledge, this is the first known report on this metabolite in reclaimed water. Another compound acetyl-sulfamethoxazole, a phase II metabolite of sulfamethoxazole was also identified in reclaimed water and mole fraction of the metabolite represent 36 %, of the cumulative sulfamethoxazole concentration. The results were illustrating the importance to include metabolites also in the routine analysis to obtain a mass balance for better understanding of the occurrence, fate and distribution of antibiotics in the environment. Finally, all the antibiotics detected in reclaimed and surface waters were investigated to assess the potential risk to the aquatic organisms. The surface water antibiotic concentrations that represented the real time exposure conditions revealed that the macrolide antibiotics, erythromycin, clarithromycin and tylosin along with quinolone antibiotic, ciprofloxacin were suspected to induce high toxicity to aquatic biota. Preliminary results showing that, among the antibiotic groups tested, macrolides posed the highest ecological threat, and therefore, they may need to be further evaluated with, long-term exposure studies considering bioaccumulation factors and more number of species selected. Overall, the occurrence of antibiotics in aquatic environment is posing an ecological health concern.
Resumo:
In the presented thesis work, meshfree method with distance fields is applied to create a novel computational approach which enables inclusion of the realistic geometric models of the microstructure and liberates Finite Element Analysis(FEA) from thedependance on and limitations of meshing of fine microstructural feature such as splats and porosity.Manufacturing processes of ceramics produce materials with complex porosity microstructure.Geometry of pores, their size and location substantially affect macro scale physical properties of the material. Complex structure and geometry of the pores severely limit application of modern Finite Element Analysis methods because they require construction of spatial grids (meshes) that conform to the geometric shape of the structure. As a result, there are virtually no effective tools available for predicting overall mechanical and thermal properties of porous materials based on their microstructure. This thesis is a separate handling and controls of geometric and physical computational models that are seamlessly combined at solution run time. Using the proposedapproach we will determine the effective thermal conductivity tensor of real porous ceramic materials featuring both isotropic and anisotropic thermal properties. This work involved development and implementation of numerical algorithms, data structure, and software.
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
Resumo:
The objective was to study the flow pattern in a plate heat exchanger (PHE) through residence time distribution (RTD) experiments. The tested PHE had flat plates and it was part of a laboratory scale pasteurization unit. Series flow and parallel flow configurations were tested with a variable number of passes and channels per pass. Owing to the small scale of the equipment and the short residence times, it was necessary to take into account the influence of the tracer detection unit on the RID data. Four theoretical RID models were adjusted: combined, series combined, generalized convection and axial dispersion. The combined model provided the best fit and it was useful to quantify the active and dead space volumes of the PHE and their dependence on its configuration. Results suggest that the axial dispersion model would present good results for a larger number of passes because of the turbulence associated with the changes of pass. This type of study can be useful to compare the hydraulic performance of different plates or to provide data for the evaluation of heat-induced changes that occur in the processing of heat-sensitive products. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Dynamic parallel scheduling using work-stealing has gained popularity in academia and industry for its good performance, ease of implementation and theoretical bounds on space and time. Cores treat their own double-ended queues (deques) as a stack, pushing and popping threads from the bottom, but treat the deque of another randomly selected busy core as a queue, stealing threads only from the top, whenever they are idle. However, this standard approach cannot be directly applied to real-time systems, where the importance of parallelising tasks is increasing due to the limitations of multiprocessor scheduling theory regarding parallelism. Using one deque per core is obviously a source of priority inversion since high priority tasks may eventually be enqueued after lower priority tasks, possibly leading to deadline misses as in this case the lower priority tasks are the candidates when a stealing operation occurs. Our proposal is to replace the single non-priority deque of work-stealing with ordered per-processor priority deques of ready threads. The scheduling algorithm starts with a single deque per-core, but unlike traditional work-stealing, the total number of deques in the system may now exceed the number of processors. Instead of stealing randomly, cores steal from the highest priority deque.
Resumo:
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.
Resumo:
Article in Press, Corrected Proof
Resumo:
Presented at INForum - Simpósio de Informática (INFORUM 2015). 7 to 8, Sep, 2015. Portugal.