952 resultados para parallel systems


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this document, we wish to describe statistics, data and the importance of the 13th CONTECSI – International Conference on Information Systems and Technology Management, which took place in the University of São Paulo, from June 1st through 3rd and was organized by TECSI/EAC/FEA/USP/ECA/POLI. This report presents statistics of the 13th CONTECSI, Goals and Objectives, Program, Plenary Sessions, Doctoral Consortium, Parallel Sessions, Honorable Mentions and Committees. We would like to point out the huge importance of the financial aid given by CAPES, CNPq, FAPESP, as well as the support of FEA USP, POLI USP, ECA USP, ANPAD, AIS, ISACA, UNINOVE, Mackenzie, Universidade do Porto, Rutgers School/USA, São Paulo Convention Bureau and CCINT-FEA-USP.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we study the Zeeman splitting effects in the parallel magnetic field versus temperature phase diagram of two-dimensional superconductors with one graphene-like band and the orbital effects of perpendicular magnetic fields in isotropic two-dimensional semi-metallic superconductors. We show that when parallel magnetic fields are applied to graphene and as the intraband interaction decreases to a critical value, the width of the metastability region present in the phase diagram decreases, vanishing completely at that critical value. In the case of two-band superconductors with one graphene-like band, a new critical interaction, associated primarily with the graphene-like band, is required in order for a second metastability region to be present in the phase diagram. For intermediate values of this interaction, a low-temperature first-order transition line bifurcates at an intermediate temperature into a first-order transition between superconducting phases and a second-order transition line between the normal and the superconducting states. In our study on the upper critical fields in generic semi-metallic superconductors, we find that the pair propagator decays faster than that of a superconductor with a metallic band. As result, the zero field band gap equation does not have solution for weak intraband interactions, meaning that there is a critical intraband interaction value in order for a superconducting phase to be present in semi-metallic superconductors. Finally, we show that the out-of-plane critical magnetic field versus temperature phase diagram displays a positive curvature, contrasting with the parabolic-like behaviour typical of metallic superconductors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Part 6: Engineering and Implementation of Collaborative Networks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation due to high contention scenarios and different hardware and operating system limitations that abort transactions, e.g. cache overflows, hardware and software exceptions, etc. To deal with these events and to ensure forward progress, BE-HTM systems usually provide a software fallback path to execute a lock-based version of the code. In this paper, we propose a hardware implementation of an irrevocability mechanism as an alternative to the software fallback path to gain insight into the hardware improvements that could enhance the execution of such a fallback. Our mechanism anticipates the abort that causes the transaction serialization, and stalls other transactions in the system so that transactional work loss is mini- mized. In addition, we evaluate the main software fallback path approaches and propose the use of ticket locks that hold precise information of the number of transactions waiting to enter the fallback. Thus, the separation of transactional and fallback execution can be achieved in a precise manner. The evaluation is carried out using the Simics/GEMS simulator and the complete range of STAMP transactional suite benchmarks. We obtain significant performance benefits of around twice the speedup and an abort reduction of 50% over the software fallback path for a number of benchmarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The trend related to the turnover of internal combustion engine vehicles with EVs goes by the name of electrification. The push electrification experienced in the last decade is linked to the still ongoing evolution in power electronics technology for charging systems. This is the reason why an evolution in testing strategies and testing equipment is crucial too. The project this dissertation is based on concerns the investigation of a new EV simulator design. that optimizes the structure of the testing equipment used by the company who commissioned this work. Project requirements can be summarized in the following two points: space occupation reduction and parallel charging implementation. Some components were completely redesigned, and others were substituted with equivalent ones that could perform the same tasks. In this way it was possible to reduce the space occupation of the simulator, as well as to increase the efficiency of the testing device. Moreover, the possibility of conjugating different charging simulations could be investigated by parallelly launching two testing procedures on a unique machine, properly predisposed for supporting the two charging protocols used. On the back of the results achieved in the body of this dissertation, a new design for the EV simulator was proposed. In this way, space reduction was obtained, and space occupation efficiency was improved with the proposed new design. The testing device thus resulted to be way more compact, enabling to gain in safety and productivity, along with a 25% cost reduction. Furthermore, parallel charging was implemented in the proposed new design since the conducted tests clearly showed the feasibility of parallel charging sessions. The results presented in this work can thus be implemented to build the first prototype of the new EV simulator.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis, a tube-based Distributed Economic Predictive Control (DEPC) scheme is presented for a group of dynamically coupled linear subsystems. These subsystems are components of a large scale system and control inputs are computed based on optimizing a local economic objective. Each subsystem is interacting with its neighbors by sending its future reference trajectory, at each sampling time. It solves a local optimization problem in parallel, based on the received future reference trajectories of the other subsystems. To ensure recursive feasibility and a performance bound, each subsystem is constrained to not deviate too much from its communicated reference trajectory. This difference between the plan trajectory and the communicated one is interpreted as a disturbance on the local level. Then, to ensure the satisfaction of both state and input constraints, they are tightened by considering explicitly the effect of these local disturbances. The proposed approach averages over all possible disturbances, handles tightened state and input constraints, while satisfies the compatibility constraints to guarantee that the actual trajectory lies within a certain bound in the neighborhood of the reference one. Each subsystem is optimizing a local arbitrary economic objective function in parallel while considering a local terminal constraint to guarantee recursive feasibility. In this framework, economic performance guarantees for a tube-based distributed predictive control (DPC) scheme are developed rigorously. It is presented that the closed-loop nominal subsystem has a robust average performance bound locally which is no worse than that of a local robust steady state. Since a robust algorithm is applying on the states of the real (with disturbances) subsystems, this bound can be interpreted as an average performance result for the real closed-loop system. To this end, we present our outcomes on local and global performance, illustrated by a numerical example.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the Ph.D. research project was to explore Dual Fuel combustion and hybridization. Natural gas-diesel Dual Fuel combustion was experimentally investigated on a 4-Stroke, 2.8 L, turbocharged, light-duty Diesel engine, considering four operating points in the range between low to medium-high loads at 3000 rpm. Then, a numerical analysis was carried out using a customized version of the KIVA-3V code, in order to optimize the diesel injection strategy of the highest investigated load. A second KIVA-3V model was used to analyse the interchangeability between natural gas and biogas on an intermediate operating point. Since natural gas-diesel Dual Fuel combustion suffers from poor combustion efficiency at low loads, the effects of hydrogen enriched natural gas on Dual Fuel combustion were investigated using a validated Ansys Forte model, followed by an optimization of the diesel injection strategy and a sensitivity analysis to the swirl ratio, on the lowest investigated load. Since one of the main issues of Low Temperature Combustion engines is the low power density, 2-Stroke engines, thanks to the double frequency compared to 4-Stroke engines, may be more suitable to operate in Dual Fuel mode. Therefore, the application of gasoline-diesel Dual Fuel combustion to a modern 2-Stroke Diesel engine was analysed, starting from the investigation of gasoline injection and mixture formation. As far as hybridization is concerned, a MATLAB-Simulink model was built to compare a conventional (combustion) and a parallel-hybrid powertrain applied to a Formula SAE race car.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedded systems are increasingly integral to daily life, improving and facilitating the efficiency of modern Cyber-Physical Systems which provide access to sensor data, and actuators. As modern architectures become increasingly complex and heterogeneous, their optimization becomes a challenging task. Additionally, ensuring platform security is important to avoid harm to individuals and assets. This study primarily addresses challenges in contemporary Embedded Systems, focusing on platform optimization and security enforcement. The initial section of this study delves into the application of machine learning methods to efficiently determine the optimal number of cores for a parallel RISC-V cluster to minimize energy consumption using static source code analysis. Results demonstrate that automated platform configuration is not only viable but also that there is a moderate performance trade-off when relying solely on static features. The second part focuses on addressing the problem of heterogeneous device mapping, which involves assigning tasks to the most suitable computational device in a heterogeneous platform for optimal runtime. The contribution of this section lies in the introduction of novel pre-processing techniques, along with a training framework called Siamese Networks, that enhances the classification performance of DeepLLVM, an advanced approach for task mapping. Importantly, these proposed approaches are independent from the specific deep-learning model used. Finally, this research work focuses on addressing issues concerning the binary exploitation of software running in modern Embedded Systems. It proposes an architecture to implement Control-Flow Integrity in embedded platforms with a Root-of-Trust, aiming to enhance security guarantees with limited hardware modifications. The approach involves enhancing the architecture of a modern RISC-V platform for autonomous vehicles by implementing a side-channel communication mechanism that relays control-flow changes executed by the process running on the host core to the Root-of-Trust. This approach has limited impact on performance and it is effective in enhancing the security of embedded platforms.