806 resultados para Desktop parallel computing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern scientific discoveries are driven by an unsatisfiable demand for computational resources. High-Performance Computing (HPC) systems are an aggregation of computing power to deliver considerably higher performance than one typical desktop computer can provide, to solve large problems in science, engineering, or business. An HPC room in the datacenter is a complex controlled environment that hosts thousands of computing nodes that consume electrical power in the range of megawatts, which gets completely transformed into heat. Although a datacenter contains sophisticated cooling systems, our studies indicate quantitative evidence of thermal bottlenecks in real-life production workload, showing the presence of significant spatial and temporal thermal and power heterogeneity. Therefore minor thermal issues/anomalies can potentially start a chain of events that leads to an unbalance between the amount of heat generated by the computing nodes and the heat removed by the cooling system originating thermal hazards. Although thermal anomalies are rare events, anomaly detection/prediction in time is vital to avoid IT and facility equipment damage and outage of the datacenter, with severe societal and business losses. For this reason, automated approaches to detect thermal anomalies in datacenters have considerable potential. This thesis analyzed and characterized the power and thermal characteristics of a Tier0 datacenter (CINECA) during production and under abnormal thermal conditions. Then, a Deep Learning (DL)-powered thermal hazard prediction framework is proposed. The proposed models are validated against real thermal hazard events reported for the studied HPC cluster while in production. This thesis is the first empirical study of thermal anomaly detection and prediction techniques of a real large-scale HPC system to the best of my knowledge. For this thesis, I used a large-scale dataset, monitoring data of tens of thousands of sensors for around 24 months with a data collection rate of around 20 seconds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Continuum parallel robots (CPRs) are manipulators employing multiple flexible beams arranged in parallel and connected to a rigid end-effector. CPRs promise higher payload and accuracy than serial CRs while keeping great flexibility. As the risk of injury during accidental contacts between a human and a CPR should be reduced, CPRs may be used in large-scale collaborative tasks or assisted robotic surgery. There exist various CPR designs, but the prototype conception is rarely based on performance considerations, and the CPRs realization in mainly based on intuitions or rigid-link parallel manipulators architectures. This thesis focuses on the performance analysis of CPRs, and the tools needed for such evaluation, such as workspace computation algorithms. In particular, workspace computation strategies for CPRs are essential for the performance assessment, since the CPRs workspace may be used as a performance index or it can serve for optimal-design tools. Two new workspace computation algorithms are proposed in this manuscript, the former focusing on the workspace volume computation and the certification of its numerical results, while the latter aims at computing the workspace boundary only. Due to the elastic nature of CPRs, a key performance indicator for these robots is the stability of their equilibrium configurations. This thesis proposes the experimental validation of the equilibrium stability assessment on a real prototype, demonstrating limitations of some commonly used assumptions. Additionally, a performance index measuring the distance to instability is originally proposed in this manuscript. Differently from the majority of the existing approaches, the clear advantage of the proposed index is a sound physical meaning; accordingly, the index can be used for a more straightforward performance quantification, and to derive robot specifications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The idea of Grid Computing originated in the nineties and found its concrete applications in contexts like the SETI@home project where a lot of computers (offered by volunteers) cooperated, performing distributed computations, inside the Grid environment analyzing radio signals trying to find extraterrestrial life. The Grid was composed of traditional personal computers but, with the emergence of the first mobile devices like Personal Digital Assistants (PDAs), researchers started theorizing the inclusion of mobile devices into Grid Computing; although impressive theoretical work was done, the idea was discarded due to the limitations (mainly technological) of mobile devices available at the time. Decades have passed, and now mobile devices are extremely more performant and numerous than before, leaving a great amount of resources available on mobile devices, such as smartphones and tablets, untapped. Here we propose a solution for performing distributed computations over a Grid Computing environment that utilizes both desktop and mobile devices, exploiting the resources from day-to-day mobile users that alternatively would end up unused. The work starts with an introduction on what Grid Computing is, the evolution of mobile devices, the idea of integrating such devices into the Grid and how to convince device owners to participate in the Grid. Then, the tone becomes more technical, starting with an explanation on how Grid Computing actually works, followed by the technical challenges of integrating mobile devices into the Grid. Next, the model, which constitutes the solution offered by this study, is explained, followed by a chapter regarding the realization of a prototype that proves the feasibility of distributed computations over a Grid composed by both mobile and desktop devices. To conclude future developments and ideas to improve this project are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Prince Maximilian zu Wied's great exploration of coastal Brazil in 1815-1817 resulted in important collections of reptiles, amphibians, birds, and mammals, many of which were new species later described by Wied himself The bulk of his collection was purchased for the American Museum of Natural History in 1869, although many ""type specimens"" had disappeared earlier. Wied carefully identified his localities but did not designate type specimens or type localities, which are taxonomic concepts that were not yet established. Information and manuscript names on a fraction (17 species) of his Brazilian reptiles and amphibians were transmitted by Wied to Prof. Heinrich Rudolf Schinz at the University of Zurich. Schinz included these species (credited to their discoverer ""Princ. Max."") in the second volume of Das Thierreich ... (1822). Most are junior objective synonyms of names published by Wied. However, six of the 17 names used by Schinz predate Wied's own publications. Three were manuscript names never published by Wied because he determined the species to be previously known. (1) Lacerta vittata Schinz, 1822 (a nomen oblitum) = Lacerta striata sensu Wied (a misidentification, non Linnaeus nec sensu Merrem) = Kentropyx calcarata Spix, 1825, herein qualified as a nomen protectum. (2) Polychrus virescens Schinz, 1822 = Lacerta marmorata Linnaeus, 1758 (now Polychrus marmoratus). (3) Scincus cyanurus Schinz, 1822 (a nomen oblitum) = Gymnophthalmus quadrilineatus sensu Wied (a misidentification, non Linnaeus nec sensu Merrem) = Micrablepharus maximiliani (Reinhardt and Lutken, ""1861"" [1862]), herein qualified as a nomen protectum. Qualifying Scincus cyanurus Schinz, 1822, as a nomen oblitum also removes the problem of homonymy with the later-named Pacific skink Scincus cyanurus Lesson (= Emoia cyanura). The remaining three names used by Schinz are senior objective synonyms that take priority over Wied's names. (4) Bufo cinctus Schinz, 1822, is senior to Bufo cinctus Wied, 1823; both, however, are junior synonyms of Bufo crucifer Wied, 1821 = Chaunus crucifer (Wied). (5) Agama picta Schinz, 1822, is senior to Agama picta Wied, 1823, requiring a change of authorship for this poorly known species, to be known as Enyalius pictus (Schinz). (6) Lacerta cyanomelas Schinz, 1822, predates Teius cyanomelas Wied, 1824 (1822-1831) both nomina oblita. Wied's illustration and description shows cyanomelas as apparently conspecific with the recently described but already well-known Cnemidophorus nativo Rocha et al., 1997, which is the valid name because of its qualification herein as a nomen protectum. The preceding specific name cyanomelas (as corrected in an errata section) is misspelled several ways in different copies of Schinz's original description (""cyanom las,"" ""cyanomlas,"" and cyanom""). Loosening, separation, and final loss of the last three letters of movable type in the printing chase probably accounts for the variant misspellings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses the integrated design of parallel manipulators, which exhibit varying dynamics. This characteristic affects the machine stability and performance. The design methodology consists of four main steps: (i) the system modeling using flexible multibody technique, (ii) the synthesis of reduced-order models suitable for control design, (iii) the systematic flexible model-based input signal design, and (iv) the evaluation of some possible machine designs. The novelty in this methodology is to take structural flexibilities into consideration during the input signal design; therefore, enhancing the standard design process which mainly considers rigid bodies dynamics. The potential of the proposed strategy is exploited for the design evaluation of a two degree-of-freedom high-speed parallel manipulator. The results are experimentally validated. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this work is to verify the possibility to correlating specific gravity and wood hardness parallel and perpendicular to the grain. The purpose is to offer one more tool to help in the decision about wood species choice for use in floors and sleepers. To reach this intent, we considered the results of standard tests (NBR 7190:1997, Timber Structures Design, Annex B, Brazilian Association of Technical Standards) to determine hardness parallel and normal to the grain in fourteen tropical high density wood species (over 850 kg/m(3), at 12% moisture content). For each species twelve determinations were made, based on the material obtained at Sao Carlos and its regional wood market. Statistical analysis led to some expressions to describe the cited properties relationships, with a determination coefficient about 0.8.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the use of optimization techniques in the design of a steel riser. Two methods are used: the genetic algorithm, which imitates the process of natural selection, and the simulated annealing, which is based on the process of annealing of a metal. Both of them are capable of searching a given solution space for the best feasible riser configuration according to predefined criteria. Optimization issues are discussed, such as problem codification, parameter selection, definition of objective function, and restrictions. A comparison between the results obtained for economic and structural objective functions is made for a case study. Optimization method parallelization is also addressed. [DOI: 10.1115/1.4001955]

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Every year, the number of discarded electro-electronic products is increasing. For this reason recycling is needed, to avoid wasting non-renewable natural resources. The objective of this work is to study the recycling of materials from parallel wire cable through Unit operations of mineral processing. Parallel wire cables are basically composed of polymer and copper. The following unit operations were tested: grinding, size classification, dense medium separation, electrostatic separation, scrubbing, panning, and elutriation. It was observed that the operations used obtained copper and PVC concentrates with a low degree of cross contamination. It was Concluded that total liberation of the materials was accomplished after grinding to less than 3 mm, using a cage mill. Separation using panning and elutriation presented the best results in terms of recovery and cross contamination. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective was to study the flow pattern in a plate heat exchanger (PHE) through residence time distribution (RTD) experiments. The tested PHE had flat plates and it was part of a laboratory scale pasteurization unit. Series flow and parallel flow configurations were tested with a variable number of passes and channels per pass. Owing to the small scale of the equipment and the short residence times, it was necessary to take into account the influence of the tracer detection unit on the RID data. Four theoretical RID models were adjusted: combined, series combined, generalized convection and axial dispersion. The combined model provided the best fit and it was useful to quantify the active and dead space volumes of the PHE and their dependence on its configuration. Results suggest that the axial dispersion model would present good results for a larger number of passes because of the turbulence associated with the changes of pass. This type of study can be useful to compare the hydraulic performance of different plates or to provide data for the evaluation of heat-induced changes that occur in the processing of heat-sensitive products. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Scheduling parallel and distributed applications efficiently onto grid environments is a difficult task and a great variety of scheduling heuristics has been developed aiming to address this issue. A successful grid resource allocation depends, among other things, on the quality of the available information about software artifacts and grid resources. In this article, we propose a semantic approach to integrate selection of equivalent resources and selection of equivalent software artifacts to improve the scheduling of resources suitable for a given set of application execution requirements. We also describe a prototype implementation of our approach based on the Integrade grid middleware and experimental results that illustrate its benefits. Copyright (C) 2009 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a method for predicting resource availability in opportunistic grids by means of use pattern analysis (UPA), a technique based on non-supervised learning methods. This prediction method is based on the assumption of the existence of several classes of computational resource use patterns, which can be used to predict the resource availability. Trace-driven simulations validate this basic assumptions, which also provide the parameter settings for the accurate learning of resource use patterns. Experiments made with an implementation of the UPA method show the feasibility of its use in the scheduling of grid tasks with very little overhead. The experiments also demonstrate the method`s superiority over other predictive and non-predictive methods. An adaptative prediction method is suggested to deal with the lack of training data at initialization. Further adaptative behaviour is motivated by experiments which show that, in some special environments, reliable resource use patterns may not always be detected. Copyright (C) 2009 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.