Biblioteca Digital

952 resultados para parallel systems

Analytical Approximations to Predict Performance Measures of Manufacturing Systems with Job Failures and Parallel Processing

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.

VarSys Introduction:First International Workshop on Variability in Parallel and Distributed Systems

Relevância:

40.00% 40.00%

Publicador:

A parallel method for solving pentadiagonal systems of linear equations

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new parallel approach for solving a pentadiagonal linear system is presented. The parallel partition method for this system and the TW parallel partition method on a chain of P processors are introduced and discussed. The result of this algorithm is a reduced pentadiagonal linear system of order P \Gamma 2 compared with a system of order 2P \Gamma 2 for the parallel partition method. More importantly the new method involves only half the number of communications startups than the parallel partition method (and other standard parallel methods) and hence is a far more efficient parallel algorithm.

Energy consumption of parallel algorithms for solving linear systems on HPC architecture

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.

Integrating structural and input design of a 2-DOF high-speed parallel manipulator: A flexible model-based approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses the integrated design of parallel manipulators, which exhibit varying dynamics. This characteristic affects the machine stability and performance. The design methodology consists of four main steps: (i) the system modeling using flexible multibody technique, (ii) the synthesis of reduced-order models suitable for control design, (iii) the systematic flexible model-based input signal design, and (iv) the evaluation of some possible machine designs. The novelty in this methodology is to take structural flexibilities into consideration during the input signal design; therefore, enhancing the standard design process which mainly considers rigid bodies dynamics. The potential of the proposed strategy is exploited for the design evaluation of a two degree-of-freedom high-speed parallel manipulator. The results are experimentally validated. (C) 2010 Elsevier Ltd. All rights reserved.

Numerical study of tensile tests conducted on systems with elastic-plastic films deposited onto elastic-plastic substrates

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, a series of two-dimensional plane-strain finite element analyses was conducted to further understand the stress distribution during tensile tests on coated systems. Besides the film and the substrate, the finite element model also considered a number of cracks perpendicular to the film/substrate interface. Different from analyses commonly found in the literature, the mechanical behavior of both film and substrate was considered elastic-perfectly plastic in part of the analyses. Together with the film yield stress and the number of film cracks, other variables that were considered were crack tip geometry, the distance between two consecutive cracks and the presence of an interlayer. The analysis was based on the normal stresses parallel to the loading axis (sigma(xx)), which are responsible for cohesive failures that are observed in the film during this type of test. Results indicated that some configurations studied in this work have significantly reduced the value of sigma(xx) at the film/substrate interface and close to the pre-defined crack tips. Furthermore, in all the cases studied the values of sigma(xx) were systematically larger at the film/substrate interface than at the film surface. (C) 2010 Elsevier B.V. All rights reserved.

Aqueous two-phasemicellar systems in an oscillatory flowmicro-reactor: study of perspectives and experimental performance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Aqueous two-phase micellar systems (ATPMS) are micellar surfactant solutions with physical properties that make them very efficient for the extraction/concentration of biological products. In this work the main proposal that has been discussed is the possible applicability and importance of a novel oscillatory flow micro-reactor (micro-OFR) envisaged for parallel screening and/or development of industrial bioprocesses in ATPMS. Based on the technology of oscillatory flow mixing (OFM), this batch or continuous micro-reactor has been presented as a new small-scale alternative for biological or physical-chemical applications. RESULTS: ATPMS experiments were carried out in different OFM conditions (times, temperatures, oscillation frequencies and amplitudes) for the extraction of glucose-6-phosphate dehydrogenase (G6PD) in Triton X-114/buffer with Cibacron Blue as affinity ligand. CONCLUSION: The results suggest the potential use of OFR, considering this process a promising and new alternative for the purification or pre-concentration of bioproducts. Despite the applied homogenization and extraction conditions have presented no improvements in the partitioning selectivity of the target enzyme, when at rest temperature they have influenced the partitioning behavior in Triton X-114 ATPMS. (C) 2011 Society of Chemical Industry

Practical parallel coset enumeration

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coset enumeration is a most important procedure for investigating finitely presented groups. We present a practical parallel procedure for coset enumeration on shared memory processors. The shared memory architecture is particularly interesting because such parallel computation is both faster and cheaper. The lower cost comes when the program requires large amounts of memory, and additional CPU's. allow us to lower the time that the expensive memory is being used. Rather than report on a suite of test cases, we take a single, typical case, and analyze the performance factors in-depth. The parallelization is achieved through a master-slave architecture. This results in an interesting phenomenon, whereby the CPU time is divided into a sequential and a parallel portion, and the parallel part demonstrates a speedup that is linear in the number of processors. We describe an early version for which only 40% of the program was parallelized, and we describe how this was modified to achieve 90% parallelization while using 15 slave processors and a master. In the latter case, a sequential time of 158 seconds was reduced to 29 seconds using 15 slaves.

Parallel processing and image analysis in the eyes of mantis shrimps

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The compound eyes of mantis shrimps, a group of tropical marine crustaceans, incorporate principles of serial and parallel processing of visual information that may be applicable to artificial imaging systems. Their eyes include numerous specializations for analysis of the spectral and polarizational properties of light, and include more photoreceptor classes for analysis of ultraviolet light, color, and polarization than occur in any other known visual system. This is possible because receptors in different regions of the eye are anatomically diverse and incorporate unusual structural features, such as spectral filters, not seen in other compound eyes. Unlike eyes of most other animals, eyes of mantis shrimps must move to acquire some types of visual information and to integrate color and polarization with spatial vision. Information leaving the retina appears to be processed into numerous parallel data streams leading into the central nervous system, greatly reducing the analytical requirements at higher levels. Many of these unusual features of mantis shrimp vision may inspire new sensor designs for machine vision

A 2D numerical model for simulating the physics of fault systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Simulations provide a powerful means to help gain the understanding of crustal fault system physics required to progress towards the goal of earthquake forecasting. Cellular Automata are efficient enough to probe system dynamics but their simplifications render interpretations questionable. In contrast, sophisticated elasto-dynamic models yield more convincing results but are too computationally demanding to explore phase space. To help bridge this gap, we develop a simple 2D elastodynamic model of parallel fault systems. The model is discretised onto a triangular lattice and faults are specified as split nodes along horizontal rows in the lattice. A simple numerical approach is presented for calculating the forces at medium and split nodes such that general nonlinear frictional constitutive relations can be modeled along faults. Single and multi-fault simulation examples are presented using a nonlinear frictional relation that is slip and slip-rate dependent in order to illustrate the model.

A Wireless EEG Acquisition Platform based on Embedded Systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a wireless EEG acquisition platform based on Open Multimedia Architecture Platform (OMAP) embedded system. A high-impedance active dry electrode was tested for improving the scalp- electrode interface. It was used the sigma-delta ADS1298 analog-to-digital converter, and developed a “kernelspace” character driver to manage the communications between the converter unit and the OMAP’s ARM core. The acquired EEG signal data is processed by a “userspace” application, which accesses the driver’s memory, saves the data to a SD-card and transmits them through a wireless TCP/IP-socket to a PC. The electrodes were tested through the alpha wave replacement phenomenon. The experimental results presented the expected alpha rhythm (8-13 Hz) reactiveness to the eyes opening task. The driver spends about 725 μs to acquire and store the data samples. The application takes about 244 μs to get the data from the driver and 1.4 ms to save it in the SD-card. A WiFi throughput of 12.8Mbps was measured which results in a transmission time of 5 ms for 512 kb of data. The embedded system consumes about 200 mAh when wireless off and 400 mAh when it is on. The system exhibits a reliable performance to record EEG signals and transmit them wirelessly. Besides the microcontroller-based architectures, the proposed platform demonstrates that powerful ARM processors running embedded operating systems can be programmed with real-time constrains at the kernel level in order to control hardware, while maintaining their parallel processing abilities in high level software applications.

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).

Practical Quality Control: Comparision of Methods on the Quantification of Stationary Phases in Paper and Thin-Layer Chromatographic Systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: Paper and thin layer chromatography methods are frequently used in Classic Nuclear Medicine for the determination of radiochemical purity (RCP) on radiopharmaceutical preparations. An aliquot of the radiopharmaceutical to be tested is spotted at the origin of a chromatographic strip (stationary phase), which in turn is placed in a chromatographic chamber in order to separate and quantify radiochemical species present in the radiopharmaceutical preparation. There are several methods for the RCP measurement, based on the use of equipment as dose calibrators, well scintillation counters, radiochromatografic scanners and gamma cameras. The purpose of this study was to compare these quantification methods for the determination of RCP. Material and Methods: 99mTc-Tetrofosmin and 99mTc-HDP are the radiopharmaceuticals chosen to serve as the basis for this study. For the determination of RCP of 99mTc-Tetrofosmin we used ITLC-SG (2.5 x 10 cm) and 2-butanone (99mTc-tetrofosmin Rf = 0.55, 99mTcO4- Rf = 1.0, other labeled impurities 99mTc-RH RF = 0.0). For the determination of RCP of 99mTc-HDP, Whatman 31ET and acetone was used (99mTc-HDP Rf = 0.0, 99mTcO4- Rf = 1.0, other labeled impurities RF = 0.0). After the development of the solvent front, the strips were allowed to dry and then imaged on the gamma camera (256x256 matrix; zoom 2; LEHR parallel-hole collimator; 5-minute image) and on the radiochromatogram scanner. Then, strips were cut in Rf 0.8 in the case of 99mTc-tetrofosmin and Rf 0.5 in the case of 99mTc-HDP. The resultant pieces were smashed in an assay tube (to minimize the effect of counting geometry) and counted in the dose calibrator and in the well scintillation counter (during 1 minute). The RCP was calculated using the formula: % 99mTc-Complex = [(99mTc-Complex) / (Total amount of 99mTc-labeled species)] x 100. Statistical analysis was done using the test of hypotheses for the difference between means in independent samples. Results:The gamma camera based method demonstrated higher operator-dependency (especially concerning the drawing of the ROIs) and the measures obtained using the dose calibrator are very sensitive to the amount of activity spotted in the chromatographic strip, so the use of a minimum of 3.7 MBq activity is essential to minimize quantification errors. Radiochromatographic scanner and well scintillation counter showed concordant results and demonstrated the higher level of precision. Conclusions: Radiochromatographic scanners and well scintillation counters based methods demonstrate to be the most accurate and less operator-dependant methods.

Supporting intra-task parallelism in real-time multiprocessor systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Os sistemas de tempo real modernos geram, cada vez mais, cargas computacionais pesadas e dinâmicas, começando-se a tornar pouco expectável que sejam implementados em sistemas uniprocessador. Na verdade, a mudança de sistemas com um único processador para sistemas multi- processador pode ser vista, tanto no domínio geral, como no de sistemas embebidos, como uma forma eficiente, em termos energéticos, de melhorar a performance das aplicações. Simultaneamente, a proliferação das plataformas multi-processador transformaram a programação paralela num tópico de elevado interesse, levando o paralelismo dinâmico a ganhar rapidamente popularidade como um modelo de programação. A ideia, por detrás deste modelo, é encorajar os programadores a exporem todas as oportunidades de paralelismo através da simples indicação de potenciais regiões paralelas dentro das aplicações. Todas estas anotações são encaradas pelo sistema unicamente como sugestões, podendo estas serem ignoradas e substituídas, por construtores sequenciais equivalentes, pela própria linguagem. Assim, o modo como a computação é na realidade subdividida, e mapeada nos vários processadores, é da responsabilidade do compilador e do sistema computacional subjacente. Ao retirar este fardo do programador, a complexidade da programação é consideravelmente reduzida, o que normalmente se traduz num aumento de produtividade. Todavia, se o mecanismo de escalonamento subjacente não for simples e rápido, de modo a manter o overhead geral em níveis reduzidos, os benefícios da geração de um paralelismo com uma granularidade tão fina serão meramente hipotéticos. Nesta perspetiva de escalonamento, os algoritmos que empregam uma política de workstealing são cada vez mais populares, com uma eficiência comprovada em termos de tempo, espaço e necessidades de comunicação. Contudo, estes algoritmos não contemplam restrições temporais, nem outra qualquer forma de atribuição de prioridades às tarefas, o que impossibilita que sejam diretamente aplicados a sistemas de tempo real. Além disso, são tradicionalmente implementados no runtime da linguagem, criando assim um sistema de escalonamento com dois níveis, onde a previsibilidade, essencial a um sistema de tempo real, não pode ser assegurada. Nesta tese, é descrita a forma como a abordagem de work-stealing pode ser resenhada para cumprir os requisitos de tempo real, mantendo, ao mesmo tempo, os seus princípios fundamentais que tão bons resultados têm demonstrado. Muito resumidamente, a única fila de gestão de processos convencional (deque) é substituída por uma fila de deques, ordenada de forma crescente por prioridade das tarefas. De seguida, aplicamos por cima o conhecido algoritmo de escalonamento dinâmico G-EDF, misturamos as regras de ambos, e assim nasce a nossa proposta: o algoritmo de escalonamento RTWS. Tirando partido da modularidade oferecida pelo escalonador do Linux, o RTWS é adicionado como uma nova classe de escalonamento, de forma a avaliar na prática se o algoritmo proposto é viável, ou seja, se garante a eficiência e escalonabilidade desejadas. Modificar o núcleo do Linux é uma tarefa complicada, devido à complexidade das suas funções internas e às fortes interdependências entre os vários subsistemas. Não obstante, um dos objetivos desta tese era ter a certeza que o RTWS é mais do que um conceito interessante. Assim, uma parte significativa deste documento é dedicada à discussão sobre a implementação do RTWS e à exposição de situações problemáticas, muitas delas não consideradas em teoria, como é o caso do desfasamento entre vários mecanismo de sincronização. Os resultados experimentais mostram que o RTWS, em comparação com outro trabalho prático de escalonamento dinâmico de tarefas com restrições temporais, reduz significativamente o overhead de escalonamento através de um controlo de migrações, e mudanças de contexto, eficiente e escalável (pelo menos até 8 CPUs), ao mesmo tempo que alcança um bom balanceamento dinâmico da carga do sistema, até mesmo de uma forma não custosa. Contudo, durante a avaliação realizada foi detetada uma falha na implementação do RTWS, pela forma como facilmente desiste de roubar trabalho, o que origina períodos de inatividade, no CPU em questão, quando a utilização geral do sistema é baixa. Embora o trabalho realizado se tenha focado em manter o custo de escalonamento baixo e em alcançar boa localidade dos dados, a escalonabilidade do sistema nunca foi negligenciada. Na verdade, o algoritmo de escalonamento proposto provou ser bastante robusto, não falhando qualquer meta temporal nas experiências realizadas. Portanto, podemos afirmar que alguma inversão de prioridades, causada pela sub-política de roubo BAS, não compromete os objetivos de escalonabilidade, e até ajuda a reduzir a contenção nas estruturas de dados. Mesmo assim, o RTWS também suporta uma sub-política de roubo determinística: PAS. A avaliação experimental, porém, não ajudou a ter uma noção clara do impacto de uma e de outra. No entanto, de uma maneira geral, podemos concluir que o RTWS é uma solução promissora para um escalonamento eficiente de tarefas paralelas com restrições temporais.

Real-time scheduling of parallel tasks in the Linux Kernel

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a global multiprocessor scheduling algorithm for the Linux kernel that combines the global EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. We state that some priority inversion may actually be acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads, while still guaranteeing the expected system’s predictability. Experimental results demonstrate the low scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.

«
1
2
3
4
5
6
7
8
...
63
64
»