976 resultados para heterogeneous system


Relevância:

100.00% 100.00%

Publicador:

Resumo:

* This work was financially supported by RFBR-04-01-00858.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user’s reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema and the theoretical analysis and experiments prove that the MaxRe algorithm’s schedule can certainly satisfy user’s reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper was proposed the development of an heterogeneous system using the microcontroller (AT90CANI28) where the protocol model CAN and the standard IEEE 802.15.4 are connected. This module is able to manage and monitor sensors and actuators using CAN and, through the wireless standard 802.15.4, communicate with the other network modules. © 2011 IEEE.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Computational performance increasingly depends on parallelism, and many systems rely on heterogeneous resources such as GPUs and FPGAs to accelerate computationally intensive applications. However, implementations for such heterogeneous systems are often hand-crafted and optimised to one computation scenario, and it can be challenging to maintain high performance when application parameters change. In this paper, we demonstrate that machine learning can help to dynamically choose parameters for task scheduling and load-balancing based on changing characteristics of the incoming workload. We use a financial option pricing application as a case study. We propose a simulation of processing financial tasks on a heterogeneous system with GPUs and FPGAs, and show how dynamic, on-line optimisations could improve such a system. We compare on-line and batch processing algorithms, and we also consider cases with no dynamic optimisations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

FPGAs and GPUs are often used when real-time performance in video processing is required. An accelerated processor is chosen based on task-specific priorities (power consumption, processing time and detection accuracy), and this decision is normally made once at design time. All three characteristics are important, particularly in battery-powered systems. Here we propose a method for moving selection of processing platform from a single design-time choice to a continuous run time one.We implement Histogram of Oriented Gradients (HOG) detectors for cars and people and Mixture of Gaussians (MoG) motion detectors running across FPGA, GPU and CPU in a heterogeneous system. We use this to detect illegally parked vehicles in urban scenes. Power, time and accuracy information for each detector is characterised. An anomaly measure is assigned to each detected object based on its trajectory and location, when compared to learned contextual movement patterns. This drives processor and implementation selection, so that scenes with high behavioural anomalies are processed with faster but more power hungry implementations, but routine or static time periods are processed with power-optimised, less accurate, slower versions. Real-time performance is evaluated on video datasets including i-LIDS. Compared to power-optimised static selection, automatic dynamic implementation mapping is 10% more accurate but draws 12W extra power in our testbed desktop system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Enzymatic hydrolysis of cellulose was highly complex because of the unclear enzymatic mechanism and many factors that affect the heterogeneous system. Therefore, it is difficult to build a theoretical model to study cellulose hydrolysis by cellulase. Artificial neural network (ANN) was used to simulate and predict this enzymatic reaction and compared with the response surface model (RSM). The independent variables were cellulase amount X-1, substrate concentration X-2, and reaction time X-3, and the response variables were reducing sugar concentration Y-1 and transformation rate of the raw material Y-2. The experimental results showed that ANN was much more suitable for studying the kinetics of the enzymatic hydrolysis than RSM. During the simulation process, relative errors produced by the ANN model were apparently smaller than that by RSM except one and the central experimental points. During the prediction process, values produced by the ANN model were much closer to the experimental values than that produced by RSM. These showed that ANN is a persuasive tool that can be used for studying the kinetics of cellulose hydrolysis catalyzed by cellulase.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this study, novel liver targeted doxorubicin (DOX) loaded alginate (ALG) nanoparticles were prepared by CaCl2 crosslinking method. Glycyrrhetinic acid (GA, a liver targeted molecule) modified alginate (GA-ALG) was synthesized in a heterogeneous system, and the structure of GA-ALG and the substitution degree of GA were analyzed by H-1 NMR, FT-IR and elemental analysis. The drug release profile under the simulated physiological condition and cytotoxicity experiments of drug-loaded GA-ALG nanoparticles were carried out in vitro. Transmission electron micrographs (TEM) and dynamic light scattering (DLS) analysis showed that drug-loaded GA-ALG nanoparticles have spherical shape structure with the mean hydrodynamic diameter around 214 +/- 11 nm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis, numerical methods aiming at determining the eigenfunctions, their adjoint and the corresponding eigenvalues of the two-group neutron diffusion equations representing any heterogeneous system are investigated. First, the classical power iteration method is modified so that the calculation of modes higher than the fundamental mode is possible. Thereafter, the Explicitly-Restarted Arnoldi method, belonging to the class of Krylov subspace methods, is touched upon. Although the modified power iteration method is a computationally-expensive algorithm, its main advantage is its robustness, i.e. the method always converges to the desired eigenfunctions without any need from the user to set up any parameter in the algorithm. On the other hand, the Arnoldi method, which requires some parameters to be defined by the user, is a very efficient method for calculating eigenfunctions of large sparse system of equations with a minimum computational effort. These methods are thereafter used for off-line analysis of the stability of Boiling Water Reactors. Since several oscillation modes are usually excited (global and regional oscillations) when unstable conditions are encountered, the characterization of the stability of the reactor using for instance the Decay Ratio as a stability indicator might be difficult if the contribution from each of the modes are not separated from each other. Such a modal decomposition is applied to a stability test performed at the Swedish Ringhals-1 unit in September 2002, after the use of the Arnoldi method for pre-calculating the different eigenmodes of the neutron flux throughout the reactor. The modal decomposition clearly demonstrates the excitation of both the global and regional oscillations. Furthermore, such oscillations are found to be intermittent with a time-varying phase shift between the first and second azimuthal modes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Intermediate band formation on silicon layers for solar cell applications was achieved by titanium implantation and laser annealing. A two-layer heterogeneous system, formed by the implanted layer and by the un-implanted substrate, was formed. In this work, we present for the first time electrical characterization results which show that recombination is suppressed when the Ti concentration is high enough to overcome the Mott limit, in agreement with the intermediate band theory. Clear differences have been observed between samples implanted with doses under or over the Mott limit. Samples implanted under the Mott limit have capacitance values much lower than the un-implanted ones as corresponds to a highly doped semiconductor Schottky junction. However, when the Mott limit is surpassed, the samples have much higher capacitance, revealing that the intermediate band is formed. The capacitance increasing is due to the big amount of charge trapped at the intermediate band, even at low temperatures. Ti deep levels have been measured by admittance spectroscopy. These deep levels are located at energies which vary from 0.20 to 0.28?eV below the conduction band for implantation doses in the range 1013-1014 at./cm2. For doses over the Mott limit, the implanted atoms become nonrecombinant. Capacitance voltage transient technique measurements prove that the fabricated devices consist of two-layers, in which the implanted layer and the substrate behave as an n+/n junction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the last decade, with the expansion of organizational scope and the tendency for outsourcing, there has been an increasing need for Business Process Integration (BPI), understood as the sharing of data and applications among business processes. The research efforts and development paths in BPI pursued by many academic groups and system vendors, targeting heterogeneous system integration, continue to face several conceptual and technological challenges. This article begins with a brief review of major approaches and emerging standards to address BPI. Further, we introduce a rule-driven messaging approach to BPI, which is based on the harmonization of messages in order to compose a new, often cross-organizational process. We will then introduce the design of a temporal first order language (Harmonized Messaging Calculus) that provides the formal foundation for general rules governing the business process execution. Definitions of the language terms, formulae, safety, and expressiveness are introduced and considered in detail.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Semiconductor chip packaging has evolved from single chip packaging to 3D heterogeneous system integration using multichip stacking in a single module. One of the key challenges in 3D integration is the high density interconnects that need to be formed between the chips with through-silicon-vias (TSVs) and inter-chip interconnects. Anisotropic Conductive Film (ACF) technology is one of the low-temperature, fine-pitch interconnect method, which has been considered as a potential replacement for solder interconnects in line with continuous scaling of the interconnects in the IC industry. However, the conventional ACF materials are facing challenges to accommodate the reduced pad and pitch size due to the micro-size particles and the particle agglomeration issue. A new interconnect material - Nanowire Anisotropic Conductive Film (NW-ACF), composed of high density copper nanowires of ~ 200 nm diameter and 10-30 µm length that are vertically distributed in a polymeric template, is developed in this work to tackle the constrains of the conventional ACFs and serves as an inter-chip interconnect solution for potential three-dimensional (3D) applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Soil is a complex heterogeneous system comprising of highly variable and dynamic micro-habitats that have significant impacts on the growth and activity of resident microbiota. A question addressed in this research is how soil structure affects the temporal dynamics and spatial distribution of bacteria. Using repacked microcosms, the effect of bulk-density, aggregate sizes and water content on growth and distribution of introduced Pseudomonas fluorescens and Bacillus subtilis bacteria was determined. Soil bulk-density and aggregate sizes were altered to manipulate the characteristics of the pore volume where bacteria reside and through which distribution of solutes and nutrients is controlled. X-ray CT was used to characterise the pore geometry of repacked soil microcosms. Soil porosity, connectivity and soil-pore interface area declined with increasing bulk-density. In samples that differ in pore geometry, its effect on growth and extent of spread of introduced bacteria was investigated. The growth rate of bacteria reduced with increasing bulk-density, consistent with a significant difference in pore geometry. To measure the ability of bacteria to spread thorough soil, placement experiments were developed. Bacteria were capable of spreading several cm’s through soil. The extent of spread of bacteria was faster and further in soil with larger and better connected pore volumes. To study the spatial distribution in detail, a methodology was developed where a combination of X-ray microtopography, to characterize the soil structure, and fluorescence microscopy, to visualize and quantify bacteria in soil sections was used. The influence of pore characteristics on distribution of bacteria was analysed at macro- and microscales. Soil porosity, connectivity and soil-pore interface influenced bacterial distribution only at the macroscale. The method developed was applied to investigate the effect of soil pore characteristics on the extent of spread of bacteria introduced locally towards a C source in soil. Soil-pore interface influenced spread of bacteria and colonization, therefore higher bacterial densities were found in soil with higher pore volumes. Therefore the results in this showed that pore geometry affects the growth and spread of bacteria in soil. The method developed showed showed how thin sectioning technique can be combined with 3D X-ray CT to visualize bacterial colonization of a 3D pore volume. This novel combination of methods is a significant step towards a full mechanistic understanding of microbial dynamics in structured soils.