985 resultados para Data Allocation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bank switching in embedded processors having partitioned memory architecture results in code size as well as run time overhead. An algorithm and its application to assist the compiler in eliminating the redundant bank switching codes introduced and deciding the optimum data allocation to banked memory is presented in this work. A relation matrix formed for the memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Data allocation to memory is done by considering all possible permutation of memory banks and combination of data. The compiler output corresponding to each data mapping scheme is subjected to a static machine code analysis which identifies the one with minimum number of bank switching codes. Even though the method is compiler independent, the algorithm utilizes certain architectural features of the target processor. A prototype based on PIC 16F87X microcontrollers is described. This method scales well into larger number of memory blocks and other architectures so that high performance compilers can integrate this technique for efficient code generation. The technique is illustrated with an example

Relevância:

60.00% 60.00%

Publicador:

Resumo:

数据分配是研究数据如何分布到多个物理节点的NP-Complete问题.给出数据分配算法的数学模型,提出基于时序片段评价的数据分配算法——DATE.该算法利用数据在短时域访问量分布不均的特点,将多目标优化问题转化为单一目标求解,采用蜜蜂算法(collective Honey bee behavior)调整参数并反馈算法结果,以实现系统负载均衡.随机实验结果表明,DATE相比于同类Random,roundrobin,Bubba算法在系统总时段均衡ET、系统时段内均衡值ES、系统最大波峰值EM 3个指标中表现更优.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.This dissertation contributes to an architecture oriented code validation, error localization and optimization technique assisting the embedded system designer in software debugging, to make it more effective at early detection of software bugs that are otherwise hard to detect, using the static analysis of machine codes. The focus of this work is to develop methods that automatically localize faults as well as optimize the code and thus improve the debugging process as well as quality of the code.Validation is done with the help of rules of inferences formulated for the target processor. The rules govern the occurrence of illegitimate/out of place instructions and code sequences for executing the computational and integrated peripheral functions. The stipulated rules are encoded in propositional logic formulae and their compliance is tested individually in all possible execution paths of the application programs. An incorrect sequence of machine code pattern is identified using slicing techniques on the control flow graph generated from the machine code.An algorithm to assist the compiler to eliminate the redundant bank switching codes and decide on optimum data allocation to banked memory resulting in minimum number of bank switching codes in embedded system software is proposed. A relation matrix and a state transition diagram formed for the active memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Instances of code redundancy based on the stipulated rules for the target processor are identified.This validation and optimization tool can be integrated to the system development environment. It is a novel approach independent of compiler/assembler, applicable to a wide range of processors once appropriate rules are formulated. Program states are identified mainly with machine code pattern, which drastically reduces the state space creation contributing to an improved state-of-the-art model checking. Though the technique described is general, the implementation is architecture oriented, and hence the feasibility study is conducted on PIC16F87X microcontrollers. The proposed tool will be very useful in steering novices towards correct use of difficult microcontroller features in developing embedded systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This document provides data for the case study presented in our recent earthwork planning papers. Some results are also provided in a graphical format using Excel.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Montado ecosystem in the Alentejo Region, south of Portugal, has enormous agro-ecological and economics heterogeneities. A definition of homogeneous sub-units among this heterogeneous ecosystem was made, but for them is disposal only partial statistical information about soil allocation agro-forestry activities. The paper proposal is to recover the unknown soil allocation at each homogeneous sub-unit, disaggregating a complete data set for the Montado ecosystem area using incomplete information at sub-units level. The methodological framework is based on a Generalized Maximum Entropy approach, which is developed in thee steps concerning the specification of a r order Markov process, the estimates of aggregate transition probabilities and the disaggregation data to recover the unknown soil allocation at each homogeneous sub-units. The results quality is evaluated using the predicted absolute deviation (PAD) and the "Disagegation Information Gain" (DIG) and shows very acceptable estimation errors.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper provides new evidence on the determinants of the allocation of the US federal budget to the states and tests the capability of congressional, electoral and partisan theories to explain such allocation. We find that socio-economic characteristics are important explanatory variables but are not sufficient to explain the disparities in the distribution of federal monies. First, prestige committee membership is not conducive to pork-barrelling. We do not find any evidence that marginal states receive more funding; on the opposite, safe states tend to be rewarded. Also, states that are historically "swing" in presidential elections tend to receive more funds. Finally, we find strong evidence supporting partisan theories of budget allocation. States whose governor has the same political affiliation of the President receive more federal funds; while states whose representatives belong to a majority opposing the president party receive less funds.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Classic group recommender systems focus on providing suggestions for a fixed group of people. Our work tries to give an inside look at design- ing a new recommender system that is capable of making suggestions for a sequence of activities, dividing people in subgroups, in order to boost over- all group satisfaction. However, this idea increases problem complexity in more dimensions and creates great challenge to the algorithm’s performance. To understand the e↵ectiveness, due to the enhanced complexity and pre- cise problem solving, we implemented an experimental system from data collected from a variety of web services concerning the city of Paris. The sys- tem recommends activities to a group of users from two di↵erent approaches: Local Search and Constraint Programming. The general results show that the number of subgroups can significantly influence the Constraint Program- ming Approaches’s computational time and e�cacy. Generally, Local Search can find results much quicker than Constraint Programming. Over a lengthy period of time, Local Search performs better than Constraint Programming, with similar final results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper a utilization of the high data-rates channels by threading of sending and receiving is studied. As a communication technology evolves the higher speeds are used more and more in various applications. But generating traffic with Gbps data-rates also brings some complications. Especially if UDP protocol is used and it is necessary to avoid packet fragmentation, for example for high-speed reliable transport protocols based on UDP. For such situation the Ethernet network packet size has to correspond to standard 1500 bytes MTU[1], which is widely used in the Internet. System may not has enough capacity to send messages with necessary rate in a single-threaded mode. A possible solution is to use more threads. It can be efficient on widespread multicore systems. Also the fact that in real network non-constant data flow can be expected brings another object of study –- an automatic adaptation to the traffic which is changing during runtime. Cases investigated in this paper include adjusting number of threads to a given speed and keeping speed on a given rate when CPU gets heavily loaded by other processes while sending data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

National Highway Traffic Safety Administration, Washington, D.C.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

For multiple heterogeneous multicore server processors across clouds and data centers, the aggregated performance of the cloud of clouds can be optimized by load distribution and balancing. Energy efficiency is one of the most important issues for large-scale server systems in current and future data centers. The multicore processor technology provides new levels of performance and energy efficiency. The present paper aims to develop power and performance constrained load distribution methods for cloud computing in current and future large-scale data centers. In particular, we address the problem of optimal power allocation and load distribution for multiple heterogeneous multicore server processors across clouds and data centers. Our strategy is to formulate optimal power allocation and load distribution for multiple servers in a cloud of clouds as optimization problems, i.e., power constrained performance optimization and performance constrained power optimization. Our research problems in large-scale data centers are well-defined multivariable optimization problems, which explore the power-performance tradeoff by fixing one factor and minimizing the other, from the perspective of optimal load distribution. It is clear that such power and performance optimization is important for a cloud computing provider to efficiently utilize all the available resources. We model a multicore server processor as a queuing system with multiple servers. Our optimization problems are solved for two different models of core speed, where one model assumes that a core runs at zero speed when it is idle, and the other model assumes that a core runs at a constant speed. Our results in this paper provide new theoretical insights into power management and performance optimization in data centers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A substantial body of research is focused on understanding the relationships between socio-demographics, land-use characteristics, and mode specific attributes on travel mode choice and time-use patterns. Residential and commercial densities, inter-mixing of land uses, and route directness in conjunction with transportation performance characteristics interact to influence accessibility to destinations as well as time spent traveling and engaging in activities. This study uniquely examines the activity durations undertaken for out-of-home subsistence; maintenance, and discretionary activities. Also examined are total tour durations (summing all activity categories within a tour). Cross-sectional activities are obtained from household activity travel survey data from the Atlanta Metropolitan Region. Time durations allocated to weekdays and weekends are compared. The censoring and endogeneity between activity categories and within individuals are captured using multiple equations Tobit models. The analysis and modeling reveal that land-use characteristics such as net residential density and the number of commercial parcels within a kilometer of a residence are associated with differences in weekday and weekend time-use allocations. Household type and structure are significant predictors across the three activity categories, but not for overall travel times. Tour characteristics such as time-of-day and primary travel mode of the tours also affect traveler's out-of-home activity-tour time-use patterns.