3 resultados para ArcGis Runtime SDK for Androide
em Massachusetts Institute of Technology
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
Dynamic optimization has several key advantages. This includes the ability to work on binary code in the absence of sources and to perform optimization across module boundaries. However, it has a significant disadvantage viz-a-viz traditional static optimization: it has a significant runtime overhead. There can be performance gain only if the overhead can be amortized. In this paper, we will quantitatively analyze the runtime overhead introduced by a dynamic optimizer, DynamoRIO. We found that the major overhead does not come from the optimizer's operation. Instead, it comes from the extra code in the code cache added by DynamoRIO. After a detailed analysis, we will propose a method of trace construction that ameliorate the overhead introduced by the dynamic optimizer, thereby reducing the runtime overhead of DynamoRIO. We believe that the result of the study as well as the proposed solution is applicable to other scenarios such as dynamic code translation and managed execution that utilizes a framework similar to that of dynamic optimization.
Resumo:
It has been widely known that a significant part of the bits are useless or even unused during the program execution. Bit-width analysis targets at finding the minimum bits needed for each variable in the program, which ensures the execution correctness and resources saving. In this paper, we proposed a static analysis method for bit-widths in general applications, which approximates conservatively at compile time and is independent of runtime conditions. While most related work focus on integer applications, our method is also tailored and applicable to floating point variables, which could be extended to transform floating point number into fixed point numbers together with precision analysis. We used more precise representations for data value ranges of both scalar and array variables. Element level analysis is carried out for arrays. We also suggested an alternative for the standard fixed-point iterations in bi-directional range analysis. These techniques are implemented on the Trimaran compiler structure and tested on a set of benchmarks to show the results.