62 resultados para Code optimization


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Simultaneous multithreading processors dynamically share processor resources between multiple threads. In general, shared SMT resources may be managed explicitly, for instance, by dynamically setting queue occupation bounds for each thread as in the DCRA and Hill-Climbing policies. Alternatively, resources may be managed implicitly; that is, resource usage is controlled by placing the desired instruction mix in the resources. In this case, the main resource management tool is the instruction fetch policy which must predict the behavior of each thread (branch mispredictions, long-latency loads, etc.) as it fetches instructions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Branch prediction feeds a speculative execution processor core with instructions. Branch mispredictions are inevitable and have negative effects on performance and energy consumption. With the advent of highly accurate conditional branch predictors, nonconditional branch instructions are gaining importance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable.We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1–2% error. In an experimental study using the approach, training on 1% of a 250-K-point CMP design space allows our models to predict performance with only 4–5% error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.

Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.

Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cold-formed steel portal frames are a popular form of construction for low-rise commercial, light industrial and agricultural buildings with spans of up to 20 m. In this article, a real-coded genetic algorithm is described that is used to minimize the cost of the main frame of such buildings. The key decision variables considered in this proposed algorithm consist of both the spacing and pitch of the frame as continuous variables, as well as the discrete section sizes.A routine taking the structural analysis and frame design for cold-formed steel sections is embedded into a genetic algorithm. The results show that the real-coded genetic algorithm handles effectively the mixture of design variables, with high robustness and consistency in achieving the optimum solution. All wind load combinations according to Australian code are considered in this research. Results for frames with knee braces are also included, for which the optimization achieved even larger savings in cost.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A neural network based tool has been developed to assist in the process of code transformation. The tool offers advice on appropriate transformations within a knowledge-driven, semi-automatic parallelisation environment. We have identified the essential characteristics of codes relevant to loop transformations. A Kohonen network is used to discover structure in the characterised codes thus revealing new knowledge that may be brought to bear on the mapping between codes and transformations or transformation sequences. A transform selector based on this process has been developed and successfully applied to the parallelisation of sequential codes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study presents a reproducible, cost-effective in vitro encrustation model and, furthermore, describes the effects of components of the artificial urine and the presence of agents that modify the action of urease on encrustation on commercially available ureteral stents. The encrustation model involved the use of small-volume reactors (700 mL) containing artificial urine and employing an orbital incubator (at 37 degrees C) to ensure controlled stirring. The artificial urine contained sources of calcium and magnesium (both as chlorides), albumin and urease. Alteration of the ratio (% w/w) of calcium salt to magnesium salt affected the mass of encrustation, with the greatest encrustation noted whenever magnesium was excluded from the artificial urine. Increasing the concentration of albumin, designed to mimic the presence of protein in urine, significantly decreased the mass of both calcium and magnesium encrustation until a plateau was observed. Finally, exclusion of urease from the artificial urine significantly reduced encrustation due to the indirect effects of this enzyme on pH. Inclusion of the urease inhibitor, acetohydroxamic acid, or urease substrates (methylurea or ethylurea) into the artificial medium markedly reduced encrustation on ureteral stents. In conclusion, this study has described the design of a reproducible, cost-effective in vitro encrustation model. Encrustation was markedly reduced on biomaterials by the inclusion of agents that modify the action of urease. These agents may, therefore, offer a novel clinical approach to the control of encrustation on urological medical devices. (c) 2005 Wiley Periodicals, Inc.