100 resultados para task difficulty
Resumo:
Task dataflow languages simplify the specification of parallel programs by dynamically detecting and enforcing dependencies between tasks. These languages are, however, often restricted to a single level of parallelism. This language design is reflected in the runtime system, where a master thread explicitly generates a task graph and worker threads execute ready tasks and wake-up their dependents. Such an approach is incompatible with state-of-the-art schedulers such as the Cilk scheduler, that minimize the creation of idle tasks (work-first principle) and place all task creation and scheduling off the critical path. This paper proposes an extension to the Cilk scheduler in order to reconcile task dependencies with the work-first principle. We discuss the impact of task dependencies on the properties of the Cilk scheduler. Furthermore, we propose a low-overhead ticket-based technique for dependency tracking and enforcement at the object level. Our scheduler also supports renaming of objects in order to increase task-level parallelism. Renaming is implemented using versioned objects, a new type of hyper object. Experimental evaluation shows that the unified scheduler is as efficient as the Cilk scheduler when tasks have no dependencies. Moreover, the unified scheduler is more efficient than SMPSS, a particular implementation of a task dataflow language.
Resumo:
We present BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range, multidimensional array tile or dynamic region. BDDT uses a block-based dependence analysis with arbitrary granularity. The analysis is applicable to existing C programs without having to restructure object or array allocation, and provides flexibility in array layouts and tile dimensions.
We evaluate BDDT using a representative set of benchmarks, and we compare it to SMPSs (the equivalent runtime system in StarSs) and OpenMP. BDDT performs comparable to or better than SMPSs and is able to cope with task granularity as much as one order of magnitude finer than SMPSs. Compared to OpenMP, BDDT performs up to 3.9× better for benchmarks that benefit from dynamic dependence analysis. BDDT provides additional data annotations to bypass dependence analysis. Using these annotations, BDDT outperforms OpenMP also in benchmarks where dependence analysis does not discover additional parallelism, thanks to a more efficient implementation of the runtime system.
Resumo:
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Resumo:
Prader-Willi syndrome (PWS) and Fragile X syndrome (FraX) are associated with distinctive cognitive and behavioural profiles. We examined whether repetitive behaviours in the two syndromes were associated with deficits in specific executive functions. PWS, FraX, and typically developing (TD) children were assessed for executive functioning using the Test of Everyday Attention for Children and an adapted Simon spatial interference task. Relative to the TD children, children with PWS and FraX showed greater costs of attention switching on the Simon task, but after controlling for intellectual ability, these switching deficits were only significant in the PWS group. Children with PWS and FraX also showed significantly increased preference for routine and differing profiles of other specific types of repetitive behaviours. A measure of switch cost from the Simon task was positively correlated to scores on preference for routine questionnaire items and was strongly associated with scores on other items relating to a preference for predictability. It is proposed that a deficit in attention switching is a component of the endophenotypes of both PWS and FraX and is associated with specific behaviours. This proposal is discussed in the context of neurocognitive pathways between genes and behaviour.
Resumo:
According to a higher order reasoning account, inferential reasoning processes underpin the widely observed cue competition effect of blocking in causal learning. The inference required for blocking has been described as modus tollens (if p then q, not q therefore not p). Young children are known to have difficulties with this type of inference, but research with adults suggests that this inference is easier if participants think counterfactually. In this study, 100 children (51 five-year-olds and 49 six- to seven-year-olds) were assigned to two types of pretraining groups. The counterfactual group observed demonstrations of cues paired with outcomes and answered questions about what the outcome would have been if the causal status of cues had been different, whereas the factual group answered factual questions about the same demonstrations. Children then completed a causal learning task. Counterfactual pretraining enhanced levels of blocking as well as modus tollens reasoning but only for the younger children. These findings provide new evidence for an important role for inferential reasoning in causal learning.
Resumo:
We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TProf by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS).