893 resultados para Scalable Nanofabrication
Resumo:
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Resumo:
Many modern networks are \emph{reconfigurable}, in the sense that the topology of the network can be changed by the nodes in the network. For example, peer-to-peer, wireless and ad-hoc networks are reconfigurable. More generally, many social networks, such as a company's organizational chart; infrastructure networks, such as an airline's transportation network; and biological networks, such as the human brain, are also reconfigurable. Modern reconfigurable networks have a complexity unprecedented in the history of engineering, resembling more a dynamic and evolving living animal rather than a structure of steel designed from a blueprint. Unfortunately, our mathematical and algorithmic tools have not yet developed enough to handle this complexity and fully exploit the flexibility of these networks. We believe that it is no longer possible to build networks that are scalable and never have node failures. Instead, these networks should be able to admit small, and maybe, periodic failures and still recover like skin heals from a cut. This process, where the network can recover itself by maintaining key invariants in response to attack by a powerful adversary is what we call \emph{self-healing}. Here, we present several fast and provably good distributed algorithms for self-healing in reconfigurable dynamic networks. Each of these algorithms have different properties, a different set of gaurantees and limitations. We also discuss future directions and theoretical questions we would like to answer. %in the final dissertation that this document is proposed to lead to.
Resumo:
Newborn babies can require significant amounts of medication containing excipients intended to improve the drug formulation. Most medicines given to neonates have been developed for adults or older children and contain excipients thought to be safe in these age groups. Many excipients have been used widely in neonates without obvious adverse effects. Some excipients may be toxic in high amounts in which case they need careful risk assessment. Alternatively, it is conceivable that ill-founded fears about excipients mean that potentially useful medicines are not made available to newborn babies. Choices about excipient exposure can occur at several stages throughout the lifecycle of a medicine, from product development through to clinical use. Making these choices requires a scalable approach to analysing the overall risk. In this contribution we examine these issues.
Resumo:
Performance evaluation of parallel software and architectural exploration of innovative hardware support face a common challenge with emerging manycore platforms: they are limited by the slow running time and the low accuracy of software simulators. Manycore FPGA prototypes are difficult to build, but they offer great rewards. Software running on such prototypes runs orders of magnitude faster than current simulators. Moreover, researchers gain significant architectural insight during the modeling process. We use the Formic FPGA prototyping board [1], which specifically targets scalable and cost-efficient multi-board prototyping, to build and test a 64-board model of a 512-core, MicroBlaze-based, non-coherent hardware prototype with a full network-on-chip in a 3D-mesh topology. We expand the hardware architecture to include the ARM Versatile Express platforms and build a 520-core heterogeneous prototype of 8 Cortex-A9 cores and 512 MicroBlaze cores. We then develop an MPI library for the prototype and evaluate it extensively using several bare-metal and MPI benchmarks. We find that our processor prototype is highly scalable, models faithfully single-chip multicore architectures, and is a very efficient platform for parallel programming research, being 50,000 times faster than software simulation.
Resumo:
Ambisonics and Higher Order Ambisonics (HOA) are scalable spatial audio techniques that attempt to present a sound scene to listeners over as large an area as possible. A localisation experiment was carried out to investigate the performance of a first and third order system at three listening positions - one in the centre and two off-centre - using a 5 m radius loudspeaker array. The results are briefly presented and compared to those of an earlier experiment on a 2.2 m loudspeaker array. In both experiments the off-centre listeners were placed such that the ratio of distance from the centre to the array radius was constant in both experiments. The test used a reverse target-pointer adjustment method to determine the error, both signed and absolute, for each combination of listening position and system. The results for both arrays were found to be very similar, suggesting that the relative amplitude of the loudspeakers, which were the same in both cases, was more dominant for localisation than the arrival time differences, which differed between array sizes.
Resumo:
The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus making the processor as compact as possible whilst maintaining flexibility and programmability. A key aspect of the approach is in reducing the application design time and implementation effort by using multiple IPPro processors in a SIMD mode. For different applications, this allows us to exploit different levels of parallelism and mapping for the specified processing architecture with the supported instruction set. In this context, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the colour and morphology operations accelerated using multiple IPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 to 33 times for colour filtering and morphology operations respectively, with a reduced design effort and time.
Resumo:
Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes with pipeline parallelism, where the number of pipeline stages is typically hard-coded in the program and defines the degree of parallelism.
This paper introduces hyperqueues, a programming abstraction that enables the construction of deterministic and scale-free pipeline parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are organized around private local views, hyperqueues require shared concurrent views on the underlying data structure. We define the semantics of hyperqueues and describe their implementation in a work-stealing scheduler. We demonstrate scalable performance on pipeline-parallel PARSEC benchmarks and find that hyperqueues provide comparable or up to 30% better performance than POSIX threads and Intel's Threading Building Blocks. The latter are highly tuned to the number of available processing cores, while programs using hyperqueues are scale-free.
Resumo:
This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code.
Resumo:
Energy consumption and total cost of ownership are daunting challenges for Datacenters, because they scale disproportionately with performance. Datacenters running financial analytics may incur extremely high operational costs in order to meet performance and latency requirements of their hosted applications. Recently, ARM-based microservers have emerged as a viable alternative to high-end servers, promising scalable performance via scale-out approaches and low energy consumption. In this paper, we investigate the viability of ARM-based microservers for option pricing, using the Monte Carlo and Binomial Tree kernels. We compare an ARM-based microserver against a state-of-the-art x86 server. We define application-related but platform-independent energy and performance metrics to compare those platforms fairly in the context of datacenters for financial analytics and give insight on the particular requirements of option pricing. Our experiments show that through scaling out energyefficient compute nodes within a 2U rack-mounted unit, an ARM-based microserver consumes as little as about 60% of the energy per option pricing compared to an x86 server, despite having significantly slower cores. We also find that the ARM microserver scales enough to meet a high fraction of market throughput demand, while consuming up to 30% less energy than an Intel server
Resumo:
Credal networks generalize Bayesian networks by relaxing the requirement of precision of probabilities. Credal networks are considerably more expressive than Bayesian networks, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal networks. The algorithm is based on an important representation result we prove for general credal networks: that any credal network can be equivalently reformulated as a credal network with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal network is then updated by L2U, a loopy approximate algorithm for binary credal networks. Overall, we generalize L2U to non-binary credal networks, obtaining a scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences with respect to other state-of-the-art algorithms is evaluated by extensive numerical tests.
Resumo:
Credal nets generalize Bayesian nets by relaxing the requirement of precision of probabilities. Credal nets are considerably more expressive than Bayesian nets, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal nets. The algorithm is based on an important representation result we prove for general credal nets: that any credal net can be equivalently reformulated as a credal net with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal net is updated by L2U, a loopy approximate algorithm for binary credal nets. Thus, we generalize L2U to non-binary credal nets, obtaining an accurate and scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences is evaluated by empirical tests.
Resumo:
Belief merging operators combine multiple belief bases (a profile) into a collective one. When the conjunction of belief bases is consistent, all the operators agree on the result. However, if the conjunction of belief bases is inconsistent, the results vary between operators. There is no formal manner to measure the results and decide on which operator to select. So, in this paper we propose to evaluate the result of merging operators by using three ordering relations (fairness, satisfaction and strength) over operators for a given profile. Moreover, a relation of conformity over operators is introduced in order to classify how well the operator conforms to the definition of a merging operator. By using the four proposed relations we provide a comparison of some classical merging operators and evaluate the results for some specific profiles.
Resumo:
Knowledge is an important component in many intelligent systems.
Since items of knowledge in a knowledge base can be conflicting, especially if
there are multiple sources contributing to the knowledge in this base, significant
research efforts have been made on developing inconsistency measures for
knowledge bases and on developing merging approaches. Most of these efforts
start with flat knowledge bases. However, in many real-world applications, items
of knowledge are not perceived with equal importance, rather, weights (which
can be used to indicate the importance or priority) are associated with items of
knowledge. Therefore, measuring the inconsistency of a knowledge base with
weighted formulae as well as their merging is an important but difficult task. In
this paper, we derive a numerical characteristic function from each knowledge
base with weighted formulae, based on the Dempster-Shafer theory of evidence.
Using these functions, we are able to measure the inconsistency of the knowledge
base in a convenient and rational way, and are able to merge multiple knowledge
bases with weighted formulae, even if knowledge in these bases may be
inconsistent. Furthermore, by examining whether multiple knowledge bases are
dependent or independent, they can be combined in different ways using their
characteristic functions, which cannot be handled (or at least have never been
considered) in classic knowledge based merging approaches in the literature.
Resumo:
Gender profiling is a fundamental task that helps CCTV systems to
provide better service for intelligent surveillance. Since subjects being detected
by CCTVs are not always cooperative, a few profiling algorithms are proposed
to deal with situations when faces of subjects are not available, among which
the most common approach is to analyze subjects’ body shape information. In
addition, there are some drawbacks for normal profiling algorithms considered
in real applications. First, the profiling result is always uncertain. Second, for a
time-lasting gender profiling algorithm, the result is not stable. The degree of
certainty usually varies, sometimes even to the extent that a male is classified
as a female, and vice versa. These facets are studied in a recent paper [16] using
Dempster-Shafer theory. In particular, Denoeux’s cautious rule is applied for
fusion mass functions through time lines. However, this paper points out that if
severe mis-classification is happened at the beginning of the time line, the result
of applying Denoeux’s rule could be disastrous. To remedy this weakness,
in this paper, we propose two generalizations to the DS approach proposed in
[16] that incorporates time-window and time-attenuation, respectively, in applying
Denoeux’s rule along with time lines, for which the DS approach is a special
case. Experiments show that these two generalizations do provide better results
than their predecessor when mis-classifications happen.
Resumo:
Belief revision is the process that incorporates, in a consistent way,
a new piece of information, called input, into a belief base. When both belief
bases and inputs are propositional formulas, a set of natural and rational properties, known as AGM postulates, have been proposed to define genuine revision operations. This paper addresses the following important issue : How to revise a partially pre-ordered information (representing initial beliefs) with a new partially pre-ordered information (representing inputs) while preserving AGM postulates? We first provide a particular representation of partial pre-orders (called units) using the concept of closed sets of units. Then we restate AGM postulates in this framework by defining counterparts of the notions of logical entailment and logical consistency. In the second part of the paper, we provide some examples of revision operations that respect our set of postulates. We also prove that our revision methods extend well-known lexicographic revision and natural revision for both cases where the input is either a single propositional formula or a total pre-order.