867 resultados para Fine-grained microstructure
Resumo:
We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer.
Resumo:
Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.
Resumo:
Hunt, C. Elrishi, H. Gilbertson, D. Grattan, J. McLaren, S. Pyatt, B. Rushworth, G. Barker, G. Early-Holocene environments in the Wadi Faynan, Jordan. The Holocene. 2004. 14,6 pp 921-930
Resumo:
Extensible systems allow services to be configured and deployed for the specific needs of individual applications. This paper describes a safe and efficient method for user-level extensibility that requires only minimal changes to the kernel. A sandboxing technique is described that supports multiple logical protection domains within the same address space at user-level. This approach allows applications to register sandboxed code with the system, that may be executed in the context of any process. Our approach differs from other implementations that require special hardware support, such as segmentation or tagged translation look-aside buffers (TLBs), to either implement multiple protection domains in a single address space, or to support fast switching between address spaces. Likewise, we do not require the entire system to be written in a type-safe language, to provide fine-grained protection domains. Instead, our user-level sandboxing technique requires only paged-based virtual memory support, and the requirement that extension code is written either in a type-safe language, or by a trusted source. Using a fast method of upcalls, we show how our sandboxing technique for implementing logical protection domains provides significant performance improvements over traditional methods of invoking user-level services. Experimental results show our approach to be an efficient method for extensibility, with inter-protection domain communication costs close to those of hardware-based solutions leveraging segmentation.
Resumo:
The CIL compiler for core Standard ML compiles whole programs using a novel typed intermediate language (TIL) with intersection and union types and flow labels on both terms and types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce customized data representations. However, duplication incurs compile-time space costs that are potentially much greater than are incurred in TILs employing type-level abstraction or quantification. In this paper, we present empirical data on the compile-time space costs of using CIL as an intermediate language. The data shows that these costs can be made tractable by using sufficiently fine-grained flow analyses together with standard hash-consing techniques. The data also suggests that non-duplicating formulations of intersection (and union) types would not achieve significantly better space complexity.
Resumo:
Growing interest in inference and prediction of network characteristics is justified by its importance for a variety of network-aware applications. One widely adopted strategy to characterize network conditions relies on active, end-to-end probing of the network. Active end-to-end probing techniques differ in (1) the structural composition of the probes they use (e.g., number and size of packets, the destination of various packets, the protocols used, etc.), (2) the entity making the measurements (e.g. sender vs. receiver), and (3) the techniques used to combine measurements in order to infer specific metrics of interest. In this paper, we present Periscope: a Linux API that enables the definition of new probing structures and inference techniques from user space through a flexible interface. PeriScope requires no support from clients beyond the ability to respond to ICMP ECHO REQUESTs and is designed to minimize user/kernel crossings and to ensure various constraints (e.g., back-to-back packet transmissions, fine-grained timing measurements) We show how to use Periscope for two different probing purposes, namely the measurement of shared packet losses between pairs of endpoints and for the measurement of subpath bandwidth. Results from Internet experiments for both of these goals are also presented.
Resumo:
Calligraphic writing presents a rich set of challenges to the human movement control system. These challenges include: initial learning, and recall from memory, of prescribed stroke sequences; critical timing of stroke onsets and durations; fine control of grip and contact forces; and letter-form invariance under voluntary size scaling, which entails fine control of stroke direction and amplitude during recruitment and derecruitment of musculoskeletal degrees of freedom. Experimental and computational studies in behavioral neuroscience have made rapid progress toward explaining the learning, planning and contTOl exercised in tasks that share features with calligraphic writing and drawing. This article summarizes computational neuroscience models and related neurobiological data that reveal critical operations spanning from parallel sequence representations to fine force control. Part one addresses stroke sequencing. It treats competitive queuing (CQ) models of sequence representation, performance, learning, and recall. Part two addresses letter size scaling and motor equivalence. It treats cursive handwriting models together with models in which sensory-motor tmnsformations are performed by circuits that learn inverse differential kinematic mappings. Part three addresses fine-grained control of timing and transient forces, by treating circuit models that learn to solve inverse dynamics problems.
Resumo:
Much work has been done on learning from failure in search to boost solving of combinatorial problems, such as clause-learning and clause-weighting in boolean satisfiability (SAT), nogood and explanation-based learning, and constraint weighting in constraint satisfaction problems (CSPs). Many of the top solvers in SAT use clause learning to good effect. A similar approach (nogood learning) has not had as large an impact in CSPs. Constraint weighting is a less fine-grained approach where the information learnt gives an approximation as to which variables may be the sources of greatest contention. In this work we present two methods for learning from search using restarts, in order to identify these critical variables prior to solving. Both methods are based on the conflict-directed heuristic (weighted-degree heuristic) introduced by Boussemart et al. and are aimed at producing a better-informed version of the heuristic by gathering information through restarting and probing of the search space prior to solving, while minimizing the overhead of these restarts. We further examine the impact of different sampling strategies and different measurements of contention, and assess different restarting strategies for the heuristic. Finally, two applications for constraint weighting are considered in detail: dynamic constraint satisfaction problems and unary resource scheduling problems.
Resumo:
FUELCON is an expert system for optimized refueling design in nuclear engineering. This task is crucial for keeping down operating costs at a plant without compromising safety. FUELCON proposes sets of alternative configurations of allocation of fuel assemblies that are each positioned in the planar grid of a horizontal section of a reactor core. Results are simulated, and an expert user can also use FUELCON to revise rulesets and improve on his or her heuristics. The successful completion of FUELCON led this research team into undertaking a panoply of sequel projects, of which we provide a meta-architectural comparative formal discussion. In this paper, we demonstrate a novel adaptive technique that learns the optimal allocation heuristic for the various cores. The algorithm is a hybrid of a fine-grained neural network and symbolic computation components. This hybrid architecture is sensitive enough to learn the particular characteristics of the ‘in-core fuel management problem’ at hand, and is powerful enough to use this information fully to automatically revise heuristics, thus improving upon those provided by a human expert.
Resumo:
The solution process for diffusion problems usually involves the time development separately from the space solution. A finite difference algorithm in time requires a sequential time development in which all previous values must be determined prior to the current value. The Stehfest Laplace transform algorithm, however, allows time solutions without the knowledge of prior values. It is of interest to be able to develop a time-domain decomposition suitable for implementation in a parallel environment. One such possibility is to use the Laplace transform to develop coarse-grained solutions which act as the initial values for a set of fine-grained solutions. The independence of the Laplace transform solutions means that we do indeed have a time-domain decomposition process. Any suitable time solver can be used for the fine-grained solution. To illustrate the technique we shall use an Euler solver in time together with the dual reciprocity boundary element method for the space solution
Resumo:
This paper describes a methodology for deploying flexible dynamic configuration into embedded systems whilst preserving the reliability advantages of static systems. The methodology is based on the concept of decision points (DP) which are strategically placed to achieve fine-grained distribution of self-management logic to meet application-specific requirements. DP logic can be changed easily, and independently of the host component, enabling self-management behavior to be deferred beyond the point of system deployment. A transparent Dynamic Wrapper mechanism (DW) automatically detects and handles problems arising from the evaluation of self-management logic within each DP and ensures that the dynamic aspects of the system collapse down to statically defined default behavior to ensure safety and correctness despite failures. Dynamic context management contributes to flexibility, and removes the need for design-time binding of context providers and consumers, thus facilitating run-time composition and incremental component upgrade.
Resumo:
High-speed field-programmable gate array (FPGA) implementations of an adaptive least mean square (LMS) filter with application in an electronic support measures (ESM) digital receiver, are presented. They employ "fine-grained" pipelining, i.e., pipelining within the processor and result in an increased output latency when used in the LMS recursive system. Therefore, the major challenge is to maintain a low latency output whilst increasing the pipeline stage in the filter for higher speeds. Using the delayed LMS (DLMS) algorithm, fine-grained pipelined FPGA implementations using both the direct form (DF) and the transposed form (TF) are considered and compared. It is shown that the direct form LMS filter utilizes the FPGA resources more efficiently thereby allowing a 120 MHz sampling rate.
Resumo:
In the absence of a firm link between individual meteorites and their asteroidal parent bodies, asteroids are typically characterized only by their light reflection properties, and grouped accordingly into classes. On 6 October 2008, a small asteroid was discovered with a flat reflectance spectrum in the 554-995nm wavelength range, and designated 2008 TC3 (refs 4-6). It subsequently hit the Earth. Because it exploded at 37km altitude, no macroscopic fragments were expected to survive. Here we report that a dedicated search along the approach trajectory recovered 47 meteorites, fragments of a single body named Almahata Sitta, with a total mass of 3.95kg. Analysis of one of these meteorites shows it to be an achondrite, a polymict ureilite, anomalous in its class: ultra-fine-grained and porous, with large carbonaceous grains. The combined asteroid and meteorite reflectance spectra identify the asteroid as F class, now firmly linked to dark carbon-rich anomalous ureilites, a material so fragile it was not previously represented in meteorite collections.
Resumo:
Many genetic studies have demonstrated an association between the 7-repeat (7r) allele of a 48-base pair variable number of tandem repeats (VNTR) in exon 3 of the DRD4 gene and the phenotype of attention deficit hyperactivity disorder (ADHD). Previous studies have shown inconsistent associations between the 7r allele and neurocognitive performance in children with ADHD. We investigated the performance of 128 children with and without ADHD on the Fixed and Random versions of the Sustained Attention to Response Task (SART). We employed timeseries analyses of reaction-time data to allow a fine-grained analysis of reaction time variability, a candidate endophenotype for ADHD. Children were grouped into either the 7r-present group (possessing at least one copy of the 7r allele) or the 7r-absent group. The ADHD group made significantly more commission errors and was significantly more variable in RT in terms of fast moment-to-moment variability than the control group, but no effect of genotype was found on these measures. Children with ADHD without the 7r allele made significantly more omission errors, were significantly more variable in the slow frequency domain and showed less sensitivity to the signal (d') than those children with ADHD the 7r and control children with or without the 7r. These results highlight the utility of time-series analyses of reaction time data for delineating the neuropsychological deficits associated with ADHD and the DRD4 VNTR. Absence of the 7-repeat allele in children with ADHD is associated with a neurocognitive profile of drifting sustained attention that gives rise to variable and inconsistent performance. (c) 2008 Wiley-Liss, Inc.
Resumo:
The research reported here is based on the standard laboratory experiments routinely performed in order to measure various geotechnical parameters. These experiments require consolidation of fine-grained samples in triaxial or stress path apparatus. The time required for the consolidation is dependent on the permeability of the soil and the length of the drainage path. The consolidation time is often of the order of several weeks in large clay-dominated samples. Long testing periods can be problematic, as they can delay decisions on design and construction methods. Acceleration of the consolidation process would require a reduction in effective drainage length and this is usually achieved by placing filter drains around the sample. The purpose of the research reported in this paper is to assess if these filter drains work effectively and, if not, to determine what modifications to the filter drains are needed. The findings have shown that use of a double filter reduces the consolidation time several fold.