955 resultados para Boolean Computations
Resumo:
In this paper we present a design methodology for algorithm/architecture co-design of a voltage-scalable, process variation aware motion estimator based on significance driven computation. The fundamental premise of our approach lies in the fact that all computations are not equally significant in shaping the output response of video systems. We use a statistical technique to intelligently identify these significant/not-so-significant computations at the algorithmic level and subsequently change the underlying architecture such that the significant computations are computed in an error free manner under voltage over-scaling. Furthermore, our design includes an adaptive quality compensation (AQC) block which "tunes" the algorithm and architecture depending on the magnitude of voltage over-scaling and severity of process variations. Simulation results show average power savings of similar to 33% for the proposed architecture when compared to conventional implementation in the 90 nm CMOS technology. The maximum output quality loss in terms of Peak Signal to Noise Ratio (PSNR) was similar to 1 dB without incurring any throughput penalty.
Resumo:
In this paper, a low complexity system for spectral analysis of heart rate variability (HRV) is presented. The main idea of the proposed approach is the implementation of the Fast-Lomb periodogram that is a ubiquitous tool in spectral analysis, using a wavelet based Fast Fourier transform. Interestingly we show that the proposed approach enables the classification of processed data into more and less significant based on their contribution to output quality. Based on such a classification a percentage of less-significant data is being pruned leading to a significant reduction of algorithmic complexity with minimal quality degradation. Indeed, our results indicate that the proposed system can achieve up-to 45% reduction in number of computations with only 4.9% average error in the output quality compared to a conventional FFT based HRV system.
Resumo:
In this paper, we propose a system level design approach considering voltage over-scaling (VOS) that achieves error resiliency using unequal error protection of different computation elements, while incurring minor quality degradation. Depending on user specifications and severity of process variations/channel noise, the degree of VOS in each block of the system is adaptively tuned to ensure minimum system power while providing "just-the-right" amount of quality and robustness. This is achieved, by taking into consideration block level interactions and ensuring that under any change of operating conditions, only the "less-crucial" computations, that contribute less to block/system output quality, are affected. The proposed approach applies unequal error protection to various blocks of a system-logic and memory-and spans multiple layers of design hierarchy-algorithm, architecture and circuit. The design methodology when applied to a multimedia subsystem shows large power benefits ( up to 69% improvement in power consumption) at reasonable image quality while tolerating errors introduced due to VOS, process variations, and channel noise.
Resumo:
We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.
Resumo:
When implementing autonomic management of multiple non-functional concerns a trade-off must be found between the ability to develop independently management of the individual concerns (following the separation of concerns principle) and the detection and resolution of conflicts that may arise when combining the independently developed management code. Here we discuss strategies to establish this trade-off and introduce a model checking based methodology aimed at simplifying the discovery and handling of conflicts arising from deployment-within the same parallel application-of independently developed management policies. Preliminary results are shown demonstrating the feasibility of the approach.
Resumo:
Methods are presented for developing synthesisable FFT cores. These are based on a modular approach in which parameterisable blocks are cascaded to implement the computations required across a range of typical FFT signal flow graphs. The underlying architectural approach combines the use of a digital serial data organisation with generic commutator blocks to produce systems that offer 100% processor utilisation with storage requirements less than previous designs. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with manual methods but with significant reductions in design times.
Resumo:
We show that, if M is a subspace lattice with the property that the rank one subspace of its operator algebra is weak* dense, L is a commutative subspace lattice and P is the lattice of all projections on a separable Hilbert space, then L⊗M⊗P is reflexive. If M is moreover an atomic Boolean subspace lattice while L is any subspace lattice, we provide a concrete lattice theoretic description of L⊗M in terms of projection valued functions defined on the set of atoms of M . As a consequence, we show that the Lattice Tensor Product Formula holds for AlgM and any other reflexive operator algebra and give several further corollaries of these results.
Resumo:
Approximate execution is a viable technique for energy-con\-strained environments, provided that applications have the mechanisms to produce outputs of the highest possible quality within the given energy budget.
We introduce a framework for energy-constrained execution with controlled and graceful quality loss. A simple programming model allows users to express the relative importance of computations for the quality of the end result, as well as minimum quality requirements. The significance-aware runtime system uses an application-specific analytical energy model to identify the degree of concurrency and approximation that maximizes quality while meeting user-specified energy constraints. Evaluation on a dual-socket 8-core server shows that the proposed
framework predicts the optimal configuration with high accuracy, enabling energy-constrained executions that result in significantly higher quality compared to loop perforation, a compiler approximation technique.
Resumo:
Fully Homomorphic Encryption (FHE) is a recently developed cryptographic technique which allows computations on encrypted data. There are many interesting applications for this encryption method, especially within cloud computing. However, the computational complexity is such that it is not yet practical for real-time applications. This work proposes optimised hardware architectures of the encryption step of an integer-based FHE scheme with the aim of improving its practicality. A low-area design and a high-speed parallel design are proposed and implemented on a Xilinx Virtex-7 FPGA, targeting the available DSP slices, which offer high-speed multiplication and accumulation. Both use the Comba multiplication scheduling method to manage the large multiplications required with uneven sized multiplicands and to minimise the number of read and write operations to RAM. Results show that speed up factors of 3.6 and 10.4 can be achieved for the encryption step with medium-sized security parameters for the low-area and parallel designs respectively, compared to the benchmark software implementation on an Intel Core2 Duo E8400 platform running at 3 GHz.
Resumo:
The end of Dennard scaling has pushed power consumption into a first order concern for current systems, on par with performance. As a result, near-threshold voltage computing (NTVC) has been proposed as a potential means to tackle the limited cooling capacity of CMOS technology. Hardware operating in NTV consumes significantly less power, at the cost of lower frequency, and thus reduced performance, as well as increased error rates. In this paper, we investigate if a low-power systems-on-chip, consisting of ARM's asymmetric big.LITTLE technology, can be an alternative to conventional high performance multicore processors in terms of power/energy in an unreliable scenario. For our study, we use the Conjugate Gradient solver, an algorithm representative of the computations performed by a large range of scientific and engineering codes.
Resumo:
Suitably functionalised carboxylic acids undergo a previously unknown photoredox reaction when irradiated with UVA in the presence of maleimide. Maleimide was found to synergistically act as a radical generating photoxidant and as a radical acceptor, negating the need for an extrinsic photoredox catalyst. Modest to excellent yields of the product chromenopyrroledione, thiochromenopyrroledione and pyrroloquinolinedione derivatives were obtained in thirteen preparative photolyses. In situ NMR spectroscopy was used to study each reaction. Reactant decay and product build-up were monitored, enabling reaction profiles to be plotted. A plausible mechanism, whereby photo-excited maleimide acts as an oxidant to generate a radical ion pair, has been postulated and is supported by UV/Vis. spectroscopy and DFT computations. The radical-cation reactive intermediates were also characterised in solution by EPR spectroscopy.
Resumo:
A number of neural networks can be formulated as the linear-in-the-parameters models. Training such networks can be transformed to a model selection problem where a compact model is selected from all the candidates using subset selection algorithms. Forward selection methods are popular fast subset selection approaches. However, they may only produce suboptimal models and can be trapped into a local minimum. More recently, a two-stage fast recursive algorithm (TSFRA) combining forward selection and backward model refinement has been proposed to improve the compactness and generalization performance of the model. This paper proposes unified two-stage orthogonal least squares methods instead of the fast recursive-based methods. In contrast to the TSFRA, this paper derives a new simplified relationship between the forward and the backward stages to avoid repetitive computations using the inherent orthogonal properties of the least squares methods. Furthermore, a new term exchanging scheme for backward model refinement is introduced to reduce computational demand. Finally, given the error reduction ratio criterion, effective and efficient forward and backward subset selection procedures are proposed. Extensive examples are presented to demonstrate the improved model compactness constructed by the proposed technique in comparison with some popular methods.
Resumo:
We show that the set of Schur idempotents with hyperreflexive range is a Boolean lattice which contains all contractions. We establish a preservation result for sums which implies that the weak* closed span of a hyperreflexive and a ternary masa-bimodule is hyperreflexive, and prove that the weak* closed span of finitely many tensor products of a hyperreflexive space and a hyperreflexive range of a Schur idempotent (respectively, a ternary masa-bimodule) is hyperreflexive.
Resumo:
Molecular information gathering and processing – a young field of applied chemistry - is undergoing good growth. The progress is occurring both in terms of conceptual development and in terms of the strengthening of older concepts with new examples. This review critically surveys these two broad avenues. We consider some cases where molecules emulate one of the building blocks of electronic logic gates. We then examine molecular emulation of various Boolean logic gates carrying one, two or three inputs. Some single-input gates are popular information gathering devices. Special systems, such as ‘lab-on-a-molecule’ and molecular keypad locks, also receive attention. A situation deviating from the Boolean blueprint is also discussed. Some pointers are offered for maintaining the upward curve of the field.
Resumo:
An adhesive elasto-plastic contact model for the discrete element method with three dimensional non-spherical particles is proposed and investigated to achieve quantitative prediction of cohesive powder flowability. Simulations have been performed for uniaxial consolidation followed by unconfined compression to failure using this model. The model has been shown to be capable of predicting the experimental flow function (unconfined compressive strength vs. the prior consolidation stress) for a limestone powder which has been selected as a reference solid in the Europe wide PARDEM research network. Contact plasticity in the model is shown to affect the flowability significantly and is thus essential for producing satisfactory computations of the behaviour of a cohesive granular material. The model predicts a linear relationship between a normalized unconfined compressive strength and the product of coordination number and solid fraction. This linear relationship is in line with the Rumpf model for the tensile strength of particulate agglomerate. Even when the contact adhesion is forced to remain constant, the increasing unconfined strength arising from stress consolidation is still predicted, which has its origin in the contact plasticity leading to microstructural evolution of the coordination number. The filled porosity is predicted to increase as the contact adhesion increases. Under confined compression, the porosity reduces more gradually for the load-dependent adhesion compared to constant adhesion. It was found that the contribution of adhesive force to the limiting friction has a significant effect on the bulk unconfined strength. The results provide new insights and propose a micromechanical based measure for characterising the strength and flowability of cohesive granular materials.