955 resultados para Boolean Computations
Resumo:
In this article, we analyse several discontinuous Galerkin (DG) methods for the Stokes problem under minimal regularity on the solution. We assume that the velocity u belongs to H-0(1)(Omega)](d) and the pressure p is an element of L-0(2)(Omega). First, we analyse standard DG methods assuming that the right-hand side f belongs to H-1(Omega) boolean AND L-1(Omega)](d). A DG method that is well defined for f belonging to H-1(Omega)](d) is then investigated. The methods under study include stabilized DG methods using equal-order spaces and inf-sup stable ones where the pressure space is one polynomial degree less than the velocity space.
Resumo:
In this paper, a numerical investigation is performed to study the mixed convective flow and heat transfer characteristics past a square cylinder in cross flow at incidence. Utilizing air (Pr = 0.71) as an operating fluid, computations are carried out at a representative Reynolds number (Re) of 100. Angles of incidences are varied as, 0 degrees <= alpha <= 45 degrees. Effect of superimposed positive and negative cross-flow buoyancy is brought about by varying the Richardson number (RI) in the range -1.0 <= Ri <= 1.0. The detail features of flow topology and heat transport are analyzed critically for different angles of incidences. The thermo fluidic forces acting on the cylinder during mixed convection are captured in terms of the drag (C-D), lift (C-L), and moment (C-M) coefficients. The results show that the lateral width of the cylinder wake reduces with increasing alpha and the isotherms spread out far wide. In the range 0 degrees < alpha < 45 degrees, C-D reduces with increasing Ri. The functional dependence of C-M with Ri reveals a linear relationship. Thermal boundary layer thickness reduces with increasing angle of incidences. The global rate of heat transfer from the cylinder increases with increasing alpha. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
H. 264/advanced video coding surveillance video encoders use the Skip mode specified by the standard to reduce bandwidth. They also use multiple frames as reference for motion-compensated prediction. In this paper, we propose two techniques to reduce the bandwidth and computational cost of static camera surveillance video encoders without affecting detection and recognition performance. A spatial sampler is proposed to sample pixels that are segmented using a Gaussian mixture model. Modified weight updates are derived for the parameters of the mixture model to reduce floating point computations. A storage pattern of the parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. The second contribution is a low computational cost algorithm to choose the reference frames. The proposed reference frame selection algorithm reduces the cost of coding uncovered background regions. We also study the number of reference frames required to achieve good coding efficiency. Distortion over foreground pixels is measured to quantify the performance of the proposed techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence.
Resumo:
In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Single fluid schemes that rely on an interface function for phase identification in multicomponent compressible flows are widely used to study hydrodynamic flow phenomena in several diverse applications. Simulations based on standard numerical implementation of these schemes suffer from an artificial increase in the width of the interface function owing to the numerical dissipation introduced by an upwind discretization of the governing equations. In addition, monotonicity requirements which ensure that the sharp interface function remains bounded at all times necessitate use of low-order accurate discretization strategies. This results in a significant reduction in accuracy along with a loss of intricate flow features. In this paper we develop a nonlinear transformation based interface capturing method which achieves superior accuracy without compromising the simplicity, computational efficiency and robustness of the original flow solver. A nonlinear map from the signed distance function to the sigmoid type interface function is used to effectively couple a standard single fluid shock and interface capturing scheme with a high-order accurate constrained level set reinitialization method in a way that allows for oscillation-free transport of the sharp material interface. Imposition of a maximum principle, which ensures that the multidimensional preconditioned interface capturing method does not produce new maxima or minima even in the extreme events of interface merger or breakup, allows for an explicit determination of the interface thickness in terms of the grid spacing. A narrow band method is formulated in order to localize computations pertinent to the preconditioned interface capturing method. Numerical tests in one dimension reveal a significant improvement in accuracy and convergence; in stark contrast to the conventional scheme, the proposed method retains its accuracy and convergence characteristics in a shifted reference frame. Results from the test cases in two dimensions show that the nonlinear transformation based interface capturing method outperforms both the conventional method and an interface capturing method without nonlinear transformation in resolving intricate flow features such as sheet jetting in the shock-induced cavity collapse. The ability of the proposed method in accounting for the gravitational and surface tension forces besides compressibility is demonstrated through a model fully three-dimensional problem concerning droplet splash and formation of a crownlike feature. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MATHFINDER, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MATHFINDER searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MATHFINDER as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MATHFINDER resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MATHFINDER retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MATHFINDER in a MapReduce framework and evaluate its scalability and response time.
Resumo:
Social insects provide an excellent platform to investigate flow of information in regulatory systems since their successful social organization is essentially achieved by effective information transfer through complex connectivity patterns among the colony members. Network representation of such behavioural interactions offers a powerful tool for structural as well as dynamical analysis of the underlying regulatory systems. In this paper, we focus on the dominance interaction networks in the tropical social wasp Ropalidia marginata-a species where behavioural observations indicate that such interactions are principally responsible for the transfer of information between individuals about their colony needs, resulting in a regulation of their own activities. Our research reveals that the dominance networks of R. marginata are structurally similar to a class of naturally evolved information processing networks, a fact confirmed also by the predominance of a specific substructure-the `feed-forward loop'-a key functional component in many other information transfer networks. The dynamical analysis through Boolean modelling confirms that the networks are sufficiently stable under small fluctuations and yet capable of more efficient information transfer compared to their randomized counterparts. Our results suggest the involvement of a common structural design principle in different biological regulatory systems and a possible similarity with respect to the effect of selection on the organization levels of such systems. The findings are also consistent with the hypothesis that dominance behaviour has been shaped by natural selection to co-opt the information transfer process in such social insect species, in addition to its primal function of mediation of reproductive competition in the colony.
Resumo:
Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.
Resumo:
For a domain Omega in C and an operator T in B-n(Omega), Cowen and Douglas construct a Hermitian holomorphic vector bundle E-T over Omega corresponding to T. The Hermitian holomorphic vector bundle E-T is obtained as a pull-back of the tautological bundle S(n, H) defined over by Gr(n, H) a nondegenerate holomorphic map z bar right arrow ker(T - z), z is an element of Omega. To find the answer to the converse, Cowen and Douglas studied the jet bundle in their foundational paper. The computations in this paper for the curvature of the jet bundle are rather intricate. They have given a set of invariants to determine if two rank n Hermitian holomorphic vector bundle are equivalent. These invariants are complicated and not easy to compute. It is natural to expect that the equivalence of Hermitian holomorphic jet bundles should be easier to characterize. In fact, in the case of the Hermitian holomorphic jet bundle J(k)(L-f), we have shown that the curvature of the line bundle L-f completely determines the class of J(k)(L-f). In case of rank Hermitian holomorphic vector bundle E-f, We have calculated the curvature of jet bundle J(k)(E-f) and also obtained a trace formula for jet bundle J(k)(E-f).
Resumo:
Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.
Resumo:
In this paper, we study codes with locality that can recover from two erasures via a sequence of two local, parity-check computations. By a local parity-check computation, we mean recovery via a single parity-check equation associated with small Hamming weight. Earlier approaches considered recovery in parallel; the sequential approach allows us to potentially construct codes with improved minimum distance. These codes, which we refer to as locally 2-reconstructible codes, are a natural generalization along one direction, of codes with all-symbol locality introduced by Gopalan et al, in which recovery from a single erasure is considered. By studying the generalized Hamming weights of the dual code, we derive upper bounds on the minimum distance of locally 2-reconstructible codes and provide constructions for a family of codes based on Turan graphs, that are optimal with respect to this bound. The minimum distance bound derived here is universal in the sense that no code which permits all-symbol local recovery from 2 erasures can have larger minimum distance regardless of approach adopted. Our approach also leads to a new bound on the minimum distance of codes with all-symbol locality for the single-erasure case.
Resumo:
The entropy generation due to mixed convective heat transfer of nanofluids past a rotating circular cylinder placed in a uniform cross stream is investigated via streamline upwind Petrov-Galerkin based finite element method. Nanosized copper (Cu) particles suspended in water are used with Prandtl number (Pr)=6.9. The computations are carried out at a representative Reynolds number (Re) of 100. The dimensionless cylinder rotation rate, a, is varied between 0 and 2. The range of nanoparticle volume fractions (phi) considered is 0 <= phi <= 5%. Effect of aiding buoyancy is brought about by considering two fixed values of the Richardson number (Ri) as 0.5 and 1.0. A new model for predicting the effective viscosity and thermal conductivity of dilute suspensions of nanoscale colloidal particles is presented. The model addresses the details of the agglomeration-deagglomeration in tune with the pertinent variations in the effective particulate dimensions, volume fractions, as well as the aggregate structure of the particulate system. The total entropy generation is found to decrease sharply with cylinder rotation rates and nanoparticle volume fractions. Increase in nanoparticle agglomeration shows decrease in heat transfer irreversibility. The Bejan number falls sharply with increase in alpha and phi.
Resumo:
3-Dimensional Diffuse Optical Tomographic (3-D DOT) image reconstruction algorithm is computationally complex and requires excessive matrix computations and thus hampers reconstruction in real time. In this paper, we present near real time 3D DOT image reconstruction that is based on Broyden approach for updating Jacobian matrix. The Broyden method simplifies the algorithm by avoiding re-computation of the Jacobian matrix in each iteration. We have developed CPU and heterogeneous CPU/GPU code for 3D DOT image reconstruction in C and MatLab programming platform. We have used Compute Unified Device Architecture (CUDA) programming framework and CUDA linear algebra library (CULA) to utilize the massively parallel computational power of GPUs (NVIDIA Tesla K20c). The computation time achieved for C program based implementation for a CPU/GPU system for 3 planes measurement and FEM mesh size of 19172 tetrahedral elements is 806 milliseconds for an iteration.
Resumo:
A block-structured adaptive mesh refinement (AMR) technique has been used to obtain numerical solutions for many scientific applications. Some block-structured AMR approaches have focused on forming patches of non-uniform sizes where the size of a patch can be tuned to the geometry of a region of interest. In this paper, we develop strategies for adaptive execution of block-structured AMR applications on GPUs, for hyperbolic directionally split solvers. While effective hybrid execution strategies exist for applications with uniform patches, our work considers efficient execution of non-uniform patches with different workloads. Our techniques include bin-packing work units to load balance GPU computations, adaptive asynchronism between CPU and GPU executions using a knapsack formulation, and scheduling communications for multi-GPU executions. Our experiments with synthetic and real data, for single-GPU and multi-GPU executions, on Tesla S1070 and Fermi C2070 clusters, show that our strategies result in up to a 3.23 speedup in performance over existing strategies.
Resumo:
We investigated the nature of the cohesive energy between graphane sheets via multiple CH center dot center dot center dot HC interactions, using density functional theory (DFT) including dispersion correction (Grimmes D3 approach) computations of n]graphane sigma dimers (n = 6-73). For comparison, we also evaluated the binding between graphene sheets that display prototypical pi/pi interactions. The results were analyzed using the block-localized wave function (BLW) method, which is a variant of ab initio valence bond (VB) theory. BLW interprets the intermolecular interactions in terms of frozen interaction energy (Delta E-F) composed of electrostatic and Pauli repulsion interactions, polarization (Delta E-pol), charge-transfer interaction (Delta E-CT), and dispersion effects (Delta E-disp). The BLW analysis reveals that the cohesive energy between graphane sheets is dominated by two stabilizing effects, namely intermolecular London dispersion and two-way charge transfer energy due to the sigma CH -> sigma*(HC) interactions. The shift of the electron density around the nonpolar covalent C-H bonds involved in the intermolecular interaction decreases the C-H bond lengths uniformly by 0.001 angstrom. The Delta E-CT term, which accounts for similar to 15% of the total binding energy, results in the accumulation of electron density in the interface area between two layers. This accumulated electron density thus acts as an electronic glue for the graphane layers and constitutes an important driving force in the self-association and stability of graphane under ambient conditions. Similarly, the double faced adhesive tape style of charge transfer interactions was also observed among graphene sheets in which it accounts for similar to 18% of the total binding energy. The binding energy between graphane sheets is additive and can be expressed as a sum of CH center dot center dot center dot HC interactions, or as a function of the number of C-H bonds.