964 resultados para Computer input-outpus equipment.
Resumo:
CAELinux is a Linux distribution which is bundled with free software packages related to Computer Aided Engineering (CAE). The free software packages include software that can build a three dimensional solid model, programs that can mesh a geometry, software for carrying out Finite Element Analysis (FEA), programs that can carry out image processing etc. Present work has two goals: 1) To give a brief description of CAELinux 2) To demonstrate that CAELinux could be useful for Computer Aided Engineering, using an example of the three dimensional reconstruction of a pig liver from a stack of CT-scan images. One can note that instead of using CAELinux, using commercial software for reconstructing the liver would cost a lot of money. One can also note that CAELinux is a free and open source operating system and all software packages that are included in the operating system are also free. Hence one can conclude that CAELinux could be a very useful tool in application areas like surgical simulation which require three dimensional reconstructions of biological organs. Also, one can see that CAELinux could be a very useful tool for Computer Aided Engineering, in general.
Resumo:
We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R-d, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle. The aggregates we consider in this paper include COUNT, sum, and MAX. First, we develop a structure for answering two-dimensional range-COUNT queries that uses O(N/B) disk blocks and answers a query in O(log(B) N) I/Os, where N is the number of input points and B is the disk block size. The structure can be extended to obtain a near-linear-size structure for answering range-sum queries using O(log(B) N) I/Os, and a linear-size structure for answering range-MAX queries in O(log(B)(2) N) I/Os. Our structures can be made dynamic and extended to higher dimensions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The Reeb graph of a scalar function tracks the evolution of the topology of its level sets. This paper describes a fast algorithm to compute the Reeb graph of a piecewise-linear (PL) function defined over manifolds and non-manifolds. The key idea in the proposed approach is to maximally leverage the efficient contour tree algorithm to compute the Reeb graph. The algorithm proceeds by dividing the input into a set of subvolumes that have loop-free Reeb graphs using the join tree of the scalar function and computes the Reeb graph by combining the contour trees of all the subvolumes. Since the key ingredient of this method is a series of union-find operations, the algorithm is fast in practice. Experimental results demonstrate that it outperforms current generic algorithms by a factor of up to two orders of magnitude, and has a performance on par with algorithms that are catered to restricted classes of input. The algorithm also extends to handle large data that do not fit in memory.
Resumo:
A unit cube in (or a k-cube in short) is defined as the Cartesian product R (1) x R (2) x ... x R (k) where R (i) (for 1 a parts per thousand currency sign i a parts per thousand currency sign k) is a closed interval of the form a (i) , a (i) + 1] on the real line. A k-cube representation of a graph G is a mapping of the vertices of G to k-cubes such that two vertices in G are adjacent if and only if their corresponding k-cubes have a non-empty intersection. The cubicity of G is the minimum k such that G has a k-cube representation. From a geometric embedding point of view, a k-cube representation of G = (V, E) yields an embedding such that for any two vertices u and v, ||f(u) - f(v)||(a) a parts per thousand currency sign 1 if and only if . We first present a randomized algorithm that constructs the cube representation of any graph on n vertices with maximum degree Delta in O(Delta ln n) dimensions. This algorithm is then derandomized to obtain a polynomial time deterministic algorithm that also produces the cube representation of the input graph in the same number of dimensions. The bandwidth ordering of the graph is studied next and it is shown that our algorithm can be improved to produce a cube representation of the input graph G in O(Delta ln b) dimensions, where b is the bandwidth of G, given a bandwidth ordering of G. Note that b a parts per thousand currency sign n and b is much smaller than n for many well-known graph classes. Another upper bound of b + 1 on the cubicity of any graph with bandwidth b is also shown. Together, these results imply that for any graph G with maximum degree Delta and bandwidth b, the cubicity is O(min{b, Delta ln b}). The upper bound of b + 1 is used to derive upper bounds for the cubicity of circular-arc graphs, cocomparability graphs and AT-free graphs in terms of the maximum degree Delta.
Resumo:
Most of the existing WCET estimation methods directly estimate execution time, ET, in cycles. We propose to study ET as a product of two factors, ET = IC * CPI, where IC is instruction count and CPI is cycles per instruction. Considering directly the estimation of ET may lead to a highly pessimistic estimate since implicitly these methods may be using worst case IC and worst case CPI. We hypothesize that there exists a functional relationship between CPI and IC such that CPI=f(IC). This is ascertained by computing the covariance matrix and studying the scatter plots of CPI versus IC. IC and CPI values are obtained by running benchmarks with a large number of inputs using the cycle accurate architectural simulator, Simplescalar on two different architectures. It is shown that the benchmarks can be grouped into different classes based on the CPI versus IC relationship. For some benchmarks like FFT, FIR etc., both IC and CPI are almost a constant irrespective of the input. There are other benchmarks that exhibit a direct or an inverse relationship between CPI and IC. In such a case, one can predict CPI for a given IC as CPI=f(IC). We derive the theoretical worst case IC for a program, denoted as SWIC, using integer linear programming(ILP) and estimate WCET as SWIC*f(SWIC). However, if CPI decreases sharply with IC then measured maximum cycles is observed to be a better estimate. For certain other benchmarks, it is observed that the CPI versus IC relationship is either random or CPI remains constant with varying IC. In such cases, WCET is estimated as the product of SWIC and measured maximum CPI. It is observed that use of the proposed method results in tighter WCET estimates than Chronos, a static WCET analyzer, for most benchmarks for the two architectures considered in this paper.
Resumo:
In this paper, we investigate the achievable rate region of Gaussian multiple access channels (MAC) with finite input alphabet and quantized output. With finite input alphabet and an unquantized receiver, the two-user Gaussian MAC rate region was studied. In most high throughput communication systems based on digital signal processing, the analog received signal is quantized using a low precision quantizer. In this paper, we first derive the expressions for the achievable rate region of a two-user Gaussian MAC with finite input alphabet and quantized output. We show that, with finite input alphabet, the achievable rate region with the commonly used uniform receiver quantizer has a significant loss in the rate region compared. It is observed that this degradation is due to the fact that the received analog signal is densely distributed around the origin, and is therefore not efficiently quantized with a uniform quantizer which has equally spaced quantization intervals. It is also observed that the density of the received analog signal around the origin increases with increasing number of users. Hence, the loss in the achievable rate region due to uniform receiver quantization is expected to increase with increasing number of users. We, therefore, propose a novel non-uniform quantizer with finely spaced quantization intervals near the origin. For a two-user Gaussian MAC with a given finite input alphabet and low precision receiver quantization, we show that the proposed non-uniform quantizer has a significantly larger rate region compared to what is achieved with a uniform quantizer.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Constellation Constrained (CC) capacity regions of two-user Gaussian Multiple Access Channels (GMAC) have been recently reported, wherein an appropriate angle of rotation between the constellations of the two users is shown to enlarge the CC capacity region. We refer to such a scheme as the Constellation Rotation (CR) scheme. In this paper, we propose a novel scheme called the Constellation Power Allocation (CPA) scheme, wherein the instantaneous transmit power of the two users are varied by maintaining their average power constraints. We show that the CPA scheme offers CC sum capacities equal (at low SNR values) or close (at high SNR values) to those offered by the CR scheme with reduced decoding complexity for QAM constellations. We study the robustness of the CPA scheme for random phase offsets in the channel and unequal average power constraints for the two users. With random phase offsets in the channel, we show that the CC sum capacity offered by the CPA scheme is more than the CR scheme at high SNR values. With unequal average power constraints, we show that the CPA scheme provides maximum gain when the power levels are close, and the advantage diminishes with the increase in the power difference.
Resumo:
Fast and efficient channel estimation is key to achieving high data rate performance in mobile and vehicular communication systems, where the channel is fast time-varying. To this end, this work proposes and optimizes channel-dependent training schemes for reciprocal Multiple-Input Multiple-Output (MIMO) channels with beamforming (BF) at the transmitter and receiver. First, assuming that Channel State Information (CSI) is available at the receiver, a channel-dependent Reverse Channel Training (RCT) signal is proposed that enables efficient estimation of the BF vector at the transmitter with a minimum training duration of only one symbol. In contrast, conventional orthogonal training requires a minimum training duration equal to the number of receive antennas. A tight approximation to the capacity lower bound on the system is derived, which is used as a performance metric to optimize the parameters of the RCT. Next, assuming that CSI is available at the transmitter, a channel-dependent forward-link training signal is proposed and its power and duration are optimized with respect to an approximate capacity lower bound. Monte Carlo simulations illustrate the significant performance improvement offered by the proposed channel-dependent training schemes over the existing channel-agnostic orthogonal training schemes.
Resumo:
In this letter, we compute the secrecy rate of decode-and-forward (DF) relay beamforming with finite input alphabet of size M. Source and relays operate under a total power constraint. First, we observe that the secrecy rate with finite-alphabet input can go to zero as the total power increases, when we use the source power and the relay weights obtained assuming Gaussian input. This is because the capacity of an eavesdropper can approach the finite-alphabet capacity of 1/2 log(2) M with increasing total power, due to the inability to completely null in the direction of the eavesdropper. We then propose a transmit power control scheme where the optimum source power and relay weights are obtained by carrying out transmit power (source power plus relay power) control on DF with Gaussian input using semi-definite programming, and then obtaining the corresponding source power and relay weights which maximize the secrecy rate for DF with finite-alphabet input. The proposed power control scheme is shown to achieve increasing secrecy rates with increasing total power with a saturation behavior at high total powers.
Resumo:
The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.
Resumo:
Electrical switching studies on amorphous Si15Te74Ge11 thin film devices show interesting changes in the switching behavior with changes in the input energy supplied; the input energy determines the extent of crystallization in the active volume, which is reflected in the value of SET resistances. This in turn, determines the trend exhibited by switching voltage (V-t) for different input conditions. The results obtained are analyzed on the basis of the amount of Joule heat generated, which determines the temperature of the active volume. Depending on the final temperature, devices are rendered either in the intermediate state with a resistance of 5*10(2) Omega or the ON state with a resistance of 5*10(1) Omega. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
The boxicity (cubicity) of a graph G, denoted by box(G) (respectively cub(G)), is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (cubes) in ℝ k . The problem of computing boxicity (cubicity) is known to be inapproximable in polynomial time even for graph classes like bipartite, co-bipartite and split graphs, within an O(n 0.5 − ε ) factor for any ε > 0, unless NP = ZPP. We prove that if a graph G on n vertices has a clique on n − k vertices, then box(G) can be computed in time n22O(k2logk) . Using this fact, various FPT approximation algorithms for boxicity are derived. The parameter used is the vertex (or edge) edit distance of the input graph from certain graph families of bounded boxicity - like interval graphs and planar graphs. Using the same fact, we also derive an O(nloglogn√logn√) factor approximation algorithm for computing boxicity, which, to our knowledge, is the first o(n) factor approximation algorithm for the problem. We also present an FPT approximation algorithm for computing the cubicity of graphs, with vertex cover number as the parameter.
Resumo:
Topological methods have been successfully used to identify features in scalar fields and to measure their importance. In this paper, we define a notion of topological saliency that captures the relative importance of a topological feature with respect to other features in its local neighborhood. Features are identified by extreme points of an input scalar field, and their importance measured by the so-called topological persistence. Computing the topological saliency of all features for varying neighborhood sizes results in a saliency plot that serves as a summary of relative importance of all topological features. We develop a convenient tool for users to interactively select and inspect features using the saliency plot. We demonstrate the use of topological saliency together with the rich information encoded in the saliency plot in several applications, including key feature identification, scalar field simplification, and feature clustering. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, an input receiver with a hysteresis characteristic that can work at voltage levels between 0.9 V and 5 V is proposed. The input receiver can be used as a wide voltage range Schmitt trigger also. At the same time, reliable circuit operation is ensured. According to the research findings, this is the first time a wide voltage range Schmitt trigger is being reported. The proposed circuit is compared with previously reported input receivers, and it is shown that the circuit has better noise immunity. The proposed input receiver ends the need for a separate Schmitt trigger and input buffer. The frequency of operation is also higher than that of the previously reported receiver. The circuit is simulated using HSPICE at 035-mu m standard thin oxide technology. Monte Carlo analysis is conducted at different process conditions, showing that the proposed circuit works well for different process conditions at different voltage levels of operation. A noise impulse of (V-CC/2) magnitude is added to the input voltage to show that the receiver receives the correct logic level even in the presence of noise. Here, V-CC is the fixed voltage supply of 3.3 V.