171 resultados para Heterogeneous Regressions Algorithms
Resumo:
CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.
Resumo:
Grover's database search algorithm, although discovered in the context of quantum computation, can be implemented using any physical system that allows superposition of states. A physical realization of this algorithm is described using coupled simple harmonic oscillators, which can be exactly solved in both classical and quantum domains. Classical wave algorithms are far more stable against decoherence compared to their quantum counterparts. In addition to providing convenient demonstration models, they may have a role in practical situations, such as catalysis.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.
Resumo:
Four algorithms, all variants of Simultaneous Perturbation Stochastic Approximation (SPSA), are proposed. The original one-measurement SPSA uses an estimate of the gradient of objective function L containing an additional bias term not seen in two-measurement SPSA. As a result, the asymptotic covariance matrix of the iterate convergence process has a bias term. We propose a one-measurement algorithm that eliminates this bias, and has asymptotic convergence properties making for easier comparison with the two-measurement SPSA. The algorithm, under certain conditions, outperforms both forms of SPSA with the only overhead being the storage of a single measurement. We also propose a similar algorithm that uses perturbations obtained from normalized Hadamard matrices. The convergence w.p. 1 of both algorithms is established. We extend measurement reuse to design two second-order SPSA algorithms and sketch the convergence analysis. Finally, we present simulation results on an illustrative minimization problem.
Resumo:
Let G - (V, E) be a weighted undirected graph having nonnegative edge weights. An estimate (delta) over cap (u, v) of the actual distance d( u, v) between u, v is an element of V is said to be of stretch t if and only if delta(u, v) <= (delta) over cap (u, v) <= t . delta(u, v). Computing all-pairs small stretch distances efficiently ( both in terms of time and space) is a well-studied problem in graph algorithms. We present a simple, novel, and generic scheme for all-pairs approximate shortest paths. Using this scheme and some new ideas and tools, we design faster algorithms for all-pairs t-stretch distances for a whole range of stretch t, and we also answer an open question posed by Thorup and Zwick in their seminal paper [J. ACM, 52 (2005), pp. 1-24].
Resumo:
There are a number of large networks which occur in many problems dealing with the flow of power, communication signals, water, gas, transportable goods, etc. Both design and planning of these networks involve optimization problems. The first part of this paper introduces the common characteristics of a nonlinear network (the network may be linear, the objective function may be non linear, or both may be nonlinear). The second part develops a mathematical model trying to put together some important constraints based on the abstraction for a general network. The third part deals with solution procedures; it converts the network to a matrix based system of equations, gives the characteristics of the matrix and suggests two solution procedures, one of them being a new one. The fourth part handles spatially distributed networks and evolves a number of decomposition techniques so that we can solve the problem with the help of a distributed computer system. Algorithms for parallel processors and spatially distributed systems have been described.There are a number of common features that pertain to networks. A network consists of a set of nodes and arcs. In addition at every node, there is a possibility of an input (like power, water, message, goods etc) or an output or none. Normally, the network equations describe the flows amoungst nodes through the arcs. These network equations couple variables associated with nodes. Invariably, variables pertaining to arcs are constants; the result required will be flows through the arcs. To solve the normal base problem, we are given input flows at nodes, output flows at nodes and certain physical constraints on other variables at nodes and we should find out the flows through the network (variables at nodes will be referred to as across variables).The optimization problem involves in selecting inputs at nodes so as to optimise an objective function; the objective may be a cost function based on the inputs to be minimised or a loss function or an efficiency function. The above mathematical model can be solved using Lagrange Multiplier technique since the equalities are strong compared to inequalities. The Lagrange multiplier technique divides the solution procedure into two stages per iteration. Stage one calculates the problem variables % and stage two the multipliers lambda. It is shown that the Jacobian matrix used in stage one (for solving a nonlinear system of necessary conditions) occurs in the stage two also.A second solution procedure has also been imbedded into the first one. This is called total residue approach. It changes the equality constraints so that we can get faster convergence of the iterations.Both solution procedures are found to coverge in 3 to 7 iterations for a sample network.The availability of distributed computer systems — both LAN and WAN — suggest the need for algorithms to solve the optimization problems. Two types of algorithms have been proposed — one based on the physics of the network and the other on the property of the Jacobian matrix. Three algorithms have been deviced, one of them for the local area case. These algorithms are called as regional distributed algorithm, hierarchical regional distributed algorithm (both using the physics properties of the network), and locally distributed algorithm (a multiprocessor based approach with a local area network configuration). The approach used was to define an algorithm that is faster and uses minimum communications. These algorithms are found to converge at the same rate as the non distributed (unitary) case.
Resumo:
The k-colouring problem is to colour a given k-colourable graph with k colours. This problem is known to be NP-hard even for fixed k greater than or equal to 3. The best known polynomial time approximation algorithms require n(delta) (for a positive constant delta depending on k) colours to colour an arbitrary k-colourable n-vertex graph. The situation is entirely different if we look at the average performance of an algorithm rather than its worst-case performance. It is well known that a k-colourable graph drawn from certain classes of distributions can be ii-coloured almost surely in polynomial time. In this paper, we present further results in this direction. We consider k-colourable graphs drawn from the random model in which each allowed edge is chosen independently with probability p(n) after initially partitioning the vertex set into ii colour classes. We present polynomial time algorithms of two different types. The first type of algorithm always runs in polynomial time and succeeds almost surely. Algorithms of this type have been proposed before, but our algorithms have provably exponentially small failure probabilities. The second type of algorithm always succeeds and has polynomial running time on average. Such algorithms are more useful and more difficult to obtain than the first type of algorithms. Our algorithms work as long as p(n) greater than or equal to n(-1+is an element of) where is an element of is a constant greater than 1/4.
Resumo:
A new formulation is suggested for the fixed end-point regulator problem, which, in conjunction with the recently developed integration-free algorithms, provides an efficient means of obtaining numerical solutions to such problems.
Resumo:
The domination and Hamilton circuit problems are of interest both in algorithm design and complexity theory. The domination problem has applications in facility location and the Hamilton circuit problem has applications in routing problems in communications and operations research.The problem of deciding if G has a dominating set of cardinality at most k, and the problem of determining if G has a Hamilton circuit are NP-Complete. Polynomial time algorithms are, however, available for a large number of restricted classes. A motivation for the study of these algorithms is that they not only give insight into the characterization of these classes but also require a variety of algorithmic techniques and data structures. So the search for efficient algorithms, for these problems in many classes still continues.A class of perfect graphs which is practically important and mathematically interesting is the class of permutation graphs. The domination problem is polynomial time solvable on permutation graphs. Algorithms that are already available are of time complexity O(n2) or more, and space complexity O(n2) on these graphs. The Hamilton circuit problem is open for this class.We present a simple O(n) time and O(n) space algorithm for the domination problem on permutation graphs. Unlike the existing algorithms, we use the concept of geometric representation of permutation graphs. Further, exploiting this geometric notion, we develop an O(n2) time and O(n) space algorithm for the Hamilton circuit problem.
Resumo:
Three different types of consistencies, viz., semiweak, weak, and strong, of a read-only transaction in a schedule s of a set T of transactions are defined and these are compared with the existing notions of consistencies of a read-only transaction in a schedule. We present a technique that enables a user to control the consistency of a read-only transaction in heterogeneous locking protocols. Since the weak consistency of a read-only transaction improves concurrency in heterogeneous locking protocols, the users can help to improve concurrency in heterogeneous locking protocols by supplying the consistency requirements of read-only transactions. A heterogeneous locking protocol P' derived from a locking protocol P that uses exclusive mode locks only and ensures serializability need not be deadlock-free. We present a sufficient condition that ensures the deadlock-freeness of Pprime, when P is deadlock-free and all the read-only transactions in Pprime are two phase.
Resumo:
A hypomonotectic alloy of Al-4.5wt%Cd has been manufactured by melt spinning and the resulting microstructure examined by transmission electron microscopy. As-melt spun hypomonotectic Al-4.5wt%Cd consists of a homogeneous distribution of faceted 5 to 120 nm diameter cadmium particles embedded in a matrix of aluminium, formed during the monotectic solidification reaction. The cadmium particles exhibit an orientation relationship with the aluminium matrix of {111}Al//{0001}Cd and lang110rangAlAl//lang11¯20> Cd, with four cadmium particle variants depending upon which of the four {111}Al planes is parallel to {0001}Cd. The cadmium particles exibit a distorted cuboctahedral shape, bounded by six curved {100}Al//{20¯23}Cd facets, six curved {111}Al/{40¯43}Cd facets and two flat {111}Al//{0001}Cd facets. The as-melt spun cadmium particle shape is metastable and the cadmium particles equilibrate during heat treatment below the cadmium melting point, becoming elongated to increase the surface area and decrease the separation of the {111}Al//{0001}Cd facets. The equilibrium cadmium particle shape and, therefore, the anisotropy of solid aluminium-solid cadmium and solid aluminium -liquid cadmium surface energies have been monitored by in situ heating in the transmission electron microscope over the temperature range between room temperature and 420 °C. The anisotropy of solid aluminium-solid cadmium surface energy is constant between room temperature and the cadmium melting point, with the {100}Al//{20¯23}Cd surface energy on average 40% greater than the {111}Al//{0001}Cd surface energy, and 10% greater than the {111}Al//{40¯43Cd surface energy. When the cadmium particles melt at temperatures above 321 °C, the {100}Al//{20¯23}Cd facets disappear and the {111}Al//{40¯43}Cd and {111}A1//{0001}Cd surface energies become equal. The {111}Al facets do not disappear when the cadmium particles melt, and the anisotropy of solid aluminium-liquid cadmium surface energy decreases gradually with increasing temperature above the cadmium melting point. The kinetics of cadmium solidification have been examined by heating and cooling experiments in a differential scanning calorimeter over a range of heating and cooling rates. Cadmium particle solidification is nucleated catalytically by the surrounding aluminium matrix on the {111}Al faceted surfaces, with an undercooling of 56 K and a contact angle of 42 °. The nucleation kinetics of cadmium particle solidification are in good agreement with the hemispherical cap model of heterogeneous nucleation.
Resumo:
A spanning tree T of a graph G is said to be a tree t-spanner if the distance between any two vertices in T is at most t times their distance in G. A graph that has a tree t-spanner is called a tree t-spanner admissible graph. The problem of deciding whether a graph is tree t-spanner admissible is NP-complete for any fixed t >= 4 and is linearly solvable for t <= 2. The case t = 3 still remains open. A chordal graph is called a 2-sep chordal graph if all of its minimal a - b vertex separators for every pair of non-adjacent vertices a and b are of size two. It is known that not all 2-sep chordal graphs admit tree 3-spanners This paper presents a structural characterization and a linear time recognition algorithm of tree 3-spanner admissible 2-sep chordal graphs. Finally, a linear time algorithm to construct a tree 3-spanner of a tree 3-spanner admissible 2-sep chordal graph is proposed. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
IEEE 802.16 standards for Wireless Metropolitan Area Networks (WMANs) include a mesh mode of operation for improving the coverage and throughput of the network. In this paper, we consider the problem of routing and centralized scheduling for such networks. We first fix the routing, which reduces the network to a tree. We then present a finite horizon dynamic programming framework. Using it we obtain various scheduling algorithms depending upon the cost function. Next we consider simpler suboptimal algorithms and compare their performances.
Resumo:
Instability and dewetting engendered by the van der Waals force in soft thin (<100 nm) linear viscoelastic solid (e. g., elastomeric gel) films on uniform and patterned surfaces are explored. Linear stability analysis shows that, although the elasticity of the film controls the onset of instability and the corresponding critical wavelength, the dominant length-scale remains invariant with the elastic modulus of the film. The unstable modes are found to be long-wave, for which a nonlinear long-wave analysis and simulations are performed to uncover the dynamics and morphology of dewetting. The stored elastic energy slows down the temporal growth of instability significantly. The simulations also show that a thermodynamically stable film with zero-frequency elasticity can be made unstable in the presence of physico-chemical defects on the substrate and can follow an entirely different pathway with far fewer holes as compared to the viscous films. Further, the elastic restoring force can retard the growth of a depression adjacent to the hole-rim and thus suppress the formation of satellite holes bordering the primary holes. These findings are in contrast to the dewetting of viscoelastic liquid films where nonzero frequency elasticity accelerates the film rupture and promotes the secondary instabilities. Thus, the zero-frequency elasticity can play a major role in imposing a better-defined long-range order to the dewetted structures by arresting the secondary instabilities. (C) 2011 American Institute of Physics. doi: 10.1063/1.3554748]