163 resultados para structured parallel computations
Resumo:
Unending quest for performance improvement coupled with the advancements in integrated circuit technology have led to the development of new architectural paradigm. Speculative multithreaded architecture (SpMT) philosophy relies on aggressive speculative execution for improved performance. However, aggressive speculative execution comes with a mixed flavor of improving performance, when successful, and adversely affecting the energy consumption (and performance) because of useless computation in the event of mis-speculation. Dynamic instruction criticality information can be usefully applied to control and guide such an aggressive speculative execution. In this paper, we present a model of micro-execution for SpMT architecture that we have developed to determine the dynamic instruction criticality. We have also developed two novel techniques utilizing the criticality information namely delaying the non-critical loads and the criticality based thread-prediction for reducing useless computations and energy consumption. Experimental results showing break-up of critical instructions and effectiveness of proposed techniques in reducing energy consumption are presented in the context of multiscalar processor that implements SpMT architecture. Our experiments show 17.7% and 11.6% reduction in dynamic energy for criticality based thread prediction and criticality based delayed load scheme respectively while the improvement in dynamic energy delay product is 13.9% and 5.5%, respectively. (c) 2012 Published by Elsevier B.V.
Resumo:
A series of novel organic-inorganic hybrid membranes have been prepared employing Nafion and acid-functionalized meso-structured molecular sieves (MMS) with varying structures and surface area. Acid-functionalized silica nanopowder of surface area 60 m(2)/g, silica meso-structured cellular foam (MSU-F) of surface area 470 m(2)/g and silica meso-structured hexagonal frame network (MCM-41) of surface area 900 m(2)/g have been employed as potential filler materials to form hybrid membranes with Nafion framework. The structural behavior, water uptake, proton conductivity and methanol permeability of these hybrid membranes have been investigated. DMFCs employing Nafion-silica MSU-F and Nafion-silica MCM-41 hybrid membranes deliver peak-power densities of 127 mW/cm(2) and 100 mW/cm(2), respectively; while a peak-power density of only 48 mW/cm(2) is obtained with the DMFC employing pristine recast Nafion membrane under identical operating conditions. The aforesaid characteristics of the hybrid membranes could be exclusively attributed to the presence of pendant sulfonic acid groups in the filler, which provide fairly continuous proton-conducting pathways between filler and matrix in the hybrid membranes facilitating proton transport without any trade-off between its proton conductivity and methanol crossover. (C) 2012 The Electrochemical Society. DOI: 10.1149/2.036211jes] All rights reserved.
Resumo:
Combustion instability events in lean premixed combustion systems can cause spatio-temporal variations in unburnt mixture fuel/air ratio. This provides a driving mechanism for heat-release oscillations when they interact with the flame. Several Reduced Order Modelling (ROM) approaches to predict the characteristics of these oscillations have been developed in the past. The present paper compares results for flame describing function characteristics determined from a ROM approach based on the level-set method, with corresponding results from detailed, fully compressible reacting flow computations for the same two dimensional slot flame configuration. The comparison between these results is seen to be sensitive to small geometric differences in the shape of the nominally steady flame used in the two computations. When the results are corrected to account for these differences, describing function magnitudes are well predicted for frequencies lesser than and greater than a lower and upper cutoff respectively due to amplification of flame surface wrinkling by the convective Darrieus-Landau (DL) instability. However, good agreement in describing function phase predictions is seen as the ROM captures the transit time of wrinkles through the flame correctly. Also, good agreement is seen for both magnitude and phase of the flame response, for large forcing amplitudes, at frequencies where the DL instability has a minimal influence. Thus, the present ROM can predict flame response as long as the DL instability, caused by gas expansion at the flame front, does not significantly alter flame front perturbation amplitudes as they traverse the flame. (C) 2012 The Combustion Institute. Published by Elsevier Inc. All rights reserved.
Resumo:
This paper presents a decentralized/peer-to-peer architecture-based parallel version of the vector evaluated particle swarm optimization (VEPSO) algorithm for multi-objective design optimization of laminated composite plates using message passing interface (MPI). The design optimization of laminated composite plates being a combinatorially explosive constrained non-linear optimization problem (CNOP), with many design variables and a vast solution space, warrants the use of non-parametric and heuristic optimization algorithms like PSO. Optimization requires minimizing both the weight and cost of these composite plates, simultaneously, which renders the problem multi-objective. Hence VEPSO, a multi-objective variant of the PSO algorithm, is used. Despite the use of such a heuristic, the application problem, being computationally intensive, suffers from long execution times due to sequential computation. Hence, a parallel version of the PSO algorithm for the problem has been developed to run on several nodes of an IBM P720 cluster. The proposed parallel algorithm, using MPI's collective communication directives, establishes a peer-to-peer relationship between the constituent parallel processes, deviating from the more common master-slave approach, in achieving reduction of computation time by factor of up to 10. Finally we show the effectiveness of the proposed parallel algorithm by comparing it with a serial implementation of VEPSO and a parallel implementation of the vector evaluated genetic algorithm (VEGA) for the same design problem. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.
Operator-splitting finite element algorithms for computations of high-dimensional parabolic problems
Resumo:
An operator-splitting finite element method for solving high-dimensional parabolic equations is presented. The stability and the error estimates are derived for the proposed numerical scheme. Furthermore, two variants of fully-practical operator-splitting finite element algorithms based on the quadrature points and the nodal points, respectively, are presented. Both the quadrature and the nodal point based operator-splitting algorithms are validated using a three-dimensional (3D) test problem. The numerical results obtained with the full 3D computations and the operator-split 2D + 1D computations are found to be in a good agreement with the analytical solution. Further, the optimal order of convergence is obtained in both variants of the operator-splitting algorithms. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Identical parallel-connected converters with unequal load sharing have unequal terminal voltages. The difference in terminal voltages is more pronounced in case of back-to-back connected converters, operated in power-circulation mode for the purpose of endurance tests. In this paper, a synchronous reference frame based analysis is presented to estimate the grid current distortion in interleaved, grid-connected converters with unequal terminal voltages. Influence of carrier interleaving angle on rms grid current ripple is studied theoretically as well as experimentally. Optimum interleaving angle to minimize the rms grid current ripple is investigated for different applications of parallel converters. The applications include unity power factor rectifiers, inverters for renewable energy sources, reactive power compensators, and circulating-power test set-up used for thermal testing of high-power converters. Optimum interleaving angle is shown to be a strong function of the average of the modulation indices of the two converters, irrespective of the application. The findings are verified experimentally on two parallel-connected converters, circulating reactive power of up to 150 kVA between them.
Resumo:
We propose a novel method of constructing Dispersion Matrices (DM) for Coherent Space-Time Shift Keying (CSTSK) relying on arbitrary PSK signal sets by exploiting codes from division algebras. We show that classic codes from Cyclic Division Algebras (CDA) may be interpreted as DMs conceived for PSK signal sets. Hence various benefits of CDA codes such as their ability to achieve full diversity are inherited by CSTSK. We demonstrate that the proposed CDA based DMs are capable of achieving a lower symbol error ratio than the existing DMs generated using the capacity as their optimization objective function for both perfect and imperfect channel estimation.
Resumo:
Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load-balance whenever possible. We first provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then provide an approach to find such hyperplanes. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-specific stencil code generator by 4% to 27%, and previous compiler techniques by a factor of 2x to 10.14x.
Resumo:
We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes just touch at their boundaries. Further, this construction can be realized in linear time.
Resumo:
In this paper, we study the diversity-multiplexing-gain tradeoff (DMT) of wireless relay networks under the half-duplex constraint. It is often unclear what penalty if any, is imposed by the half-duplex constraint on the DMT of such networks. We study two classes of networks; the first class, called KPP(I) networks, is the class of networks with the relays organized in K parallel paths between the source and the destination. While we assume that there is no direct source-destination path, the K relaying paths can interfere with each other. The second class, termed as layered networks, is comprised of relays organized in layers, where links exist only between adjacent layers. We present a communication scheme based on static schedules and amplify-and-forward relaying for these networks. We also show that for KPP(I) networks with K >= 3, the proposed schemes can achieve full-duplex DMT performance, thus demonstrating that there is no performance hit on the DMT due to the half-duplex constraint. We also show that, for layered networks, a linear DMT of d(max)(1 - r)(+) between the maximum diversity d(max) and the maximum MG, r(max) = 1 is achievable. We adapt existing DMT optimal coding schemes to these networks, thus specifying the end-to-end communication strategy explicitly.
Resumo:
Among the mu-conotoxins that block vertebrate voltage-gated sodium channels (VGSCs), some have been shown to be potent analgesics following systemic administration in mice. We have determined the solution structure of a new representative of this family, mu-BuIIIB, and established its disulfide connectivities by direct mass spectrometric collision induced dissociation fragmentation of the peptide with disulfides intact The major oxidative folding product adopts a 1-4/2-5/3-6 pattern with the following disulfide bridges: Cys5-Cys17, Cys6-Cys23, and Cys13-Cys24. The solution structure reveals that the unique N-terminal extension in mu-BuIIIB, which is also present in mu-BuIIIA and mu-BuIIIC but absent in other mu-conotoxins, forms part of a short a-helix encompassing Glu3 to Asn8. This helix is packed against the rest of the toxin and stabilized by the Cys5-Cys17 and Cys6-Cys23 disulfide bonds. As such, the side chain of Val1 is located close to the aromatic rings of Trp16 and His20, which are located on the canonical helix that displays several residues found to be essential for VGSC blockade in related mu-conotoxins. Mutations of residues 2 and 3 in the N-terminal extension enhanced the potency of mu-BuIIIB for Na(v)1.3. One analogue, D-Ala2]BuIIIB, showed a 40-fold increase, making it the most potent peptide blocker of this channel characterized to date and thus a useful new tool with which to characterize this channel. On the basis of previous results for related mu-conotoxins, the dramatic effects of mutations at the N-terminus were unanticipated and suggest that further gains in potency might be achieved by additional modifications of this region.
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
Seven double cysteine mutants of maltose binding protein (MBP) were generated with one each in the active cleft at position 298 and the second cysteine distributed over both domains of the protein. These cysteines were spin labeled and distances between the labels in biradical pairs determined by pulsed double electron-electron resonance (DEER) measurements. The values were compared with theoretical predictions of distances between the labels in biradicals constructed by molecular modeling from the crystal structure of MBP without maltose and were found to be in excellent agreement. MBP is in a molten globule state at pH 3.3 and is known to still bind its substrate maltose. The nitroxide spin label was sufficiently stable under these conditions. In preliminary experiments, DEER measurements were carried out with one of the mutants yielding a broad distance distribution as was to be expected if there is no explicit tertiary structure and the individual helices pointing into all possible directions.
Resumo:
Many of the conducting polymers though having good material property are not solution processable. Hence an alternate method of fabrication of film by pulsed laser deposition, was explored in this work. PDTCPA, a donor-acceptor-donor type of polymer having absorption from 900 nm to 300 nm was deposited by both UV and IR laser to understand the effect of deposition parameters on the film quality. It was observed that the laser ablation of PDTCPA doesn't alter its chemical structure hence retaining the chemical integrity of the polymer. Microscopic studies of the ablated film shows that the IR laser ablated films were particulate in nature while UV laser ablated films are deposited as smooth continuous layer. The morphology of the film influences its electrical characteristics as current-voltage characteristic of these films shows that films deposited by UV laser are p rectifying while those by IR laser are more of resistor in nature.