53 resultados para scalability
Resumo:
In this paper, we propose an efficient source routing algorithm for unicast flows, which addresses the scalability problem associated with the basic source routing technique. Simulation results indicate that the proposed algorithm indeed helps in reducing the message overhead considerably, and at the same time it gives comparable performance in terms of resource utilization across a wide range of workloads.
Resumo:
Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.
Resumo:
In the present study, the mechanical behaviour of CSM (chopped strand mat)-based GFRC (glass fibre-reinforced composite) plates with single and multiple hemispheres under compressive loads has been investigated both experimentally and numerically. The basic stress-strain behaviours arc identified with quasi-static tests on two-ply coupon laminates and short cylinders, and these are followed up with compressive tests in a UTM (universal testing machine) on single- and multiple-hemisphere plates. The ability of an explicit LS-DYNA solver in predicting the complex material behaviour of composite hemispheres, including failure, is demonstrated. The relevance and scalability of the present class of structural components as `force-multipliers' and `energy-multipliers' have been justified by virtue of findings that as the number of hemispheres in a panel increased from one to four, peak load and average absorbed energy rose by factors of approximately four and six, respectively. The performance of a composite hemisphere has been compared to similar-sized steel and aluminium hemispheres, and the former is found to be of distinctly higher specific energy than the steel specimen. A simulation-based study has also been carried out on a composite 2 x 2-hemisphere panel under impact loads and its behaviour approaching that of an ideal energy absorber has been predicted. In summary, the present investigation has established the efficacy of composite plates with hemispherical force multipliers as potential energy-absorbing countermeasures and the suitability of CAE (computer-aided engineering) for their design.
Resumo:
The Morse-Smale complex is a topological structure that captures the behavior of the gradient of a scalar function on a manifold. This paper discusses scalable techniques to compute the Morse-Smale complex of scalar functions defined on large three-dimensional structured grids. Computing the Morse-Smale complex of three-dimensional domains is challenging as compared to two-dimensional domains because of the non-trivial structure introduced by the two types of saddle criticalities. We present a parallel shared-memory algorithm to compute the Morse-Smale complex based on Forman's discrete Morse theory. The algorithm achieves scalability via synergistic use of the CPU and the GPU. We first prove that the discrete gradient on the domain can be computed independently for each cell and hence can be implemented on the GPU. Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.
Resumo:
In this article, we investigate the performance of a volume integral equation code on BlueGene/L system. Volume integral equation (VIE) is solved for homogeneous and inhomogeneous dielectric objects for radar cross section (RCS) calculation in a highly parallel environment. Pulse basis functions and point matching technique is used to convert the volume integral equation into a set of simultaneous linear equations and is solved using parallel numerical library ScaLAPACK on IBM's distributed-memory supercomputer BlueGene/L by different number of processors to compare the speed-up and test the scalability of the code.
Resumo:
This paper illustrates a Wavelet Coefficient based approach using experiments to understand the sensitivity of ultrasonic signals due to parametric variation of a crack configuration in a metal plate. A PZT patch sensor/actuator system integrated to a metal plate with through-thickness crack is used. The proposed approach uses piezoelectric patches, which can be used to both actuate and sense the ultrasonic signals. While this approach leads to more flexibility and reduced cost for larger scalability of the sensor/actuator network, the complexity of the signals increases as compared to what is encountered in conventional ultrasonic NDE problems using selective wave modes. A Damage Index (DI) has been introduced, which is function of wavelet coefficient. Experiments have been carried out for various crack sizes, crack orientations and band-limited tone-burst signal through FIR filter. For a 1 cm long crack interrogated with 20 kHz tone-burst signal, the Damage Index (DI) for the horizontal crack orientation increases by about 70% with respect to that for 135 degrees oriented crack and it increases by about 33% with respect to the vertically oriented crack. The detailed results reported in this paper is a step forward to developing computational schemes for parametric identification of damage using sensor/actuator network and ultrasonic wave.
Resumo:
Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.
Resumo:
The Lovasz θ function of a graph, is a fundamental tool in combinatorial optimization and approximation algorithms. Computing θ involves solving a SDP and is extremely expensive even for moderately sized graphs. In this paper we establish that the Lovasz θ function is equivalent to a kernel learning problem related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM−θ graphs, on which the Lovasz θ function can be approximated well by a one-class SVM. This leads to a novel use of SVM techniques to solve algorithmic problems in large graphs e.g. identifying a planted clique of size Θ(n√) in a random graph G(n,12). A classic approach for this problem involves computing the θ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM−θ graph, and as a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. Further, we introduce the notion of a ''common orthogonal labeling'' which extends the notion of a ''orthogonal labelling of a single graph (used in defining the θ function) to multiple graphs. The problem of finding the optimal common orthogonal labelling is cast as a Multiple Kernel Learning problem and is used to identify a large common dense region in multiple graphs. The proposed algorithm achieves an order of magnitude scalability compared to the state of the art.
Resumo:
Moore's Law has driven the semiconductor revolution enabling over four decades of scaling in frequency, size, complexity, and power. However, the limits of physics are preventing further scaling of speed, forcing a paradigm shift towards multicore computing and parallelization. In effect, the system is taking over the role that the single CPU was playing: high-speed signals running through chips but also packages and boards connect ever more complex systems. High-speed signals making their way through the entire system cause new challenges in the design of computing hardware. Inductance, phase shifts and velocity of light effects, material resonances, and wave behavior become not only prevalent but need to be calculated accurately and rapidly to enable short design cycle times. In essence, to continue scaling with Moore's Law requires the incorporation of Maxwell's equations in the design process. Incorporating Maxwell's equations into the design flow is only possible through the combined power that new algorithms, parallelization and high-speed computing provide. At the same time, incorporation of Maxwell-based models into circuit and system-level simulation presents a massive accuracy, passivity, and scalability challenge. In this tutorial, we navigate through the often confusing terminology and concepts behind field solvers, show how advances in field solvers enable integration into EDA flows, present novel methods for model generation and passivity assurance in large systems, and demonstrate the power of cloud computing in enabling the next generation of scalable Maxwell solvers and the next generation of Moore's Law scaling of systems. We intend to show the truly symbiotic growing relationship between Maxwell and Moore!
Resumo:
This paper presents an advanced single network adaptive critic (SNAC) aided nonlinear dynamic inversion (NDI) approach for simultaneous attitude control and trajectory tracking of a micro-quadrotor. Control of micro-quadrotors is a challenging problem due to its small size, strong coupling in pitch-yaw-roll and aerodynamic effects that often need to be ignored in the control design process to avoid mathematical complexities. In the proposed SNAC aided NDI approach, the gains of the dynamic inversion design are selected in such a way that the resulting controller behaves closely to a pre-synthesized SNAC controller for the output regulation problem. However, since SNAC is based on optimal control theory, it makes the dynamic inversion controller to operate near optimal and enhances its robustness property as well. More important, it retains two major benefits of dynamic inversion: (i) closed form expression of the controller and (ii) easy scalability to command tracking application even without any apriori knowledge of the reference command. Effectiveness of the proposed controller is demonstrated from six degree-of-freedom simulation studies of a micro-quadrotor. It has also been observed that the proposed SNAC aided NDI approach is more robust to modeling inaccuracies, as compared to the NDI controller designed independently from time domain specifications.
Resumo:
In this paper, we propose a novel authentication protocol for MANETs requiring stronger security. The protocol works on a two-tier network architecture with client nodes and authentication server nodes, and supports dynamic membership. We use an external membership granting server (MGS) to provide stronger security with dynamic membership. However, the external MGS in our protocol is semi-online instead of being online, i.e., the MGS cannot initiate a connection with a network node but any network node can communicate with the MGS whenever required. To ensure efficiency, the protocol uses symmetric key cryptography to implement the authentication service. However, to achieve storage scalability, the protocol uses a pseudo random function (PRF) to bind the secret key of a client to its identity using the secret key of its server. In addition, the protocol possesses an efficient server revocation mechanism along with an efficient server re-assignment mechanism, which makes the protocol robust against server node compromise.
Resumo:
Automatic and accurate detection of the closure-burst transition events of stops and affricates serves many applications in speech processing. A temporal measure named the plosion index is proposed to detect such events, which are characterized by an abrupt increase in energy. Using the maxima of the pitch-synchronous normalized cross correlation as an additional temporal feature, a rule-based algorithm is designed that aims at selecting only those events associated with the closure-burst transitions of stops and affricates. The performance of the algorithm, characterized by receiver operating characteristic curves and temporal accuracy, is evaluated using the labeled closure-burst transitions of stops and affricates of the entire TIMIT test and training databases. The robustness of the algorithm is studied with respect to global white and babble noise as well as local noise using the TIMIT test set and on telephone quality speech using the NTIMIT test set. For these experiments, the proposed algorithm, which does not require explicit statistical training and is based on two one-dimensional temporal measures, gives a performance comparable to or better than the state-of-the-art methods. In addition, to test the scalability, the algorithm is applied on the Buckeye conversational speech corpus and databases of two Indian languages. (C) 2014 Acoustical Society of America.
Resumo:
Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.
Resumo:
To combine the advantages of both stability and optimality-based designs, a single network adaptive critic (SNAC) aided nonlinear dynamic inversion approach is presented in this paper. Here, the gains of a dynamic inversion controller are selected in such a way that the resulting controller behaves very close to a pre-synthesized SNAC controller in the output regulation sense. Because SNAC is based on optimal control theory, it makes the dynamic inversion controller operate nearly optimal. More important, it retains the two major benefits of dynamic inversion, namely (i) a closed-form expression of the controller and (ii) easy scalability to command tracking applications without knowing the reference commands a priori. An extended architecture is also presented in this paper that adapts online to system modeling and inversion errors, as well as reduced control effectiveness, thereby leading to enhanced robustness. The strengths of this hybrid method of applying SNAC to optimize an nonlinear dynamic inversion controller is demonstrated by considering a benchmark problem in robotics, that is, a two-link robotic manipulator system. Copyright (C) 2013 John Wiley & Sons, Ltd.
Resumo:
In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MATHFINDER, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MATHFINDER searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MATHFINDER as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MATHFINDER resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MATHFINDER retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MATHFINDER in a MapReduce framework and evaluate its scalability and response time.