224 resultados para Parallel computing. Multilayer perceptron. OpenMP
Resumo:
Damage detection by measuring and analyzing vibration signals in a machine component is an established procedure in mechanical and aerospace engineering. This paper presents vibration signature analysis of steel bridge structures in a nonconventional way using artificial neural networks (ANN). Multilayer perceptrons have been adopted using the back-propagation algorithm for network training. The training patterns in terms of vibration signature are generated analytically for a moving load traveling on a trussed bridge structure at a constant speed to simulate the inspection vehicle. Using the finite-element technique, the moving forces are converted into stationary time-dependent force functions in order to generate vibration signals in the structure and the same is used to train the network. The performance of the trained networks is examined for their capability to detect damage from unknown signatures taken independently at one, three, and five nodes. It has been observed that the prediction using the trained network with single-node signature measurement at a suitability chosen location is even better than that of three-node and five-node measurement data.
Resumo:
This paper deals with the development of a new model for the cooling process on the runout table of hot strip mills, The suitability of different numerical methods for the solution of the proposed model equation from the point of view of accuracy and computation time are studied, Parallel solutions for the model equation are proposed.
Resumo:
Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.
Resumo:
Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.
Resumo:
Thin films of ZnO, Li doped ZnO (ZLO) and multilayer of ZnO and ZLO (ZnO/ZLO) were grown on silicon and corning glass substrates by pulsed laser deposition technique. Single phase formation and the crystalline qualities of the films were analyzed by X-ray diffraction and Li composition in the film was investigated to be 15 wt% by X-ray photoelectron spectroscopy. Raman spectrum reveals the hexagonal wurtzite structure of ZnO, ZLO and ZnO/ZLO multilayer and confirms the single phase formation. Films grown on corning glass shows more than 80% transmittance in the visible region and the optical band gaps were calculated to be 3.245, 3.26 and 3.22 eV for ZnO, ZLO and ZnO/ZLO, respectively. An efficient blue emission was observed in all films which were grown on silicon (1 0 0) substrate by photoluminescence (PL). PL measurements at different temperatures reveal that the PL emission intensity of ZnO/ZLO multilayer was weakly dependent on temperature as compared to the single layers of ZnO and ZLO and the wavelength of emission was independent of temperature. Our results indicate that ZnO/ZLO multilayer can be used for the fabrication of blue light emitting diodes. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present a differential-geometric approach to analyze the singularities of task space point trajectories of two and three-degree-of-freedom serial and parallel manipulators. At non-singular configurations, the first-order, local properties are characterized by metric coefficients, and, geometrically, by the shape and size of a velocity ellipse or an ellipsoid. At singular configurations, the determinant of the matrix of metric coefficients is zero and the velocity ellipsoid degenerates to an ellipse, a line or a point, and the area or the volume of the velocity ellipse or ellipsoid becomes zero. The degeneracies of the velocity ellipsoid or ellipse gives a simple geometric picture of the possible task space velocities at a singular configuration. To study the second-order properties at a singularity, we use the derivatives of the metric coefficients and the rate of change of area or volume. The derivatives are shown to be related to the possible task space accelerations at a singular configuration. In the case of parallel manipulators, singularities may lead to either loss or gain of one or more degrees-of-freedom. For loss of one or more degrees-of-freedom, ther possible velocities and accelerations are again obtained from a modified metric and derivatives of the metric coefficients. In the case of a gain of one or more degrees-of-freedom, the possible task space velocities can be pictured as growth to lines, ellipses, and ellipsoids. The theoretical results are illustrated with the help of a general spatial 2R manipulator and a three-degree-of-freedom RPSSPR-SPR parallel manipulator.
Resumo:
In this paper a new parallel algorithm for nonlinear transient dynamic analysis of large structures has been presented. An unconditionally stable Newmark-beta method (constant average acceleration technique) has been employed for time integration. The proposed parallel algorithm has been devised within the broad framework of domain decomposition techniques. However, unlike most of the existing parallel algorithms (devised for structural dynamic applications) which are basically derived using nonoverlapped domains, the proposed algorithm uses overlapped domains. The parallel overlapped domain decomposition algorithm proposed in this paper has been formulated by splitting the mass, damping and stiffness matrices arises out of finite element discretisation of a given structure. A predictor-corrector scheme has been formulated for iteratively improving the solution in each step. A computer program based on the proposed algorithm has been developed and implemented with message passing interface as software development environment. PARAM-10000 MIMD parallel computer has been used to evaluate the performances. Numerical experiments have been conducted to validate as well as to evaluate the performance of the proposed parallel algorithm. Comparisons have been made with the conventional nonoverlapped domain decomposition algorithms. Numerical studies indicate that the proposed algorithm is superior in performance to the conventional domain decomposition algorithms. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Protein folding and unfolding are complex phenomena, and it is accepted that multidomain proteins generally follow multiple pathways. Maltose-binding protein (MBP) is a large (a two-domain, 370-amino acid residue) bacterial periplasmic protein involved in maltose uptake. Despite the large size, it has been shown to exhibit an apparent two-state equilibrium unfolding in bulk experiments. Single-molecule studies can uncover rare events that are masked by averaging in bulk studies. Here, we use single-molecule force spectroscopy to study the mechanical unfolding pathways of MBP and its precursor protein (preMBP) in the presence and absence of ligands. Our results show that MBP exhibits kinetic partitioning on mechanical stretching and unfolds via two parallel pathways: one of them involves a mechanically stable intermediate (path I) whereas the other is devoid of it (path II). The apoMBP unfolds via path I in 62% of the mechanical unfolding events, and the remaining 38% follow path II. In the case of maltose-bound MBP, the protein unfolds via the intermediate in 79% of the cases, the remaining 21% via path II. Similarly, on binding to maltotriose, a ligand whose binding strength with the polyprotein is similar to that of maltose, the occurrence of the intermediate is comparable (82% via path I) with that of maltose. The precursor protein preMBP also shows a similar behavior upon mechanical unfolding. The percentages of molecules unfolding via path I are 53% in the apo form and 68% and 72% upon binding to maltose and maltotriose, respectively, for preMBP. These observations demonstrate that ligand binding can modulate the mechanical unfolding pathways of proteins by a kinetic partitioning mechanism. This could be a general mechanism in the unfolding of other large two-domain ligand-binding proteins of the bacterial periplasmic space.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.