443 resultados para Supercomputer
Resumo:
Salient object detection has become an important task in many image processing applications. The existing approaches exploit background prior and contrast prior to attain state of the art results. In this paper, instead of using background cues, we estimate the foreground regions in an image using objectness proposals and utilize it to obtain smooth and accurate saliency maps. We propose a novel saliency measure called `foreground connectivity' which determines how tightly a pixel or a region is connected to the estimated foreground. We use the values assigned by this measure as foreground weights and integrate these in an optimization framework to obtain the final saliency maps. We extensively evaluate the proposed approach on two benchmark databases and demonstrate that the results obtained are better than the existing state of the art approaches.
Resumo:
This short communication presents our recent studies to implement numerical simulations for multi-phase flows on top-ranked supercomputer systems with distributed memory architecture. The numerical model is designed so as to make full use of the capacity of the hardware. Satisfactory scalability in terms of both the parallel speed-up rate and the size of the problem has been obtained on two high rank systems with massively parallel processors, the Earth Simulator (Earth simulator research center, Yokohama Kanagawa, Japan) and the TSUBAME (Tokyo Institute of Technology, Tokyo, Japan) supercomputers.
Resumo:
150 p.
Resumo:
We investigated the structural, elastic, and electronic properties of the cubic perovskite-type BaHfO3 using a first-principles method based on the plane-wave basis set. Analysis of the band structure shows that perovskite-type BaHfO3 is a wide gap indirect semiconductor. The band-gap is predicted to be 3.94 eV within the screened exchange local density approximation (sX-LDA). The calculated equilibrium lattice constant of this compound is in good agreement with the available experimental and theoretical data reported in the literatures. The independent elastic constants (C-11, C-12, and C-44), bulk modules B and its pressure derivatives B', compressibility beta, shear modulus G, Young's modulus Y, Poisson's ratio nu, and Lame constants (mu, lambda) are obtained and analyzed in comparison with the available theoretical and experimental data for both the singlecrystalline and polycrystalline BaHfO3. The bonding-charge density calculation make it clear that the covalent bonds exist between the Hf and 0 atoms and the ionic bonds exist between the Ba atoms and HfO3 ionic groups in BaHfO3. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
We describe an approach to parallel compilation that seeks to harness the vast amount of fine-grain parallelism that is exposed through partial evaluation of numerically-intensive scientific programs. We have constructed a compiler for the Supercomputer Toolkit parallel processor that uses partial evaluation to break down data abstractions and program structure, producing huge basic blocks that contain large amounts of fine-grain parallelism. We show that this fine-grain prarllelism can be effectively utilized even on coarse-grain parallel architectures by selectively grouping operations together so as to adjust the parallelism grain-size to match the inter-processor communication capabilities of the target architecture.
Resumo:
The simulation of subsonic aeroacoustic problems such as the flow-generated sound of wind instruments is well suited for parallel computing on a cluster of non-dedicated workstations. Simulations are demonstrated which employ 20 non-dedicated Hewlett-Packard workstations (HP9000/715), and achieve comparable performance on this problem as a 64-node CM-5 dedicated supercomputer with vector units. The success of the present approach depends on the low communication requirements of the problem (low communication to computation ratio) which arise from the coarse-grain decomposition of the problem and the use of local-interaction methods. Many important problems may be suitable for this type of parallel computing including computer vision, circuit simulation, and other subsonic flow problems.
Resumo:
We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer.
Resumo:
This paper discusses the calculation of electron impact collision strengths and effective collision strengths for iron peak elements of importance in the analysis of many astronomical and laboratory spectra. It commences with a brief overview of R-matrix theory which is the basis of computer programs which have been widely used to calculate the relevant atomic data used in this analysis. A summary is then given of calculations carried out over the last 20 y for electron collisions with Fe II. The grand challenge, represented by the calculation of accurate collision strengths and effective collision strengths for this ion, is then discussed. A new parallel R-matrix program PRMAT, which is being developed to meet this challenge, is then described and results of recent calculations, using this program to determine optically forbidden transitions in e- – Ni IV on a Cray T3E-1200 parallel supercomputer, are presented. The implications of this e- – Ni IV calculation for the determination of accurate data from an isoelectronic e- – Fe II calculation are discussed and finally some future directions of research are reviewed.
Resumo:
In this paper we describe the design of a parallel solution of the inhomogeneous Schrodinger equation, which arises in the construction of continuum orbitals in the R-matrix theory of atomic continuum processes. A prototype system is described which has been programmed in occam2 and implemented on a bi-directional pipeline of transputers. Some timing results for the prototype system are presented, and the development of a full production system is discussed.
Resumo:
En radiothérapie, la tomodensitométrie (CT) fournit l’information anatomique du patient utile au calcul de dose durant la planification de traitement. Afin de considérer la composition hétérogène des tissus, des techniques de calcul telles que la méthode Monte Carlo sont nécessaires pour calculer la dose de manière exacte. L’importation des images CT dans un tel calcul exige que chaque voxel exprimé en unité Hounsfield (HU) soit converti en une valeur physique telle que la densité électronique (ED). Cette conversion est habituellement effectuée à l’aide d’une courbe d’étalonnage HU-ED. Une anomalie ou artefact qui apparaît dans une image CT avant l’étalonnage est susceptible d’assigner un mauvais tissu à un voxel. Ces erreurs peuvent causer une perte cruciale de fiabilité du calcul de dose. Ce travail vise à attribuer une valeur exacte aux voxels d’images CT afin d’assurer la fiabilité des calculs de dose durant la planification de traitement en radiothérapie. Pour y parvenir, une étude est réalisée sur les artefacts qui sont reproduits par simulation Monte Carlo. Pour réduire le temps de calcul, les simulations sont parallélisées et transposées sur un superordinateur. Une étude de sensibilité des nombres HU en présence d’artefacts est ensuite réalisée par une analyse statistique des histogrammes. À l’origine de nombreux artefacts, le durcissement de faisceau est étudié davantage. Une revue sur l’état de l’art en matière de correction du durcissement de faisceau est présentée suivi d’une démonstration explicite d’une correction empirique.
Resumo:
Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. Brain tissue is a particularly complex structure, and its segmentation is an important step for studies in temporal change detection of morphology, as well as for 3D visualization in surgical planning. In this paper, we present a method for segmentation of brain tissue from magnetic resonance images that is a combination of three existing techniques from the Computer Vision literature: EM segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation in a way that the resultant method is more robust than its components. Finally, we present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 20 brain scans each with 256x256x124 voxels and validate those against segmentations generated by neuroanatomy experts.
Resumo:
A full assessment of para-virtualization is important, because without knowledge about the various overheads, users can not understand whether using virtualization is a good idea or not. In this paper we are very interested in assessing the overheads of running various benchmarks on bare-‐metal, as well as on para-‐virtualization. The idea is to see what the overheads of para-‐ virtualization are, as well as looking at the overheads of turning on monitoring and logging. The knowledge from assessing various benchmarks on these different systems will help a range of users understand the use of virtualization systems. In this paper we assess the overheads of using Xen, VMware, KVM and Citrix, see Table 1. These different virtualization systems are used extensively by cloud-‐users. We are using various Netlib1 benchmarks, which have been developed by the University of Tennessee at Knoxville (UTK), and Oak Ridge National Laboratory (ORNL). In order to assess these virtualization systems, we run the benchmarks on bare-‐metal, then on the para-‐virtualization, and finally we turn on monitoring and logging. The later is important as users are interested in Service Level Agreements (SLAs) used by the Cloud providers, and the use of logging is a means of assessing the services bought and used from commercial providers. In this paper we assess the virtualization systems on three different systems. We use the Thamesblue supercomputer, the Hactar cluster and IBM JS20 blade server (see Table 2), which are all servers available at the University of Reading. A functional virtualization system is multi-‐layered and is driven by the privileged components. Virtualization systems can host multiple guest operating systems, which run on its own domain, and the system schedules virtual CPUs and memory within each Virtual Machines (VM) to make the best use of the available resources. The guest-‐operating system schedules each application accordingly. You can deploy virtualization as full virtualization or para-‐virtualization. Full virtualization provides a total abstraction of the underlying physical system and creates a new virtual system, where the guest operating systems can run. No modifications are needed in the guest OS or application, e.g. the guest OS or application is not aware of the virtualized environment and runs normally. Para-‐virualization requires user modification of the guest operating systems, which runs on the virtual machines, e.g. these guest operating systems are aware that they are running on a virtual machine, and provide near-‐native performance. You can deploy both para-‐virtualization and full virtualization across various virtualized systems. Para-‐virtualization is an OS-‐assisted virtualization; where some modifications are made in the guest operating system to enable better performance. In this kind of virtualization, the guest operating system is aware of the fact that it is running on the virtualized hardware and not on the bare hardware. In para-‐virtualization, the device drivers in the guest operating system coordinate the device drivers of host operating system and reduce the performance overheads. The use of para-‐virtualization [0] is intended to avoid the bottleneck associated with slow hardware interrupts that exist when full virtualization is employed. It has revealed [0] that para-‐ virtualization does not impose significant performance overhead in high performance computing, and this in turn this has implications for the use of cloud computing for hosting HPC applications. The “apparent” improvement in virtualization has led us to formulate the hypothesis that certain classes of HPC applications should be able to execute in a cloud environment, with minimal performance degradation. In order to support this hypothesis, first it is necessary to define exactly what is meant by a “class” of application, and secondly it will be necessary to observe application performance, both within a virtual machine and when executing on bare hardware. A further potential complication is associated with the need for Cloud service providers to support Service Level Agreements (SLA), so that system utilisation can be audited.