877 resultados para High-performance computing hyperspectral imaging


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Europe's widely distributed climate modelling expertise, now organized in the European Network for Earth System Modelling (ENES), is both a strength and a challenge. Recognizing this, the European Union's Program for Integrated Earth System Modelling (PRISM) infrastructure project aims at designing a flexible and friendly user environment to assemble, run and post-process Earth System models. PRISM was started in December 2001 with a duration of three years. This paper presents the major stages of PRISM, including: (1) the definition and promotion of scientific and technical standards to increase component modularity; (2) the development of an end-to-end software environment (graphical user interface, coupling and I/O system, diagnostics, visualization) to launch, monitor and analyse complex Earth system models built around state-of-art community component models (atmosphere, ocean, atmospheric chemistry, ocean bio-chemistry, sea-ice, land-surface); and (3) testing and quality standards to ensure high-performance computing performance on a variety of platforms. PRISM is emerging as a core strategic software infrastructure for building the European research area in Earth system sciences. Copyright (c) 2005 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Three naming strategies are discussed that allow the processes of a distributed application to continue being addressed by their original logical name, along all the migrations they may be forced to undertake because of performance-improvement goals. A simple centralised solution is firstly discussed which showed a software bottleneck with the increase of the number of processes; other two solutions are considered that entail different communication schemes and different communication overheads for the naming protocol. All these strategies are based on the facility that each process is allowed to survive after migration, even in its original site, only to provide a forwarding service to those communications that used its obsolete address.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A full assessment of para-­virtualization is important, because without knowledge about the various overheads, users can not understand whether using virtualization is a good idea or not. In this paper we are very interested in assessing the overheads of running various benchmarks on bare-­‐metal, as well as on para-­‐virtualization. The idea is to see what the overheads of para-­‐ virtualization are, as well as looking at the overheads of turning on monitoring and logging. The knowledge from assessing various benchmarks on these different systems will help a range of users understand the use of virtualization systems. In this paper we assess the overheads of using Xen, VMware, KVM and Citrix, see Table 1. These different virtualization systems are used extensively by cloud-­‐users. We are using various Netlib1 benchmarks, which have been developed by the University of Tennessee at Knoxville (UTK), and Oak Ridge National Laboratory (ORNL). In order to assess these virtualization systems, we run the benchmarks on bare-­‐metal, then on the para-­‐virtualization, and finally we turn on monitoring and logging. The later is important as users are interested in Service Level Agreements (SLAs) used by the Cloud providers, and the use of logging is a means of assessing the services bought and used from commercial providers. In this paper we assess the virtualization systems on three different systems. We use the Thamesblue supercomputer, the Hactar cluster and IBM JS20 blade server (see Table 2), which are all servers available at the University of Reading. A functional virtualization system is multi-­‐layered and is driven by the privileged components. Virtualization systems can host multiple guest operating systems, which run on its own domain, and the system schedules virtual CPUs and memory within each Virtual Machines (VM) to make the best use of the available resources. The guest-­‐operating system schedules each application accordingly. You can deploy virtualization as full virtualization or para-­‐virtualization. Full virtualization provides a total abstraction of the underlying physical system and creates a new virtual system, where the guest operating systems can run. No modifications are needed in the guest OS or application, e.g. the guest OS or application is not aware of the virtualized environment and runs normally. Para-­‐virualization requires user modification of the guest operating systems, which runs on the virtual machines, e.g. these guest operating systems are aware that they are running on a virtual machine, and provide near-­‐native performance. You can deploy both para-­‐virtualization and full virtualization across various virtualized systems. Para-­‐virtualization is an OS-­‐assisted virtualization; where some modifications are made in the guest operating system to enable better performance. In this kind of virtualization, the guest operating system is aware of the fact that it is running on the virtualized hardware and not on the bare hardware. In para-­‐virtualization, the device drivers in the guest operating system coordinate the device drivers of host operating system and reduce the performance overheads. The use of para-­‐virtualization [0] is intended to avoid the bottleneck associated with slow hardware interrupts that exist when full virtualization is employed. It has revealed [0] that para-­‐ virtualization does not impose significant performance overhead in high performance computing, and this in turn this has implications for the use of cloud computing for hosting HPC applications. The “apparent” improvement in virtualization has led us to formulate the hypothesis that certain classes of HPC applications should be able to execute in a cloud environment, with minimal performance degradation. In order to support this hypothesis, first it is necessary to define exactly what is meant by a “class” of application, and secondly it will be necessary to observe application performance, both within a virtual machine and when executing on bare hardware. A further potential complication is associated with the need for Cloud service providers to support Service Level Agreements (SLA), so that system utilisation can be audited.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reconfigurable computing is becoming an important new alternative for implementing computations. Field programmable gate arrays (FPGAs) are the ideal integrated circuit technology to experiment with the potential benefits of using different strategies of circuit specialization by reconfiguration. The final form of the reconfiguration strategy is often non-trivial to determine. Consequently, in this paper, we examine strategies for reconfiguration and, based on our experience, propose general guidelines for the tradeoffs using an area-time metric called functional density. Three experiments are set up to explore different reconfiguration strategies for FPGAs applied to a systolic implementation of a scalar quantizer used as a case study. Quantitative results for each experiment are given. The regular nature of the example means that the results can be generalized to a wide class of industry-relevant problems based on arrays.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The impending threat of global climate change and its regional manifestations is among the most important and urgent problems facing humanity. Society needs accurate and reliable estimates of changes in the probability of regional weather variations to develop science-based adaptation and mitigation strategies. Recent advances in weather prediction and in our understanding and ability to model the climate system suggest that it is both necessary and possible to revolutionize climate prediction to meet these societal needs. However, the scientific workforce and the computational capability required to bring about such a revolution is not available in any single nation. Motivated by the success of internationally funded infrastructure in other areas of science, this paper argues that, because of the complexity of the climate system, and because the regional manifestations of climate change are mainly through changes in the statistics of regional weather variations, the scientific and computational requirements to predict its behavior reliably are so enormous that the nations of the world should create a small number of multinational high-performance computing facilities dedicated to the grand challenges of developing the capabilities to predict climate variability and change on both global and regional scales over the coming decades. Such facilities will play a key role in the development of next-generation climate models, build global capacity in climate research, nurture a highly trained workforce, and engage the global user community, policy-makers, and stakeholders. We recommend the creation of a small number of multinational facilities with computer capability at each facility of about 20 peta-flops in the near term, about 200 petaflops within five years, and 1 exaflop by the end of the next decade. Each facility should have sufficient scientific workforce to develop and maintain the software and data analysis infrastructure. Such facilities will enable questions of what resolution, both horizontal and vertical, in atmospheric and ocean models, is necessary for more confident predictions at the regional and local level. Current limitations in computing power have placed severe limitations on such an investigation, which is now badly needed. These facilities will also provide the world's scientists with the computational laboratories for fundamental research on weather–climate interactions using 1-km resolution models and on atmospheric, terrestrial, cryospheric, and oceanic processes at even finer scales. Each facility should have enabling infrastructure including hardware, software, and data analysis support, and scientific capacity to interact with the national centers and other visitors. This will accelerate our understanding of how the climate system works and how to model it. It will ultimately enable the climate community to provide society with climate predictions, which are based on our best knowledge of science and the most advanced technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The necessity and benefits for establishing the international Earth-system Prediction Initiative (EPI) are discussed by scientists associated with the World Meteorological Organization (WMO) World Weather Research Programme (WWRP), World Climate Research Programme (WCRP), International Geosphere–Biosphere Programme (IGBP), Global Climate Observing System (GCOS), and natural-hazards and socioeconomic communities. The proposed initiative will provide research and services to accelerate advances in weather, climate, and Earth system prediction and the use of this information by global societies. It will build upon the WMO, the Group on Earth Observations (GEO), the Global Earth Observation System of Systems (GEOSS) and the International Council for Science (ICSU) to coordinate the effort across the weather, climate, Earth system, natural-hazards, and socioeconomic disciplines. It will require (i) advanced high-performance computing facilities, supporting a worldwide network of research and operational modeling centers, and early warning systems; (ii) science, technology, and education projects to enhance knowledge, awareness, and utilization of weather, climate, environmental, and socioeconomic information; (iii) investments in maintaining existing and developing new observational capabilities; and (iv) infrastructure to transition achievements into operational products and services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.