774 resultados para GPGPU Parallel Computing
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
Markowitz showed that assets can be combined to produce an 'Efficient' portfolio that will give the highest level of portfolio return for any level of portfolio risk, as measured by the variance or standard deviation. These portfolios can then be connected to generate what is termed an 'Efficient Frontier' (EF). In this paper we discuss the calculation of the Efficient Frontier for combinations of assets, again using the spreadsheet Optimiser. To illustrate the derivation of the Efficient Frontier, we use the data from the Investment Property Databank Long Term Index of Investment Returns for the period 1971 to 1993. Many investors might require a certain specific level of holding or a restriction on holdings in at least some of the assets. Such additional constraints may be readily incorporated into the model to generate a constrained EF with upper and/or lower bounds. This can then be compared with the unconstrained EF to see whether the reduction in return is acceptable. To see the effect that these additional constraints may have, we adopt a fairly typical pension fund profile, with no more than 20% of the total held in Property. The paper shows that it is now relatively easy to use the Optimiser available in at least one spreadsheet (EXCEL) to calculate efficient portfolios for various levels of risk and return, both constrained and unconstrained, so as to be able to generate any number of Efficient Frontiers.
Resumo:
Both the (5,3) counter and (2,2,3) counter multiplication techniques are investigated for the efficiency of their operation speed and the viability of the architectures when implemented in a fast bipolar ECL technology. The implementation of the counters in series-gated ECL and threshold logic are contrasted for speed, noise immunity and complexity, and are critically compared with the fastest practical design of a full-adder. A novel circuit technique to overcome the problems of needing high fan-in input weights in threshold circuits through the use of negative weighted inputs is presented. The authors conclude that a (2,2,3) counter based array multiplier implemented in series-gated ECL should enable a significant increase in speed over conventional full adder based array multipliers.
Resumo:
The authors compare various array multiplier architectures based on (p,q) counter circuits. The tradeoff in multiplier design is always between adding complexity and increasing speed. It is shown that by using a (2,2,3) counter cell it is possible to gain a significant increase in speed over a conventional full-adder, carry-save array based approach. The increase in complexity should be easily accommodated using modern emitter-coupled-logic processes.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
The impending threat of global climate change and its regional manifestations is among the most important and urgent problems facing humanity. Society needs accurate and reliable estimates of changes in the probability of regional weather variations to develop science-based adaptation and mitigation strategies. Recent advances in weather prediction and in our understanding and ability to model the climate system suggest that it is both necessary and possible to revolutionize climate prediction to meet these societal needs. However, the scientific workforce and the computational capability required to bring about such a revolution is not available in any single nation. Motivated by the success of internationally funded infrastructure in other areas of science, this paper argues that, because of the complexity of the climate system, and because the regional manifestations of climate change are mainly through changes in the statistics of regional weather variations, the scientific and computational requirements to predict its behavior reliably are so enormous that the nations of the world should create a small number of multinational high-performance computing facilities dedicated to the grand challenges of developing the capabilities to predict climate variability and change on both global and regional scales over the coming decades. Such facilities will play a key role in the development of next-generation climate models, build global capacity in climate research, nurture a highly trained workforce, and engage the global user community, policy-makers, and stakeholders. We recommend the creation of a small number of multinational facilities with computer capability at each facility of about 20 peta-flops in the near term, about 200 petaflops within five years, and 1 exaflop by the end of the next decade. Each facility should have sufficient scientific workforce to develop and maintain the software and data analysis infrastructure. Such facilities will enable questions of what resolution, both horizontal and vertical, in atmospheric and ocean models, is necessary for more confident predictions at the regional and local level. Current limitations in computing power have placed severe limitations on such an investigation, which is now badly needed. These facilities will also provide the world's scientists with the computational laboratories for fundamental research on weather–climate interactions using 1-km resolution models and on atmospheric, terrestrial, cryospheric, and oceanic processes at even finer scales. Each facility should have enabling infrastructure including hardware, software, and data analysis support, and scientific capacity to interact with the national centers and other visitors. This will accelerate our understanding of how the climate system works and how to model it. It will ultimately enable the climate community to provide society with climate predictions, which are based on our best knowledge of science and the most advanced technology.
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.
Resumo:
Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.
Resumo:
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data and a data warehouse. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular we look at two aspects, first how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories --- this is an important and challenging aspect of P-found because the data volumes involved are too large to be centralised. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling new scientific discoveries.
Resumo:
Java is becoming an increasingly popular language for developing distributed and parallel scientific and engineering applications. Jini is a Java-based infrastructure developed by Sun that can allegedly provide all the services necessary to support distributed applications. It is the aim of this paper to explore and investigate the services and properties that Jini actually provides and match these against the needs of high performance distributed and parallel applications written in Java. The motivation for this work is the need to develop a distributed infrastructure to support an MPI-like interface to Java known as MPJ. In the first part of the paper we discuss the needs of MPJ, the parallel environment that we wish to support. In particular we look at aspects such as reliability and ease of use. We then move on to sketch out the Jini architecture and review the components and services that Jini provides. In the third part of the paper we critically explore a Jini infrastructure that could be used to support MPJ. Here we are particularly concerned with Jini's ability to support reliably a cocoon of MPJ processes executing in a heterogeneous envirnoment. In the final part of the paper we summarise our findings and report on future work being undertaken on Jini and MPJ.