848 resultados para parallel benchmarks
Resumo:
The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architecture. It is a co-processor specially tailored for data-parallel computations, whose basic architectural design is similar to the ones of GPUs (Graphics Processing Units), leveraging the use of many integrated low computational cores to perform parallel computations. The main novelty of the MIC architecture, relatively to GPUs, is its compatibility with the Intel x86 architecture. This enables the use of many of the tools commonly available for the parallel programming of x86-based architectures, which may lead to a smaller learning curve. However, programming the Xeon Phi still entails aspects intrinsic to accelerator-based computing, in general, and to the MIC architecture, in particular. In this thesis we advocate the use of algorithmic skeletons for programming the Xeon Phi. Algorithmic skeletons abstract the complexity inherent to parallel programming, hiding details such as resource management, parallel decomposition, inter-execution flow communication, thus removing these concerns from the programmer’s mind. In this context, the goal of the thesis is to lay the foundations for the development of a simple but powerful and efficient skeleton framework for the programming of the Xeon Phi processor. For this purpose we build upon Marrow, an existing framework for the orchestration of OpenCLTM computations in multi-GPU and CPU environments. We extend Marrow to execute both OpenCL and C++ parallel computations on the Xeon Phi. We evaluate the newly developed framework, several well-known benchmarks, like Saxpy and N-Body, will be used to compare, not only its performance to the existing framework when executing on the co-processor, but also to assess the performance on the Xeon Phi versus a multi-GPU environment.
Resumo:
OutSystems Platform is used to develop, deploy, and maintain enterprise web an mobile web applications. Applications are developed through a visual domain specific language, in an integrated development environment, and compiled to a standard stack of web technologies. In the platform’s core, there is a compiler and a deployment service that transform the visual model into a running web application. As applications grow, compilation and deployment times increase as well, impacting the developer’s productivity. In the previous model, a full application was the only compilation and deployment unit. When the developer published an application, even if he only changed a very small aspect of it, the application would be fully compiled and deployed. Our goal is to reduce compilation and deployment times for the most common use case, in which the developer performs small changes to an application before compiling and deploying it. We modified the OutSystems Platform to support a new incremental compilation and deployment model that reuses previous computations as much as possible in order to improve performance. In our approach, the full application is broken down into smaller compilation and deployment units, increasing what can be cached and reused. We also observed that this finer model would benefit from a parallel execution model. Hereby, we created a task driven Scheduler that executes compilation and deployment tasks in parallel. Our benchmarks show a substantial improvement of the compilation and deployment process times for the aforementioned development scenario.
Resumo:
Mutual fund managers increasingly lend their holdings and/or use short sales to generate higher returns for their funds. This project presents a first look into the impact these practices on performance using the performance measures: i) Characteristic Selectivity (CS), the ability of the fund's managers to choose stocks that outperform their benchmarks; ii) Characteristic Timing (CT), the ability of the manager to time the market; iii) and Average Style (AS), the returns from funds systematically holding stocks with certain characteristics. These returns are computed through the DGTW benchmarks. The effect of other variables that have also been shown to impact fund’s returns – total net assets under management, investment styles, turnover and expense ratios – will also be analyzed. I find that managers who use short-sales do not exhibit better stock picking abilities than those who do not, while mutual funds that lend do present higher CS returns. In addition, while lending is not significant for the total performance of a fund, the employment of short-sales and of both short-sales and lending has a negative impact on the fund’s performance.
Resumo:
Combinatorial Optimization Problems occur in a wide variety of contexts and generally are NP-hard problems. At a corporate level solving this problems is of great importance since they contribute to the optimization of operational costs. In this thesis we propose to solve the Public Transport Bus Assignment problem considering an heterogeneous fleet and line exchanges, a variant of the Multi-Depot Vehicle Scheduling Problem in which additional constraints are enforced to model a real life scenario. The number of constraints involved and the large number of variables makes impracticable solving to optimality using complete search techniques. Therefore, we explore metaheuristics, that sacrifice optimality to produce solutions in feasible time. More concretely, we focus on the development of algorithms based on a sophisticated metaheuristic, Ant-Colony Optimization (ACO), which is based on a stochastic learning mechanism. For complex problems with a considerable number of constraints, sophisticated metaheuristics may fail to produce quality solutions in a reasonable amount of time. Thus, we developed parallel shared-memory (SM) synchronous ACO algorithms, however, synchronism originates the straggler problem. Therefore, we proposed three SM asynchronous algorithms that break the original algorithm semantics and differ on the degree of concurrency allowed while manipulating the learned information. Our results show that our sequential ACO algorithms produced better solutions than a Restarts metaheuristic, the ACO algorithms were able to learn and better solutions were achieved by increasing the amount of cooperation (number of search agents). Regarding parallel algorithms, our asynchronous ACO algorithms outperformed synchronous ones in terms of speedup and solution quality, achieving speedups of 17.6x. The cooperation scheme imposed by asynchronism also achieved a better learning rate than the original one.
Resumo:
Since the last decade of the twentieth century, the healthcare industry is paying attention to the environmental impact of their buildings and therefore new regulations, policy goals and Buildings Sustainability Assessment (HBSA) methods are being developed and implemented. At the present, healthcare is one of the most regulated industries and it is also one of the largest consumers of energy per net floor area. To assess the sustainability of healthcare buildings it is necessary to establish a set of benchmarks related with their life-cycle performance. They are both essential to rate the sustainability of a project and to support designers and other stakeholders in the process of designing and operating a sustainable building, by allowing the comparison to be made between a project and the conventional and best market practices. This research is focused on the methodology to set the benchmarks for resources consumption, waste production, operation costs and potential environmental impacts related to the operational phase of healthcare buildings. It aims at contributing to the reduction of the subjectivity found in the definition of the benchmarks used in Building Sustainability Assessment (BSA) methods, and it is applied in the Portuguese context. These benchmarks will be used in the development of a Portuguese HBSA method.
Resumo:
The Closest Vector Problem (CVP) and the Shortest Vector Problem (SVP) are prime problems in lattice-based cryptanalysis, since they underpin the security of many lattice-based cryptosystems. Despite the importance of these problems, there are only a few CVP-solvers publicly available, and their scalability was never studied. This paper presents a scalable implementation of an enumeration-based CVP-solver for multi-cores, which can be easily adapted to solve the SVP. In particular, it achieves super-linear speedups in some instances on up to 8 cores and almost linear speedups on 16 cores when solving the CVP on a 50-dimensional lattice. Our results show that enumeration-based CVP-solvers can be parallelized as effectively as enumeration-based solvers for the SVP, based on a comparison with a state of the art SVP-solver. In addition, we show that we can optimize the SVP variant of our solver in such a way that it becomes 35%-60% faster than the fastest enumeration-based SVP-solver to date.
Resumo:
"A workshop within the 19th International Conference on Applications and Theory of Petri Nets - ICATPN’1998"
Resumo:
3
Resumo:
2
Resumo:
1
Resumo:
Magdeburg, Univ., Fak. für Verfahrens- und Systemtechnik, Diss., 2012
Resumo:
Magdeburg, Univ., Fak. für Verfahrens- und Systemtechnik, Diss., 2012
Resumo:
Nowadays a huge attention of the academia and research teams is attracted to the potential of the usage of the 60 GHz frequency band in the wireless communications. The use of the 60GHz frequency band offers great possibilities for wide variety of applications that are yet to be implemented. These applications also imply huge implementation challenges. Such example is building a high data rate transceiver which at the same time would have very low power consumption. In this paper we present a prototype of Single Carrier -SC transceiver system, illustrating a brief overview of the baseband design, emphasizing the most important decisions that need to be done. A brief overview of the possible approaches when implementing the equalizer, as the most complex module in the SC transceiver, is also presented. The main focus of this paper is to suggest a parallel architecture for the receiver in a Single Carrier communication system. This would provide higher data rates that the communication system canachieve, for a price of higher power consumption. The suggested architecture of such receiver is illustrated in this paper,giving the results of its implementation in comparison with its corresponding serial implementation.
Resumo:
Magdeburg, Univ., Fak. für Naturwiss., Diss., 2014