742 resultados para implementations


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of unstructured mesh codes on parallel machines is one of the most effective ways to solve large computational mechanics problems. Completely general geometries and complex behaviour can be modelled and, in principle, the inherent sparsity of many such problems can be exploited to obtain excellent parallel efficiencies. However, unlike their structured counterparts, the problem of distributing the mesh across the memory of the machine, whilst minimising the amount of interprocessor communication, must be carefully addressed. This process is an overhead that is not incurred by a serial code, but is shown to rapidly computable at turn time and tailored for the machine being used.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is now clear that the concept of a HPC compiler which automatically produces highly efficient parallel implementations is a pipe-dream. Another route is to recognise from the outset that user information is required and to develop tools that embed user interaction in the transformation of code from scalar to parallel form, and then use conventional compilers with a set of communication calls. This represents the key idea underlying the development of the CAPTools software environment. The initial version of CAPTools is focused upon single block structured mesh computational mechanics codes. The capability for unstructured mesh codes is under test now and block structured meshes will be included next. The parallelisation process can be completed rapidly for modest codes and the parallel performance approaches that which is delivered by hand parallelisations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Atualmente vivemos numa era em que a publicidade nos rodeia através de várias formas e onde as empresas esforçam-se cada vez mais para tornar eficaz a mensagem que pretendem passar. O uso de métodos convencionais, como a televisão, rádio, ou até outdoors, está a tornar-se pouco eficaz. Em muito pouco tempo, nos últimos vinte anos, a Internet mudou a nossa forma de viver, sendo até comparado ao Renascimento e à Revolução Industrial. As gerações mais recentes nasceram rodeadas deste “boom” publicitário, o que as tornou imunes. De forma a contornar este problema, surge Levinson em 1989 onde apresenta uma forma de minimizar este efeito e ao mesmo tempo proporcionar a que pequenas empresas tenham capacidade de competir com as maiores (Levinson, 2007). Assim, o marketing de guerrilha caracteriza-se por estar normalmente associado a implementações de baixo custo, que por vezes são irrepetíveis, pois conseguem alcançar um impacto “wow” significativo junto do grande público (Oliveira & Ferreira, 2013). O presente estudo contribui para a literatura do marketing de guerrilha existente, realizando assim uma compilação acerca do desenvolvimento desta temática até aos dias de hoje. De forma a perceber quais são os fatores que influenciam o uso do marketing de guerrilha pelas empresas portuguesas, foram inquiridas 140 empresas de todo o país, através de um questionário com base no estudo desenvolvido por Overbeek (2012). Através desta investigação exploratória, numa área ainda pouco explorada em Portugal, até à data, em especial a nível académico, “verificou-se que existe uma grande procura por este tipo de ferramentas não convencionais, tanto que, verificou-se que 86,4% da amostra já presenciou uma ação de guerrilha, no entanto apenas 36,4% admite já ter implementado na sua empresa, o que levanta a questão do porquê de uma taxa tão reduzida de utilização deste tipo de abordagem não convencional (Almeida & Au-Yong-Oliveira, 2015, p.1). A explicação poderá estar ligada à grande aversão à incerteza que existe em Portugal (Hofstede, 2001), e ao receio da mudança e da experimentação de novos produtos em Portugal (Steenkamp et al., 1999). Fatores que não irão mudar durante décadas, dado o tempo que demora a mudar culturas nacionais (Hofstede, 2001). Verifica-se também que na amostra das 140 empresas se destacam pessoas formadas (ao grau de licenciatura e mestrado) em Marketing (18,7% da amostra), Design (15,7%), Gestão (10,4%) e Tecnologias da Informação e Comunicação (7,9%). Pode-se concluir que são as quatro áreas fundamentais, ou pelo menos a necessidade existe em ter conhecimento nestas quatro áreas atualmente. Devido à [pequena] dimensão das empresas, um colaborador que tenha estas quatro competências tem uma vantagem competitiva face aos restantes, no que toca a hard skills.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Die Nützlichkeit des Einsatzes von Computern in Schule und Ausbildung ist schon seit einigen Jahren unbestritten. Uneinigkeit herrscht gegenwärtig allerdings darüber, welche Aufgaben von Computern eigenständig wahrgenommen werden können. Bewertet man die Übernahme von Lehrfunktionen durch computerbasierte Lehrsysteme, müssen häufig Mängel festgestellt werden. Das Ziel der vorliegenden Arbeit ist es, ausgehend von aktuellen Praxisrealisierungen computerbasierter Lehrsysteme unterschiedliche Klassen von zentralen Lehrkompetenzen (Schülermodellierung, Fachwissen und instruktionale Aktivitäten im engeren Sinne) zu bestimmen. Innerhalb jeder Klasse werden globale Leistungen der Lehrsysteme und notwendige, in komplementärer Relation stehende Tätigkeiten menschlicher Tutoren bestimmt. Das dabei entstandene Klassifikationsschema erlaubt sowohl die Einordnung typischer Lehrsysteme als auch die Feststellung von spezifischen Kompetenzen, die in der Lehrer- bzw. Trainerausbildung zukünftig vermehrt berücksichtigt werden sollten. (DIPF/Orig.)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The last two decades have seen many exciting examples of tiny robots from a few cm3 to less than one cm3. Although individually limited, a large group of these robots has the potential to work cooperatively and accomplish complex tasks. Two examples from nature that exhibit this type of cooperation are ant and bee colonies. They have the potential to assist in applications like search and rescue, military scouting, infrastructure and equipment monitoring, nano-manufacture, and possibly medicine. Most of these applications require the high level of autonomy that has been demonstrated by large robotic platforms, such as the iRobot and Honda ASIMO. However, when robot size shrinks down, current approaches to achieve the necessary functions are no longer valid. This work focused on challenges associated with the electronics and fabrication. We addressed three major technical hurdles inherent to current approaches: 1) difficulty of compact integration; 2) need for real-time and power-efficient computations; 3) unavailability of commercial tiny actuators and motion mechanisms. The aim of this work was to provide enabling hardware technologies to achieve autonomy in tiny robots. We proposed a decentralized application-specific integrated circuit (ASIC) where each component is responsible for its own operation and autonomy to the greatest extent possible. The ASIC consists of electronics modules for the fundamental functions required to fulfill the desired autonomy: actuation, control, power supply, and sensing. The actuators and mechanisms could potentially be post-fabricated on the ASIC directly. This design makes for a modular architecture. The following components were shown to work in physical implementations or simulations: 1) a tunable motion controller for ultralow frequency actuation; 2) a nonvolatile memory and programming circuit to achieve automatic and one-time programming; 3) a high-voltage circuit with the highest reported breakdown voltage in standard 0.5 μm CMOS; 4) thermal actuators fabricated using CMOS compatible process; 5) a low-power mixed-signal computational architecture for robotic dynamics simulator; 6) a frequency-boost technique to achieve low jitter in ring oscillators. These contributions will be generally enabling for other systems with strict size and power constraints such as wireless sensor nodes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is considerable interest in the use of genetic algorithms to solve problems arising in the areas of scheduling and timetabling. However, the classical genetic algorithm paradigm is not well equipped to handle the conflict between objectives and constraints that typically occurs in such problems. In order to overcome this, successful implementations frequently make use of problem specific knowledge. This paper is concerned with the development of a GA for a nurse rostering problem at a major UK hospital. The structure of the constraints is used as the basis for a co-evolutionary strategy using co-operating sub-populations. Problem specific knowledge is also used to define a system of incentives and disincentives, and a complementary mutation operator. Empirical results based on 52 weeks of live data show how these features are able to improve an unsuccessful canonical GA to the point where it is able to provide a practical solution to the problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the last few years, more and more heuristic decision making techniques have been inspired by nature, e.g. evolutionary algorithms, ant colony optimisation and simulated annealing. More recently, a novel computational intelligence technique inspired by immunology has emerged, called Artificial Immune Systems (AIS). This immune system inspired technique has already been useful in solving some computational problems. In this keynote, we will very briefly describe the immune system metaphors that are relevant to AIS. We will then give some illustrative real-world problems suitable for AIS use and show a step-by-step algorithm walkthrough. A comparison of AIS to other well-known algorithms and areas for future work will round this keynote off. It should be noted that as AIS is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from the examples given here

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We show that the decay of the inflaton field may be incomplete, while nevertheless successfully reheating the Universe and leaving a stable remnant that accounts for the present dark matter abundance. We note, in particular, that since the mass of the inflaton decay products is field dependent, one can construct models, endowed with an appropriate discrete symmetry, where inflaton decay is kinematically forbidden at late times and only occurs during the initial stages of field oscillations after inflation. We show that this is sufficient to ensure the transition to a radiation-dominated era and that inflaton particles typically thermalize in the process. They eventually decouple and freeze out, yielding a thermal dark matter relic. We discuss possible implementations of this generic mechanism within consistent cosmological and particle physics scenarios, for both single-field and hybrid inflation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Simultaneous Localization and Mapping (SLAM) is a procedure used to determine the location of a mobile vehicle in an unknown environment, while constructing a map of the unknown environment at the same time. Mobile platforms, which make use of SLAM algorithms, have industrial applications in autonomous maintenance, such as the inspection of flaws and defects in oil pipelines and storage tanks. A typical SLAM consists of four main components, namely, experimental setup (data gathering), vehicle pose estimation, feature extraction, and filtering. Feature extraction is the process of realizing significant features from the unknown environment such as corners, edges, walls, and interior features. In this work, an original feature extraction algorithm specific to distance measurements obtained through SONAR sensor data is presented. This algorithm has been constructed by combining the SONAR Salient Feature Extraction Algorithm and the Triangulation Hough Based Fusion with point-in-polygon detection. The reconstructed maps obtained through simulations and experimental data with the fusion algorithm are compared to the maps obtained with existing feature extraction algorithms. Based on the results obtained, it is suggested that the proposed algorithm can be employed as an option for data obtained from SONAR sensors in environment, where other forms of sensing are not viable. The algorithm fusion for feature extraction requires the vehicle pose estimation as an input, which is obtained from a vehicle pose estimation model. For the vehicle pose estimation, the author uses sensor integration to estimate the pose of the mobile vehicle. Different combinations of these sensors are studied (e.g., encoder, gyroscope, or encoder and gyroscope). The different sensor fusion techniques for the pose estimation are experimentally studied and compared. The vehicle pose estimation model, which produces the least amount of error, is used to generate inputs for the feature extraction algorithm fusion. In the experimental studies, two different environmental configurations are used, one without interior features and another one with two interior features. Numerical and experimental findings are discussed. Finally, the SLAM algorithm is implemented along with the algorithms for feature extraction and vehicle pose estimation. Three different cases are experimentally studied, with the floor of the environment intentionally altered to induce slipping. Results obtained for implementations with and without SLAM are compared and discussed. The present work represents a step towards the realization of autonomous inspection platforms for performing concurrent localization and mapping in harsh environments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the last few years, more and more heuristic decision making techniques have been inspired by nature, e.g. evolutionary algorithms, ant colony optimisation and simulated annealing. More recently, a novel computational intelligence technique inspired by immunology has emerged, called Artificial Immune Systems (AIS). This immune system inspired technique has already been useful in solving some computational problems. In this keynote, we will very briefly describe the immune system metaphors that are relevant to AIS. We will then give some illustrative real-world problems suitable for AIS use and show a step-by-step algorithm walkthrough. A comparison of AIS to other well-known algorithms and areas for future work will round this keynote off. It should be noted that as AIS is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from the examples given here.