117 resultados para Parallelizing Compilers
Resumo:
This paper describes a fast integer sorting algorithm, herein referred to as Bit-index sort, which does not use comparisons and is intended to sort partial permutations. Experimental results exhibit linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers, supported by machine hardware, to retrieve the ordered output sequence. Results show that Bit-index sort outperforms quicksort and counting sort algorithms when compared in their execution time. A parallel approach for Bit-index sort using two simultaneous threads is also included, which obtains further speedups of up to 1.6 compared to its sequential case.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
It is now clear that the concept of a HPC compiler which automatically produces highly efficient parallel implementations is a pipe-dream. Another route is to recognise from the outset that user information is required and to develop tools that embed user interaction in the transformation of code from scalar to parallel form, and then use conventional compilers with a set of communication calls. This represents the key idea underlying the development of the CAPTools software environment. The initial version of CAPTools is focused upon single block structured mesh computational mechanics codes. The capability for unstructured mesh codes is under test now and block structured meshes will be included next. The parallelisation process can be completed rapidly for modest codes and the parallel performance approaches that which is delivered by hand parallelisations.
Resumo:
Many-core systems are emerging from the need of more computational power and power efficiency. However there are many issues which still revolve around the many-core systems. These systems need specialized software before they can be fully utilized and the hardware itself may differ from the conventional computational systems. To gain efficiency from many-core system, programs need to be parallelized. In many-core systems the cores are small and less powerful than cores used in traditional computing, so running a conventional program is not an efficient option. Also in Network-on-Chip based processors the network might get congested and the cores might work at different speeds. In this thesis is, a dynamic load balancing method is proposed and tested on Intel 48-core Single-Chip Cloud Computer by parallelizing a fault simulator. The maximum speedup is difficult to obtain due to severe bottlenecks in the system. In order to exploit all the available parallelism of the Single-Chip Cloud Computer, a runtime approach capable of dynamically balancing the load during the fault simulation process is used. The proposed dynamic fault simulation approach on the Single-Chip Cloud Computer shows up to 45X speedup compared to a serial fault simulation approach. Many-core systems can draw enormous amounts of power, and if this power is not controlled properly, the system might get damaged. One way to manage power is to set power budget for the system. But if this power is drawn by just few cores of the many, these few cores get extremely hot and might get damaged. Due to increase in power density multiple thermal sensors are deployed on the chip area to provide realtime temperature feedback for thermal management techniques. Thermal sensor accuracy is extremely prone to intra-die process variation and aging phenomena. These factors lead to a situation where thermal sensor values drift from the nominal values. This necessitates efficient calibration techniques to be applied before the sensor values are used. In addition, in modern many-core systems cores have support for dynamic voltage and frequency scaling. Thermal sensors located on cores are sensitive to the core's current voltage level, meaning that dedicated calibration is needed for each voltage level. In this thesis a general-purpose software-based auto-calibration approach is also proposed for thermal sensors to calibrate thermal sensors on different range of voltages.
Resumo:
Many functional programs can be viewed as representation changers, that is, as functions that convert abstract values from one concrete representation to another. Examples of such programs include base-converters, binary adders and multipliers, and compilers. In this paper we give a number of different approaches to specifying representation changers (pointwise, functional, and relational), and present a simple technique that can be used to derive functional programs from the specifications.
Resumo:
This paper draws a parallel between document preparation and the traditional processes of compilation and link editing for computer programs. A block-based document model is described which allows for separate compilation of various portions of a document. These portions are brought together and merged by a linker program, called dlink, whose pilot implementation is based on ditroff and on its underlying intermediate code. In the light of experiences with dlink the requirements for a universal object-module language for documents are discussed. These requirements often resemble the characteristics of the intermediate codes used by programming-language compilers but with interesting extra constraints which arise from the way documents are executed .
Resumo:
For various reasons, many Algol 68 compilers do not directly implement the parallel processing operations defined in the Revised Algol 68 Report. It is still possible however, to perform parallel processing, multitasking and simulation provided that the implementation permits the creation of a master routine for the coordination and initiation of processes under its control. The package described here is intended for real time applications and runs in conjunction with the Algol 68R system; it extends and develops the original Algol 68RT package, which was designed for use with multiplexers at the Royal Radar Establishment, Malvern. The facilities provided, in addition to the synchronising operations, include an interface to an ICL Communications Processor enabling the abstract processes to be realised as the interaction of several teletypes or visual display units with a real time program providing a useful service.
Resumo:
The persistence concern implemented as an aspect has been studied since the appearance of the Aspect-Oriented paradigm. Frequently, persistence is given as an example that can be aspectized, but until today no real world solution has applied that paradigm. Such solution should be able to enhance the programmer productivity and make the application less prone to errors. To test the viability of that concept, in a previous study we developed a prototype that implements Orthogonal Persistence as an aspect. This first version of the prototype was already fully functional with all Java types including arrays. In this work the results of our new research to overcome some limitations that we have identified on the data type abstraction and transparency in the prototype are presented. One of our goals was to avoid the Java standard idiom for genericity, based on casts, type tests and subtyping. Moreover, we also find the need to introduce some dynamic data type abilities. We consider that the Reflection is the solution to those issues. To achieve that, we have extended our prototype with a new static weaver that preprocesses the application source code in order to introduce changes to the normal behavior of the Java compiler with a new generated reflective code.
Resumo:
During the lifetime of a research project, different partners develop several research prototype tools that share many common aspects. This is equally true for researchers as individuals and as groups: during a period of time they often develop several related tools to pursue a specific research line. Making research prototype tools easily accessible to the community is of utmost importance to promote the corresponding research, get feedback, and increase the tools’ lifetime beyond the duration of a specific project. One way to achieve this is to build graphical user interfaces (GUIs) that facilitate trying tools; in particular, with web-interfaces one avoids the overhead of downloading and installing the tools. Building GUIs from scratch is a tedious task, in particular for web-interfaces, and thus it typically gets low priority when developing a research prototype. Often we opt for copying the GUI of one tool and modifying it to fit the needs of a new related tool. Apart from code duplication, these tools will “live” separately, even though we might benefit from having them all in a common environment since they are related. This work aims at simplifying the process of building GUIs for research prototypes tools. In particular, we present EasyInterface, a toolkit that is based on novel methodology that provides an easy way to make research prototype tools available via common different environments such as a web-interface, within Eclipse, etc. It includes a novel text-based output language that allows to present results graphically without requiring any knowledge in GUI/Web programming. For example, an output of a tool could be (a structured version of) “highlight line number 10 of file ex.c” and “when the user clicks on line 10, open a dialog box with the text ...”. The environment will interpret this output and converts it to corresponding visual e_ects. The advantage of using this approach is that it will be interpreted equally by all environments of EasyInterface, e.g., the web-interface, the Eclipse plugin, etc. EasyInterface has been developed in the context of the Envisage [5] project, and has been evaluated on tools developed in this project, which include static analyzers, test-case generators, compilers, simulators, etc. EasyInterface is open source and available at GitHub2.
Resumo:
The quality and the speed for genome sequencing has advanced at the same time that technology boundaries are stretched. This advancement has been divided so far in three generations. The first-generation methods enabled sequencing of clonal DNA populations. The second-generation massively increased throughput by parallelizing many reactions while the third-generation methods allow direct sequencing of single DNA molecules. The first techniques to sequence DNA were not developed until the mid-1970s, when two distinct sequencing methods were developed almost simultaneously, one by Alan Maxam and Walter Gilbert, and the other one by Frederick Sanger. The first one is a chemical method to cleave DNA at specific points and the second one uses ddNTPs, which synthesizes a copy from the DNA chain template. Nevertheless, both methods generate fragments of varying lengths that are further electrophoresed. Moreover, it is important to say that until the 1990s, the sequencing of DNA was relatively expensive and it was seen as a long process. Besides, using radiolabeled nucleotides also compounded the problem through safety concerns and prevented the automation. Some advancements within the first generation include the replacement of radioactive labels by fluorescent labeled ddNTPs and cycle sequencing with thermostable DNA polymerase, which allows automation and signal amplification, making the process cheaper, safer and faster. Another method is Pyrosequencing, which is based on the “sequencing by synthesis” principle. It differs from Sanger sequencing, in that it relies on the detection of pyrophosphate release on nucleotide incorporation. By the end of the last millennia, parallelization of this method started the Next Generation Sequencing (NGS) with 454 as the first of many methods that can process multiple samples, calling it the 2º generation sequencing. Here electrophoresis was completely eliminated. One of the methods that is sometimes used is SOLiD, based on sequencing by ligation of fluorescently dye-labeled di-base probes which competes to ligate to the sequencing primer. Specificity of the di-base probe is achieved by interrogating every 1st and 2nd base in each ligation reaction. The widely used Solexa/Illumina method uses modified dNTPs containing so called “reversible terminators” which blocks further polymerization. The terminator also contains a fluorescent label, which can be detected by a camera. Now, the previous step towards the third generation was in charge of Ion Torrent, who developed a technique that is based in a method of “sequencing-by-synthesis”. Its main feature is the detection of hydrogen ions that are released during base incorporation. Likewise, the third generation takes into account nanotechnology advancements for the processing of unique DNA molecules to a real time synthesis sequencing system like PacBio; and finally, the NANOPORE, projected since 1995, also uses Nano-sensors forming channels obtained from bacteria that conducts the sample to a sensor that allows the detection of each nucleotide residue in the DNA strand. The advancements in terms of technology that we have nowadays have been so quick, that it makes wonder: ¿How do we imagine the next generation?
Resumo:
In the context of computer numerical control (CNC) and computer aided manufacturing (CAM), the capabilities of programming languages such as symbolic and intuitive programming, program portability and geometrical portfolio have special importance -- They allow to save time and to avoid errors during part programming and permit code re-usage -- Our updated literature review indicates that the current state of art presents voids in parametric programming, program portability and programming flexibility -- In response to this situation, this article presents a compiler implementation for EGCL (Extended G-code Language), a new, enriched CNC programming language which allows the use of descriptive variable names, geometrical functions and flow-control statements (if-then-else, while) -- Our compiler produces low-level generic, elementary ISO-compliant Gcode, thus allowing for flexibility in the choice of the executing CNC machine and in portability -- Our results show that readable variable names and flow control statements allow a simplified and intuitive part programming and permit re-usage of the programs -- Future work includes allowing the programmer to define own functions in terms of EGCL, in contrast to the current status of having them as library built-in functions
Resumo:
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.