969 resultados para Loops parallelization
Resumo:
Identify opportunities for software parallelism is a task that takes a lot of human time, but once some code patterns for parallelism are identified, a software could quickly accomplish this task. Thus, automating this process brings many benefits such as saving time and reducing errors caused by the programmer [1]. This work aims at developing a software environment that identifies opportunities for parallelism in a source code written in C language, and generates a program with the same behavior, but with higher degree of parallelism, compatible with a graphics processor compatible with CUDA architecture.
Resumo:
Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exempli?ed by the SPECint codes, is a very hard problem. This paper shows that, with a helping hand, such auto-parallelization is possible and fruitful. This paper makes the following contributions: (i) A compiler framework for extracting pipeline-like parallelism from outer program loops is presented. (ii) Using a light-weight programming model based on annotations, the programmer helps the compiler to ?nd thread-level parallelism. Each of the annotations speci?es only a small piece of semantic information that compiler analysis misses, e.g. stating that a variable is dead at a certain program point. The annotations are designed such that correctness is easily veri?ed. Furthermore, we present a tool for suggesting annotations to the programmer. (iii) The methodology is applied to autoparallelize several SPECint benchmarks. For the benchmark with most parallelism (hmmer), we obtain a scalable 7-fold speedup on an AMD quad-core dual processor. The annotations constitute a parallel programming model that relies extensively on a sequential program representation. Hereby, the complexity of debugging is not increased and it does not obscure the source code. These properties could prove valuable to increase the ef?ciency of parallel programming.
Resumo:
Program specialization optimizes programs for known valúes of the input. It is often the case that the set of possible input valúes is unknown, or this set is infinite. However, a form of specialization can still be performed in such cases by means of abstract interpretation, specialization then being with respect to abstract valúes (substitutions), rather than concrete ones. We study the múltiple specialization of logic programs based on abstract interpretation. This involves in principie, and based on information from global analysis, generating several versions of a program predicate for different uses of such predicate, optimizing these versions, and, finally, producing a new, "multiply specialized" program. While múltiple specialization has received theoretical attention, little previous evidence exists on its practicality. In this paper we report on the incorporation of múltiple specialization in a parallelizing compiler and quantify its effects. A novel approach to the design and implementation of the specialization system is proposed. The resulting implementation techniques result in identical specializations to those of the best previously proposed techniques but require little or no modification of some existing abstract interpreters. Our results show that, using the proposed techniques, the resulting "abstract múltiple specialization" is indeed a relevant technique in practice. In particular, in the parallelizing compiler application, a good number of run-time tests are eliminated and invariants extracted automatically from loops, resulting generally in lower overheads and in several cases in increased speedups.
Resumo:
The invited presentation was delivered at Queensland Department of Main Roads, Brisbane Australia, 17th June 2013
Resumo:
Chlamydia trachomatis is a bacterial pathogen responsible for one of the most prevalent sexually transmitted infections worldwide. Its unique development cycle has limited our understanding of its pathogenic mechanisms. However, CtHtrA has recently been identified as a potential C. trachomatis virulence factor. CtHtrA is a tightly regulated quality control protein with a monomeric structural unit comprised of a chymotrypsin-like protease domain and two PDZ domains. Activation of proteolytic activity relies on the C-terminus of the substrate allosterically binding to the PDZ1 domain, which triggers subsequent conformational change and oligomerization of the protein into 24-mers enabling proteolysis. This activation is mediated by a cascade of precise structural arrangements, but the specific CtHtrA residues and structural elements required to facilitate activation are unknown. Using in vitro analysis guided by homology modeling, we show that the mutation of residues Arg362 and Arg224, predicted to disrupt the interaction between the CtHtrA PDZ1 domain and loop L3, and between loop L3 and loop LD, respectively, are critical for the activation of proteolytic activity. We also demonstrate that mutation to residues Arg299 and Lys160, predicted to disrupt PDZ1 domain interactions with protease loop LC and strand β5, are also able to influence proteolysis, implying their involvement in the CtHtrA mechanism of activation. This is the first investigation of protease loop LC and strand β5 with respect to their potential interactions with the PDZ1 domain. Given their high level of conservation in bacterial HtrA, these structural elements may be equally significant in the activation mechanism of DegP and other HtrA family members.
Resumo:
Loop detectors are widely used on the motorway networks where they provide point speed and traffic volumes. Models have been proposed for temporal and spatial generalization of speed for average travel time estimation. Advancement in technology provides complementary data sources such as Bluetooth MAC Scanner (BMS), detecting the MAC ID of the Bluetooth devices transported by the traveller. Matching the data from two BMS stations provides individual vehicle travel time. Generally, on the motorways loops are closely spaced, whereas BMS are placed few kilometres apart. In this research, we fuse BMSs and loops data to define the trajectories of the Bluetooth vehicles. The trajectories are utilised to estimate the travel time statistics between any two points along the motorway. The proposed model is tested using simulation and validated with real data from Pacific motorway, Brisbane. Comparing the model with the linear interpolation based trajectory provides significant improvements.
Resumo:
This thesis presents a novel program parallelization technique incorporating with dynamic and static scheduling. It utilizes a problem specific pattern developed from the prior knowledge of the targeted problem abstraction. Suitable for solving complex parallelization problems such as data intensive all-to-all comparison constrained by memory, the technique delivers more robust and faster task scheduling compared to the state-of-the art techniques. Good performance is achieved from the technique in data intensive bioinformatics applications.
Resumo:
The conformational analysis of the synthetic peptide Boc-Cys-Pro-Val-Cys-NHMe has been carried out, as a model for small disulfide loops, in biologically active polypeptides. 'H NMR studies (270 MHz) establish that the Val(3) and Cys(4) NH groups are solvent shielded, while 13C studies establish an all-trans peptide backbone. Circular dichroism and Raman spectroscopy provide evidence for a right-handed twist of the disulfide bond. Analysis of the vicinal (JaB)c oupling constants for the two Cys residues establishes that XI - *60° for Cys(4), while some flexibility is suggested at Cys( 1). Conformational energy calculations, imposing intramolecular hydrogen bonding constraints, favor a P-turn (type I) structure with Pro(2)-Va1(3) as the corner residues. Theoretical and spectroscopic results are consistent with the presence of a transannular 4 - 1 hydrogen bond between Cys( 1) CO and Cys(4) NH groups, with the Val NH being sterically shielded from the solvent environment.
Resumo:
In this thesis we examine multi-field inflationary models of the early Universe. Since non-Gaussianities may allow for the possibility to discriminate between models of inflation, we compute deviations from a Gaussian spectrum of primordial perturbations by extending the delta-N formalism. We use N-flation as a concrete model; our findings show that these models are generically indistinguishable as long as the slow roll approximation is still valid. Besides computing non-Guassinities, we also investigate Preheating after multi-field inflation. Within the framework of N-flation, we find that preheating via parametric resonance is suppressed, an indication that it is the old theory of preheating that is applicable. In addition to studying non-Gaussianities and preheatng in multi-field inflationary models, we study magnetogenesis in the early universe. To this aim, we propose a mechanism to generate primordial magnetic fields via rotating cosmic string loops. Magnetic fields in the micro-Gauss range have been observed in galaxies and clusters, but their origin has remained elusive. We consider a network of strings and find that rotating cosmic string loops, which are continuously produced in such networks, are viable candidates for magnetogenesis with relevant strength and length scales, provided we use a high string tension and an efficient dynamo.
Resumo:
Mycobacterium tuberculosis (Mtb), a dreaded pathogen, has a unique cell envelope composed of high fatty acid content that plays a crucial role in its pathogenesis. Acetyl Coenzyme A Carboxylase (ACC), an important enzyme that catalyzes the first reaction of fatty acid biosynthesis, is biotinylated by biotin acetyl-CoA carboxylase ligase (BirA). The ligand-binding loops in all known apo BirAs to date are disordered and attain an ordered structure only after undergoing a conformational change upon ligand-binding. Here, we report that dehydration of Mtb-BirA crystals traps both the apo and active conformations in its asymmetric unit, and for the first time provides structural evidence of such transformation. Recombinant Mtb-BirA was crystallized at room temperature, and diffraction data was collected at 295 K as well as at 120 K. Transfer of crystals to paraffin and paratone-N oil (cryoprotectants) prior to flash-freezing induced lattice shrinkage and enhancement in the resolution of the X-ray diffraction data. Intriguingly, the crystal lattice rearrangement due to shrinkage in the dehydrated Mtb-BirA crystals ensued structural order of otherwise flexible ligand-binding loops L4 and L8 in apo BirA. In addition, crystal dehydration resulted in a shift of similar to 3.5 angstrom in the flexible loop L6, a proline-rich loop unique to Mtb complex as well as around the L11 region. The shift in loop L11 in the C-terminal domain on dehydration emulates the action responsible for the complex formation with its protein ligand biotin carboxyl carrier protein (BCCP) domain of ACCA3. This is contrary to the involvement of loop L14 observed in Pyrococcus horikoshii BirA-BCCP complex. Another interesting feature that emerges from this dehydrated structure is that the two subunits A and B, though related by a noncrystallographic twofold symmetry, assemble into an asymmetric dimer representing the ligand-bound and ligand-free states of the protein, respectively. In-depth analyses of the sequence and the structure also provide answers to the reported lower affinities of Mtb-BirA toward ATP and biotin substrates. This dehydrated crystal structure not only provides key leads to the understanding of the structure/function relationships in the protein in the absence of any ligand-bound structure, but also demonstrates the merit of dehydration of crystals as an inimitable technique to have a glance at proteins in action.
Resumo:
This work describes the parallelization of High Resolution flow solver on unstructured meshes, HIFUN-3D, an unstructured data based finite volume solver for 3-D Euler equations. For mesh partitioning, we use METIS, a software based on multilevel graph partitioning. The unstructured graph used for partitioning is associated with weights both on its vertices and edges. The data residing on every processor is split into four layers. Such a novel procedure of handling data helps in maintaining the effectiveness of the serial code. The communication of data across the processors is achieved by explicit message passing using the standard blocking mode feature of Message Passing Interface (MPI). The parallel code is tested on PACE++128 available in CFD Center