119 resultados para Trenching machinery
Resumo:
Pervasive use of pointers in large-scale real-world applications continues to make points-to analysis an important optimization-enabler. Rapid growth of software systems demands a scalable pointer analysis algorithm. A typical inclusion-based points-to analysis iteratively evaluates constraints and computes a points-to solution until a fixpoint. In each iteration, (i) points-to information is propagated across directed edges in a constraint graph G and (ii) more edges are added by processing the points-to constraints. We observe that prioritizing the order in which the information is processed within each of the above two steps can lead to efficient execution of the points-to analysis. While earlier work in the literature focuses only on the propagation order, we argue that the other dimension, that is, prioritizing the constraint processing, can lead to even higher improvements on how fast the fixpoint of the points-to algorithm is reached. This becomes especially important as we prove that finding an optimal sequence for processing the points-to constraints is NP-Complete. The prioritization scheme proposed in this paper is general enough to be applied to any of the existing points-to analyses. Using the prioritization framework developed in this paper, we implement prioritized versions of Andersen's analysis, Deep Propagation, Hardekopf and Lin's Lazy Cycle Detection and Bloom Filter based points-to analysis. In each case, we report significant improvements in the analysis times (33%, 47%, 44%, 20% respectively) as well as the memory requirements for a large suite of programs, including SPEC 2000 benchmarks and five large open source programs.
Resumo:
Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings. With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime. In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.
Resumo:
High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit the resources in modern processor architectures. Nevertheless, selecting required compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may be off by orders of magnitude in performance compared to hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. Our framework has been implemented and validated experimentally on a representative set of benchmarks running on state-of-the-art multi-core platforms.
Resumo:
In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.
Resumo:
The present approach uses stopwords and the gaps that oc- cur between successive stopwords –formed by contentwords– as features for sentiment classification.
Resumo:
Network Intrusion Detection Systems (NIDS) intercept the traffic at an organization's network periphery to thwart intrusion attempts. Signature-based NIDS compares the intercepted packets against its database of known vulnerabilities and malware signatures to detect such cyber attacks. These signatures are represented using Regular Expressions (REs) and strings. Regular Expressions, because of their higher expressive power, are preferred over simple strings to write these signatures. We present Cascaded Automata Architecture to perform memory efficient Regular Expression pattern matching using existing string matching solutions. The proposed architecture performs two stage Regular Expression pattern matching. We replace the substring and character class components of the Regular Expression with new symbols. We address the challenges involved in this approach. We augment the Word-based Automata, obtained from the re-written Regular Expressions, with counter-based states and length bound transitions to perform Regular Expression pattern matching. We evaluated our architecture on Regular Expressions taken from Snort rulesets. We were able to reduce the number of automata states between 50% to 85%. Additionally, we could reduce the number of transitions by a factor of 3 leading to further reduction in the memory requirements.
Resumo:
There have been several studies on the performance of TCP controlled transfers over an infrastructure IEEE 802.11 WLAN, assuming perfect channel conditions. In this paper, we develop an analytical model for the throughput of TCP controlled file transfers over the IEEE 802.11 DCF with different packet error probabilities for the stations, accounting for the effect of packet drops on the TCP window. Our analysis proceeds by combining two models: one is an extension of the usual TCP-over-DCF model for an infrastructure WLAN, where the throughput of a station depends on the probability that the head-of-the-line packet at the Access Point belongs to that station; the second is a model for the TCP window process for connections with different drop probabilities. Iterative calculations between these models yields the head-of-the-line probabilities, and then, performance measures such as the throughputs and packet failure probabilities can be derived. We find that, due to MAC layer retransmissions, packet losses are rare even with high channel error probabilities and the stations obtain fair throughputs even when some of them have packet error probabilities as high as 0.1 or 0.2. For some restricted settings we are also able to model tail-drop loss at the AP. Although involving many approximations, the model captures the system behavior quite accurately, as compared with simulations.
Resumo:
This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Combating stress is one of the prime requirements for any organism. For parasitic microbes, stress levels are highest during the growth inside the host. Their survival depends on their ability to acclimatize and adapt to new environmental conditions. Robust cellular machinery for stress response is, therefore, both critical and essential especially for pathogenic microorganisms. Microbes have cleverly exploited stress proteins as virulence factors for pathogenesis in their hosts. Owing to its ability to sense and respond to the stress conditions, Heat shock protein 90 (Hsp90) is one of the key stress proteins utilized by parasitic microbes. There are growing evidences for the critical role played by Hsp90 in the growth of pathogenic organisms like Candida, Giardia, Plasmodium, Trypanosoma, and others. This review, therefore, explores potential of exploiting Hsp90 as a target for the treatment of infectious diseases. This molecular chaperone has already gained attention as an effective anti-cancer drug target. As a result, a lot of research has been done at laboratory, preclinical and clinical levels for several Hsp90 inhibitors as potential anti-cancer drugs. In addition, lot of data pertaining to toxicity studies, pharmacokinetics and pharmacodynamics studies, dosage regime, drug related toxicities, dose limiting toxicities as well as adverse drug reactions are available for Hsp90 inhibitors. Therefore, repurposing/repositioning strategies are also being explored for these compounds which have gone through advanced stage clinical trials. This review presents a comprehensive summary of current status of development of Hsp90 as a drug target and its inhibitors as candidate anti-infectives. A particular emphasis is laid on the possibility of repositioning strategies coupled with pharmaceutical solutions required for fulfilling needs for ever growing pharmaceutical infectious disease market.
Resumo:
Saccharomyces cerevisiae RAD50, MRE11, and XRS2 genes are essential for telomere length maintenance, cell cycle checkpoint signaling, meiotic recombination, and DNA double-stranded break (DSB) repair via nonhomologous end joining and homologous recombination. The DSB repair pathways that draw upon Mre11-Rad50-Xrs2 subunits are complex, so their mechanistic features remain poorly understood. Moreover, the molecular basis of DSB end resection in yeast mre11-nuclease deficient mutants and Mre11 nuclease-independent activation of ATM in mammals remains unknown and adds a new dimension to many unanswered questions about the mechanism of DSB repair. Here, we demonstrate that S. cerevisiae Mre11 (ScMre11) exhibits higher binding affinity for single-over double-stranded DNA and intermediates of recombination and repair and catalyzes robust unwinding of substrates possessing a 3' single-stranded DNA overhang but not of 5' overhangs or blunt-ended DNA fragments. Additional evidence disclosed that ScMre11 nuclease activity is dispensable for its DNA binding and unwinding activity, thus uncovering the molecular basis underlying DSB end processing in mre11 nuclease deficient mutants. Significantly, Rad50, Xrs2, and Sae2 potentiate the DNA unwinding activity of Mre11, thus underscoring functional interaction among the components of DSB end repair machinery. Our results also show that ScMre11 by itself binds to DSB ends, then promotes end bridging of duplex DNA, and directly interacts with Sae2. We discuss the implications of these results in the context of an alternative mechanism for DSB end processing and the generation of single-stranded DNA for DNA repair and homologous recombination.
Resumo:
The success of AAV2 mediated hepatic gene transfer in human trials for diseases such as hemophilia has been hampered by a combination of low transduction efficiency and a robust immune response directed against these vectors. We have previously shown that AAV2 is targeted for destruction in the cytoplasm by the host-cellular kinase/ubiquitination/proteasomal degradation machinery and modification of the serine(S)/threonine(T) kinase and lysine(K) targets on AAV capsid is beneficial. Thus targeted single mutations of S/T>A(S489A, S498A, T251A) and K>R (K532R) improved the efficiency of gene transfer in vivo as compared to wild type (WT)-AAV2 vectors (∼6-14 fold). In the present study, we evaluated if combined alteration of the phosphodegrons (PD), which are the phosphorylation sites recognized as degradation signals by ubiquitin ligases, improves further the gene transfer efficiency. Thus, we generated four multiple mutant vectors (PD: 1+3, S489A+K532R, PD: 1+3, S489A+K532R together with T251 residue which did not lie in any of the phosphodegrons but had shown increased transduction efficiency compared to the WT-AAV2 vector (∼6 fold) and was also conserved in 9 out of 10 AAV serotypes (AAV 1 to 10), PD: 1+3, S489A+K532R+S498A and a fourth combination of PD: 3, K532R+T251. We then evaluated them in vitro and in vivo and compared their gene transfer efficiency with either the WT-AAV2 or the best single mutant S489A-AAV2 vector. The novel multiple mutations on the AAV2 capsid did not affect the overall vector packaging efficiency. All the multiple AAV2 mutants showed superior transduction efficiency in HeLa cells in vitro when compared to either the WT (62-72% Vs 21%) or the single mutant S489A (62-72% Vs 50%) AAV2 vectors as demonstrated by FACS analysis (Fig. 1A). On hepatic gene transfer with 5x10^10 vgs per animal in C57BL/6 mice, all the multiple mutants showed increased transgene expression compared to either the WT-AAV2 (∼15-23 fold) or the S489A single mutant vector (∼2-3 fold) (Fig.1B and C). These novel multiple mutant AAV2 vectors also showed higher vector copy number in murine hepatocytes 4 weeks post transduction, as compared to the WT-AAV2 (∼5-6 Vs 1.4 vector copies/diploid genome) and further higher when compared to the single mutant S489A(∼5-6 fold Vs 3.8 fold) (Fig.1D). Further ongoing studies will demonstrate the therapeutic benefit of one or more of the multiple mutants vectors in preclinical models of hemophilia.
Resumo:
Recombinant AAV-8 vectors have shown significant promise for hepatic gene therapy of hemophilia B. However, the theme of AAV vector dose dependent immunotoxicity seen with AAV2 vectors earlier seem to re-emerge with AAV8 vectors as well. It is therefore important to develop novel AAV8 vectors that provide enhanced gene expression at significantly less vector doses. We hypothesized that AAV8 during its intracellular trafficking, are targeted for destruction in the cytoplasm by the host-cellular kinase/ubiquitination/proteasomal degradation machinery and modification of specific serine/threonine kinase or ubiquitination targets on AAV8 capsid (Fig.1A) may improve its transduction efficiency. To test this, point mutations at specific serine (S)/threonine (T) > alanine (A) or lysine (K)>arginine (R) residues were generated on AAV8 capsid. scAAV8-EGFP vectors containing the wild-type (WT) and each one of the 5 S/T/K-mutant(S276A, S501A, S671A, T251A and K137R) capsids were evaluated for their liver transduction efficiency at a dose of 5 X 1010 vgs/ animal in C57BL/6 mice in vivo. The best performing mutant was found to be the K137R vector in terms of either the gene expression (46-fold) or the vector copy numbers in the hepatocytes (22-fold) compared to WT-AAV8 (Fig.1B). The K137R-AAV8 vector that showed significantly decreased ubiquitination of the viral capsid had reduced activation of markers of innate immune response [IL-6, IL-12, tumor necrosis factor α, Kupffer cells and TLR-9]. In addition, animals injected with the K137R mutant also demonstrated decreased (2-fold) levels of cross-neutralizing antibodies when compared to animals that received the WT-AAV8 vector. To study further the utility of the novel AAV8-K137R mutant in a therapeutic setting, we delivered human coagulation factor IX (h.FIX) under the control of liver specific promoters (LP1 or hAAT) at two different doses (2.5x10^10 and 1x10^11 vgs per mouse) in 8-12 weeks old male C57BL/6 mice. As can be seen in Fig.1C/D, the circulating levels of h.FIX were higher in all the K137R-AAV8 treated groups as compared to the WT-AAV8 treated groups either at 2 weeks (62% vs 37% for hAAT constructs and 47% vs 21% for LP1 constructs) or 4 weeks (78% vs 56% for hAAT constructs and 64% vs 30% for LP1 constructs) post hepatic gene transfer. These studies demonstrate the feasibility of the use of this novel vector for potential gene therapy of hemophilia B.
Resumo:
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs. On two-program workloads from the Parboil2 benchmark suite we find concurrent execution is often no better than serialized execution. We identify that the lack of control over resource allocation to kernels is a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware concurrency policies that offer significantly better performance and concurrency compared to the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil 2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads when compared to the current CUDA concurrency implementation.
Resumo:
Estimating program worst case execution time(WCET) accurately and efficiently is a challenging task. Several programs exhibit phase behavior wherein cycles per instruction (CPI) varies in phases during execution. Recent work has suggested the use of phases in such programs to estimate WCET with minimal instrumentation. However the suggested model uses a function of mean CPI that has no probabilistic guarantees. We propose to use Chebyshev's inequality that can be applied to any arbitrary distribution of CPI samples, to probabilistically bound CPI of a phase. Applying Chebyshev's inequality to phases that exhibit high CPI variation leads to pessimistic upper bounds. We propose a mechanism that refines such phases into sub-phases based on program counter(PC) signatures collected using profiling and also allows the user to control variance of CPI within a sub-phase. We describe a WCET analyzer built on these lines and evaluate it with standard WCET and embedded benchmark suites on two different architectures for three chosen probabilities, p={0.9, 0.95 and 0.99}. For p= 0.99, refinement based on PC signatures alone, reduces average pessimism of WCET estimate by 36%(77%) on Arch1 (Arch2). Compared to Chronos, an open source static WCET analyzer, the average improvement in estimates obtained by refinement is 5%(125%) on Arch1 (Arch2). On limiting variance of CPI within a sub-phase to {50%, 10%, 5% and 1%} of its original value, average accuracy of WCET estimate improves further to {9%, 11%, 12% and 13%} respectively, on Arch1. On Arch2, average accuracy of WCET improves to 159% when CPI variance is limited to 50% of its original value and improvement is marginal beyond that point.