Biblioteca Digital

40 resultados para latency

Is Granger Causality a Viable Technique for Analyzing fMRI Data?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Multivariate neural data provide the basis for assessing interactions in brain networks. Among myriad connectivity measures, Granger causality (GC) has proven to be statistically intuitive, easy to implement, and generate meaningful results. Although its application to functional MRI (fMRI) data is increasing, several factors have been identified that appear to hinder its neural interpretability: (a) latency differences in hemodynamic response function (HRF) across different brain regions, (b) low-sampling rates, and (c) noise. Recognizing that in basic and clinical neuroscience, it is often the change of a dependent variable (e.g., GC) between experimental conditions and between normal and pathology that is of interest, we address the question of whether there exist systematic relationships between GC at the fMRI level and that at the neural level. Simulated neural signals were convolved with a canonical HRF, down-sampled, and noise-added to generate simulated fMRI data. As the coupling parameters in the model were varied, fMRI GC and neural GC were calculated, and their relationship examined. Three main results were found: (1) GC following HRF convolution is a monotonically increasing function of neural GC; (2) this monotonicity can be reliably detected as a positive correlation when realistic fMRI temporal resolution and noise level were used; and (3) although the detectability of monotonicity declined due to the presence of HRF latency differences, substantial recovery of detectability occurred after correcting for latency differences. These results suggest that Granger causality is a viable technique for analyzing fMRI data when the questions are appropriately formulated.

Veja mais

Traffic engineered NoC for streaming applications

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Streaming applications demand hard bandwidth and throughput guarantees in a multiprocessor environment amidst resource competing processes. We present a Label Switching based Network-on-Chip (LS-NoC) motivated by throughput guarantees offered by bandwidth reservation. Label switching is a packet relaying technique in which individual packets carry route information in the form of labels. A centralized LS-NoC Management framework engineers traffic into Quality of Service (QoS) guaranteed routes. LS-NoC caters to the requirements of streaming applications where communication channels are fixed over the lifetime of the application. The proposed NoC framework inherently supports heterogeneous and ad hoc system-on-chips. The LS-NoC can be used in conjunction with conventional best effort NoC as a QoS guaranteed communication network or as a replacement to the conventional NoC. A multicast, broadcast capable label switched router for the LS-NoC has been designed. A 5 port, 256 bit data bus, 4 bit label router occupies 0.431 mm(2) in 130 nm and delivers peak bandwidth of 80 Gbits/s per link at 312.5 MHz. Bandwidth and latency guarantees of LS-NoC have been demonstrated on traffic from example streaming applications and on constant and variable bit rate traffic patterns. LS-NoC was found to have a competitive AreaxPower/Throughput figure of merit with state-of-the-art NoCs providing QoS. Circuit switching with link sharing abilities and support for asynchronous operation make LS-NoC a desirable choice for QoS servicing in chip multiprocessors. (C) 2013 Elsevier B.V. All rights reserved.

Veja mais

CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Veja mais

Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.

Veja mais

Hypersensitivity of hypoxia grown Mycobacterium smegmatis to DNA damaging agents: Implications of the DNA repair deficiencies in attenuation of mycobacteria

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mycobacteria are an important group of pathogenic bacteria. We generated a series of DNA repair deficient strains of Mycobacterium smegmatis, a model organism, to understand the importance of various DNA repair proteins (UvrB, Ung, UdgB, MutY and Fpg) in survival of the pathogenic strains. Here, we compared tolerance of the M. smegmatis strains to genotoxic stress (ROS and RNI) under aerobic, hypoxic and recovery conditions of growth by monitoring their survival. We show an increased susceptibility of mycobacteria to genotoxic stress under hypoxia. UvrB deficiency led to high susceptibility of M. smegmatis to the DNA damaging agents. Ung was second in importance in strains with single deficiencies. Interestingly, we observed that while deficiency of UdgB had only a minor impact on the strain's susceptibility, its combination with Ung deficiency resulted in severe consequences on the strain's survival under genotoxic stress suggesting a strong interdependence of different DNA repair pathways in safeguarding genomic integrity. Our observations reinforce the possibility of targeting DNA repair processes in mycobacteria for therapeutic intervention during active growth and latency phase of the pathogen. High susceptibility of the UvrB, or the Ung/UdgB deficient strains to genotoxic stress may be exploited in generation of attenuated strains of mycobacteria. (C) 2013 Elsevier Ireland Ltd. All rights reserved.

Veja mais

Toward a Scalable Working Set Size Estimation Method and Its Application for Chip Multiprocessors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is essential to accurately estimate the working set size (WSS) of an application for various optimizations such as to partition cache among virtual machines or reduce leakage power dissipated in an over-allocated cache by switching it OFF. However, the state-of-the-art heuristics such as average memory access latency (AMAL) or cache miss ratio (CMR) are poorly correlated to the WSS of an application due to 1) over-sized caches and 2) their dispersed nature. Past studies focus on estimating WSS of an application executing on a uniprocessor platform. Estimating the same for a chip multiprocessor (CMP) with a large dispersed cache is challenging due to the presence of concurrently executing threads/processes. Hence, we propose a scalable, highly accurate method to estimate WSS of an application. We call this method ``tagged WSS (TWSS)'' estimation method. We demonstrate the use of TWSS to switch-OFF the over-allocated cache ways in Static and Dynamic NonUniform Cache Architectures (SNUCA, DNUCA) on a tiled CMP. In our implementation of adaptable way SNUCA and DNUCA caches, decision of altering associativity is taken by each L2 controller. Hence, this approach scales better with the number of cores present on a CMP. It gives overall (geometric mean) 26% and 19% higher energy-delay product savings compared to AMAL and CMR heuristics on SNUCA, respectively.

Veja mais

Measuring Glutathione Redox Potential of HIV-1-infected Macrophages

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Redox signaling plays a crucial role in the pathogenesis of human immunodeficiency virus type-1 (HIV-1). The majority of HIV redox research relies on measuring redox stress using invasive technologies, which are unreliable and do not provide information about the contributions of subcellular compartments. A major technological leap emerges from the development of genetically encoded redox-sensitive green fluorescent proteins (roGFPs), which provide sensitive and compartment-specific insights into redox homeostasis. Here, we exploited a roGFP-based specific bioprobe of glutathione redox potential (E-GSH; Grx1-roGFP2) and measured subcellular changes in E-GSH during various phases of HIV-1 infection using U1 monocytic cells (latently infected U937 cells with HIV-1). We show that although U937 and U1 cells demonstrate significantly reduced cytosolic and mitochondrial E-GSH (approximately -310 mV), active viral replication induces substantial oxidative stress (E-GSH more than -240 mV). Furthermore, exposure to a physiologically relevant oxidant, hydrogen peroxide (H2O2), induces significant deviations in subcellular E-GSH between U937 and U1, which distinctly modulates susceptibility to apoptosis. Using Grx1-roGFP2, we demonstrate that a marginal increase of about similar to 25 mV in E-GSH is sufficient to switch HIV-1 from latency to reactivation, raising the possibility of purging HIV-1 by redox modulators without triggering detrimental changes in cellular physiology. Importantly, we show that bioactive lipids synthesized by clinical drug-resistant isolates of Mycobacterium tuberculosis reactivate HIV-1 through modulation of intracellular E-GSH. Finally, the expression analysis of U1 and patient peripheral blood mononuclear cells demonstrated a major recalibration of cellular redox homeostatic pathways during persistence and active replication of HIV.

Veja mais

Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Veja mais

PriDyn: Enabling Differentiated I/O Services in Cloud Using Dynamic Priorities

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Virtualization is one of the key enabling technologies for Cloud computing. Although it facilitates improved utilization of resources, virtualization can lead to performance degradation due to the sharing of physical resources like CPU, memory, network interfaces, disk controllers, etc. Multi-tenancy can cause highly unpredictable performance for concurrent I/O applications running inside virtual machines that share local disk storage in Cloud. Disk I/O requests in a typical Cloud setup may have varied requirements in terms of latency and throughput as they arise from a range of heterogeneous applications having diverse performance goals. This necessitates providing differential performance services to different I/O applications. In this paper, we present PriDyn, a novel scheduling framework which is designed to consider I/O performance metrics of applications such as acceptable latency and convert them to an appropriate priority value for disk access based on the current system state. This framework aims to provide differentiated I/O service to various applications and ensures predictable performance for critical applications in multi-tenant Cloud environment. We demonstrate through experimental validations on real world I/O traces that this framework achieves appreciable enhancements in I/O performance, indicating that this approach is a promising step towards enabling QoS guarantees on Cloud storage.

Veja mais

Sensing of Stimulus Artifact Suppressed Signals From Electrode Interfaces

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Stimulus artifacts inhibit reliable acquisition of biological evoked potentials for several milliseconds if an electrode contact is utilized for both electrical stimulation and recording purposes. This hinders the measurement of evoked short-latency biological responses, which is otherwise elicited by stimulation in implantable prosthetic devices. We present an improved stimulus artifact suppression scheme using two electrode simultaneous stimulation and differential readout using high-gain amplifiers. Substantial reduction of artifact duration has been shown possible through the common-mode rejection property of an instrumentation amplifier for electrode interfaces. The performance of this method depends on good matching of electrode-electrolyte interface properties of the chosen electrode pair. A novel calibration algorithm has been developed that helps in artificial matching of impedance and thereby achieves the required performance in artifact suppression. Stimulus artifact duration has been reduced down to 50 mu s from the stimulation-cum-recording electrodes, which is similar to 6x improvement over the present state of the art. The system is characterized with emulated resistor-capacitor loads and a variety of in-vitro metal electrodes dipped in saline environment. The proposed method is going to be useful for closed-loop electrical stimulation and recording studies, such as bidirectional neural prosthesis of retina, cochlea, brain, and spinal cord.

Veja mais

40 resultados para latency

Filtro por publicador