668 resultados para Bitrate overhead
Resumo:
With no Channel State Information (CSI) at the users, transmission over the two-user Gaussian Multiple Access Channel with fading and finite constellation at the input, will have high error rates due to multiple access interference (MAI). However, perfect CSI at the users is an unrealistic assumption in the wireless scenario, as it would involve extremely large feedback overheads. In this paper we propose a scheme which removes the adverse effect of MAI using only quantized knowledge of fade state at the transmitters such that the associated overhead is nominal. One of the users rotates its constellation relative to the other without varying the transmit power to adapt to the existing channel conditions, in order to meet certain predetermined minimum Euclidean distance requirement in the equivalent constellation at the destination. The optimal rotation scheme is described for the case when both the users use symmetric M-PSK constellations at the input, where M = 2(gimel), gimel being a positive integer. The strategy is illustrated by considering the example where both the users use QPSK signal sets at the input. The case when the users use PSK constellations of different sizes is also considered. It is shown that the proposed scheme has considerable better error performance compared to the conventional non-adaptive scheme, at the cost of a feedback overhead of just log log(2) (M-2/8 - M/4 + 2)] + 1 bits, for the M-PSK case.
Resumo:
Orthogonal frequency-division multiple access (OFDMA) systems divide the available bandwidth into orthogonal subchannels and exploit multiuser diversity and frequency selectivity to achieve high spectral efficiencies. However, they require a significant amount of channel state feedback for scheduling and rate adaptation and are sensitive to feedback delays. We develop a comprehensive analysis for OFDMA system throughput in the presence of feedback delays as a function of the feedback scheme, frequency-domain scheduler, and rate adaptation rule. Also derived are expressions for the outage probability, which captures the inability of a subchannel to successfully carry data due to the feedback scheme or feedback delays. Our model encompasses the popular best-n and threshold-based feedback schemes and the greedy, proportional fair, and round-robin schedulers that cover a wide range of throughput versus fairness tradeoff. It helps quantify the different robustness of the schedulers to feedback overhead and delays. Even at low vehicular speeds, it shows that small feedback delays markedly degrade the throughput and increase the outage probability. Further, given the feedback delay, the throughput degradation depends primarily on the feedback overhead and not on the feedback scheme itself. We also show how to optimize the rate adaptation thresholds as a function of feedback delay.
Resumo:
In the design of modulation schemes for the physical layer network-coded two way relaying scenario with two phases (Multiple access (MA) Phase and Broadcast (BC) Phase), it was observed by Koike-Akino et al. that adaptively changing the network coding map used at the relay according to the channel conditions greatly reduces the impact of multiple access interference and all these network coding maps should satisfy a requirement called the exclusive law. In [11] the case in which the end nodes use M-PSK signal sets is extensively studied using Latin Squares. This paper deals with the case in which the end nodes use square M-QAM signal sets. In a fading scenario, for certain channel conditions, termed singular fade states, the MA phase performance is greatly reduced. We show that the square QAM signal sets lead to lesser number of singular fade states compared to PSK signal sets. Because of this, the complexity at the relay is enormously reduced. Moreover lesser number of overhead bits are required in the BC phase. We find the number of singular fade states for PAM and QAM signal sets used at the end nodes. The fade state γejθ = 1 is a singular fade state for M-QAM for all values of M and it is shown that certain block circulant Latin Squares remove this singular fade state. Simulation results are presented to show that QAM signal set perform better than PSK.
Resumo:
Realization of cloud computing has been possible due to availability of virtualization technologies on commodity platforms. Measuring resource usage on the virtualized servers is difficult because of the fact that the performance counters used for resource accounting are not virtualized. Hence, many of the prevalent virtualization technologies like Xen, VMware, KVM etc., use host specific CPU usage monitoring, which is coarse grained. In this paper, we present a performance monitoring tool for KVM based virtualized machines, which measures the CPU overhead incurred by the hypervisor on behalf of the virtual machine along-with the CPU usage of virtual machine itself. This fine-grained resource usage information, provided by the above tool, can be used for diverse situations like resource provisioning to support performance associated QoS requirements, identification of bottlenecks during VM placements, resource profiling of applications in cloud environments, etc. We demonstrate a use case of this tool by measuring the performance of web-servers hosted on a KVM based virtualized server.
Resumo:
For transmission over the two-user Gaussian Multiple Access Channel with fading and finite constellation at the inputs, we propose a scheme which uses only quantized knowledge of fade state at users with the feedback overhead being nominal. One of the users rotates its constellation without varying the transmit power to adapt to the existing channel conditions, in order to meet certain pre-determined minimum Euclidean distance requirement in the equivalent constellation at the destination. The optimal modulation scheme has been described for the case when both the users use symmetric M-PSK constellations at the input, where M = 2λ, λ being a positive integer. The strategy has been illustrated by considering examples where both the users use QPSK signal set at the input. It is shown that the proposed scheme has considerable better error performance compared to the conventional non-adaptive scheme, at the cost of a feedback overhead of just [log2 (M2/8 - M/4 + 2)] + 1 bits, for the M-PSK case.
Resumo:
A scheme for built-in self-test of analog signals with minimal area overhead for measuring on-chip voltages in an all-digital manner is presented. The method is well suited for a distributed architecture, where the routing of analog signals over long paths is minimized. A clock is routed serially to the sampling heads placed at the nodes of analog test voltages. This sampling head present at each test node, which consists of a pair of delay cells and a pair of flip-flops, locally converts the test voltage to a skew between a pair of subsampled signals, thus giving rise to as many subsampled signal pairs as the number of nodes. To measure a certain analog voltage, the corresponding subsampled signal pair is fed to a delay measurement unit to measure the skew between this pair. The concept is validated by designing a test chip in a UMC 130-nm CMOS process. Sub-millivolt accuracy for static signals is demonstrated for a measurement time of a few seconds, and an effective number of bits of 5.29 is demonstrated for low-bandwidth signals in the absence of sample-and-hold circuitry.
Resumo:
Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.
Resumo:
Dynamic analysis techniques have been proposed to detect potential deadlocks. Analyzing and comprehending each potential deadlock to determine whether the deadlock is feasible in a real execution requires significant programmer effort. Moreover, empirical evidence shows that existing analyses are quite imprecise. This imprecision of the analyses further void the manual effort invested in reasoning about non-existent defects. In this paper, we address the problems of imprecision of existing analyses and the subsequent manual effort necessary to reason about deadlocks. We propose a novel approach for deadlock detection by designing a dynamic analysis that intelligently leverages execution traces. To reduce the manual effort, we replay the program by making the execution follow a schedule derived based on the observed trace. For a real deadlock, its feasibility is automatically verified if the replay causes the execution to deadlock. We have implemented our approach as part of WOLF and have analyzed many large (upto 160KLoC) Java programs. Our experimental results show that we are able to identify 74% of the reported defects as true (or false) positives automatically leaving very few defects for manual analysis. The overhead of our approach is negligible making it a compelling tool for practical adoption.
Resumo:
Contemporary cellular standards, such as Long Term Evolution (LTE) and LTE-Advanced, employ orthogonal frequency-division multiplexing (OFDM) and use frequency-domain scheduling and rate adaptation. In conjunction with feedback reduction schemes, high downlink spectral efficiencies are achieved while limiting the uplink feedback overhead. One such important scheme that has been adopted by these standards is best-m feedback, in which every user feeds back its m largest subchannel (SC) power gains and their corresponding indices. We analyze the single cell average throughput of an OFDM system with uniformly correlated SC gains that employs best-m feedback and discrete rate adaptation. Our model incorporates three schedulers that cover a wide range of the throughput versus fairness tradeoff and feedback delay. We show that, for small m, correlation significantly reduces average throughput with best-m feedback. This result is pertinent as even in typical dispersive channels, correlation is high. We observe that the schedulers exhibit varied sensitivities to correlation and feedback delay. The analysis also leads to insightful expressions for the average throughput in the asymptotic regime of a large number of users.
Resumo:
The correctness of a hard real-time system depends its ability to meet all its deadlines. Existing real-time systems use either a pure real-time scheduler or a real-time scheduler embedded as a real-time scheduling class in the scheduler of an operating system (OS). Existing implementations of schedulers in multicore systems that support real-time and non-real-time tasks, permit the execution of non-real-time tasks in all the cores with priorities lower than those of real-time tasks, but interrupts and softirqs associated with these non-real-time tasks can execute in any core with priorities higher than those of real-time tasks. As a result, the execution overhead of real-time tasks is quite large in these systems, which, in turn, affects their runtime. In order that the hard real-time tasks can be executed in such systems with minimal interference from other Linux tasks, we propose, in this paper, an integrated scheduler architecture, called SchedISA, which aims to considerably reduce the execution overhead of real-time tasks in these systems. In order to test the efficacy of the proposed scheduler, we implemented partitioned earliest deadline first (P-EDF) scheduling algorithm in SchedISA on Linux kernel, version 3.8, and conducted experiments on Intel core i7 processor with eight logical cores. We compared the execution overhead of real-time tasks in the above implementation of SchedISA with that in SCHED_DEADLINE's P-EDF implementation, which concurrently executes real-time and non-real-time tasks in Linux OS in all the cores. The experimental results show that the execution overhead of real-time tasks in the above implementation of SchedISA is considerably less than that in SCHED_DEADLINE. We believe that, with further refinement of SchedISA, the execution overhead of real-time tasks in SchedISA can be reduced to a predictable maximum, making it suitable for scheduling hard real-time tasks without affecting the CPU share of Linux tasks.
Resumo:
Distributed system has quite a lot of servers to attain increased availability of service and for fault tolerance. Balancing the load among these servers is an important task to achieve better performance. There are various hardware and software based load balancing solutions available. However there is always an overhead on Servers and the Load Balancer while communicating with each other and sharing their availability and the current load status information. Load balancer is always busy in listening to clients' request and redirecting them. It also needs to collect the servers' availability status frequently, to keep itself up-to-date. Servers are busy in not only providing service to clients but also sharing their current load information with load balancing algorithms. In this paper we have proposed and discussed the concept and system model for software based load balancer along with Availability-Checker and Load Reporters (LB-ACLRs) which reduces the overhead on server and the load balancer. We have also described the architectural components with their roles and responsibilities. We have presented a detailed analysis to show how our proposed Availability Checker significantly increases the performance of the system.
Resumo:
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate versus off-chip bandwidth dilemma by organizing the data in a bi-modal fashion - blocks with high spatial locality are organized as large blocks and those with little spatial locality as small blocks. By adaptively selecting the right granularity of storage for individual blocks at run-time, the proposed DRAM cache organization is able to make judicious use of the available DRAM cache capacity as well as reduce the off-chip memory bandwidth consumption. The Bi-Modal Cache improves cache hit latency despite moving the metadata to DRAM by means of a small SRAM based Way Locator. Further by leveraging the tremendous internal bandwidth and capacity that stacked DRAM organizations provide, the Bi-Modal Cache enables efficient concurrent accesses to tags and data to reduce hit time. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement (in terms of Average Normalized Turnaround Time (ANTT)) of 10.8%, 13.8% and 14.0% in 4-core, 8-core and 16-core workloads respectively.
Resumo:
The bilateral filter is known to be quite effective in denoising images corrupted with small dosages of additive Gaussian noise. The denoising performance of the filter, however, is known to degrade quickly with the increase in noise level. Several adaptations of the filter have been proposed in the literature to address this shortcoming, but often at a substantial computational overhead. In this paper, we report a simple pre-processing step that can substantially improve the denoising performance of the bilateral filter, at almost no additional cost. The modified filter is designed to be robust at large noise levels, and often tends to perform poorly below a certain noise threshold. To get the best of the original and the modified filter, we propose to combine them in a weighted fashion, where the weights are chosen to minimize (a surrogate of) the oracle mean-squared-error (MSE). The optimally-weighted filter is thus guaranteed to perform better than either of the component filters in terms of the MSE, at all noise levels. We also provide a fast algorithm for the weighted filtering. Visual and quantitative denoising results on standard test images are reported which demonstrate that the improvement over the original filter is significant both visually and in terms of PSNR. Moreover, the denoising performance of the optimally-weighted bilateral filter is competitive with the computation-intensive non-local means filter.
Resumo:
We present a technique for independently exciting two resonant modes of vibration in a single-crystal silicon bulk mode microresonator using the same electrode configuration through control of the polarity of the DC actuation voltage. Applications of this technique may include built-in temperature compensation by the simultaneous selective excitation of two closely spaced modes that may have different temperature coefficients of resonant frequency. The technique is simple and requires minimum circuit overhead for implementation. The technique is implemented on square plate resonators with quality factors as high as 3.06 × 106. Copyright © 2008 by ASME.
Resumo:
Navassa is a small, undeveloped island in the Windward Passage between Jamaica and Haiti. It was designated a National Wildlife Refuge under the jurisdiction of the U.S. Fish and Wildlife Service in 1999, but the remote location makes management and enforcement challenging, and the area is regularly fished by artisanal fishermen from Haiti. In April 2006, the NOAA Center for Coastal Fisheries and Habitat Research conducted a research cruise to Navassa. The cruise produced the first high-resolution multibeam bathymetry for the area, which will facilitate habitat mapping and assist in refuge management. A major emphasis of the cruise was to study the impact of Haitian fishing gear on benthic habitats and fish communities; however, in 10 days on station only one small boat was observed with five fishermen and seven traps. Fifteen monitoring stations were established to characterize fish and benthic communities along the deep (28-34 m) shelf, as these areas have been largely unstudied by previous cruises. The fish communities included numerous squirrelfishes, triggerfishes, and parrotfishes. Snappers and grouper were also present but no small individuals were observed. Similarly, conch surveys indicated the population was in low abundance and was heavily skewed towards adults. Analysis of the benthic photoquadrats is currently underway. Other cruise activities included installation of a temperature logger network, sample collection for stable isotope analyses to examine trophic structure, and drop camera surveys to ground-truth habitat maps and overhead imagery. (PDF contains 58 pages)