954 resultados para Saturated throughput


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The prevalent virtualization technologies provide QoS support within the software layers of the virtual machine monitor(VMM) or the operating system of the virtual machine(VM). The QoS features are mostly provided as extensions to the existing software used for accessing the I/O device because of which the applications sharing the I/O device experience loss of performance due to crosstalk effects or usable bandwidth. In this paper we examine the NIC sharing effects across VMs on a Xen virtualized server and present an alternate paradigm that improves the shared bandwidth and reduces the crosstalk effect on the VMs. We implement the proposed hardwaresoftware changes in a layered queuing network (LQN) model and use simulation techniques to evaluate the architecture. We find that simple changes in the device architecture and associated system software lead to application throughput improvement of up to 60%. The architecture also enables finer QoS controls at device level and increases the scalability of device sharing across multiple virtual machines. We find that the performance improvement derived using LQN model is comparable to that reported by similar but real implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Scalable Networks on Chips (NoCs) are needed to match the ever-increasing communication demands of large-scale Multi-Processor Systems-on-chip (MPSoCs) for multi media communication applications. The heterogeneous nature of application specific on-chip cores along with the specific communication requirements among the cores calls for the design of application-specific NoCs for improved performance in terms of communication energy, latency, and throughput. In this work, we propose a methodology for the design of customized irregular networks-on-chip. The proposed method exploits a priori knowledge of the applications communication characteristic to generate an optimized network topology and corresponding routing tables.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Earlier studies have exploited statistical multiplexing of flows in the core of the Internet to reduce the buffer requirement in routers. Reducing the memory requirement of routers is important as it enables an improvement in performance and at the same time a decrease in the cost. In this paper, we observe that the links in the core of the Internet are typically over-provisioned and this can be exploited to reduce the buffering requirement in routers. The small on-chip memory of a network processor (NP) can be effectively used to buffer packets during most regimes of traffic. We propose a dynamic buffering strategy which buffers packets in the receive and transmit buffers of a NP when the memory requirement is low. When the buffer requirement increases due to bursts in the traffic, memory is allocated to packets in the off-chip DRAM. This scheme effectively mitigates the DRAM access bottleneck, as only a part of the traffic is stored in the DRAM. We build a Petri net model and evaluate the proposed scheme with core Internet like traffic. At 77% link utilization, the dynamic buffering scheme has a drop rate of just 0.65%, whereas the traditional DRAM buffering has 4.64% packet drop rate. Even with a high link utilization of 90%, which rarely happens in the core, our dynamic buffering results in a packet drop rate of only 2.17%, while supporting a throughput of 7.39 Gbps. We study the proposed scheme under different conditions to understand the provisioning of processing threads and to determine the queue length at which packets must be buffered in the DRAM. We show that the proposed dynamic buffering strategy drastically reduces the buffering requirement while still maintaining low packet drop rates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In Universal Mobile Telecommunication Systems (UMTS), the Downlink Shared Channel (DSCH) can be used for providing streaming services. The traffic model for streaming services is different from the commonly used continuously- backlogged model. Each connection specifies a required service rate over an interval of time, k, called the "control horizon". In this paper, our objective is to determine how k DSCH frames should be shared among a set of I connections. We need a scheduler that is efficient and fair and introduce the notion of discrepancy to balance the conflicting requirements of aggregate throughput and fairness. Our motive is to schedule the mobiles in such a way that the schedule minimizes the discrepancy over the k frames. We propose an optimal and computationally efficient algorithm, called STEM+. The proof of the optimality of STEM+, when applied to the UMTS rate sets is the major contribution of this paper. We also show that STEM+ performs better in terms of both fairness and aggregate throughput compared to other scheduling algorithms. Thus, STEM+ achieves both fairness and efficiency and is therefore an appealing algorithm for scheduling streaming connections.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Network processors today consist of multiple parallel processors (micro engines) with support for multiple threads to exploit packet level parallelism inherent in network workloads. With such concurrency, packet ordering at the output of the network processor cannot be guaranteed. This paper studies the effect of concurrency in network processors on packet ordering. We use a validated Petri net model of a commercial network processor, Intel IXP 2400, to determine the extent of packet reordering for IPv4 forwarding application. Our study indicates that in addition to the parallel processing in the network processor, the allocation scheme for the transmit buffer also adversely impacts packet ordering. In particular, our results reveal that these packet reordering results in a packet retransmission rate of up to 61%. We explore different transmit buffer allocation schemes namely, contiguous, strided, local, and global which reduces the packet retransmission to 24%. We propose an alternative scheme, packet sort, which guarantees complete packet ordering while achieving a throughput of 2.5 Gbps. Further, packet sort outperforms the in-built packet ordering schemes in the IXP processor by up to 35%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we are concerned with finding the maximum throughput that a mobile ad hoc network can support. Even when nodes are stationary, the problem of determining the capacity region has long been known to be NP-hard. Mobility introduces an additional dimension of complexity because nodes now also have to decide when they should initiate route discovery. Since route discovery involves communication and computation overhead, it should not be invoked very often. On the other hand, mobility implies that routes are bound to become stale resulting in sub-optimal performance if routes are not updated. We attempt to gain some understanding of these effects by considering a simple one-dimensional network model. The simplicity of our model allows us to use stochastic dynamic programming (SDP) to find the maximum possible network throughput with ideal routing and medium access control (MAC) scheduling. Using the optimal value as a benchmark, we also propose and evaluate the performance of a simple threshold-based heuristic. Unlike the optimal policy which requires considerable state information, the heuristic is very simple to implement and is not overly sensitive to the threshold value used. We find empirical conditions for our heuristic to be near-optimal as well as network scenarios when our simple heuristic does not perform very well. We provide extensive numerical and simulation results for different parameter settings of our model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fixed and mobile relays are used, among other applications, in the downlink of cellular communications systems. Cooperation between relays can greatly increase their benefits in terms of extended coverage, increased reliability, and improved spectral efficiency. In this paper, we introduce the fundamental notion of asymmetric cooperation. For this, we consider a two-phase transmission protocol where, in the first phase, the base station (BS) sends several available messages to the relays over wireless links. But, depending on the channel state and the duration of the BS transmission, not all relays decode all messages. In a second phase, the relays, which may now have asymmetric message knowledge, use cooperative linear precoding for the transmission to the mobile stations. We show that for many channel configurations, asymmetric cooperation, although (slighlty) sub-optimum for the second phase, is optimum from a total-throughput point of view, as it requires less time and energy in the first phase. We give analytical formulations for the optimum operating parameters and the achievable throughput, and show that under typical circumstances, 20-30% throughput enhancement can be achieved over conventional systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider a dense ad hoc wireless network comprising n nodes confined to a given two dimensional region of fixed area. For the Gupta-Kumar random traffic model and a realistic interference and path loss model (i.e., the channel power gains are bounded above, and are bounded below by a strictly positive number), we study the scaling of the aggregate end-to-end throughput with respect to the network average power constraint, P macr, and the number of nodes, n. The network power constraint P macr is related to the per node power constraint, P macr, as P macr = np. For large P, we show that the throughput saturates as Theta(log(P macr)), irrespective of the number of nodes in the network. For moderate P, which can accommodate spatial reuse to improve end-to-end throughput, we observe that the amount of spatial reuse feasible in the network is limited by the diameter of the network. In fact, we observe that the end-to-end path loss in the network and the amount of spatial reuse feasible in the network are inversely proportional. This puts a restriction on the gains achievable using the cooperative communication techniques studied in and, as these rely on direct long distance communication over the network.