958 resultados para Scheduling


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Just-in-Time (JIT) compilers for Java can be augmented by making use of runtime profile information to produce better quality code and hence achieve higher performance. In a JIT compilation environment, the profile information obtained can be readily exploited in the same run to aid recompilation and optimization of frequently executed (hot) methods. This paper discusses a low overhead path profiling scheme for dynamically profiling AT produced native code. The profile information is used in recompilation during a subsequent invocation of the hot method. During recompilation tree regions along the hot paths are enlarged and instruction scheduling at the superblock level is performed. We have used the open source LaTTe AT compiler framework for our implementation. Our results on a SPARC platform for SPEC JVM98 benchmarks indicate that (i) there is a significant reduction in the number of tree regions along the hot paths, and (ii) profile aided recompilation in LaTTe achieves performance comparable to that of adaptive LaTTe in spite of retranslation and profiling overheads.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multi-rate large-grain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rate-optimal compile-time (MBRO) schedules for multi-rate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that in constructing the rate-optimal schedule, it directly minimizes the memory requirement by choosing the schedule time of nodes appropriately. Lastly, a new circular-arc interval graph coloring algorithm has been proposed to further reduce the memory requirement by allowing buffer sharing among the arcs of the multi-rate dataflow graph. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also known as block schedules) proposed by Lee and Messerschmitt (IEEE Transactions on Computers, vol. 36, no. 1, 1987, pp. 24-35), (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao (Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, SC, Jan. 10-13, 1993, pp. 29-42), and (iii) the multi-rate software pipelining (MRSP) algorithm (Govindarajan and Gao, in Proceedings of the 1993 International Conference on Application Specific Array Processors, Venice, Italy, Oct. 25-27, 1993, pp. 77-88). Schedules generated for a number of random dataflow graphs and for a set of DSP application programs using the different scheduling methods are compared. The experimental results have demonstrated a significant improvement (10-20%) in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods, without sacrificing the computation rate. The MBRO method also gives a 20% average improvement in computation rate compared to Lee's Block scheduling method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a prototype of a fuzzy system for alleviation of network overloads in the day-to-day operation of power systems. The control used for overload alleviation is real power generation rescheduling. Generation Shift Sensitivity Factors (GSSF) are computed accurately, using a more realistic operational load flow model. Overloading of lines and sensitivity of controlling variables are translated into fuzzy set notations to formulate the relation between overloading of line and controlling ability of generation scheduling. A fuzzy rule based system is formed to select the controllers, their movement direction and step size. Overall sensitivity of line loading to each of the generation is also considered in selecting the controller. Results obtained for network overload alleviation of two modified Indian power networks of 24 bus and 82 bus with line outage contingencies are presented for illustration purposes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hybrid wireless networks are extensively used in the superstores, market places, malls, etc. and provide high QoS (Quality of Service) to the end-users has become a challenging task. In this paper, we propose a policy-based transaction-aware QoS management architecture in a hybrid wireless superstore environment. The proposed scheme operates at the transaction level, for the downlink QoS management. We derive a policy for the estimation of QoS parameters, like, delay, jitter, bandwidth, availability, packet loss for every transaction before scheduling on the downlink. We also propose a QoS monitor which monitors the specified QoS and automatically adjusts the QoS according to the requirement. The proposed scheme has been simulated in hybrid wireless superstore environment and tested for various superstore transactions. The results shows that the policy-based transaction QoS management is enhance the performance and utilize network resources efficiently at the peak time of the superstore business.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The poor performance of TCP over multi-hop wireless networks is well known. In this paper we explore to what extent network coding can help to improve the throughput performance of TCP controlled bulk transfers over a chain topology multi-hop wireless network. The nodes use a CSMA/ CA mechanism, such as IEEE 802.11’s DCF, to perform distributed packet scheduling. The reverse flowing TCP ACKs are sought to be X-ORed with forward flowing TCP data packets. We find that, without any modification to theMAC protocol, the gain from network coding is negligible. The inherent coordination problem of carrier sensing based random access in multi-hop wireless networks dominates the performance. We provide a theoretical analysis that yields a throughput bound with network coding. We then propose a distributed modification of the IEEE 802.11 DCF, based on tuning the back-off mechanism using a feedback approach. Simulation studies show that the proposed mechanism when combined with network coding, improves the performance of a TCP session by more than 100%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this article we study the problem of joint congestion control, routing and MAC layer scheduling in multi-hop wireless mesh network, where the nodes in the network are subjected to maximum energy expenditure rates. We model link contention in the wireless network using the contention graph and we model energy expenditure rate constraint of nodes using the energy expenditure rate matrix. We formulate the problem as an aggregate utility maximization problem and apply duality theory in order to decompose the problem into two sub-problems namely, network layer routing and congestion control problem and MAC layer scheduling problem. The source adjusts its rate based on the cost of the least cost path to the destination where the cost of the path includes not only the prices of the links in it but also the prices associated with the nodes on the path. The MAC layer scheduling of the links is carried out based on the prices of the links. We study the e�ects of energy expenditure rate constraints of the nodes on the optimal throughput of the network.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The lifetime calculation of large dense sensor networks with fixed energy resources and the remaining residual energy have shown that for a constant energy resource in a sensor network the fault rate at the cluster head is network size invariant when using the network layer with no MAC losses.Even after increasing the battery capacities in the nodes the total lifetime does not increase after a max limit of 8 times. As this is a serious limitation lots of research has been done at the MAC layer which allows to adapt to the specific connectivity, traffic and channel polling needs for sensor networks. There have been lots of MAC protocols which allow to control the channel polling of new radios which are available to sensor nodes to communicate. This further reduces the communication overhead by idling and sleep scheduling thus extending the lifetime of the monitoring application. We address the two issues which effects the distributed characteristics and performance of connected MAC nodes. (1) To determine the theoretical minimum rate based on joint coding for a correlated data source at the singlehop, (2a) to estimate cluster head errors using Bayesian rule for routing using persistence clustering when node densities are the same and stored using prior probability at the network layer, (2b) to estimate the upper bound of routing errors when using passive clustering were the node densities at the multi-hop MACS are unknown and not stored at the multi-hop nodes a priori. In this paper we evaluate many MAC based sensor network protocols and study the effects on sensor network lifetime. A renewable energy MAC routing protocol is designed when the probabilities of active nodes are not known a priori. From theoretical derivations we show that for a Bayesian rule with known class densities of omega1, omega2 with expected error P* is bounded by max error rate of P=2P* for single-hop. We study the effects of energy losses using cross-layer simulation of - large sensor network MACS setup, the error rate which effect finding sufficient node densities to have reliable multi-hop communications due to unknown node densities. The simulation results show that even though the lifetime is comparable the expected Bayesian posterior probability error bound is close or higher than Pges2P*.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we are concerned with finding the maximum throughput that a mobile ad hoc network can support. Even when nodes are stationary, the problem of determining the capacity region has long been known to be NP-hard. Mobility introduces an additional dimension of complexity because nodes now also have to decide when they should initiate route discovery. Since route discovery involves communication and computation overhead, it should not be invoked very often. On the other hand, mobility implies that routes are bound to become stale resulting in sub-optimal performance if routes are not updated. We attempt to gain some understanding of these effects by considering a simple one-dimensional network model. The simplicity of our model allows us to use stochastic dynamic programming (SDP) to find the maximum possible network throughput with ideal routing and medium access control (MAC) scheduling. Using the optimal value as a benchmark, we also propose and evaluate the performance of a simple threshold-based heuristic. Unlike the optimal policy which requires considerable state information, the heuristic is very simple to implement and is not overly sensitive to the threshold value used. We find empirical conditions for our heuristic to be near-optimal as well as network scenarios when our simple heuristic does not perform very well. We provide extensive numerical and simulation results for different parameter settings of our model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Antenna selection allows multiple-antenna systems to achieve most of their promised diversity gain, while keeping the number of RF chains and, thus, cost/complexity low. In this paper we investigate antenna selection for fourth-generation OFDMA- based cellular communications systems, in particular, 3GPP LTE (long-term evolution) systems. We propose a training method for antenna selection that is especially suitable for OFDMA. By means of simulation, we evaluate the SNR-gain that can be achieved with our design. We find that the performance depends on the bandwidth assigned to each user, the scheduling method (round-robin or frequency-domain scheduling), and the Doppler spread. Furthermore, the signal-to-noise ratio of the training sequence plays a critical role. Typical SNR gains are around 2 dB, with larger values obtainable in certain circumstances.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Accurate system planning and performance evaluation requires knowledge of the joint impact of scheduling, interference, and fading. However, current analyses either require costly numerical simulations or make simplifying assumptions that limit the applicability of the results. In this paper, we derive analytical expressions for the spectral efficiency of cellular systems that use either the channel-unaware but fair round robin scheduler or the greedy, channel-aware but unfair maximum signal to interference ratio scheduler. As is the case in real deployments, non-identical co-channel interference at each user, both Rayleigh fading and lognormal shadowing, and limited modulation constellation sizes are accounted for in the analysis. We show that using a simple moment generating function-based lognormal approximation technique and an accurate Gaussian-Q function approximation leads to results that match simulations well. These results are more accurate than erstwhile results that instead used the moment-matching Fenton-Wilkinson approximation method and bounds on the Q function. The spectral efficiency of cellular systems is strongly influenced by the channel scheduler and the small constellation size that is typically used in third generation cellular systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we consider the problem of association of wireless stations (STAs) with an access network served by a wireless local area network (WLAN) and a 3G cellular network. There is a set of WLAN Access Points (APs) and a set of 3G Base Stations (BSs) and a number of STAs each of which needs to be associated with one of the APs or one of the BSs. We concentrate on downlink bulk elastic transfers. Each association provides each ST with a certain transfer rate. We evaluate an association on the basis of the sum log utility of the transfer rates and seek the utility maximizing association. We also obtain the optimal time scheduling of service from a 3G BS to the associated STAs. We propose a fast iterative heuristic algorithm to compute an association. Numerical results show that our algorithm converges in a few steps yielding an association that is within 1% (in objective value) of the optimal (obtained through exhaustive search); in most cases the algorithm yields an optimal solution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Flexible Manufacturing Systems (FMS), widely considered as the manufacturing technology of the future, are gaining increasing importance due to the immense advantages they provide in terms of cost, quality and productivity over the conventional manufacturing. An FMS is a complex interconnection of capital intensive resources and high levels of system performance is very crucial for survival in a competing environment.Discrete event simulation is one of the most popular methods for performance evaluation of FMS during planning, design and operation phases. Indeed fast simulators are suggested for selection of optimal strategies for flow control (which part type to enter and at what instant), AGV scheduling (which vehicle to carry which part), routing (which machine to process the part) and part selection (which part for processing next). In this paper we develop a C-net based model for an FMS and use the same for distributed discrete event simulation. We illustrate using examples the efficacy of destributed discrete event simulation for the performance evaluation of FMSs.