102 resultados para Software Architecture


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel comparator architecture is proposed for speed operation in low voltage environment. Performance comparison with a conventional regenerative comparator shows a speed-up of 41%. The proposed comparator is embedded in a continuous time sigma-delta ADC so as to reduce the quantizer delay and hence minimizes the excess loop delay problem. A performance enhancement of 1dB in the dynamic range of the ADC is achieved with this new comparator. We have implemented this ADC in a standard single-poly 8-Metal 0.13 mum UMC process. The entire system operates at 1.2 V supply providing a dynamic range of 32 dB consuming 720 muW of power and occupies an area of 0.1 mm2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In achieving higher instruction level parallelism, software pipelining increases the register pressure in the loop. The usefulness of the generated schedule may be restricted to cases where the register pressure is less than the available number of registers. Spill instructions need to be introduced otherwise. But scheduling these spill instructions in the compact schedule is a difficult task. Several heuristics have been proposed to schedule spill code. These heuristics may generate more spill code than necessary, and scheduling them may necessitate increasing the initiation interval. We model the problem of register allocation with spill code generation and scheduling in software pipelined loops as a 0-1 integer linear program. The formulation minimizes the increase in initiation interval (II) by optimally placing spill code and simultaneously minimizes the amount of spill code produced. To the best of our knowledge, this is the first integrated formulation for register allocation, optimal spill code generation and scheduling for software pipelined loops. The proposed formulation performs better than the existing heuristics by preventing an increase in II in 11.11% of the loops and generating 18.48% less spill code on average among the loops extracted from Perfect Club and SPEC benchmarks with a moderate increase in compilation time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A customer reported problem (or Trouble Ticket) in software maintenance is typically solved by one or more maintenance engineers. The decision of allocating the ticket to one or more engineers is generally taken by the lead, based on customer delivery deadlines and a guided complexity assessment from each maintenance engineer. The key challenge in such a scenario is two folds, un-truthful (hiked up) elicitation of ticket complexity by each engineer to the lead and the decision of allocating the ticket to a group of engineers who will solve the ticket with in customer deadline. The decision of allocation should ensure Individual and Coalitional Rationality along with Coalitional Stability. In this paper we use game theory to examine the issue of truthful elicitation of ticket complexities by engineers for solving ticket as a group given a specific customer delivery deadline. We formulate this problem as strategic form game and propose two mechanisms, (1) Division of Labor (DOL) and (2) Extended Second Price (ESP). In the proposed mechanisms we show that truth telling by each engineer constitutes a Dominant Strategy Nash Equilibrium of the underlying game. Also we analyze the existence of Individual Rationality (IR) and Coalitional Rationality (CR) properties to motivate voluntary and group participation. We use Core, solution concept from co-operative game theory to analyze the stability of the proposed group based on the allocation and payments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory. In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective genetic algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70%) that is possible between power and performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose the design and implementation of hardware architecture for spatial prediction based image compression scheme, which consists of prediction phase and quantization phase. In prediction phase, the hierarchical tree structure obtained from the test image is used to predict every central pixel of an image by its four neighboring pixels. The prediction scheme generates an error image, to which the wavelet/sub-band coding algorithm can be applied to obtain efficient compression. The software model is tested for its performance in terms of entropy, standard deviation. The memory and silicon area constraints play a vital role in the realization of the hardware for hand-held devices. The hardware architecture is constructed for the proposed scheme, which involves the aspects of parallelism in instructions and data. The processor consists of pipelined functional units to obtain the maximum throughput and higher speed of operation. The hardware model is analyzed for performance in terms throughput, speed and power. The results of hardware model indicate that the proposed architecture is suitable for power constrained implementations with higher data rate

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today 80 % of the content on the Web is in English, which is spoken by only 8% of the World population and 5% of Indian population. There is wealth of useful content in the various languages of the world other than English, which can be made available on the Internet. But, to date, for various reasons most of it is not yet available on the Internet. India itself has 18 officially recognized languages and scores of dialects. Although the medium of instruction for most of the higher education and research in India is English, substantial amount of literature by way of novels, textbooks, scholarly information are being generated in the other languages in the country. Many of the e-governance initiatives are in the respective state languages. In the past, support for different languages by the operating systems and the software packages were not very encouraging. However, with the advent of Unicode technology, operating systems and software packages are supporting almost all the major languages of the world that have scripts. In the work reported in this paper, we have explained the configuration changes that are needed for Eprints.org software to store multilingual content and to create a multilingual user interface.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application-unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application's data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.