The prevalent virtualization technologies provide QoS support within the software layers of the virtual machine monitor(VMM) or the operating system of the virtual machine(VM). The QoS features are mostly provided as extensions to the existing software used for accessing the I/O device because of which the applications sharing the I/O device experience loss of performance due to crosstalk effects or usable bandwidth. In this paper we examine the NIC sharing effects across VMs on a Xen virtualized server and present an alternate paradigm that improves the shared bandwidth and reduces the crosstalk effect on the VMs. We implement the proposed hardwaresoftware changes in a layered queuing network (LQN) model and use simulation techniques to evaluate the architecture. We find that simple changes in the device architecture and associated system software lead to application throughput improvement of up to 60%. The architecture also enables finer QoS controls at device level and increases the scalability of device sharing across multiple virtual machines. We find that the performance improvement derived using LQN model is comparable to that reported by similar but real implementations.


Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.


In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.


Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.


In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational structures are defined on the fabric at runtime to realize different modules of the applications in hardware. Applications synthesized on this fabric offers performance comparable to ASICs while retaining the programmability of processing cores. We outline the application synthesis methodology through examples, and compare our results with software implementations on traditional platforms with unbounded resources.


This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.


The highest levels of security can be achieved through the use of more than one type of cryptographic algorithm for each security function. In this paper, the REDEFINE polymorphic architecture is presented as an architecture framework that can optimally support a varied set of crypto algorithms without losing high performance. The presented solution is capable of accelerating the advanced encryption standard (AES) and elliptic curve cryptography (ECC) cryptographic protocols, while still supporting different flavors of these algorithms as well as different underlying finite field sizes. The compelling feature of this cryptosystem is the ability to provide acceleration support for new field sizes as well as new (possibly proprietary) cryptographic algorithms decided upon after the cryptosystem is deployed.


Today's SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. In this article, we address the on-chip memory architecture exploration for DSP processors which are organized as multiple memory banks, where banks can be single/dual ported with non-uniform bank sizes. In this paper we propose two different methods for physical memory architecture exploration and identify the strengths and applicability of these methods in a systematic way. Both methods address the memory architecture exploration for a given target application by considering the application's data access characteristics and generates a set of Pareto-optimal design points that are interesting from a power, performance and VLSI area perspective. To the best of our knowledge, this is the first comprehensive work on memory space exploration at physical memory level that integrates data layout and memory exploration to address the system objectives from both hardware design and application software development perspective. Further we propose an automatic framework that explores the design space identifying 100's of Pareto-optimal design points within a few hours of running on a standard desktop configuration.


Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.


This paper presents a Radix-4(3) based FFT architecture suitable for OFDM based WLAN applications. The radix-4(3) parallel unrolled architecture presented here, uses a radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. A 64 point FFT processor based on the proposed architecture has been implemented in UMC 130nm 1P8M CMOS process with a maximum clock frequency of 100 MHz and area of 0.83mm(2). The proposed processor provides a throughput of four times the clock rate and can finish one 64 point FFT computation in 16 clock cycles. For IEEE 802.11a/g WLAN, the processor needs to be operated at a clock rate of 5 MHz with a power consumption of 2.27 mW which is 27% less than the previously reported low power implementations.


In this paper we propose a fully parallel 64K point radix-4(4) FFT processor. The radix-4(4) parallel unrolled architecture uses a novel radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. The radix-4(4) block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. The resultant 64K point FFT processor shows significant reduction in intermediate memory but with increased hardware complexity. Compared to the state-of-art implementation 5], our architecture shows reduced latency with comparable throughput and area. The 64K point FFT architecture was synthesized using a 130nm CMOS technology which resulted in a throughput of 1.4 GSPS and latency of 47.7 mu s with a maximum clock frequency of 350MHz. When compared to 5], the latency is reduced by 303 mu s with 50.8% reduction in area.


[ES]Esta obra recoge las comunicaciones seleccionadas para el 6 Congreso Europeo sobre Eficiencia Energtica y Sostenibilidad en Arquitectura, organizado por el grupo de investigacin Calidad de Vida en Arquitectura de la Universidad del Pas Vasco/Euskal Herriko Unibertsitatea. El congreso, que se celebra en el marco de los XXXIV Cursos de Verano de la UPV/EHU, aborda en esta cuarta edicin el tema Ciudades en riesgo: resiliencia y redundancia. Alrededor de este tema general se desarrollan cinco ponencias magistrales, a cargo de Margaretha Breil (Centro Euro-Mediterrneo para el Cambio Climtico), Cristina Garzillo Leemhuis (ICLEI), Ignasi Fontanals (OptiCits), Juan Carlos Barrios Montenegro (Global Action Plan) y Manuel Valds Lpez (Ajuntament de Barcelona). Adems, 24 comunicaciones seleccionadas por el comit cientfico presentarn trabajos de investigaciones actuales en las sesiones orales y pster. Es objetivo paralelo del congreso es fortalecer las lneas de investigacin en eficiencia energtica y sostenibilidad de los grupos de investigacin y formacin de la UPV/ EHU comprometidos con esta propuesta, con objeto de colaborar en el reforzamiento de la I D i en su mbito de conocimiento y apoyar la apuesta especfica de los Gobiernos Central y Vasco, as como de otras instituciones nacionales e internacionales respecto a las actividades de I D i en las materias relacionadas con el cambio climtico, la eficiencia energtica y la sostenibilidad ambiental [ENG] This work contains the selected abstracts of the 6th European Conference on Energy Efficiency and Sustainability in Architecture and Planning, organized by the research group Quality of life in Architecture of the University of the Basque Country. The conference is part of the XXXIV Summer Courses of the UPV/EHU and deals, in its fourth edition, with the topic Cities at risk: resilience and redundancy. Around this general theme there are five invited speakers: Margaretha Breil (Euro-Mediterranean Centre for Climate Change), Cristina Garzillo Leemhuis (ICLEI), Ignasi Fontanals (OptiCits), Juan Carlos Barrios Montenegro (Global Action Plan) y Manuel Valds Lpez (Barcelona City Council). 24 abstracts additional have been selected by the scientific committee that offer actual research works in presentations and posters. The purpose of the conferences is to strengthen the investigation lines in energy efficiency and sustainability of the research and education groups of the University of the Basque Country (UPV/EHU) involved, with the purpose of collaborating in the reinforcement of the I D i in its field of knowledge, and support the specific projects of the Central and Basque Governments, as well as other national and international institutions related to the I Di activities in similar fields of climate change, energy efficiency and environmental sustainability.


A neural network model, called an FBF network, is proposed for automatic parallel separation of multiple image figures from each other and their backgrounds in noisy grayscale or multi-colored images. The figures can then be processed in parallel by an array of self-organizing Adaptive Resonance Theory (ART) neural networks for automatic target recognition. An FBF network can automatically separate the disconnected but interleaved spirals that Minsky and Papert introduced in their book Perceptrons. The network's design also clarifies why humans cannot rapidly separate interleaved spirals, yet can rapidly detect conjunctions of disparity and color, or of disparity and motion, that distinguish target figures from surrounding distractors. Figure-ground separation is accomplished by iterating operations of a Feature Contour System (FCS) and a Boundary Contour System (BCS) in the order FCS-BCS-FCS, hence the term FBF, that have been derived from an analysis of biological vision. The FCS operations include the use of nonlinear shunting networks to compensate for variable illumination and nonlinear diffusion networks to control filling-in. A key new feature of an FBF network is the use of filling-in for figure-ground separation. The BCS operations include oriented filters joined to competitive and cooperative interactions designed to detect, regularize, and complete boundaries in up to 50 percent noise, while suppressing the noise. A modified CORT-X filter is described which uses both on-cells and off-cells to generate a boundary segmentation from a noisy image.


Can my immediate physical environment affect how I feel? The instinctive answer to this question must be a resounding yes. What might seem a throwaway remark is increasingly borne out by research in environmental and behavioural psychology, and in the more recent discipline of Evidence-Based Design. Research outcomes are beginning to converge with findings in neuroscience and neurophysiology, as we discover more about how the human brain and body functions, and reacts to environmental stimuli. What we see, hear, touch, and sense affects each of us psychologically and, by extension, physically, on a continual basis. The physical characteristics of our daily environment thus have the capacity to profoundly affect all aspects of our functioning, from biological systems to cognitive ability. This has long been understood on an intuitive basis, and utilised on a more conscious basis by architects and other designers. Recent research in evidence-based design, coupled with advances in neurophysiology, confirm what have been previously held as commonalities, but also illuminate an almost frightening potential to do enormous good, or alternatively, terrible harm, by virtue of how we make our everyday surroundings. The thesis adopts a design methodology in its approach to exploring the potential use of wireless sensor networks in environments for elderly people. Vitruvian principles of commodity, firmness and delight inform the research process and become embedded in the final design proposals and research conclusions. The issue of person-environment fit becomes a key principle in describing a model of continuously-evolving responsive architecture which makes the individual user its focus, with the intention of promoting wellbeing. The key research questions are: What are the key system characteristics of an adaptive therapeutic single-room environment? How can embedded technologies be utilised to maximise the adaptive and therapeutic aspects of the personal life-space of an elderly person with dementia?.