696 resultados para feeder reconfiguration
Resumo:
Systems relying on fixed hardware components with a static level of parallelism can suffer from an underuse of logical resources, since they have to be designed for the worst-case scenario. This problem is especially important in video applications due to the emergence of new flexible standards, like Scalable Video Coding (SVC), which offer several levels of scalability. In this paper, Dynamic and Partial Reconfiguration (DPR) of modern FPGAs is used to achieve run-time variable parallelism, by using scalable architectures where the size can be adapted at run-time. Based on this proposal, a scalable Deblocking Filter core (DF), compliant with the H.264/AVC and SVC standards has been designed. This scalable DF allows run-time addition or removal of computational units working in parallel. Scalability is offered together with a scalable parallelization strategy at the macroblock (MB) level, such that when the size of the architecture changes, MB filtering order is modified accordingly
Resumo:
SRAM-based FPGAs are sensitive to radiation effects. Soft errors can appear and accumulate, potentially defeating mitigation strategies deployed at the Application Layer. Therefore, Configuration Memory scrubbing is required to improve radiation tolerance of such FPGAs in space applications. Virtex FPGAs allow runtime scrubbing by means of dynamic partial reconfiguration. Even with scrubbing, intra-FPGA TMR systems are subjected to common-mode errors affecting more than one design domain. This is solved in inter-FPGA TMR systems at the expense of a higher cost, power and mass. In this context, a self-reference scrubber for device-level TMR system based on Xilinx Virtex FPGAs is presented. This scrubber allows for a fast SEU/MBU detection and correction by peer frame comparison without needing to access a golden configuration memory
Resumo:
Las FPAA´s son dispositivos analógicos programables. Estos dispositivos se basan en el uso de condensadores conmutados junto con amplificadores operacionales. Este tipo de tecnología presenta una serie de ventajas, ya que combinan las ventajas de dispositivos digitales, como la reprogramación en función de las variables del entorno que los rodean, con la diferencia de ser dispositivos analógicos, permitiendo la realización de una amplia gama de diseños analógicos en un solo chip. En este proyecto se ha realizado un estudio sobre el funcionamiento de los condensadores conmutados y su uso en el dispositivo AN221E04 del fabricante Anadigm. Una vez descrita la arquitectura del AN221E04 y explicadas las bases del funcionamiento de los condensadores conmutados, utilizando como ejemplo los modelos facilitados por Anadigm, se desarrolla un modelo de amplificador de instrumentación teórico y se describe la metodología para su implementación en un AN221E04 con el software Anadigm Designer 2. Una vez implementado este modelo de amplificador de instrumentación se han efectuado una serie de pruebas con el objetivo de estudiar la capacidad de estos dispositivos. Dichas pruebas ponen de manifiesto que las FPAA´s tienen una serie de ventajas a tener en cuenta a la hora de realizar diseños analógicos. La precisión obtenida por el modelo de amplificador de instrumentación realizado es más que aceptable, llegando a obtener errores de ganancia inferiores al 1% con ganancias de 200V/V sin tener la necesidad de realizar grandes ajustes. En las conclusiones de este estudio se exponen tanto ventajas como inconvenientes de la utilización de FPAA´s en diseños analógicos. La principal ventaja de este uso es el ahorro de costes, ya que una vez desarrollada una plataforma de diseño, la capacidad de reconfiguración permite utilizar dicha plataforma para un amplio abanico de aplicaciones, reduciendo el número de componentes y simplificando las etapas de diseño. Como desventaja, las FPAA´s tienen una serie de limitaciones qué hay que tener en cuenta en ciertos casos pudiendo hacer irrealizable un diseño concreto; como puede ser el valor máximo o mínimo de ganancia. The FPAA's are programmable analog devices. These devices rely on the use of switched capacitors together with operational amplifiers. This type of technology has a number of advantages, because they combine the advantages of digital devices such as the reprogramming function of the variables of the surrounding environment, with the difference being analog devices, allowing the realization of a wide range of designs analog on a single chip. This project has conducted a study on the operation of the switched capacitor and its use in the device AN221E04 from Anadigm. Having described the architecture of AN221E04 and explained the basis for the operation of the switched capacitor, using the example models provided by Anadigm is developing an instrumentation amplifier theory model and describes the methodology for implementation in a AN221E04 with the Anadigm Designer 2 software. Once implemented this instrumentation amplifier model, have made a series of tests in order to study the ability of these devices. These tests show that the FPAA's have a number of advantages to take into account when making analog designs. The accuracy obtained by the instrumentation amplifier model is made more than acceptable, earning gain errors of less than 1% with gains of 200V / V without the need for major adjustments. The conclusions of this study are presented both advantages and disadvantages of using FPAA's in analog designs. The main advantage of this application is the cost savings, because once developed a platform for design, reconfiguration capability allows you to use this platform for a wide range of applications, reducing component count and simplifying design stages. As a disadvantage, the FPAA's have a number of limitations which must be taken into account in certain cases may make impossible a specific design, such as the maximum or minimum gain, or the magnitude of the possible settings.
Resumo:
Single core capabilities have reached their maximum clock speed; new multicore architectures provide an alternative way to tackle this issue instead. The design of decoding applications running on top of these multicore platforms and their optimization to exploit all system computational power is crucial to obtain best results. Since the development at the integration level of printed circuit boards are increasingly difficult to optimize due to physical constraints and the inherent increase in power consumption, development of multiprocessor architectures is becoming the new Holy Grail. In this sense, it is crucial to develop applications that can run on the new multi-core architectures and find out distributions to maximize the potential use of the system. Today most of commercial electronic devices, available in the market, are composed of embedded systems. These devices incorporate recently multi-core processors. Task management onto multiple core/processors is not a trivial issue, and a good task/actor scheduling can yield to significant improvements in terms of efficiency gains and also processor power consumption. Scheduling of data flows between the actors that implement the applications aims to harness multi-core architectures to more types of applications, with an explicit expression of parallelism into the application. On the other hand, the recent development of the MPEG Reconfigurable Video Coding (RVC) standard allows the reconfiguration of the video decoders. RVC is a flexible standard compatible with MPEG developed codecs, making it the ideal tool to integrate into the new multimedia terminals to decode video sequences. With the new versions of the Open RVC-CAL Compiler (Orcc), a static mapping of the actors that implement the functionality of the application can be done once the application executable has been generated. This static mapping must be done for each of the different cores available on the working platform. It has been chosen an embedded system with a processor with two ARMv7 cores. This platform allows us to obtain the desired tests, get as much improvement results from the execution on a single core, and contrast both with a PC-based multiprocessor system. Las posibilidades ofrecidas por el aumento de la velocidad de la frecuencia de reloj de sistemas de un solo procesador están siendo agotadas. Las nuevas arquitecturas multiprocesador proporcionan una vía de desarrollo alternativa en este sentido. El diseño y optimización de aplicaciones de descodificación de video que se ejecuten sobre las nuevas arquitecturas permiten un mejor aprovechamiento y favorecen la obtención de mayores rendimientos. Hoy en día muchos de los dispositivos comerciales que se están lanzando al mercado están integrados por sistemas embebidos, que recientemente están basados en arquitecturas multinúcleo. El manejo de las tareas de ejecución sobre este tipo de arquitecturas no es una tarea trivial, y una buena planificación de los actores que implementan las funcionalidades puede proporcionar importantes mejoras en términos de eficiencia en el uso de la capacidad de los procesadores y, por ende, del consumo de energía. Por otro lado, el reciente desarrollo del estándar de Codificación de Video Reconfigurable (RVC), permite la reconfiguración de los descodificadores de video. RVC es un estándar flexible y compatible con anteriores codecs desarrollados por MPEG. Esto hace de RVC el estándar ideal para ser incorporado en los nuevos terminales multimedia que se están comercializando. Con el desarrollo de las nuevas versiones del compilador específico para el desarrollo de lenguaje RVC-CAL (Orcc), en el que se basa MPEG RVC, el mapeo estático, para entornos basados en multiprocesador, de los actores que integran un descodificador es posible. Se ha elegido un sistema embebido con un procesador con dos núcleos ARMv7. Esta plataforma nos permitirá llevar a cabo las pruebas de verificación y contraste de los conceptos estudiados en este trabajo, en el sentido del desarrollo de descodificadores de video basados en MPEG RVC y del estudio de la planificación y mapeo estático de los mismos.
Resumo:
Los arrays de ranuras son sistemas de antennas conocidos desde los años 40, principalmente destinados a formar parte de sistemas rádar de navíos de combate y grandes estaciones terrenas donde el tamaño y el peso no eran altamente restrictivos. Con el paso de los años y debido sobre todo a importantes avances en materiales y métodos de fabricación, el rango de aplicaciones de este tipo de sistemas radiantes creció en gran medida. Desde nuevas tecnologías biomédicas, sistemas anticolisión en automóviles y navegación en aviones, enlaces de comunicaciones de alta tasa binaria y corta distancia e incluso sistemas embarcados en satélites para la transmisión de señal de televisión. Dentro de esta familia de antennas, existen dos grupos que destacan por ser los más utilizados: las antennas de placas paralelas con las ranuras distribuidas de forma circular o espiral y las agrupaciones de arrays lineales construidos sobre guia de onda. Continuando con las tareas de investigación desarrolladas durante los últimos años en el Instituto de Tecnología de Tokyo y en el Grupo de Radiación de la Universidad Politécnica de Madrid, la totalidad de esta tesis se centra en este último grupo, aunque como se verá se separa en gran medida de las técnicas de diseño y metodologías convencionales. Los arrays de ranuras rectas y paralelas al eje de la guía rectangular que las alimenta son, sin ninguna duda, los modelos más empleados debido a la fiabilidad que presentan a altas frecuencias, su capacidad para gestionar grandes cantidades de potencia y la sencillez de su diseño y fabricación. Sin embargo, también presentan desventajas como estrecho ancho de banda en pérdidas de retorno y rápida degradación del diagrama de radiación con la frecuencia. Éstas son debidas a la naturaleza resonante de sus elementos radiantes: al perder la resonancia, el sistema global se desajusta y sus prestaciones degeneran. En arrays bidimensionales de slots rectos, el campo eléctrico queda polarizado sobre el plano transversal a las ranuras, correspondiéndose con el plano de altos lóbulos secundarios. Esta tesis tiene como objetivo el desarrollo de un método sistemático de diseño de arrays de ranuras inclinadas y desplazadas del centro (en lo sucesivo “ranuras compuestas”), definido en 1971 como uno de los desafíos a superar dentro del mundo del diseño de antennas. La técnica empleada se basa en el Método de los Momentos, la Teoría de Circuitos y la Teoría de Conexión Aleatoria de Matrices de Dispersión. Al tratarse de un método circuital, la primera parte de la tesis se corresponde con el estudio de la aplicabilidad de las redes equivalentes fundamentales, su capacidad para recrear fenómenos físicos de la ranura, las limitaciones y ventajas que presentan para caracterizar las diferentes configuraciones de slot compuesto. Se profundiza en las diferencias entre las redes en T y en ! y se condiciona la selección de una u otra dependiendo del tipo de elemento radiante. Una vez seleccionado el tipo de red a emplear en el diseño del sistema, se ha desarrollado un algoritmo de cascadeo progresivo desde el puerto alimentador hacia el cortocircuito que termina el modelo. Este algoritmo es independiente del número de elementos, la frecuencia central de funcionamiento, del ángulo de inclinación de las ranuras y de la red equivalente seleccionada (en T o en !). Se basa en definir el diseño del array como un Problema de Satisfacción de Condiciones (en inglés, Constraint Satisfaction Problem) que se resuelve por un método de Búsqueda en Retroceso (Backtracking algorithm). Como resultado devuelve un circuito equivalente del array completo adaptado a su entrada y cuyos elementos consumen una potencia acorde a una distribución de amplitud dada para el array. En toda agrupación de antennas, el acoplo mutuo entre elementos a través del campo radiado representa uno de los principales problemas para el ingeniero y sus efectos perjudican a las prestaciones globales del sistema, tanto en adaptación como en capacidad de radiación. El empleo de circuito equivalente se descartó por la dificultad que suponía la caracterización de estos efectos y su inclusión en la etapa de diseño. En esta tesis doctoral el acoplo también se ha modelado como una red equivalente cuyos elementos son transformadores ideales y admitancias, conectada al conjunto de redes equivalentes que representa el array. Al comparar los resultados estimados en términos de pérdidas de retorno y radiación con aquellos obtenidos a partir de programas comerciales populares como CST Microwave Studio se confirma la validez del método aquí propuesto, el primer método de diseño sistemático de arrays de ranuras compuestos alimentados por guía de onda rectangular. Al tratarse de ranuras no resonantes, el ancho de banda en pérdidas de retorno es mucho mas amplio que el que presentan arrays de slots rectos. Para arrays bidimensionales, el ángulo de inclinación puede ajustarse de manera que el campo quede polarizado en los planos de bajos lóbulos secundarios. Además de simulaciones se han diseñado, construido y medido dos prototipos centrados en la frecuencia de 12GHz, de seis y diez elementos. Las medidas de pérdidas de retorno y diagrama de radiación revelan excelentes resultados, certificando la bondad del método genuino Method of Moments - Forward Matching Procedure desarrollado a lo largo de esta tésis. Abstract The slot antenna arrays are well known systems from the decade of 40s, mainly intended to be part of radar systems of large warships and terrestrial stations where size and weight were not highly restrictive. Over the years, mainly due to significant advances in materials and manufacturing methods, the range of applications of this type of radiating systems grew significantly. From new biomedical technologies, collision avoidance systems in cars and aircraft navigation, short communication links with high bit transfer rate and even embedded systems in satellites for television broadcast. Within this family of antennas, two groups stand out as being the most frequent in the literature: parallel plate antennas with slots placed in a circular or spiral distribution and clusters of waveguide linear arrays. To continue the vast research work carried out during the last decades in the Tokyo Institute of Technology and in the Radiation Group at the Universidad Politécnica de Madrid, this thesis focuses on the latter group, although it represents a technique that drastically breaks with traditional design methodologies. The arrays of slots straight and parallel to the axis of the feeding rectangular waveguide are without a doubt the most used models because of the reliability that they present at high frequencies, its ability to handle large amounts of power and their simplicity of design and manufacturing. However, there also exist disadvantages as narrow bandwidth in return loss and rapid degradation of the radiation pattern with frequency. These are due to the resonant nature of radiating elements: away from the resonance status, the overall system performance and radiation pattern diminish. For two-dimensional arrays of straight slots, the electric field is polarized transverse to the radiators, corresponding to the plane of high side-lobe level. This thesis aims to develop a systematic method of designing arrays of angled and displaced slots (hereinafter "compound slots"), defined in 1971 as one of the challenges to overcome in the world of antenna design. The used technique is based on the Method of Moments, Circuit Theory and the Theory of Scattering Matrices Connection. Being a circuitry-based method, the first part of this dissertation corresponds to the study of the applicability of the basic equivalent networks, their ability to recreate the slot physical phenomena, their limitations and advantages presented to characterize different compound slot configurations. It delves into the differences of T and ! and determines the selection of the most suitable one depending on the type of radiating element. Once the type of network to be used in the system design is selected, a progressive algorithm called Forward Matching Procedure has been developed to connect the proper equivalent networks from the feeder port to shorted ending. This algorithm is independent of the number of elements, the central operating frequency, the angle of inclination of the slots and selected equivalent network (T or ! networks). It is based on the definition of the array design as a Constraint Satisfaction Problem, solved by means of a Backtracking Algorithm. As a result, the method returns an equivalent circuit of the whole array which is matched at its input port and whose elements consume a power according to a given amplitude distribution for the array. In any group of antennas, the mutual coupling between elements through the radiated field represents one of the biggest problems that the engineer faces and its effects are detrimental to the overall performance of the system, both in radiation capabilities and return loss. The employment of an equivalent circuit for the array design was discarded by some authors because of the difficulty involved in the characterization of the coupling effects and their inclusion in the design stage. In this thesis the coupling has also been modeled as an equivalent network whose elements are ideal transformers and admittances connected to the set of equivalent networks that represent the antennas of the array. By comparing the estimated results in terms of return loss and radiation with those obtained from popular commercial software as CST Microwave Studio, the validity of the proposed method is fully confirmed, representing the first method of systematic design of compound-slot arrays fed by rectangular waveguide. Since these slots do not work under the resonant status, the bandwidth in return loss is much wider than the longitudinal-slot arrays. For the case of two-dimensional arrays, the angle of inclination can be adjusted so that the field is polarized at the low side-lobe level plane. Besides the performed full-wave simulations two prototypes of six and ten elements for the X-band have been designed, built and measured, revealing excellent results and agreement with the expected results. These facts certify that the genuine technique Method of Moments - Matching Forward Procedure developed along this thesis is valid and trustable.
Resumo:
In order to achieve total selectivity at electrical distribution networks it is of great importance to analyze the defect currents at ungrounded power systems. This information will help to grant selectivity at electrical distribution networks ensuring that only the defect line or feeder is removed from service. In the present work a new selective and directional protection method for ungrounded power systems is evaluated. The new method measures only defect currents to detect earth faults and works with a directional criterion to determine the line under faulty conditions. The main contribution of this new technique is that it can detect earth faults in outgoing lines at any type of substation avoiding the possible mismatch of traditional directional earth fault relays. This detection technique is based on the comparison of the direction of a reference current to the direction of all earth fault capacitive currents at all the feeders connected to the same bus bars. This new method has been validated through computer simulations. The results for the different cases studied are remarkable, proving total validity and usefulness of the new method.
Resumo:
In this work, the power management techniques implemented in a high-performance node for Wireless Sensor Networks (WSN) based on a RAM-based FPGA are presented. This new node custom architecture is intended for high-end WSN applications that include complex sensor management like video cameras, high compute demanding tasks such as image encoding or robust encryption, and/or higher data bandwidth needs. In the case of these complex processing tasks, yet maintaining low power design requirements, it can be shown that the combination of different techniques such as extensive HW algorithm mapping, smart management of power islands to selectively switch on and off components, smart and low-energy partial reconfiguration, an adequate set of save energy modes and wake up options, all combined, may yield energy results that may compete and improve energy usage of typical low power microcontrollers used in many WSN node architectures. Actually, results show that higher complexity tasks are in favor of HW based platforms, while the flexibility achieved by dynamic and partial reconfiguration techniques could be comparable to SW based solutions.
Resumo:
La optimización de parámetros tales como el consumo de potencia, la cantidad de recursos lógicos empleados o la ocupación de memoria ha sido siempre una de las preocupaciones principales a la hora de diseñar sistemas embebidos. Esto es debido a que se trata de sistemas dotados de una cantidad de recursos limitados, y que han sido tradicionalmente empleados para un propósito específico, que permanece invariable a lo largo de toda la vida útil del sistema. Sin embargo, el uso de sistemas embebidos se ha extendido a áreas de aplicación fuera de su ámbito tradicional, caracterizadas por una mayor demanda computacional. Así, por ejemplo, algunos de estos sistemas deben llevar a cabo un intenso procesado de señales multimedia o la transmisión de datos mediante sistemas de comunicaciones de alta capacidad. Por otra parte, las condiciones de operación del sistema pueden variar en tiempo real. Esto sucede, por ejemplo, si su funcionamiento depende de datos medidos por el propio sistema o recibidos a través de la red, de las demandas del usuario en cada momento, o de condiciones internas del propio dispositivo, tales como la duración de la batería. Como consecuencia de la existencia de requisitos de operación dinámicos es necesario ir hacia una gestión dinámica de los recursos del sistema. Si bien el software es inherentemente flexible, no ofrece una potencia computacional tan alta como el hardware. Por lo tanto, el hardware reconfigurable aparece como una solución adecuada para tratar con mayor flexibilidad los requisitos variables dinámicamente en sistemas con alta demanda computacional. La flexibilidad y adaptabilidad del hardware requieren de dispositivos reconfigurables que permitan la modificación de su funcionalidad bajo demanda. En esta tesis se han seleccionado las FPGAs (Field Programmable Gate Arrays) como los dispositivos más apropiados, hoy en día, para implementar sistemas basados en hardware reconfigurable De entre todas las posibilidades existentes para explotar la capacidad de reconfiguración de las FPGAs comerciales, se ha seleccionado la reconfiguración dinámica y parcial. Esta técnica consiste en substituir una parte de la lógica del dispositivo, mientras el resto continúa en funcionamiento. La capacidad de reconfiguración dinámica y parcial de las FPGAs es empleada en esta tesis para tratar con los requisitos de flexibilidad y de capacidad computacional que demandan los dispositivos embebidos. La propuesta principal de esta tesis doctoral es el uso de arquitecturas de procesamiento escalables espacialmente, que son capaces de adaptar su funcionalidad y rendimiento en tiempo real, estableciendo un compromiso entre dichos parámetros y la cantidad de lógica que ocupan en el dispositivo. A esto nos referimos con arquitecturas con huellas escalables. En particular, se propone el uso de arquitecturas altamente paralelas, modulares, regulares y con una alta localidad en sus comunicaciones, para este propósito. El tamaño de dichas arquitecturas puede ser modificado mediante la adición o eliminación de algunos de los módulos que las componen, tanto en una dimensión como en dos. Esta estrategia permite implementar soluciones escalables, sin tener que contar con una versión de las mismas para cada uno de los tamaños posibles de la arquitectura. De esta manera se reduce significativamente el tiempo necesario para modificar su tamaño, así como la cantidad de memoria necesaria para almacenar todos los archivos de configuración. En lugar de proponer arquitecturas para aplicaciones específicas, se ha optado por patrones de procesamiento genéricos, que pueden ser ajustados para solucionar distintos problemas en el estado del arte. A este respecto, se proponen patrones basados en esquemas sistólicos, así como de tipo wavefront. Con el objeto de poder ofrecer una solución integral, se han tratado otros aspectos relacionados con el diseño y el funcionamiento de las arquitecturas, tales como el control del proceso de reconfiguración de la FPGA, la integración de las arquitecturas en el resto del sistema, así como las técnicas necesarias para su implementación. Por lo que respecta a la implementación, se han tratado distintos aspectos de bajo nivel dependientes del dispositivo. Algunas de las propuestas realizadas a este respecto en la presente tesis doctoral son un router que es capaz de garantizar el correcto rutado de los módulos reconfigurables dentro del área destinada para ellos, así como una estrategia para la comunicación entre módulos que no introduce ningún retardo ni necesita emplear recursos configurables del dispositivo. El flujo de diseño propuesto se ha automatizado mediante una herramienta denominada DREAMS. La herramienta se encarga de la modificación de las netlists correspondientes a cada uno de los módulos reconfigurables del sistema, y que han sido generadas previamente mediante herramientas comerciales. Por lo tanto, el flujo propuesto se entiende como una etapa de post-procesamiento, que adapta esas netlists a los requisitos de la reconfiguración dinámica y parcial. Dicha modificación la lleva a cabo la herramienta de una forma completamente automática, por lo que la productividad del proceso de diseño aumenta de forma evidente. Para facilitar dicho proceso, se ha dotado a la herramienta de una interfaz gráfica. El flujo de diseño propuesto, y la herramienta que lo soporta, tienen características específicas para abordar el diseño de las arquitecturas dinámicamente escalables propuestas en esta tesis. Entre ellas está el soporte para el realojamiento de módulos reconfigurables en posiciones del dispositivo distintas a donde el módulo es originalmente implementado, así como la generación de estructuras de comunicación compatibles con la simetría de la arquitectura. El router has sido empleado también en esta tesis para obtener un rutado simétrico entre nets equivalentes. Dicha posibilidad ha sido explotada para aumentar la protección de circuitos con altos requisitos de seguridad, frente a ataques de canal lateral, mediante la implantación de lógica complementaria con rutado idéntico. Para controlar el proceso de reconfiguración de la FPGA, se propone en esta tesis un motor de reconfiguración especialmente adaptado a los requisitos de las arquitecturas dinámicamente escalables. Además de controlar el puerto de reconfiguración, el motor de reconfiguración ha sido dotado de la capacidad de realojar módulos reconfigurables en posiciones arbitrarias del dispositivo, en tiempo real. De esta forma, basta con generar un único bitstream por cada módulo reconfigurable del sistema, independientemente de la posición donde va a ser finalmente reconfigurado. La estrategia seguida para implementar el proceso de realojamiento de módulos es diferente de las propuestas existentes en el estado del arte, pues consiste en la composición de los archivos de configuración en tiempo real. De esta forma se consigue aumentar la velocidad del proceso, mientras que se reduce la longitud de los archivos de configuración parciales a almacenar en el sistema. El motor de reconfiguración soporta módulos reconfigurables con una altura menor que la altura de una región de reloj del dispositivo. Internamente, el motor se encarga de la combinación de los frames que describen el nuevo módulo, con la configuración existente en el dispositivo previamente. El escalado de las arquitecturas de procesamiento propuestas en esta tesis también se puede beneficiar de este mecanismo. Se ha incorporado también un acceso directo a una memoria externa donde se pueden almacenar bitstreams parciales. Para acelerar el proceso de reconfiguración se ha hecho funcionar el ICAP por encima de la máxima frecuencia de reloj aconsejada por el fabricante. Así, en el caso de Virtex-5, aunque la máxima frecuencia del reloj deberían ser 100 MHz, se ha conseguido hacer funcionar el puerto de reconfiguración a frecuencias de operación de hasta 250 MHz, incluyendo el proceso de realojamiento en tiempo real. Se ha previsto la posibilidad de portar el motor de reconfiguración a futuras familias de FPGAs. Por otro lado, el motor de reconfiguración se puede emplear para inyectar fallos en el propio dispositivo hardware, y así ser capaces de evaluar la tolerancia ante los mismos que ofrecen las arquitecturas reconfigurables. Los fallos son emulados mediante la generación de archivos de configuración a los que intencionadamente se les ha introducido un error, de forma que se modifica su funcionalidad. Con el objetivo de comprobar la validez y los beneficios de las arquitecturas propuestas en esta tesis, se han seguido dos líneas principales de aplicación. En primer lugar, se propone su uso como parte de una plataforma adaptativa basada en hardware evolutivo, con capacidad de escalabilidad, adaptabilidad y recuperación ante fallos. En segundo lugar, se ha desarrollado un deblocking filter escalable, adaptado a la codificación de vídeo escalable, como ejemplo de aplicación de las arquitecturas de tipo wavefront propuestas. El hardware evolutivo consiste en el uso de algoritmos evolutivos para diseñar hardware de forma autónoma, explotando la flexibilidad que ofrecen los dispositivos reconfigurables. En este caso, los elementos de procesamiento que componen la arquitectura son seleccionados de una biblioteca de elementos presintetizados, de acuerdo con las decisiones tomadas por el algoritmo evolutivo, en lugar de definir la configuración de las mismas en tiempo de diseño. De esta manera, la configuración del core puede cambiar cuando lo hacen las condiciones del entorno, en tiempo real, por lo que se consigue un control autónomo del proceso de reconfiguración dinámico. Así, el sistema es capaz de optimizar, de forma autónoma, su propia configuración. El hardware evolutivo tiene una capacidad inherente de auto-reparación. Se ha probado que las arquitecturas evolutivas propuestas en esta tesis son tolerantes ante fallos, tanto transitorios, como permanentes y acumulativos. La plataforma evolutiva se ha empleado para implementar filtros de eliminación de ruido. La escalabilidad también ha sido aprovechada en esta aplicación. Las arquitecturas evolutivas escalables permiten la adaptación autónoma de los cores de procesamiento ante fluctuaciones en la cantidad de recursos disponibles en el sistema. Por lo tanto, constituyen un ejemplo de escalabilidad dinámica para conseguir un determinado nivel de calidad, que puede variar en tiempo real. Se han propuesto dos variantes de sistemas escalables evolutivos. El primero consiste en un único core de procesamiento evolutivo, mientras que el segundo está formado por un número variable de arrays de procesamiento. La codificación de vídeo escalable, a diferencia de los codecs no escalables, permite la decodificación de secuencias de vídeo con diferentes niveles de calidad, de resolución temporal o de resolución espacial, descartando la información no deseada. Existen distintos algoritmos que soportan esta característica. En particular, se va a emplear el estándar Scalable Video Coding (SVC), que ha sido propuesto como una extensión de H.264/AVC, ya que este último es ampliamente utilizado tanto en la industria, como a nivel de investigación. Para poder explotar toda la flexibilidad que ofrece el estándar, hay que permitir la adaptación de las características del decodificador en tiempo real. El uso de las arquitecturas dinámicamente escalables es propuesto en esta tesis con este objetivo. El deblocking filter es un algoritmo que tiene como objetivo la mejora de la percepción visual de la imagen reconstruida, mediante el suavizado de los "artefactos" de bloque generados en el lazo del codificador. Se trata de una de las tareas más intensivas en procesamiento de datos de H.264/AVC y de SVC, y además, su carga computacional es altamente dependiente del nivel de escalabilidad seleccionado en el decodificador. Por lo tanto, el deblocking filter ha sido seleccionado como prueba de concepto de la aplicación de las arquitecturas dinámicamente escalables para la compresión de video. La arquitectura propuesta permite añadir o eliminar unidades de computación, siguiendo un esquema de tipo wavefront. La arquitectura ha sido propuesta conjuntamente con un esquema de procesamiento en paralelo del deblocking filter a nivel de macrobloque, de tal forma que cuando se varía del tamaño de la arquitectura, el orden de filtrado de los macrobloques varia de la misma manera. El patrón propuesto se basa en la división del procesamiento de cada macrobloque en dos etapas independientes, que se corresponden con el filtrado horizontal y vertical de los bloques dentro del macrobloque. Las principales contribuciones originales de esta tesis son las siguientes: - El uso de arquitecturas altamente regulares, modulares, paralelas y con una intensa localidad en sus comunicaciones, para implementar cores de procesamiento dinámicamente reconfigurables. - El uso de arquitecturas bidimensionales, en forma de malla, para construir arquitecturas dinámicamente escalables, con una huella escalable. De esta forma, las arquitecturas permiten establecer un compromiso entre el área que ocupan en el dispositivo, y las prestaciones que ofrecen en cada momento. Se proponen plantillas de procesamiento genéricas, de tipo sistólico o wavefront, que pueden ser adaptadas a distintos problemas de procesamiento. - Un flujo de diseño y una herramienta que lo soporta, para el diseño de sistemas reconfigurables dinámicamente, centradas en el diseño de las arquitecturas altamente paralelas, modulares y regulares propuestas en esta tesis. - Un esquema de comunicaciones entre módulos reconfigurables que no introduce ningún retardo ni requiere el uso de recursos lógicos propios. - Un router flexible, capaz de resolver los conflictos de rutado asociados con el diseño de sistemas reconfigurables dinámicamente. - Un algoritmo de optimización para sistemas formados por múltiples cores escalables que optimice, mediante un algoritmo genético, los parámetros de dicho sistema. Se basa en un modelo conocido como el problema de la mochila. - Un motor de reconfiguración adaptado a los requisitos de las arquitecturas altamente regulares y modulares. Combina una alta velocidad de reconfiguración, con la capacidad de realojar módulos en tiempo real, incluyendo el soporte para la reconfiguración de regiones que ocupan menos que una región de reloj, así como la réplica de un módulo reconfigurable en múltiples posiciones del dispositivo. - Un mecanismo de inyección de fallos que, empleando el motor de reconfiguración del sistema, permite evaluar los efectos de fallos permanentes y transitorios en arquitecturas reconfigurables. - La demostración de las posibilidades de las arquitecturas propuestas en esta tesis para la implementación de sistemas de hardware evolutivos, con una alta capacidad de procesamiento de datos. - La implementación de sistemas de hardware evolutivo escalables, que son capaces de tratar con la fluctuación de la cantidad de recursos disponibles en el sistema, de una forma autónoma. - Una estrategia de procesamiento en paralelo para el deblocking filter compatible con los estándares H.264/AVC y SVC que reduce el número de ciclos de macrobloque necesarios para procesar un frame de video. - Una arquitectura dinámicamente escalable que permite la implementación de un nuevo deblocking filter, totalmente compatible con los estándares H.264/AVC y SVC, que explota el paralelismo a nivel de macrobloque. El presente documento se organiza en siete capítulos. En el primero se ofrece una introducción al marco tecnológico de esta tesis, especialmente centrado en la reconfiguración dinámica y parcial de FPGAs. También se motiva la necesidad de las arquitecturas dinámicamente escalables propuestas en esta tesis. En el capítulo 2 se describen las arquitecturas dinámicamente escalables. Dicha descripción incluye la mayor parte de las aportaciones a nivel arquitectural realizadas en esta tesis. Por su parte, el flujo de diseño adaptado a dichas arquitecturas se propone en el capítulo 3. El motor de reconfiguración se propone en el 4, mientras que el uso de dichas arquitecturas para implementar sistemas de hardware evolutivo se aborda en el 5. El deblocking filter escalable se describe en el 6, mientras que las conclusiones finales de esta tesis, así como la descripción del trabajo futuro, son abordadas en el capítulo 7. ABSTRACT The optimization of system parameters, such as power dissipation, the amount of hardware resources and the memory footprint, has been always a main concern when dealing with the design of resource-constrained embedded systems. This situation is even more demanding nowadays. Embedded systems cannot anymore be considered only as specific-purpose computers, designed for a particular functionality that remains unchanged during their lifetime. Differently, embedded systems are now required to deal with more demanding and complex functions, such as multimedia data processing and high-throughput connectivity. In addition, system operation may depend on external data, the user requirements or internal variables of the system, such as the battery life-time. All these conditions may vary at run-time, leading to adaptive scenarios. As a consequence of both the growing computational complexity and the existence of dynamic requirements, dynamic resource management techniques for embedded systems are needed. Software is inherently flexible, but it cannot meet the computing power offered by hardware solutions. Therefore, reconfigurable hardware emerges as a suitable technology to deal with the run-time variable requirements of complex embedded systems. Adaptive hardware requires the use of reconfigurable devices, where its functionality can be modified on demand. In this thesis, Field Programmable Gate Arrays (FPGAs) have been selected as the most appropriate commercial technology existing nowadays to implement adaptive hardware systems. There are different ways of exploiting reconfigurability in reconfigurable devices. Among them is dynamic and partial reconfiguration. This is a technique which consists in substituting part of the FPGA logic on demand, while the rest of the device continues working. The strategy followed in this thesis is to exploit the dynamic and partial reconfiguration of commercial FPGAs to deal with the flexibility and complexity demands of state-of-the-art embedded systems. The proposal of this thesis to deal with run-time variable system conditions is the use of spatially scalable processing hardware IP cores, which are able to adapt their functionality or performance at run-time, trading them off with the amount of logic resources they occupy in the device. This is referred to as a scalable footprint in the context of this thesis. The distinguishing characteristic of the proposed cores is that they rely on highly parallel, modular and regular architectures, arranged in one or two dimensions. These architectures can be scaled by means of the addition or removal of the composing blocks. This strategy avoids implementing a full version of the core for each possible size, with the corresponding benefits in terms of scaling and adaptation time, as well as bitstream storage memory requirements. Instead of providing specific-purpose architectures, generic architectural templates, which can be tuned to solve different problems, are proposed in this thesis. Architectures following both systolic and wavefront templates have been selected. Together with the proposed scalable architectural templates, other issues needed to ensure the proper design and operation of the scalable cores, such as the device reconfiguration control, the run-time management of the architecture and the implementation techniques have been also addressed in this thesis. With regard to the implementation of dynamically reconfigurable architectures, device dependent low-level details are addressed. Some of the aspects covered in this thesis are the area constrained routing for reconfigurable modules, or an inter-module communication strategy which does not introduce either extra delay or logic overhead. The system implementation, from the hardware description to the device configuration bitstream, has been fully automated by modifying the netlists corresponding to each of the system modules, which are previously generated using the vendor tools. This modification is therefore envisaged as a post-processing step. Based on these implementation proposals, a design tool called DREAMS (Dynamically Reconfigurable Embedded and Modular Systems) has been created, including a graphic user interface. The tool has specific features to cope with modular and regular architectures, including the support for module relocation and the inter-module communications scheme based on the symmetry of the architecture. The core of the tool is a custom router, which has been also exploited in this thesis to obtain symmetric routed nets, with the aim of enhancing the protection of critical reconfigurable circuits against side channel attacks. This is achieved by duplicating the logic with an exactly equal routing. In order to control the reconfiguration process of the FPGA, a Reconfiguration Engine suited to the specific requirements set by the proposed architectures was also proposed. Therefore, in addition to controlling the reconfiguration port, the Reconfiguration Engine has been enhanced with the online relocation ability, which allows employing a unique configuration bitstream for all the positions where the module may be placed in the device. Differently to the existing relocating solutions, which are based on bitstream parsers, the proposed approach is based on the online composition of bitstreams. This strategy allows increasing the speed of the process, while the length of partial bitstreams is also reduced. The height of the reconfigurable modules can be lower than the height of a clock region. The Reconfiguration Engine manages the merging process of the new and the existing configuration frames within each clock region. The process of scaling up and down the hardware cores also benefits from this technique. A direct link to an external memory where partial bitstreams can be stored has been also implemented. In order to accelerate the reconfiguration process, the ICAP has been overclocked over the speed reported by the manufacturer. In the case of Virtex-5, even though the maximum frequency of the ICAP is reported to be 100 MHz, valid operations at 250 MHz have been achieved, including the online relocation process. Portability of the reconfiguration solution to today's and probably, future FPGAs, has been also considered. The reconfiguration engine can be also used to inject faults in real hardware devices, and this way being able to evaluate the fault tolerance offered by the reconfigurable architectures. Faults are emulated by introducing partial bitstreams intentionally modified to provide erroneous functionality. To prove the validity and the benefits offered by the proposed architectures, two demonstration application lines have been envisaged. First, scalable architectures have been employed to develop an evolvable hardware platform with adaptability, fault tolerance and scalability properties. Second, they have been used to implement a scalable deblocking filter suited to scalable video coding. Evolvable Hardware is the use of evolutionary algorithms to design hardware in an autonomous way, exploiting the flexibility offered by reconfigurable devices. In this case, processing elements composing the architecture are selected from a presynthesized library of processing elements, according to the decisions taken by the algorithm, instead of being decided at design time. This way, the configuration of the array may change as run-time environmental conditions do, achieving autonomous control of the dynamic reconfiguration process. Thus, the self-optimization property is added to the native self-configurability of the dynamically scalable architectures. In addition, evolvable hardware adaptability inherently offers self-healing features. The proposal has proved to be self-tolerant, since it is able to self-recover from both transient and cumulative permanent faults. The proposed evolvable architecture has been used to implement noise removal image filters. Scalability has been also exploited in this application. Scalable evolvable hardware architectures allow the autonomous adaptation of the processing cores to a fluctuating amount of resources available in the system. Thus, it constitutes an example of the dynamic quality scalability tackled in this thesis. Two variants have been proposed. The first one consists in a single dynamically scalable evolvable core, and the second one contains a variable number of processing cores. Scalable video is a flexible approach for video compression, which offers scalability at different levels. Differently to non-scalable codecs, a scalable video bitstream can be decoded with different levels of quality, spatial or temporal resolutions, by discarding the undesired information. The interest in this technology has been fostered by the development of the Scalable Video Coding (SVC) standard, as an extension of H.264/AVC. In order to exploit all the flexibility offered by the standard, it is necessary to adapt the characteristics of the decoder to the requirements of each client during run-time. The use of dynamically scalable architectures is proposed in this thesis with this aim. The deblocking filter algorithm is the responsible of improving the visual perception of a reconstructed image, by smoothing blocking artifacts generated in the encoding loop. This is one of the most computationally intensive tasks of the standard, and furthermore, it is highly dependent on the selected scalability level in the decoder. Therefore, the deblocking filter has been selected as a proof of concept of the implementation of dynamically scalable architectures for video compression. The proposed architecture allows the run-time addition or removal of computational units working in parallel to change its level of parallelism, following a wavefront computational pattern. Scalable architecture is offered together with a scalable parallelization strategy at the macroblock level, such that when the size of the architecture changes, the macroblock filtering order is modified accordingly. The proposed pattern is based on the division of the macroblock processing into two independent stages, corresponding to the horizontal and vertical filtering of the blocks within the macroblock. The main contributions of this thesis are: - The use of highly parallel, modular, regular and local architectures to implement dynamically reconfigurable processing IP cores, for data intensive applications with flexibility requirements. - The use of two-dimensional mesh-type arrays as architectural templates to build dynamically reconfigurable IP cores, with a scalable footprint. The proposal consists in generic architectural templates, which can be tuned to solve different computational problems. •A design flow and a tool targeting the design of DPR systems, focused on highly parallel, modular and local architectures. - An inter-module communication strategy, which does not introduce delay or area overhead, named Virtual Borders. - A custom and flexible router to solve the routing conflicts as well as the inter-module communication problems, appearing during the design of DPR systems. - An algorithm addressing the optimization of systems composed of multiple scalable cores, which size can be decided individually, to optimize the system parameters. It is based on a model known as the multi-dimensional multi-choice Knapsack problem. - A reconfiguration engine tailored to the requirements of highly regular and modular architectures. It combines a high reconfiguration throughput with run-time module relocation capabilities, including the support for sub-clock reconfigurable regions and the replication in multiple positions. - A fault injection mechanism which takes advantage of the system reconfiguration engine, as well as the modularity of the proposed reconfigurable architectures, to evaluate the effects of transient and permanent faults in these architectures. - The demonstration of the possibilities of the architectures proposed in this thesis to implement evolvable hardware systems, while keeping a high processing throughput. - The implementation of scalable evolvable hardware systems, which are able to adapt to the fluctuation of the amount of resources available in the system, in an autonomous way. - A parallelization strategy for the H.264/AVC and SVC deblocking filter, which reduces the number of macroblock cycles needed to process the whole frame. - A dynamically scalable architecture that permits the implementation of a novel deblocking filter module, fully compliant with the H.264/AVC and SVC standards, which exploits the macroblock level parallelism of the algorithm. This document is organized in seven chapters. In the first one, an introduction to the technology framework of this thesis, specially focused on dynamic and partial reconfiguration, is provided. The need for the dynamically scalable processing architectures proposed in this work is also motivated in this chapter. In chapter 2, dynamically scalable architectures are described. Description includes most of the architectural contributions of this work. The design flow tailored to the scalable architectures, together with the DREAMs tool provided to implement them, are described in chapter 3. The reconfiguration engine is described in chapter 4. The use of the proposed scalable archtieectures to implement evolvable hardware systems is described in chapter 5, while the scalable deblocking filter is described in chapter 6. Final conclusions of this thesis, and the description of future work, are addressed in chapter 7.
Resumo:
The objective of this study was to build up a data set including productive performance and production factors data of growing-finishing (GF) pigs in Spain in order to perform a representative and reliable description of the traits of Spanish growing-finishing pig industry. Data from 764 batches from 452 farms belonging to nine companies (1,157,212 pigs) were collected between 2008 and 2010 through a survey including five parts: general, facilities, feeding, health status and performance. Most studied farms had only GF pigs on their facilities (94.7%), produced ‘industrial’ pigs (86.7%), had entire male and female (59.5%) and Pietrain-sired pigs (70.0%), housed between 13-20 pigs per pen (87.2%), had 50% of slatted floor (70%), single-space dry feeder (54.0%), nipple drinker (88.7%) and automatic ventilation systems (71.2%). A 75.0% of the farms used three feeding phases using mainly pelleted diets (91.0%), 61.3% performed three or more antibiotic treatments and 36.5% obtained water from the public supply. Continuous variables studied had the following average values: number of pigs placed per batch, 1,515 pigs; initial and final body weight, 19.0 and 108 kg; length of GF period, 136 days; culling rate, 1.4%; barn occupation, 99.7%; feed intake per pig and fattening cycle, 244 kg; daily gain, 0.657 kg; feed conversion ratio, 2.77 kg kg-1 and mortality rate, 4.3%. Data reflecting the practical situation of the Spanish growing and finishing pig production and it may contribute to develop new strategies in order to improve the productive and economic efficiency of GF pig units.
Resumo:
With the advent of cloud computing model, distributed caches have become the cornerstone for building scalable applications. Popular systems like Facebook [1] or Twitter use Memcached [5], a highly scalable distributed object cache, to speed up applications by avoiding database accesses. Distributed object caches assign objects to cache instances based on a hashing function, and objects are not moved from a cache instance to another unless more instances are added to the cache and objects are redistributed. This may lead to situations where some cache instances are overloaded when some of the objects they store are frequently accessed, while other cache instances are less frequently used. In this paper we propose a multi-resource load balancing algorithm for distributed cache systems. The algorithm aims at balancing both CPU and Memory resources among cache instances by redistributing stored data. Considering the possible conflict of balancing multiple resources at the same time, we give CPU and Memory resources weighted priorities based on the runtime load distributions. A scarcer resource is given a higher weight than a less scarce resource when load balancing. The system imbalance degree is evaluated based on monitoring information, and the utility load of a node, a unit for resource consumption. Besides, since continuous rebalance of the system may affect the QoS of applications utilizing the cache system, our data selection policy ensures that each data migration minimizes the system imbalance degree and hence, the total reconfiguration cost can be minimized. An extensive simulation is conducted to compare our policy with other policies. Our policy shows a significant improvement in time efficiency and decrease in reconfiguration cost.
Resumo:
One of the main concerns of evolvable and adaptive systems is the need of a training mechanism, which is normally done by using a training reference and a test input. The fitness function to be optimized during the evolution (training) phase is obtained by comparing the output of the candidate systems against the reference. The adaptivity that this type of systems may provide by re-evolving during operation is especially important for applications with runtime variable conditions. However, fully automated self-adaptivity poses additional problems. For instance, in some cases, it is not possible to have such reference, because the changes in the environment conditions are unknown, so it becomes difficult to autonomously identify which problem requires to be solved, and hence, what conditions should be representative for an adequate re-evolution. In this paper, a solution to solve this dependency is presented and analyzed. The system consists of an image filter application mapped on an evolvable hardware platform, able to evolve using two consecutive frames from a camera as both test and reference images. The system is entirely mapped in an FPGA, and native dynamic and partial reconfiguration is used for evolution. It is also shown that using such images, both of them being noisy, as input and reference images in the evolution phase of the system is equivalent or even better than evolving the filter with offline images. The combination of both techniques results in the completely autonomous, noise type/level agnostic filtering system without reference image requirement described along the paper.
Resumo:
Evolvable Hardware (EH) is a technique that consists of using reconfigurable hardware devices whose configuration is controlled by an Evolutionary Algorithm (EA). Our system consists of a fully-FPGA implemented scalable EH platform, where the Reconfigurable processing Core (RC) can adaptively increase or decrease in size. Figure 1 shows the architecture of the proposed System-on-Programmable-Chip (SoPC), consisting of a MicroBlaze processor responsible of controlling the whole system operation, a Reconfiguration Engine (RE), and a Reconfigurable processing Core which is able to change its size in both height and width. This system is used to implement image filters, which are generated autonomously thanks to the evolutionary process. The system is complemented with a camera that enables the usage of the platform for real time applications.
Resumo:
In this paper, an architecture based on a scalable and flexible set of Evolvable Processing arrays is presented. FPGA-native Dynamic Partial Reconfiguration (DPR) is used for evolution, which is done intrinsically, letting the system to adapt autonomously to variable run-time conditions, including the presence of transient and permanent faults. The architecture supports different modes of operation, namely: independent, parallel, cascaded or bypass mode. These modes of operation can be used during evolution time or during normal operation. The evolvability of the architecture is combined with fault-tolerance techniques, to enhance the platform with self-healing features, making it suitable for applications which require both high adaptability and reliability. Experimental results show that such a system may benefit from accelerated evolution times, increased performance and improved dependability, mainly by increasing fault tolerance for transient and permanent faults, as well as providing some fault identification possibilities. The evolvable HW array shown is tailored for window-based image processing applications.
Resumo:
While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today’s applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements.
Resumo:
Evolvable hardware (EH) is an interesting alternative to conventional digital circuit design, since autonomous generation of solutions for a given task permits self-adaptivity of the system to changing environments, and they present inherent fault tolerance when evolution is intrinsically performed. Systems based on FPGAs that use Dynamic and Partial Reconfiguration (DPR) for evolving the circuit are an example. Also, thanks to DPR, these systems can be provided with scalability, a feature that allows a system to change the number of allocated resources at run-time in order to vary some feature, such as performance. The combination of both aspects leads to scalable evolvable hardware (SEH), which changes in size as an extra degree of freedom when trying to achieve the optimal solution by means of evolution. The main contributions of this paper are an architecture of a scalable and evolvable hardware processing array system, some preliminary evolution strategies which take scalability into consideration, and to show in the experimental results the benefits of combined evolution and scalability. A digital image filtering application is used as use case.