961 resultados para General-purpose computing on graphics processing units (GPGPU)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fluid inteliigence has been defined as an innate ability to reason which is measured commonly by the Raven's Progressive Matrices (RPM). Individual differences in fluid intelligence are currently explained by the Cascade model (Fry & Hale, 1996) and the Controlled Attention hypothesis (Engle, Kane, & Tuholski, 1999; Kane & Engle, 2002). The first theory is based on a complex relation among age, speed, and working memory which is described as a Cascade. The alternative to this theory, the Controlled Attention hypothesis, is based on the proposition that it is the executive attention component of working memory that explains performance on fluid intelligence tests. The first goal of this study was to examine whether the Cascade model is consistent within the visuo-spatial and verbal-numerical modalities. The second goal was to examine whether the executive attention component ofworking memory accounts for the relation between working memory and fluid intelligence. Two hundred and six undergraduate students between the ages of 18 and 28 completed a battery of cognitive tests selected to measure processing speed, working memory, and controlled attention which were selected from two cognitive modalities, verbalnumerical and visuo-spatial. These were used to predict performance on two standard measures of fluid intelligence: the Raven's Progressive Matrices (RPM) and the Shipley Institute of Living Scales (SILS) subtests. Multiple regression and Structural Equation Modeling (SEM) were used to test the Cascade model and to determine the independent and joint effects of controlled attention and working memory on general fluid intelligence. Among the processing speed measures only spatial scan was related to the RPM. No other significant relations were observed between processing speed and fluid intelligence. As 1 a construct, working memory was related to the fluid intelligence tests. Consistent with the predictions for the RPM there was support for the Cascade model within the visuo-spatial modality but not within the verbal-numerical modality. There was no support for the Cascade model with respect to the SILS tests. SEM revealed that there was a direct path between controlled attention and RPM and between working memory and RPM. However, a significant path between set switching and RPM explained the relation between controlled attention and RPM. The prediction that controlled attention mediated the relation between working memory and RPM was therefore not supported. The findings support the view that the Cascade model may not adequately explain individual differences in fluid intelligence and this may be due to the differential relations observed between working memory and fluid intelligence across different modalities. The findings also show that working memory is not a domain-general construct and as a result its relation with fluid intelligence may be dependent on the nature of the working memory modality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Le code source de la libraire développée accompagne ce dépôt dans l'état où il était à ce moment. Il est possible de trouver une version plus à jour sur github (http://github.com/abergeron).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many discussions about the music processing have occurred over the years. It is stated, on one hand, the existence of a single joint for grasping the music or any of its attributes by the Central Nervous System. Furthermore, it is claimed also the existence of multiple and diverse systems to understand each aspect of music. In general, model-independent set, studies focusing on the processing of sound components, specifically the musical tones, can significantly clarify the basic functioning of the auditory system and other higher brain functions. In this sense, one of the most prominent approaches in the study of sensory and perceptual processes of hearing, or changed unharmed, has been Neuroscience, which is interested in the interaction between the brain areas corresponding to different cognitive processes. Thus, the purpose of this study was to review the studies that dealt processing models of the attributes of tonal Western music, based on the conception that neuropsychological neural structures are interdependent sensory pathways.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The complexity of current and emerging architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance. The resource sharing present in modern multi-core architectures adds various levels of heterogeneity to the system. Shared resources often includes cache, memory, network controllers and in some cases floating point units (as in the AMD Bulldozer), which mean that the access time depends on the mapping of application tasks, and the core's location within the system. Heterogeneity further increases with the use of hardware-accelerators such as GPUs and the Intel Xeon Phi, where many specialist cores are attached to general-purpose cores. This trend for shared resources and non-uniform cores is expected to continue into the exascale era. The complexity of these systems means that various runtime scenarios are possible, and it has been found that under-populating nodes, altering the domain decomposition and non-standard task to core mappings can dramatically alter performance. To find this out, however, is often a process of trial and error. To better inform this process, a performance model was developed for a simple regular grid-based kernel code, shallow. The code comprises two distinct types of work, loop-based array updates and nearest-neighbour halo-exchanges. Separate performance models were developed for each part, both based on a similar methodology. Application specific benchmarks were run to measure performance for different problem sizes under different execution scenarios. These results were then fed into a performance model that derives resource usage for a given deployment scenario, with interpolation between results as necessary.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The petroleum production is associated to the produced water, which has dispersed and dissolved materials that damage not only the environment, but also the petroleum processing units. This study aims at the treatment of produced water focusing mainly on the removal of metals and oil and using this treated water as raw material for the production of sodium carbonate. Initially, it was addressed the removal of the following divalent metals: calcium, magnesium, barium, zinc, copper, iron, and cadmium. For this purpose, surfactants derived from vegetable oils, such as coconut oil, soybean oil, and sunflower oil, were used. The investigation showed that there is a stoichiometric relationship between the metals removed from the produced water and the surfactants used in the process of metals removal. It was also developed a model that correlates the hydrolysis constant of saponified coconut oil with the metal distribution between the resulting stages of the proposed process, flocs and aqueous phases, and relating the results with the pH of the medium. The correlation coefficient obtained was 0.963. Next, the process of producing washing soda (prefiro soda ahs ou sodium carbonate) started. The resulting water from the various treatment approaches from petroleum production water was used. During this stage of the research, it was observed that the surfactant assisted in the produced water treatment, by removing some metals and the dispersed oil entirety. The yield of sodium carbonate production was approximately 80%, and its purity was around 95%. It was also assessed, in the production of sodium carbonate, the influence of the type of reactor, using a continuous reactor and a batch reactor. These tests showed that the process with continuous reactor was not as efficient as the batch process. In general, it can be concluded that the production of sodium carbonate from water of oil production is a feasible process, rendering an effluent that causes a great environmental impact a raw material with large scale industrial use

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a simplified architecture of a neurofuzzy controller for general purpose applications that tries to minimize the processing used in the several stages of hazy modeling of systems. The basic procedures of fuzzification and defuzzification are simplified to the maximum while the inference procedures are computed in a private way. The simplified architecture allows a fast and easy configuration of the neurofuzzy controller and the structuring rules that define the control actions is automatic. Th controller's Limits and performance are standardized and the control actions are previously calculated. For application, the industrial systems of fluid flow control will be considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Healthcare, Human Computer Interfaces (HCI), Security and Biometry are the most promising application scenario directly involved in the Body Area Networks (BANs) evolution. Both wearable devices and sensors directly integrated in garments envision a word in which each of us is supervised by an invisible assistant monitoring our health and daily-life activities. New opportunities are enabled because improvements in sensors miniaturization and transmission efficiency of the wireless protocols, that achieved the integration of high computational power aboard independent, energy-autonomous, small form factor devices. Application’s purposes are various: (I) data collection to achieve off-line knowledge discovery; (II) user notification of his/her activities or in case a danger occurs; (III) biofeedback rehabilitation; (IV) remote alarm activation in case the subject need assistance; (V) introduction of a more natural interaction with the surrounding computerized environment; (VI) users identification by physiological or behavioral characteristics. Telemedicine and mHealth [1] are two of the leading concepts directly related to healthcare. The capability to borne unobtrusiveness objects supports users’ autonomy. A new sense of freedom is shown to the user, not only supported by a psychological help but a real safety improvement. Furthermore, medical community aims the introduction of new devices to innovate patient treatments. In particular, the extension of the ambulatory analysis in the real life scenario by proving continuous acquisition. The wide diffusion of emerging wellness portable equipment extended the usability of wearable devices also for fitness and training by monitoring user performance on the working task. The learning of the right execution techniques related to work, sport, music can be supported by an electronic trainer furnishing the adequate aid. HCIs made real the concept of Ubiquitous, Pervasive Computing and Calm Technology introduced in the 1988 by Marc Weiser and John Seeley Brown. They promotes the creation of pervasive environments, enhancing the human experience. Context aware, adaptive and proactive environments serve and help people by becoming sensitive and reactive to their presence, since electronics is ubiquitous and deployed everywhere. In this thesis we pay attention to the integration of all the aspects involved in a BAN development. Starting from the choice of sensors we design the node, configure the radio network, implement real-time data analysis and provide a feedback to the user. We present algorithms to be implemented in wearable assistant for posture and gait analysis and to provide assistance on different walking conditions, preventing falls. Our aim, expressed by the idea to contribute at the development of a non proprietary solutions, driven us to integrate commercial and standard solutions in our devices. We use sensors available on the market and avoided to design specialized sensors in ASIC technologies. We employ standard radio protocol and open source projects when it was achieved. The specific contributions of the PhD research activities are presented and discussed in the following. • We have designed and build several wireless sensor node providing both sensing and actuator capability making the focus on the flexibility, small form factor and low power consumption. The key idea was to develop a simple and general purpose architecture for rapid analysis, prototyping and deployment of BAN solutions. Two different sensing units are integrated: kinematic (3D accelerometer and 3D gyroscopes) and kinetic (foot-floor contact pressure forces). Two kind of feedbacks were implemented: audio and vibrotactile. • Since the system built is a suitable platform for testing and measuring the features and the constraints of a sensor network (radio communication, network protocols, power consumption and autonomy), we made a comparison between Bluetooth and ZigBee performance in terms of throughput and energy efficiency. Test in the field evaluate the usability in the fall detection scenario. • To prove the flexibility of the architecture designed, we have implemented a wearable system for human posture rehabilitation. The application was developed in conjunction with biomedical engineers who provided the audio-algorithms to furnish a biofeedback to the user about his/her stability. • We explored off-line gait analysis of collected data, developing an algorithm to detect foot inclination in the sagittal plane, during walk. • In collaboration with the Wearable Lab – ETH, Zurich, we developed an algorithm to monitor the user during several walking condition where the user carry a load. The remainder of the thesis is organized as follows. Chapter I gives an overview about Body Area Networks (BANs), illustrating the relevant features of this technology and the key challenges still open. It concludes with a short list of the real solutions and prototypes proposed by academic research and manufacturers. The domain of the posture and gait analysis, the methodologies, and the technologies used to provide real-time feedback on detected events, are illustrated in Chapter II. The Chapter III and IV, respectively, shown BANs developed with the purpose to detect fall and monitor the gait taking advantage by two inertial measurement unit and baropodometric insoles. Chapter V reports an audio-biofeedback system to improve balance on the information provided by the use centre of mass. A walking assistant based on the KNN classifier to detect walking alteration on load carriage, is described in Chapter VI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In such territories where food production is mostly scattered in several small / medium size or even domestic farms, a lot of heterogeneous residues are produced yearly, since farmers usually carry out different activities in their properties. The amount and composition of farm residues, therefore, widely change during year, according to the single production process periodically achieved. Coupling high efficiency micro-cogeneration energy units with easy handling biomass conversion equipments, suitable to treat different materials, would provide many important advantages to the farmers and to the community as well, so that the increase in feedstock flexibility of gasification units is nowadays seen as a further paramount step towards their wide spreading in rural areas and as a real necessity for their utilization at small scale. Two main research topics were thought to be of main concern at this purpose, and they were therefore discussed in this work: the investigation of fuels properties impact on gasification process development and the technical feasibility of small scale gasification units integration with cogeneration systems. According to these two main aspects, the present work was thus divided in two main parts. The first one is focused on the biomass gasification process, that was investigated in its theoretical aspects and then analytically modelled in order to simulate thermo-chemical conversion of different biomass fuels, such as wood (park waste wood and softwood), wheat straw, sewage sludge and refuse derived fuels. The main idea is to correlate the results of reactor design procedures with the physical properties of biomasses and the corresponding working conditions of gasifiers (temperature profile, above all), in order to point out the main differences which prevent the use of the same conversion unit for different materials. At this scope, a gasification kinetic free model was initially developed in Excel sheets, considering different values of air to biomass ratio and the downdraft gasification technology as particular examined application. The differences in syngas production and working conditions (process temperatures, above all) among the considered fuels were tried to be connected to some biomass properties, such elementary composition, ash and water contents. The novelty of this analytical approach was the use of kinetic constants ratio in order to determine oxygen distribution among the different oxidation reactions (regarding volatile matter only) while equilibrium of water gas shift reaction was considered in gasification zone, by which the energy and mass balances involved in the process algorithm were linked together, as well. Moreover, the main advantage of this analytical tool is the easiness by which the input data corresponding to the particular biomass materials can be inserted into the model, so that a rapid evaluation on their own thermo-chemical conversion properties is possible to be obtained, mainly based on their chemical composition A good conformity of the model results with the other literature and experimental data was detected for almost all the considered materials (except for refuse derived fuels, because of their unfitting chemical composition with the model assumptions). Successively, a dimensioning procedure for open core downdraft gasifiers was set up, by the analysis on the fundamental thermo-physical and thermo-chemical mechanisms which are supposed to regulate the main solid conversion steps involved in the gasification process. Gasification units were schematically subdivided in four reaction zones, respectively corresponding to biomass heating, solids drying, pyrolysis and char gasification processes, and the time required for the full development of each of these steps was correlated to the kinetics rates (for pyrolysis and char gasification processes only) and to the heat and mass transfer phenomena from gas to solid phase. On the basis of this analysis and according to the kinetic free model results and biomass physical properties (particles size, above all) it was achieved that for all the considered materials char gasification step is kinetically limited and therefore temperature is the main working parameter controlling this step. Solids drying is mainly regulated by heat transfer from bulk gas to the inner layers of particles and the corresponding time especially depends on particle size. Biomass heating is almost totally achieved by the radiative heat transfer from the hot walls of reactor to the bed of material. For pyrolysis, instead, working temperature, particles size and the same nature of biomass (through its own pyrolysis heat) have all comparable weights on the process development, so that the corresponding time can be differently depending on one of these factors according to the particular fuel is gasified and the particular conditions are established inside the gasifier. The same analysis also led to the estimation of reaction zone volumes for each biomass fuel, so as a comparison among the dimensions of the differently fed gasification units was finally accomplished. Each biomass material showed a different volumes distribution, so that any dimensioned gasification unit does not seem to be suitable for more than one biomass species. Nevertheless, since reactors diameters were found out quite similar for all the examined materials, it could be envisaged to design a single units for all of them by adopting the largest diameter and by combining together the maximum heights of each reaction zone, as they were calculated for the different biomasses. A total height of gasifier as around 2400mm would be obtained in this case. Besides, by arranging air injecting nozzles at different levels along the reactor, gasification zone could be properly set up according to the particular material is in turn gasified. Finally, since gasification and pyrolysis times were found to considerably change according to even short temperature variations, it could be also envisaged to regulate air feeding rate for each gasified material (which process temperatures depend on), so as the available reactor volumes would be suitable for the complete development of solid conversion in each case, without even changing fluid dynamics behaviour of the unit as well as air/biomass ratio in noticeable measure. The second part of this work dealt with the gas cleaning systems to be adopted downstream the gasifiers in order to run high efficiency CHP units (i.e. internal engines and micro-turbines). Especially in the case multi–fuel gasifiers are assumed to be used, weightier gas cleaning lines need to be envisaged in order to reach the standard gas quality degree required to fuel cogeneration units. Indeed, as the more heterogeneous feed to the gasification unit, several contaminant species can simultaneously be present in the exit gas stream and, as a consequence, suitable gas cleaning systems have to be designed. In this work, an overall study on gas cleaning lines assessment is carried out. Differently from the other research efforts carried out in the same field, the main scope is to define general arrangements for gas cleaning lines suitable to remove several contaminants from the gas stream, independently on the feedstock material and the energy plant size The gas contaminant species taken into account in this analysis were: particulate, tars, sulphur (in H2S form), alkali metals, nitrogen (in NH3 form) and acid gases (in HCl form). For each of these species, alternative cleaning devices were designed according to three different plant sizes, respectively corresponding with 8Nm3/h, 125Nm3/h and 350Nm3/h gas flows. Their performances were examined on the basis of their optimal working conditions (efficiency, temperature and pressure drops, above all) and their own consumption of energy and materials. Successively, the designed units were combined together in different overall gas cleaning line arrangements, paths, by following some technical constraints which were mainly determined from the same performance analysis on the cleaning units and from the presumable synergic effects by contaminants on the right working of some of them (filters clogging, catalysts deactivation, etc.). One of the main issues to be stated in paths design accomplishment was the tars removal from the gas stream, preventing filters plugging and/or line pipes clogging At this scope, a catalytic tars cracking unit was envisaged as the only solution to be adopted, and, therefore, a catalytic material which is able to work at relatively low temperatures was chosen. Nevertheless, a rapid drop in tars cracking efficiency was also estimated for this same material, so that an high frequency of catalysts regeneration and a consequent relevant air consumption for this operation were calculated in all of the cases. Other difficulties had to be overcome in the abatement of alkali metals, which condense at temperatures lower than tars, but they also need to be removed in the first sections of gas cleaning line in order to avoid corrosion of materials. In this case a dry scrubber technology was envisaged, by using the same fine particles filter units and by choosing for them corrosion resistant materials, like ceramic ones. Besides these two solutions which seem to be unavoidable in gas cleaning line design, high temperature gas cleaning lines were not possible to be achieved for the two larger plant sizes, as well. Indeed, as the use of temperature control devices was precluded in the adopted design procedure, ammonia partial oxidation units (as the only considered methods for the abatement of ammonia at high temperature) were not suitable for the large scale units, because of the high increase of reactors temperature by the exothermic reactions involved in the process. In spite of these limitations, yet, overall arrangements for each considered plant size were finally designed, so that the possibility to clean the gas up to the required standard degree was technically demonstrated, even in the case several contaminants are simultaneously present in the gas stream. Moreover, all the possible paths defined for the different plant sizes were compared each others on the basis of some defined operational parameters, among which total pressure drops, total energy losses, number of units and secondary materials consumption. On the basis of this analysis, dry gas cleaning methods proved preferable to the ones including water scrubber technology in al of the cases, especially because of the high water consumption provided by water scrubber units in ammonia adsorption process. This result is yet connected to the possibility to use activated carbon units for ammonia removal and Nahcolite adsorber for chloride acid. The very high efficiency of this latter material is also remarkable. Finally, as an estimation of the overall energy loss pertaining the gas cleaning process, the total enthalpy losses estimated for the three plant sizes were compared with the respective gas streams energy contents, these latter obtained on the basis of low heating value of gas only. This overall study on gas cleaning systems is thus proposed as an analytical tool by which different gas cleaning line configurations can be evaluated, according to the particular practical application they are adopted for and the size of cogeneration unit they are connected to.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the effects of a borderline-specific treatment, called general psychiatric management, on emotional change, outcome and therapeutic alliance of an outpatient presenting with borderline personality disorder. Based on the sequential model of emotional processing, emotional states were assessed in a 10-session setting. The case showed an increase in expressions of distress and no change in therapeutic alliance and tended towards general deterioration. Results suggest emotional processing may play a lesser role in general psychiatric management in early phase treatment than previously hypothezised.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hoy en día el uso de dispositivos portátiles multimedia es ya una realidad totalmente habitual. Además, estos dispositivos tienen una capacidad de cálculo y unos recursos gráficos y de memoria altos, tanto es así que por ejemplo en un móvil se pueden reproducir vídeos de muy alta calidad o tener capacidad para manejar entornos 3D. El precio del uso de estos recursos es un mayor consumo de batería que en ocasiones es demasiado alto y acortan en gran medida la vida de la carga útil de la batería. El Grupo de Diseño Electrónico y Microelectrónico de la Universidad Politécnica de Madrid ha abierto una línea de trabajo que busca la optimización del consumo de energía en este tipo de dispositivos, concretamente en el ámbito de la reproducción de vídeo. El enfoque para afrontar la solución del problema se basa en obtener un mayor rendimiento de la batería a costa de disminuir la experiencia multimedia del usuario. De esta manera, cuando la carga de la batería esté por debajo de un determinado umbral mientras el dispositivo esté reproduciendo un vídeo de alta calidad será el dispositivo quien se autoconfigure dinámicamente para consumir menos potencia en esta tarea, reduciendo la tasa de imágenes por segundo o la resolución del vídeo que se descodifica. Además de lo citado anteriormente se propone dividir la descodificación y la representación del vídeo en dos procesadores, uno de propósito general y otro para procesado digital de señal, con esto se consigue que tener la misma capacidad de cálculo que con un solo procesador pero a una frecuencia menor. Para materializar la propuesta se usará la tarjeta BeagleBoard basada en un procesador multinúcleo OMAP3530 de Texas Instrument que contiene dos núcleos: un ARM1 Cortex-A8 y un DSP2 de la familia C6000. Este procesador multinúcleo además permite modificar la frecuencia de reloj y la tensión de alimentación dinámicamente para conseguir reducir de este modo el consumo del terminal. Por otro lado, como reproductor de vídeos se utilizará una versión de MPlayer que integra un descodificador de vídeo escalable que permite elegir dinámicamente la resolución o las imágenes por segundo que se decodifican para posteriormente mostrarlas. Este reproductor se ejecutará en el núcleo ARM pero debido a la alta carga computacional de la descodificación de vídeos, y que el ARM no está optimizado para este tipo de procesado de datos, el reproductor debe encargar la tarea de la descodificación al DSP. El objetivo de este Proyecto Fin de Carrera consiste en que mientras el descodificador de vídeo está ejecutándose en el núcleo DSP y el Mplayer en el núcleo ARM del OMAP3530 se pueda elegir dinámicamente qué parte del vídeo se descodifica, es decir, seleccionar en tiempo real la calidad o capa del vídeo que se quiere mostrar. Haciendo esto, se podrá quitar carga computacional al núcleo ARM y asignársela al DSP el cuál puede procesarla a menor frecuencia para ahorrar batería. 1 ARM: Es una arquitectura de procesadores de propósito general basada en RISC (Reduced Instruction Set Computer). Es desarrollada por la empresa inglesa ARM holdings. 2 DSP: Procesador Digital de Señal (Digital Signal Processor). Es un sistema basado en procesador, el cual está orientado al cálculo matemático a altas velocidad. Generalmente poseen varias unidades aritmético-lógicas (ALUs) para conseguir realizar varias operaciones simultáneamente. SUMMARY. Nowadays, the use of multimedia devices is a well known reality. In addition, these devices have high graphics and calculus performance and a lot of memory as well. In instance, we can play high quality videos and 3D environments in a mobile phone. That kind of use may increase the device's power consumption and make shorter the battery duration. Electronic and Microelectronic Design Group of Technical University of Madrid has a research line which is looking for optimization of power consumption while these devices are playing videos. The solution of this trouble is based on taking more advantage of battery by decreasing multimedia user experience. On this way, when battery charge is under a threshold while device is playing a high quality video the device is going to configure itself dynamically in order to decrease its power consumption by decreasing frame per second rate, video resolution or increasing the noise in the decoded frame. It is proposed splitting decoding and representation tasks in two processors in order to have the same calculus capability with lower frecuency. The first one is specialized in digital signal processing and the other one is a general purpose processor. In order to materialize this proposal we will use a board called BeagleBoard which is based on a multicore processor called OMAP3530 from Texas Instrument. This processor includes two cores: ARM Cortex-A8 and a TMS320C64+ DSP core. Changing clock frequency and supply voltage is allowed by OMAP3530, we can decrease the power consumption on this way. On the other hand, MPlayer will be used as video player. It includes a scalable video decoder which let us changing dynamically the resolution or frames per second rate of the video in order to show it later. This player will be executed by ARM core but this is not optimized for this task, for that reason, DSP core will be used to decoding video. The target of this final career project is being able to choose which part of the video is decoded each moment while decoder is executed by DSP and Mplayer by ARM. It will be able to change in real time the video quality, resolution and frames per second that user want to show. On this way, reducing the computational charge within the processor will be possible.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: A fully three-dimensional (3D) massively parallelizable list-mode ordered-subsets expectation-maximization (LM-OSEM) reconstruction algorithm has been developed for high-resolution PET cameras. System response probabilities are calculated online from a set of parameters derived from Monte Carlo simulations. The shape of a system response for a given line of response (LOR) has been shown to be asymmetrical around the LOR. This work has been focused on the development of efficient region-search techniques to sample the system response probabilities, which are suitable for asymmetric kernel models, including elliptical Gaussian models that allow for high accuracy and high parallelization efficiency. The novel region-search scheme using variable kernel models is applied in the proposed PET reconstruction algorithm. Methods: A novel region-search technique has been used to sample the probability density function in correspondence with a small dynamic subset of the field of view that constitutes the region of response (ROR). The ROR is identified around the LOR by searching for any voxel within a dynamically calculated contour. The contour condition is currently defined as a fixed threshold over the posterior probability, and arbitrary kernel models can be applied using a numerical approach. The processing of the LORs is distributed in batches among the available computing devices, then, individual LORs are processed within different processing units. In this way, both multicore and multiple many-core processing units can be efficiently exploited. Tests have been conducted with probability models that take into account the noncolinearity, positron range, and crystal penetration effects, that produced tubes of response with varying elliptical sections whose axes were a function of the crystal's thickness and angle of incidence of the given LOR. The algorithm treats the probability model as a 3D scalar field defined within a reference system aligned with the ideal LOR. Results: This new technique provides superior image quality in terms of signal-to-noise ratio as compared with the histogram-mode method based on precomputed system matrices available for a commercial small animal scanner. Reconstruction times can be kept low with the use of multicore, many-core architectures, including multiple graphic processing units. Conclusions: A highly parallelizable LM reconstruction method has been proposed based on Monte Carlo simulations and new parallelization techniques aimed at improving the reconstruction speed and the image signal-to-noise of a given OSEM algorithm. The method has been validated using simulated and real phantoms. A special advantage of the new method is the possibility of defining dynamically the cut-off threshold over the calculated probabilities thus allowing for a direct control on the trade-off between speed and quality during the reconstruction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los sistemas empotrados han sido concebidos tradicionalmente como sistemas de procesamiento específicos que realizan una tarea fija durante toda su vida útil. Para cumplir con requisitos estrictos de coste, tamaño y peso, el equipo de diseño debe optimizar su funcionamiento para condiciones muy específicas. Sin embargo, la demanda de mayor versatilidad, un funcionamiento más inteligente y, en definitiva, una mayor capacidad de procesamiento comenzaron a chocar con estas limitaciones, agravado por la incertidumbre asociada a entornos de operación cada vez más dinámicos donde comenzaban a ser desplegados progresivamente. Esto trajo como resultado una necesidad creciente de que los sistemas pudieran responder por si solos a eventos inesperados en tiempo diseño tales como: cambios en las características de los datos de entrada y el entorno del sistema en general; cambios en la propia plataforma de cómputo, por ejemplo debido a fallos o defectos de fabricación; y cambios en las propias especificaciones funcionales causados por unos objetivos del sistema dinámicos y cambiantes. Como consecuencia, la complejidad del sistema aumenta, pero a cambio se habilita progresivamente una capacidad de adaptación autónoma sin intervención humana a lo largo de la vida útil, permitiendo que tomen sus propias decisiones en tiempo de ejecución. Éstos sistemas se conocen, en general, como sistemas auto-adaptativos y tienen, entre otras características, las de auto-configuración, auto-optimización y auto-reparación. Típicamente, la parte soft de un sistema es mayoritariamente la única utilizada para proporcionar algunas capacidades de adaptación a un sistema. Sin embargo, la proporción rendimiento/potencia en dispositivos software como microprocesadores en muchas ocasiones no es adecuada para sistemas empotrados. En este escenario, el aumento resultante en la complejidad de las aplicaciones está siendo abordado parcialmente mediante un aumento en la complejidad de los dispositivos en forma de multi/many-cores; pero desafortunadamente, esto hace que el consumo de potencia también aumente. Además, la mejora en metodologías de diseño no ha sido acorde como para poder utilizar toda la capacidad de cómputo disponible proporcionada por los núcleos. Por todo ello, no se están satisfaciendo adecuadamente las demandas de cómputo que imponen las nuevas aplicaciones. La solución tradicional para mejorar la proporción rendimiento/potencia ha sido el cambio a unas especificaciones hardware, principalmente usando ASICs. Sin embargo, los costes de un ASIC son altamente prohibitivos excepto en algunos casos de producción en masa y además la naturaleza estática de su estructura complica la solución a las necesidades de adaptación. Los avances en tecnologías de fabricación han hecho que la FPGA, una vez lenta y pequeña, usada como glue logic en sistemas mayores, haya crecido hasta convertirse en un dispositivo de cómputo reconfigurable de gran potencia, con una cantidad enorme de recursos lógicos computacionales y cores hardware empotrados de procesamiento de señal y de propósito general. Sus capacidades de reconfiguración han permitido combinar la flexibilidad propia del software con el rendimiento del procesamiento en hardware, lo que tiene la potencialidad de provocar un cambio de paradigma en arquitectura de computadores, pues el hardware no puede ya ser considerado más como estático. El motivo es que como en el caso de las FPGAs basadas en tecnología SRAM, la reconfiguración parcial dinámica (DPR, Dynamic Partial Reconfiguration) es posible. Esto significa que se puede modificar (reconfigurar) un subconjunto de los recursos computacionales en tiempo de ejecución mientras el resto permanecen activos. Además, este proceso de reconfiguración puede ser ejecutado internamente por el propio dispositivo. El avance tecnológico en dispositivos hardware reconfigurables se encuentra recogido bajo el campo conocido como Computación Reconfigurable (RC, Reconfigurable Computing). Uno de los campos de aplicación más exóticos y menos convencionales que ha posibilitado la computación reconfigurable es el conocido como Hardware Evolutivo (EHW, Evolvable Hardware), en el cual se encuentra enmarcada esta tesis. La idea principal del concepto consiste en convertir hardware que es adaptable a través de reconfiguración en una entidad evolutiva sujeta a las fuerzas de un proceso evolutivo inspirado en el de las especies biológicas naturales, que guía la dirección del cambio. Es una aplicación más del campo de la Computación Evolutiva (EC, Evolutionary Computation), que comprende una serie de algoritmos de optimización global conocidos como Algoritmos Evolutivos (EA, Evolutionary Algorithms), y que son considerados como algoritmos universales de resolución de problemas. En analogía al proceso biológico de la evolución, en el hardware evolutivo el sujeto de la evolución es una población de circuitos que intenta adaptarse a su entorno mediante una adecuación progresiva generación tras generación. Los individuos pasan a ser configuraciones de circuitos en forma de bitstreams caracterizados por descripciones de circuitos reconfigurables. Seleccionando aquellos que se comportan mejor, es decir, que tienen una mejor adecuación (o fitness) después de ser evaluados, y usándolos como padres de la siguiente generación, el algoritmo evolutivo crea una nueva población hija usando operadores genéticos como la mutación y la recombinación. Según se van sucediendo generaciones, se espera que la población en conjunto se aproxime a la solución óptima al problema de encontrar una configuración del circuito adecuada que satisfaga las especificaciones. El estado de la tecnología de reconfiguración después de que la familia de FPGAs XC6200 de Xilinx fuera retirada y reemplazada por las familias Virtex a finales de los 90, supuso un gran obstáculo para el avance en hardware evolutivo; formatos de bitstream cerrados (no conocidos públicamente); dependencia de herramientas del fabricante con soporte limitado de DPR; una velocidad de reconfiguración lenta; y el hecho de que modificaciones aleatorias del bitstream pudieran resultar peligrosas para la integridad del dispositivo, son algunas de estas razones. Sin embargo, una propuesta a principios de los años 2000 permitió mantener la investigación en el campo mientras la tecnología de DPR continuaba madurando, el Circuito Virtual Reconfigurable (VRC, Virtual Reconfigurable Circuit). En esencia, un VRC en una FPGA es una capa virtual que actúa como un circuito reconfigurable de aplicación específica sobre la estructura nativa de la FPGA que reduce la complejidad del proceso reconfiguración y aumenta su velocidad (comparada con la reconfiguración nativa). Es un array de nodos computacionales especificados usando descripciones HDL estándar que define recursos reconfigurables ad-hoc: multiplexores de rutado y un conjunto de elementos de procesamiento configurables, cada uno de los cuales tiene implementadas todas las funciones requeridas, que pueden seleccionarse a través de multiplexores tal y como ocurre en una ALU de un microprocesador. Un registro grande actúa como memoria de configuración, por lo que la reconfiguración del VRC es muy rápida ya que tan sólo implica la escritura de este registro, el cual controla las señales de selección del conjunto de multiplexores. Sin embargo, esta capa virtual provoca: un incremento de área debido a la implementación simultánea de cada función en cada nodo del array más los multiplexores y un aumento del retardo debido a los multiplexores, reduciendo la frecuencia de funcionamiento máxima. La naturaleza del hardware evolutivo, capaz de optimizar su propio comportamiento computacional, le convierten en un buen candidato para avanzar en la investigación sobre sistemas auto-adaptativos. Combinar un sustrato de cómputo auto-reconfigurable capaz de ser modificado dinámicamente en tiempo de ejecución con un algoritmo empotrado que proporcione una dirección de cambio, puede ayudar a satisfacer los requisitos de adaptación autónoma de sistemas empotrados basados en FPGA. La propuesta principal de esta tesis está por tanto dirigida a contribuir a la auto-adaptación del hardware de procesamiento de sistemas empotrados basados en FPGA mediante hardware evolutivo. Esto se ha abordado considerando que el comportamiento computacional de un sistema puede ser modificado cambiando cualquiera de sus dos partes constitutivas: una estructura hard subyacente y un conjunto de parámetros soft. De esta distinción, se derivan dos lineas de trabajo. Por un lado, auto-adaptación paramétrica, y por otro auto-adaptación estructural. El objetivo perseguido en el caso de la auto-adaptación paramétrica es la implementación de técnicas de optimización evolutiva complejas en sistemas empotrados con recursos limitados para la adaptación paramétrica online de circuitos de procesamiento de señal. La aplicación seleccionada como prueba de concepto es la optimización para tipos muy específicos de imágenes de los coeficientes de los filtros de transformadas wavelet discretas (DWT, DiscreteWavelet Transform), orientada a la compresión de imágenes. Por tanto, el objetivo requerido de la evolución es una compresión adaptativa y más eficiente comparada con los procedimientos estándar. El principal reto radica en reducir la necesidad de recursos de supercomputación para el proceso de optimización propuesto en trabajos previos, de modo que se adecúe para la ejecución en sistemas empotrados. En cuanto a la auto-adaptación estructural, el objetivo de la tesis es la implementación de circuitos auto-adaptativos en sistemas evolutivos basados en FPGA mediante un uso eficiente de sus capacidades de reconfiguración nativas. En este caso, la prueba de concepto es la evolución de tareas de procesamiento de imagen tales como el filtrado de tipos desconocidos y cambiantes de ruido y la detección de bordes en la imagen. En general, el objetivo es la evolución en tiempo de ejecución de tareas de procesamiento de imagen desconocidas en tiempo de diseño (dentro de un cierto grado de complejidad). En este caso, el objetivo de la propuesta es la incorporación de DPR en EHW para evolucionar la arquitectura de un array sistólico adaptable mediante reconfiguración cuya capacidad de evolución no había sido estudiada previamente. Para conseguir los dos objetivos mencionados, esta tesis propone originalmente una plataforma evolutiva que integra un motor de adaptación (AE, Adaptation Engine), un motor de reconfiguración (RE, Reconfiguration Engine) y un motor computacional (CE, Computing Engine) adaptable. El el caso de adaptación paramétrica, la plataforma propuesta está caracterizada por: • un CE caracterizado por un núcleo de procesamiento hardware de DWT adaptable mediante registros reconfigurables que contienen los coeficientes de los filtros wavelet • un algoritmo evolutivo como AE que busca filtros wavelet candidatos a través de un proceso de optimización paramétrica desarrollado específicamente para sistemas caracterizados por recursos de procesamiento limitados • un nuevo operador de mutación simplificado para el algoritmo evolutivo utilizado, que junto con un mecanismo de evaluación rápida de filtros wavelet candidatos derivado de la literatura actual, asegura la viabilidad de la búsqueda evolutiva asociada a la adaptación de wavelets. En el caso de adaptación estructural, la plataforma propuesta toma la forma de: • un CE basado en una plantilla de array sistólico reconfigurable de 2 dimensiones compuesto de nodos de procesamiento reconfigurables • un algoritmo evolutivo como AE que busca configuraciones candidatas del array usando un conjunto de funcionalidades de procesamiento para los nodos disponible en una biblioteca accesible en tiempo de ejecución • un RE hardware que explota la capacidad de reconfiguración nativa de las FPGAs haciendo un uso eficiente de los recursos reconfigurables del dispositivo para cambiar el comportamiento del CE en tiempo de ejecución • una biblioteca de elementos de procesamiento reconfigurables caracterizada por bitstreams parciales independientes de la posición, usados como el conjunto de configuraciones disponibles para los nodos de procesamiento del array Las contribuciones principales de esta tesis se pueden resumir en la siguiente lista: • Una plataforma evolutiva basada en FPGA para la auto-adaptación paramétrica y estructural de sistemas empotrados compuesta por un motor computacional (CE), un motor de adaptación (AE) evolutivo y un motor de reconfiguración (RE). Esta plataforma se ha desarrollado y particularizado para los casos de auto-adaptación paramétrica y estructural. • En cuanto a la auto-adaptación paramétrica, las contribuciones principales son: – Un motor computacional adaptable mediante registros que permite la adaptación paramétrica de los coeficientes de una implementación hardware adaptativa de un núcleo de DWT. – Un motor de adaptación basado en un algoritmo evolutivo desarrollado específicamente para optimización numérica, aplicada a los coeficientes de filtros wavelet en sistemas empotrados con recursos limitados. – Un núcleo IP de DWT auto-adaptativo en tiempo de ejecución para sistemas empotrados que permite la optimización online del rendimiento de la transformada para compresión de imágenes en entornos específicos de despliegue, caracterizados por tipos diferentes de señal de entrada. – Un modelo software y una implementación hardware de una herramienta para la construcción evolutiva automática de transformadas wavelet específicas. • Por último, en cuanto a la auto-adaptación estructural, las contribuciones principales son: – Un motor computacional adaptable mediante reconfiguración nativa de FPGAs caracterizado por una plantilla de array sistólico en dos dimensiones de nodos de procesamiento reconfigurables. Es posible mapear diferentes tareas de cómputo en el array usando una biblioteca de elementos sencillos de procesamiento reconfigurables. – Definición de una biblioteca de elementos de procesamiento apropiada para la síntesis autónoma en tiempo de ejecución de diferentes tareas de procesamiento de imagen. – Incorporación eficiente de la reconfiguración parcial dinámica (DPR) en sistemas de hardware evolutivo, superando los principales inconvenientes de propuestas previas como los circuitos reconfigurables virtuales (VRCs). En este trabajo también se comparan originalmente los detalles de implementación de ambas propuestas. – Una plataforma tolerante a fallos, auto-curativa, que permite la recuperación funcional online en entornos peligrosos. La plataforma ha sido caracterizada desde una perspectiva de tolerancia a fallos: se proponen modelos de fallo a nivel de CLB y de elemento de procesamiento, y usando el motor de reconfiguración, se hace un análisis sistemático de fallos para un fallo en cada elemento de procesamiento y para dos fallos acumulados. – Una plataforma con calidad de filtrado dinámica que permite la adaptación online a tipos de ruido diferentes y diferentes comportamientos computacionales teniendo en cuenta los recursos de procesamiento disponibles. Por un lado, se evolucionan filtros con comportamientos no destructivos, que permiten esquemas de filtrado en cascada escalables; y por otro, también se evolucionan filtros escalables teniendo en cuenta requisitos computacionales de filtrado cambiantes dinámicamente. Este documento está organizado en cuatro partes y nueve capítulos. La primera parte contiene el capítulo 1, una introducción y motivación sobre este trabajo de tesis. A continuación, el marco de referencia en el que se enmarca esta tesis se analiza en la segunda parte: el capítulo 2 contiene una introducción a los conceptos de auto-adaptación y computación autonómica (autonomic computing) como un campo de investigación más general que el muy específico de este trabajo; el capítulo 3 introduce la computación evolutiva como la técnica para dirigir la adaptación; el capítulo 4 analiza las plataformas de computación reconfigurables como la tecnología para albergar hardware auto-adaptativo; y finalmente, el capítulo 5 define, clasifica y hace un sondeo del campo del hardware evolutivo. Seguidamente, la tercera parte de este trabajo contiene la propuesta, desarrollo y resultados obtenidos: mientras que el capítulo 6 contiene una declaración de los objetivos de la tesis y la descripción de la propuesta en su conjunto, los capítulos 7 y 8 abordan la auto-adaptación paramétrica y estructural, respectivamente. Finalmente, el capítulo 9 de la parte 4 concluye el trabajo y describe caminos de investigación futuros. ABSTRACT Embedded systems have traditionally been conceived to be specific-purpose computers with one, fixed computational task for their whole lifetime. Stringent requirements in terms of cost, size and weight forced designers to highly optimise their operation for very specific conditions. However, demands for versatility, more intelligent behaviour and, in summary, an increased computing capability began to clash with these limitations, intensified by the uncertainty associated to the more dynamic operating environments where they were progressively being deployed. This brought as a result an increasing need for systems to respond by themselves to unexpected events at design time, such as: changes in input data characteristics and system environment in general; changes in the computing platform itself, e.g., due to faults and fabrication defects; and changes in functional specifications caused by dynamically changing system objectives. As a consequence, systems complexity is increasing, but in turn, autonomous lifetime adaptation without human intervention is being progressively enabled, allowing them to take their own decisions at run-time. This type of systems is known, in general, as selfadaptive, and are able, among others, of self-configuration, self-optimisation and self-repair. Traditionally, the soft part of a system has mostly been so far the only place to provide systems with some degree of adaptation capabilities. However, the performance to power ratios of software driven devices like microprocessors are not adequate for embedded systems in many situations. In this scenario, the resulting rise in applications complexity is being partly addressed by rising devices complexity in the form of multi and many core devices; but sadly, this keeps on increasing power consumption. Besides, design methodologies have not been improved accordingly to completely leverage the available computational power from all these cores. Altogether, these factors make that the computing demands new applications pose are not being wholly satisfied. The traditional solution to improve performance to power ratios has been the switch to hardware driven specifications, mainly using ASICs. However, their costs are highly prohibitive except for some mass production cases and besidesthe static nature of its structure complicates the solution to the adaptation needs. The advancements in fabrication technologies have made that the once slow, small FPGA used as glue logic in bigger systems, had grown to be a very powerful, reconfigurable computing device with a vast amount of computational logic resources and embedded, hardened signal and general purpose processing cores. Its reconfiguration capabilities have enabled software-like flexibility to be combined with hardware-like computing performance, which has the potential to cause a paradigm shift in computer architecture since hardware cannot be considered as static anymore. This is so, since, as is the case with SRAMbased FPGAs, Dynamic Partial Reconfiguration (DPR) is possible. This means that subsets of the FPGA computational resources can now be changed (reconfigured) at run-time while the rest remains active. Besides, this reconfiguration process can be triggered internally by the device itself. This technological boost in reconfigurable hardware devices is actually covered under the field known as Reconfigurable Computing. One of the most exotic fields of application that Reconfigurable Computing has enabled is the known as Evolvable Hardware (EHW), in which this dissertation is framed. The main idea behind the concept is turning hardware that is adaptable through reconfiguration into an evolvable entity subject to the forces of an evolutionary process, inspired by that of natural, biological species, that guides the direction of change. It is yet another application of the field of Evolutionary Computation (EC), which comprises a set of global optimisation algorithms known as Evolutionary Algorithms (EAs), considered as universal problem solvers. In analogy to the biological process of evolution, in EHW the subject of evolution is a population of circuits that tries to get adapted to its surrounding environment by progressively getting better fitted to it generation after generation. Individuals become circuit configurations representing bitstreams that feature reconfigurable circuit descriptions. By selecting those that behave better, i.e., with a higher fitness value after being evaluated, and using them as parents of the following generation, the EA creates a new offspring population by using so called genetic operators like mutation and recombination. As generations succeed one another, the whole population is expected to approach to the optimum solution to the problem of finding an adequate circuit configuration that fulfils system objectives. The state of reconfiguration technology after Xilinx XC6200 FPGA family was discontinued and replaced by Virtex families in the late 90s, was a major obstacle for advancements in EHW; closed (non publicly known) bitstream formats; dependence on manufacturer tools with highly limiting support of DPR; slow speed of reconfiguration; and random bitstream modifications being potentially hazardous for device integrity, are some of these reasons. However, a proposal in the first 2000s allowed to keep investigating in this field while DPR technology kept maturing, the Virtual Reconfigurable Circuit (VRC). In essence, a VRC in an FPGA is a virtual layer acting as an application specific reconfigurable circuit on top of an FPGA fabric that reduces the complexity of the reconfiguration process and increases its speed (compared to native reconfiguration). It is an array of computational nodes specified using standard HDL descriptions that define ad-hoc reconfigurable resources; routing multiplexers and a set of configurable processing elements, each one containing all the required functions, which are selectable through functionality multiplexers as in microprocessor ALUs. A large register acts as configuration memory, so VRC reconfiguration is very fast given it only involves writing this register, which drives the selection signals of the set of multiplexers. However, large overheads are introduced by this virtual layer; an area overhead due to the simultaneous implementation of every function in every node of the array plus the multiplexers, and a delay overhead due to the multiplexers, which also reduces maximum frequency of operation. The very nature of Evolvable Hardware, able to optimise its own computational behaviour, makes it a good candidate to advance research in self-adaptive systems. Combining a selfreconfigurable computing substrate able to be dynamically changed at run-time with an embedded algorithm that provides a direction for change, can help fulfilling requirements for autonomous lifetime adaptation of FPGA-based embedded systems. The main proposal of this thesis is hence directed to contribute to autonomous self-adaptation of the underlying computational hardware of FPGA-based embedded systems by means of Evolvable Hardware. This is tackled by considering that the computational behaviour of a system can be modified by changing any of its two constituent parts: an underlying hard structure and a set of soft parameters. Two main lines of work derive from this distinction. On one side, parametric self-adaptation and, on the other side, structural self-adaptation. The goal pursued in the case of parametric self-adaptation is the implementation of complex evolutionary optimisation techniques in resource constrained embedded systems for online parameter adaptation of signal processing circuits. The application selected as proof of concept is the optimisation of Discrete Wavelet Transforms (DWT) filters coefficients for very specific types of images, oriented to image compression. Hence, adaptive and improved compression efficiency, as compared to standard techniques, is the required goal of evolution. The main quest lies in reducing the supercomputing resources reported in previous works for the optimisation process in order to make it suitable for embedded systems. Regarding structural self-adaptation, the thesis goal is the implementation of self-adaptive circuits in FPGA-based evolvable systems through an efficient use of native reconfiguration capabilities. In this case, evolution of image processing tasks such as filtering of unknown and changing types of noise and edge detection are the selected proofs of concept. In general, evolving unknown image processing behaviours (within a certain complexity range) at design time is the required goal. In this case, the mission of the proposal is the incorporation of DPR in EHW to evolve a systolic array architecture adaptable through reconfiguration whose evolvability had not been previously checked. In order to achieve the two stated goals, this thesis originally proposes an evolvable platform that integrates an Adaptation Engine (AE), a Reconfiguration Engine (RE) and an adaptable Computing Engine (CE). In the case of parametric adaptation, the proposed platform is characterised by: • a CE featuring a DWT hardware processing core adaptable through reconfigurable registers that holds wavelet filters coefficients • an evolutionary algorithm as AE that searches for candidate wavelet filters through a parametric optimisation process specifically developed for systems featured by scarce computing resources • a new, simplified mutation operator for the selected EA, that together with a fast evaluation mechanism of candidate wavelet filters derived from existing literature, assures the feasibility of the evolutionary search involved in wavelets adaptation In the case of structural adaptation, the platform proposal takes the form of: • a CE based on a reconfigurable 2D systolic array template composed of reconfigurable processing nodes • an evolutionary algorithm as AE that searches for candidate configurations of the array using a set of computational functionalities for the nodes available in a run time accessible library • a hardware RE that exploits native DPR capabilities of FPGAs and makes an efficient use of the available reconfigurable resources of the device to change the behaviour of the CE at run time • a library of reconfigurable processing elements featured by position-independent partial bitstreams used as the set of available configurations for the processing nodes of the array Main contributions of this thesis can be summarised in the following list. • An FPGA-based evolvable platform for parametric and structural self-adaptation of embedded systems composed of a Computing Engine, an evolutionary Adaptation Engine and a Reconfiguration Engine. This platform is further developed and tailored for both parametric and structural self-adaptation. • Regarding parametric self-adaptation, main contributions are: – A CE adaptable through reconfigurable registers that enables parametric adaptation of the coefficients of an adaptive hardware implementation of a DWT core. – An AE based on an Evolutionary Algorithm specifically developed for numerical optimisation applied to wavelet filter coefficients in resource constrained embedded systems. – A run-time self-adaptive DWT IP core for embedded systems that allows for online optimisation of transform performance for image compression for specific deployment environments characterised by different types of input signals. – A software model and hardware implementation of a tool for the automatic, evolutionary construction of custom wavelet transforms. • Lastly, regarding structural self-adaptation, main contributions are: – A CE adaptable through native FPGA fabric reconfiguration featured by a two dimensional systolic array template of reconfigurable processing nodes. Different processing behaviours can be automatically mapped in the array by using a library of simple reconfigurable processing elements. – Definition of a library of such processing elements suited for autonomous runtime synthesis of different image processing tasks. – Efficient incorporation of DPR in EHW systems, overcoming main drawbacks from the previous approach of virtual reconfigurable circuits. Implementation details for both approaches are also originally compared in this work. – A fault tolerant, self-healing platform that enables online functional recovery in hazardous environments. The platform has been characterised from a fault tolerance perspective: fault models at FPGA CLB level and processing elements level are proposed, and using the RE, a systematic fault analysis for one fault in every processing element and for two accumulated faults is done. – A dynamic filtering quality platform that permits on-line adaptation to different types of noise and different computing behaviours considering the available computing resources. On one side, non-destructive filters are evolved, enabling scalable cascaded filtering schemes; and on the other, size-scalable filters are also evolved considering dynamically changing computational filtering requirements. This dissertation is organized in four parts and nine chapters. First part contains chapter 1, the introduction to and motivation of this PhD work. Following, the reference framework in which this dissertation is framed is analysed in the second part: chapter 2 features an introduction to the notions of self-adaptation and autonomic computing as a more general research field to the very specific one of this work; chapter 3 introduces evolutionary computation as the technique to drive adaptation; chapter 4 analyses platforms for reconfigurable computing as the technology to hold self-adaptive hardware; and finally chapter 5 defines, classifies and surveys the field of Evolvable Hardware. Third part of the work follows, which contains the proposal, development and results obtained: while chapter 6 contains an statement of the thesis goals and the description of the proposal as a whole, chapters 7 and 8 address parametric and structural self-adaptation, respectively. Finally, chapter 9 in part 4 concludes the work and describes future research paths.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present Tethered Monte Carlo, a simple, general purpose method of computing the effective potential of the order parameter (Helmholtz free energy). This formalism is based on a new statistical ensemble, closely related to the micromagnetic one, but with an extended configuration space (through Creutz-like demons). Canonical averages for arbitrary values of the external magnetic field are computed without additional simulations. The method is put to work in the two-dimensional Ising model, where the existence of exact results enables us to perform high precision checks. A rather peculiar feature of our implementation, which employs a local Metropolis algorithm, is the total absence, within errors, of critical slowing down for magnetic observables. Indeed, high accuracy results are presented for lattices as large as L = 1024.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present Tethered Monte Carlo, a simple, general purpose method of computing the effective potential of the order parameter (Helmholtz free energy). This formalism is based on a new statistical ensemble, closely related to the micromagnetic one, but with an extended configuration space (through Creutz-like demons). Canonical averages for arbitrary values of the external magnetic field are computed without additional simulations. The method is put to work in the two-dimensional Ising model, where the existence of exact results enables us to perform high precision checks. A rather peculiar feature of our implementation, which employs a local Metropolis algorithm, is the total absence, within errors, of critical slowing down for magnetic observables. Indeed, high accuracy results are presented for lattices as large as L = 1024.