611 resultados para SCIL processor


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Performance studies of actual parallel systems usually tend to concéntrate on the effectiveness of a given implementation. This is often done in the absolute, without quantitave reference to the potential parallelism contained in the programs from the point of view of the execution paradigm. We feel that studying the parallelism inherent to the programs is interesting, as it gives information about the best possible behavior of any implementation and thus allows contrasting the results obtained. We propose a method for obtaining ideal speedups for programs through a combination of sequential or parallel execution and simulation, and the algorithms that allow implementing the method. Our approach is novel and, we argüe, more accurate than previously proposed methods, in that a crucial part of the data - the execution times of tasks - is obtained from actual executions, while speedup is computed by simulation. This allows obtaining speedup (and other) data under controlled and ideal assumptions regarding issues such as number of processor, scheduling algorithm and overheads, etc. The results obtained can be used for example to evalúate the ideal parallelism that a program contains for a given model of execution and to compare such "perfect" parallelism to that obtained by a given implementation of that model. We also present a tool, IDRA, which implements the proposed method, and results obtained with IDRA for benchmark programs, which are then compared with those obtained in actual executions on real parallel systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La temperatura es una preocupación que juega un papel protagonista en el diseño de circuitos integrados modernos. El importante aumento de las densidades de potencia que conllevan las últimas generaciones tecnológicas ha producido la aparición de gradientes térmicos y puntos calientes durante el funcionamiento normal de los chips. La temperatura tiene un impacto negativo en varios parámetros del circuito integrado como el retardo de las puertas, los gastos de disipación de calor, la fiabilidad, el consumo de energía, etc. Con el fin de luchar contra estos efectos nocivos, la técnicas de gestión dinámica de la temperatura (DTM) adaptan el comportamiento del chip en función en la información que proporciona un sistema de monitorización que mide en tiempo de ejecución la información térmica de la superficie del dado. El campo de la monitorización de la temperatura en el chip ha llamado la atención de la comunidad científica en los últimos años y es el objeto de estudio de esta tesis. Esta tesis aborda la temática de control de la temperatura en el chip desde diferentes perspectivas y niveles, ofreciendo soluciones a algunos de los temas más importantes. Los niveles físico y circuital se cubren con el diseño y la caracterización de dos nuevos sensores de temperatura especialmente diseñados para los propósitos de las técnicas DTM. El primer sensor está basado en un mecanismo que obtiene un pulso de anchura variable dependiente de la relación de las corrientes de fuga con la temperatura. De manera resumida, se carga un nodo del circuito y posteriormente se deja flotando de tal manera que se descarga a través de las corrientes de fugas de un transistor; el tiempo de descarga del nodo es la anchura del pulso. Dado que la anchura del pulso muestra una dependencia exponencial con la temperatura, la conversión a una palabra digital se realiza por medio de un contador logarítmico que realiza tanto la conversión tiempo a digital como la linealización de la salida. La estructura resultante de esta combinación de elementos se implementa en una tecnología de 0,35 _m. El sensor ocupa un área muy reducida, 10.250 nm2, y consume muy poca energía, 1.05-65.5nW a 5 muestras/s, estas cifras superaron todos los trabajos previos en el momento en que se publicó por primera vez y en el momento de la publicación de esta tesis, superan a todas las implementaciones anteriores fabricadas en el mismo nodo tecnológico. En cuanto a la precisión, el sensor ofrece una buena linealidad, incluso sin calibrar; se obtiene un error 3_ de 1,97oC, adecuado para tratar con las aplicaciones de DTM. Como se ha explicado, el sensor es completamente compatible con los procesos de fabricación CMOS, este hecho, junto con sus valores reducidos de área y consumo, lo hacen especialmente adecuado para la integración en un sistema de monitorización de DTM con un conjunto de monitores empotrados distribuidos a través del chip. Las crecientes incertidumbres de proceso asociadas a los últimos nodos tecnológicos comprometen las características de linealidad de nuestra primera propuesta de sensor. Con el objetivo de superar estos problemas, proponemos una nueva técnica para obtener la temperatura. La nueva técnica también está basada en las dependencias térmicas de las corrientes de fuga que se utilizan para descargar un nodo flotante. La novedad es que ahora la medida viene dada por el cociente de dos medidas diferentes, en una de las cuales se altera una característica del transistor de descarga |la tensión de puerta. Este cociente resulta ser muy robusto frente a variaciones de proceso y, además, la linealidad obtenida cumple ampliamente los requisitos impuestos por las políticas DTM |error 3_ de 1,17oC considerando variaciones del proceso y calibrando en dos puntos. La implementación de la parte sensora de esta nueva técnica implica varias consideraciones de diseño, tales como la generación de una referencia de tensión independiente de variaciones de proceso, que se analizan en profundidad en la tesis. Para la conversión tiempo-a-digital, se emplea la misma estructura de digitalización que en el primer sensor. Para la implementación física de la parte de digitalización, se ha construido una biblioteca de células estándar completamente nueva orientada a la reducción de área y consumo. El sensor resultante de la unión de todos los bloques se caracteriza por una energía por muestra ultra baja (48-640 pJ) y un área diminuta de 0,0016 mm2, esta cifra mejora todos los trabajos previos. Para probar esta afirmación, se realiza una comparación exhaustiva con más de 40 propuestas de sensores en la literatura científica. Subiendo el nivel de abstracción al sistema, la tercera contribución se centra en el modelado de un sistema de monitorización que consiste de un conjunto de sensores distribuidos por la superficie del chip. Todos los trabajos anteriores de la literatura tienen como objetivo maximizar la precisión del sistema con el mínimo número de monitores. Como novedad, en nuestra propuesta se introducen nuevos parámetros de calidad aparte del número de sensores, también se considera el consumo de energía, la frecuencia de muestreo, los costes de interconexión y la posibilidad de elegir diferentes tipos de monitores. El modelo se introduce en un algoritmo de recocido simulado que recibe la información térmica de un sistema, sus propiedades físicas, limitaciones de área, potencia e interconexión y una colección de tipos de monitor; el algoritmo proporciona el tipo seleccionado de monitor, el número de monitores, su posición y la velocidad de muestreo _optima. Para probar la validez del algoritmo, se presentan varios casos de estudio para el procesador Alpha 21364 considerando distintas restricciones. En comparación con otros trabajos previos en la literatura, el modelo que aquí se presenta es el más completo. Finalmente, la última contribución se dirige al nivel de red, partiendo de un conjunto de monitores de temperatura de posiciones conocidas, nos concentramos en resolver el problema de la conexión de los sensores de una forma eficiente en área y consumo. Nuestra primera propuesta en este campo es la introducción de un nuevo nivel en la jerarquía de interconexión, el nivel de trillado (o threshing en inglés), entre los monitores y los buses tradicionales de periféricos. En este nuevo nivel se aplica selectividad de datos para reducir la cantidad de información que se envía al controlador central. La idea detrás de este nuevo nivel es que en este tipo de redes la mayoría de los datos es inútil, porque desde el punto de vista del controlador sólo una pequeña cantidad de datos |normalmente sólo los valores extremos| es de interés. Para cubrir el nuevo nivel, proponemos una red de monitorización mono-conexión que se basa en un esquema de señalización en el dominio de tiempo. Este esquema reduce significativamente tanto la actividad de conmutación sobre la conexión como el consumo de energía de la red. Otra ventaja de este esquema es que los datos de los monitores llegan directamente ordenados al controlador. Si este tipo de señalización se aplica a sensores que realizan conversión tiempo-a-digital, se puede obtener compartición de recursos de digitalización tanto en tiempo como en espacio, lo que supone un importante ahorro de área y consumo. Finalmente, se presentan dos prototipos de sistemas de monitorización completos que de manera significativa superan la características de trabajos anteriores en términos de área y, especialmente, consumo de energía. Abstract Temperature is a first class design concern in modern integrated circuits. The important increase in power densities associated to recent technology evolutions has lead to the apparition of thermal gradients and hot spots during run time operation. Temperature impacts several circuit parameters such as speed, cooling budgets, reliability, power consumption, etc. In order to fight against these negative effects, dynamic thermal management (DTM) techniques adapt the behavior of the chip relying on the information of a monitoring system that provides run-time thermal information of the die surface. The field of on-chip temperature monitoring has drawn the attention of the scientific community in the recent years and is the object of study of this thesis. This thesis approaches the matter of on-chip temperature monitoring from different perspectives and levels, providing solutions to some of the most important issues. The physical and circuital levels are covered with the design and characterization of two novel temperature sensors specially tailored for DTM purposes. The first sensor is based upon a mechanism that obtains a pulse with a varying width based on the variations of the leakage currents on the temperature. In a nutshell, a circuit node is charged and subsequently left floating so that it discharges away through the subthreshold currents of a transistor; the time the node takes to discharge is the width of the pulse. Since the width of the pulse displays an exponential dependence on the temperature, the conversion into a digital word is realized by means of a logarithmic counter that performs both the timeto- digital conversion and the linearization of the output. The structure resulting from this combination of elements is implemented in a 0.35_m technology and is characterized by very reduced area, 10250 nm2, and power consumption, 1.05-65.5 nW at 5 samples/s, these figures outperformed all previous works by the time it was first published and still, by the time of the publication of this thesis, they outnumber all previous implementations in the same technology node. Concerning the accuracy, the sensor exhibits good linearity, even without calibration it displays a 3_ error of 1.97oC, appropriate to deal with DTM applications. As explained, the sensor is completely compatible with standard CMOS processes, this fact, along with its tiny area and power overhead, makes it specially suitable for the integration in a DTM monitoring system with a collection of on-chip monitors distributed across the chip. The exacerbated process fluctuations carried along with recent technology nodes jeop-ardize the linearity characteristics of the first sensor. In order to overcome these problems, a new temperature inferring technique is proposed. In this case, we also rely on the thermal dependencies of leakage currents that are used to discharge a floating node, but now, the result comes from the ratio of two different measures, in one of which we alter a characteristic of the discharging transistor |the gate voltage. This ratio proves to be very robust against process variations and displays a more than suficient linearity on the temperature |1.17oC 3_ error considering process variations and performing two-point calibration. The implementation of the sensing part based on this new technique implies several issues, such as the generation of process variations independent voltage reference, that are analyzed in depth in the thesis. In order to perform the time-to-digital conversion, we employ the same digitization structure the former sensor used. A completely new standard cell library targeting low area and power overhead is built from scratch to implement the digitization part. Putting all the pieces together, we achieve a complete sensor system that is characterized by ultra low energy per conversion of 48-640pJ and area of 0.0016mm2, this figure outperforms all previous works. To prove this statement, we perform a thorough comparison with over 40 works from the scientific literature. Moving up to the system level, the third contribution is centered on the modeling of a monitoring system consisting of set of thermal sensors distributed across the chip. All previous works from the literature target maximizing the accuracy of the system with the minimum number of monitors. In contrast, we introduce new metrics of quality apart form just the number of sensors; we consider the power consumption, the sampling frequency, the possibility to consider different types of monitors and the interconnection costs. The model is introduced in a simulated annealing algorithm that receives the thermal information of a system, its physical properties, area, power and interconnection constraints and a collection of monitor types; the algorithm yields the selected type of monitor, the number of monitors, their position and the optimum sampling rate. We test the algorithm with the Alpha 21364 processor under several constraint configurations to prove its validity. When compared to other previous works in the literature, the modeling presented here is the most complete. Finally, the last contribution targets the networking level, given an allocated set of temperature monitors, we focused on solving the problem of connecting them in an efficient way from the area and power perspectives. Our first proposal in this area is the introduction of a new interconnection hierarchy level, the threshing level, in between the monitors and the traditional peripheral buses that applies data selectivity to reduce the amount of information that is sent to the central controller. The idea behind this new level is that in this kind of networks most data are useless because from the controller viewpoint just a small amount of data |normally extreme values| is of interest. To cover the new interconnection level, we propose a single-wire monitoring network based on a time-domain signaling scheme that significantly reduces both the switching activity over the wire and the power consumption of the network. This scheme codes the information in the time domain and allows a straightforward obtention of an ordered list of values from the maximum to the minimum. If the scheme is applied to monitors that employ TDC, digitization resource sharing is achieved, producing an important saving in area and power consumption. Two prototypes of complete monitoring systems are presented, they significantly overcome previous works in terms of area and, specially, power consumption.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This document describes the basic steps to developed and embedded Linux-based system using the BeagleBoard. The document has been specifically written to use a BeagleBoard development system based on the OMAP `processor.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Networks of Evolutionary Processors (NEPs) are computing mechanisms directly inspired from the behavior of cell populations more specifically the point mutations in DNA strands. These mechanisms are been used for solving NP-complete problems by means of a parallel computation postulation. This paper describes an implementation of the basic model of NEP using Web technologies and includes the possibility of designing some of the most common variants of it by means the use of the web page design which eases the configuration of a given problem. It is a system intended to be used in a multicore processor in order to benefit from the multi thread use.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Future high-quality consumer electronics will contain a number of applications running in a highly dynamic environment, and their execution will need to be efficiently arbitrated by the underlying platform software. The multimedia applications that currently execute in such similar contexts face frequent run-time variations in their resource demands, originated by the greedy nature of the multimedia processing itself. Changes in resource demands are triggered by numerous reasons (e.g. a switch in the input media compression format). Such situations require real-time adaptation mechanisms to adjust the system operation to the new requirements, and this must be done seamlessly to satisfy the user experience. One solution for efficiently managing application execution is to apply quality of service resource management techniques, based on assigning and enforcing resource contracts to applications. Most resource management solutions provide temporal isolation by enforcing resource assignments and avoiding any resource overruns. However, this has a clear limitation over the cost-effective resource usage. This paper presents a simple priority assignment scheme based on uniform priority bands to allow that greedy multimedia tasks incur in safe overruns that increase resource usage and do not threaten the timely execution of non-overrunning tasks. Experimental results show that the proposed priority assignment scheme in combination with a resource accounting mechanism preserves timely multimedia execution and delivery, achieves a higher cost-effective processor usage, and guarantees the execution isolation of non-overrunning tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Developing a herd localization system capable to operate unattended in communication-challenged areas arises from the necessity of improving current systems in terms of cost, autonomy or any other facilities that a certain target group (or overall users) may demand. A network architecture of herd localization is proposed with its corresponding hardware and a methodology to assess performance in different operating conditions. The system is designed taking into account an eventual environmental impact hence most nodes are simple, cheap and kinetically powered from animal movements-neither batteries nor sophisticated processor chips are needed. Other network elements integrating GPS and batteries operate with selectable duty cycles, thus reducing maintenance duties. Equipment has been tested on Scandinavian reindeer in Lapland and its element modeling is integrated into a simulator to analyze such localization network applicability for different use cases. Performance indicators (detection frequency, localization accuracy and delay) are fitted to assess the overall performance; system relative costs are enclosed also for a range of deployments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Early propagation effect (EPE) is a critical problem in conventional dual-rail logic implementations against Side Channel Attacks (SCAs). Among previous EPE-resistant architectures, PA-DPL logic offers EPE-free capability at relatively low cost. However, its separate dual core structure is a weakness when facing concentrated EM attacks where a tiny EM probe can be precisely positioned closer to one of the two cores. In this paper, we present an PA-DPL dual-core interleaved structure to strengthen resistance against sophisticated EM attacks on Xilinx FPGA implementations. The main merit of the proposed structure is that every two routing in each signal pair are kept identical even the dual cores are interleaved together. By minimizing the distance between the complementary routings and instances of both cores, even the concentrated EM measurement cannot easily distinguish the minor EM field unbalance. In PA- DPL, EPE is avoided by compressing the evaluation phase to a small portion of the clock period, therefore, the speed is inevitably limited. Regarding this, we made an improvement to extend the duty cycle of evaluation phase to more than 40 percent, yielding a larger maximum working frequency. The detailed design flow is also presented. We validate the security improvement against EM attack by implementing a simplified AES co-processor in Virtex-5 FPGA.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces a semantic language developed with the objective to be used in a semantic analyzer based on linguistic and world knowledge. Linguistic knowledge is provided by a Combinatorial Dictionary and several sets of rules. Extra-linguistic information is stored in an Ontology. The meaning of the text is represented by means of a series of RDF-type triples of the form predicate (subject, object). Semantic analyzer is one of the options of the multifunctional ETAP-3 linguistic processor. The analyzer can be used for Information Extraction and Question Answering. We describe semantic representation of expressions that provide an assessment of the number of objects involved and/or give a quantitative evaluation of different types of attributes. We focus on the following aspects: 1) parametric and non-parametric attributes; 2) gradable and non-gradable attributes; 3) ontological representation of different classes of attributes; 4) absolute and relative quantitative assessment; 5) punctual and interval quantitative assessment; 6) intervals with precise and fuzzy boundaries

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dynamic thermal management techniques require a collection of on-chip thermal sensors that imply a significant area and power overhead. Finding the optimum number of temperature monitors and their location on the chip surface to optimize accuracy is an NP-hard problem. In this work we improve the modeling of the problem by including area, power and networking constraints along with the consideration of three inaccuracy terms: spatial errors, sampling rate errors and monitor-inherent errors. The problem is solved by the simulated annealing algorithm. We apply the algorithm to a test case employing three different types of monitors to highlight the importance of the different metrics. Finally we present a case study of the Alpha 21364 processor under two different constraint scenarios.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, a unified algorithm-architecture-circuit co-design environment for complex FPGA system development is presented. The main objective is to find an efficient methodology for designing a configurable optimized FPGA system by using as few efforts as possible in verification stage, so as to speed up the development period. A proposed high performance FFT/iFFT processor for Multiband Orthogonal Frequency Division Multiplexing Ultra Wideband (MB-OFDM UWB) system design process is given as an example to demonstrate the proposed methodology. This efficient design methodology is tested and considered to be suitable for almost all types of complex FPGA system designs and verifications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La difusión de TV3D actual utiliza formatos como el Side-by-Side o Top-and-Bottom, en los que cada par de imágenes, correspondiente a las vistas de los ojos derecho e izquierdo, se encapsula con la mitad de la resolución espacial en una sola imagen. Estas imágenes se muestran de manera casi simultánea de forma que el ojo humano compone una imagen con profundidad que se asemeja a la visión binocular natural. Desde hace un par de años las principales plataformas de televisión han empezado a crear canales con contenido 3D. La televisión 3D (TV3D) se ha introducido en los hogares gracias a los televisores estereoscópicos. Estos televisores, que son compatibles con los formatos antes mencionados, extraen de cada imagen sus dos vistas, recuperan la resolución original y presentan cada vista alternativamente en la pantalla, generando al mismo tiempo una señal de sincronismo para las gafas activas, creando de esta forma la sensación tridimensional de las imágenes. En este PFC se pretende realizar el diseño VHDL de un cambiador de formato que genere en tiempo real la secuencia de imágenes correspondiente a los ojos derecho e izquierdo, con resolución completa, a partir de una secuencia codificada en formato tipo Top-and-Bottom y el banco de test para su prueba. Este circuito se implementará como un periférico del procesador NIOS II de Altera. El diseño podría utilizarse como parte de un sistema que permita la visualización de las actuales emisiones de televisión 3D en un televisor convencional. La tecnología de referencia que se utilizará serán las FPGAs, más concretamente la tarjeta Cyclone III FPGA Starter Kit (EP3C25 FPGA) de Altera, junto a una tarjeta de ampliación de Microtronix con entrada y salida HDMI para video y audio. Además se pretende crear la documentación necesaria para el desarrollo de futuros trabajos relacionados con la televisión 3D. ABSTRACT Current TV3D broadcasting uses formats as Side-by-Side or Top-and-Bottom, where every single pair of images, corresponding to left and right eyes views, are encapsulated with half spatial resolution in one single image. These images are almost simultaneously displayed so that the human eye forms an image with depth resembling naturally binocular vision. From a couple of years the major TV platforms have begun to create 3D content channels. 3D Television (3DTV) has been introduced in homes through stereoscopic televisions. These televisions, which are compatible with the above formats, each image is extracted from the two views, and recover the original resolution and displays alternately each view in screen, while generating a synchronization signal for active glasses, thereby creating the three-dimensional sensation of the images. The main objective in this PFC is to make the design of an exchanger VHDL format in real time to generate the image sequence corresponding to the right and left eyes, with full resolution from an encoded sequence type format Top-and-Bottom and test bench for testing. This circuit is implemented as a Altera NIOS II processor peripheral.The design could be used as part of a system enabling the display of current television broadcasts 3D on a conventional television. The reference technology that will be use are FPGAs, more specifically Cyclone III FPGA Starter Card Kit (EP3C25 FPGA) Altera, along with an expansion card Microtronix with HDMI input and output video and audio. It also aims to create documentation for the development of future works related to 3D TV.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cada vez es más frecuente que los sistemas de comunicaciones realicen buena parte de sus funciones (modulación y demodulación, codificación y decodificación...) mediante software en lugar de utilizar hardware dedicado. Esta técnica se denomina “Radio software”. El objetivo de este PFC es estudiar un algoritmo implementado en C empleado en sistemas de comunicaciones modernos, en concreto la decodificación de Viterbi, el cual se encarga de corregir los posibles errores producidos a lo largo de la comunicación, para poder trasladarlo a sistemas empotrados multiprocesador. Partiendo de un código en C para el decodificador que realiza todas sus operaciones en serie, en este Proyecto fin de carrera se ha paralelizado dicho código, es decir, que el trabajo que realizaba un solo hilo para el caso del código serie, es procesado por un número de hilos configurables por el usuario, persiguiendo que el tiempo de ejecución se reduzca, es decir, que el programa paralelizado se ejecute de una manera más rápida. El trabajo se ha realizado en un PC con sistema operativo Linux, pero la versión paralelizada del código puede ser empleada en un sistema empotrado multiprocesador en el cual cada procesador ejecuta el código correspondiente a uno de los hilos de la versión de PC. ABSTRACT It is increasingly common for communications systems to perform most of its functions (modulation and demodulation, coding and decoding) by software instead of than using dedicated hardware. This technique is called: “Software Radio”. The aim of the PFC is to study an implemented algorithm in C language used in modern communications systems, particularly Viterbi decoding, which amends any possible error produced during the communication, in order to be able to move multiprocessor embedded systems. Starting from a C code of the decoder that performs every single operation in serial, in this final project, this code has been parallelized, which means that the work used to be done by just a single thread in the case of serial code, is processed by a number of threads configured by the user, in order to decrease the execution time, meaning that the parallelized program is executed faster. The work has been carried out on a PC using Linux operating system, but the parallelized version of the code could also be used in an embedded multiprocessor system in which each processor executes the corresponding code to every single one of the threads of the PC version.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The latest video coding standards developed, like HEVC (High Efficiency Video Coding, approved in January 2013), require for their implementation the use of devices able to support a high computational load. Considering that currently it is not enough the usage of one unique Digital Signal Processor (DSP), multicore devices have appeared recently in the market. However, due to its novelty, the working methodology that allows produce solutions for these configurations is in a very initial state, since currently the most part of the work needs to be performed manually. In consequence, the objective set consists on finding methodologies that ease this process. The study has been focused on extend a methodology, under development, for the generation of solutions for PCs and embedded systems. During this study, the standards RVC (Reconfigurable Video Coding) and HEVC have been employed, as well as DSPs of the Texas Instruments company. In its development, it has been tried to address all the factors that influence both the development and deployment of these new implementations of video decoders, ranging from tools up to aspects of the partitioning of algorithms, without this can cause a drop in application performance. The results of this study are the description of the employed methodology, the characterization of the software migration process and performance measurements for the HEVC standard in an RVC-based implementation. RESUMEN Los estándares de codificación de vídeo desarrollados más recientemente, como HEVC (High Efficiency Video Coding, aprobado en enero de 2013), requieren para su implementación el uso de dispositivos capaces de soportar una elevada carga computacional. Teniendo en cuenta que actualmente no es suficiente con utilizar un único Procesador Digital de Señal (DSP), han aparecido recientemente dispositivos multinúcleo en el mercado. Sin embargo, debido a su novedad, la metodología de trabajo que permite elaborar soluciones para tales configuraciones se encuentra en un estado muy inicial, ya que actualmente la mayor parte del trabajo debe realizarse manualmente. En consecuencia, el objetivo marcado consiste en encontrar metodologías que faciliten este proceso. El estudio se ha centrado en extender una metodología, en desarrollo, para la generación de soluciones para PC y sistemas empotrados. Durante dicho estudio se han empleado los estándares RVC (Reconfigurable Video Coding) y HEVC, así como DSPs de la compañía Texas Instruments. En su desarrollo se ha tratado de atender a todos los factores que influyen tanto en el desarrollo como en la puesta en marcha de estas nuevas implementaciones de descodificadores de vídeo; abarcando desde las herramientas a utilizar hasta aspectos del particionado de los algoritmos, sin que por ello se produzca una reducción en el rendimiento de las aplicaciones. Los resultados de este estudio son una descripción de la metodología empleada, la caracterización del proceso de migración de software, y medidas de rendimiento para el estándar HEVC en una implementación basada en RVC.