979 resultados para minimalist hardware architecture


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The mechanical action of the heart is made possible in response to electrical events that involve the cardiac cells, a property that classifies the heart tissue between the excitable tissues. At the cellular level, the electrical event is the signal that triggers the mechanical contraction, inducing a transient increase in intracellular calcium which, in turn, carries the message of contraction to the contractile proteins of the cell. The primary goal of my project was to implement in CUDA (Compute Unified Device Architecture, an hardware architecture for parallel processing created by NVIDIA) a tissue model of the rabbit sinoatrial node to evaluate the heterogeneity of its structure and how that variability influences the behavior of the cells. In particular, each cell has an intrinsic discharge frequency, thus different from that of every other cell of the tissue and it is interesting to study the process of synchronization of the cells and look at the value of the last discharge frequency if they synchronized.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper addresses the modelling and validation of an evolvable hardware architecture which can be mapped on a 2D systolic structure implemented on commercial reconfigurable FPGAs. The adaptation capabilities of the architecture are exercised to validate its evolvability. The underlying proposal is the use of a library of reconfigurable components characterised by their partial bitstreams, which are used by the Evolutionary Algorithm to find a solution to a given task. Evolution of image noise filters is selected as the proof of concept application. Results show that computation speed of the resulting evolved circuit is higher than with the Virtual Reconfigurable Circuits approach, and this can be exploited on the evolution process by using dynamic reconfiguration

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta tesis presenta un novedoso marco de referencia para el análisis y optimización del retardo de codificación y descodificación para vídeo multivista. El objetivo de este marco de referencia es proporcionar una metodología sistemática para el análisis del retardo en codificadores y descodificadores multivista y herramientas útiles en el diseño de codificadores/descodificadores para aplicaciones con requisitos de bajo retardo. El marco de referencia propuesto caracteriza primero los elementos que tienen influencia en el comportamiento del retardo: i) la estructura de predicción multivista, ii) el modelo hardware del codificador/descodificador y iii) los tiempos de proceso de cuadro. En segundo lugar, proporciona algoritmos para el cálculo del retardo de codificación/ descodificación de cualquier estructura arbitraria de predicción multivista. El núcleo de este marco de referencia consiste en una metodología para el análisis del retardo de codificación/descodificación multivista que es independiente de la arquitectura hardware del codificador/descodificador, completada con un conjunto de modelos que particularizan este análisis del retardo con las características de la arquitectura hardware del codificador/descodificador. Entre estos modelos, aquellos basados en teoría de grafos adquieren especial relevancia debido a su capacidad de desacoplar la influencia de los diferentes elementos en el comportamiento del retardo en el codificador/ descodificador, mediante una abstracción de su capacidad de proceso. Para revelar las posibles aplicaciones de este marco de referencia, esta tesis presenta algunos ejemplos de su utilización en problemas de diseño que afectan a codificadores y descodificadores multivista. Este escenario de aplicación cubre los siguientes casos: estrategias para el diseño de estructuras de predicción que tengan en consideración requisitos de retardo además del comportamiento tasa-distorsión; diseño del número de procesadores y análisis de los requisitos de velocidad de proceso en codificadores/ descodificadores multivista dado un retardo objetivo; y el análisis comparativo del comportamiento del retardo en codificadores multivista con diferentes capacidades de proceso e implementaciones hardware. ABSTRACT This thesis presents a novel framework for the analysis and optimization of the encoding and decoding delay for multiview video. The objective of this framework is to provide a systematic methodology for the analysis of the delay in multiview encoders and decoders and useful tools in the design of multiview encoders/decoders for applications with low delay requirements. The proposed framework characterizes firstly the elements that have an influence in the delay performance: i) the multiview prediction structure ii) the hardware model of the encoder/decoder and iii) frame processing times. Secondly, it provides algorithms for the computation of the encoding/decoding delay of any arbitrary multiview prediction structure. The core of this framework consists in a methodology for the analysis of the multiview encoding/decoding delay that is independent of the hardware architecture of the encoder/decoder, which is completed with a set of models that particularize this delay analysis with the characteristics of the hardware architecture of the encoder/decoder. Among these models, the ones based in graph theory acquire special relevance due to their capacity to detach the influence of the different elements in the delay performance of the encoder/decoder, by means of an abstraction of its processing capacity. To reveal possible applications of this framework, this thesis presents some examples of its utilization in design problems that affect multiview encoders and decoders. This application scenario covers the following cases: strategies for the design of prediction structures that take into consideration delay requirements in addition to the rate-distortion performance; design of number of processors and analysis of processor speed requirements in multiview encoders/decoders given a target delay; and comparative analysis of the encoding delay performance of multiview encoders with different processing capabilities and hardware implementations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a framework for the analysis of the decoding delay in multiview video coding (MVC). We show that in real-time applications, an accurate estimation of the decoding delay is essential to achieve a minimum communication latency. As opposed to single-view codecs, the complexity of the multiview prediction structure and the parallel decoding of several views requires a systematic analysis of this decoding delay, which we solve using graph theory and a model of the decoder hardware architecture. Our framework assumes a decoder implementation in general purpose multi-core processors with multi-threading capabilities. For this hardware model, we show that frame processing times depend on the computational load of the decoder and we provide an iterative algorithm to compute jointly frame processing times and decoding delay. Finally, we show that decoding delay analysis can be applied to design decoders with the objective of minimizing the communication latency of the MVC system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Los sistemas de adquisición de datos utilizados en los diagnósticos de los dispositivos de fusión termonuclear se enfrentan a importantes retos planteados en los dispositivos de pulso largo. Incluso en los dispositivos de pulso corto, en los que se analizan los datos después de la descarga, existen aún una gran cantidad de datos sin analizar, lo cual supone que queda una gran cantidad de conocimiento por descubrir dentro de las bases de datos existentes. En la última década, la comunidad de fusión ha realizado un gran esfuerzo para mejorar los métodos de análisis off‐line para mejorar este problema, pero no se ha conseguido resolver completamente, debido a que algunos de estos métodos han de resolverse en tiempo real. Este paradigma lleva a establecer que los dispositivos de pulso largo deberán incluir dispositivos de adquisición de datos con capacidades de procesamiento local, capaces de ejecutar avanzados algoritmos de análisis. Los trabajos de investigación realizados en esta tesis tienen como objetivo determinar si es posible incrementar la capacidad local de procesamiento en tiempo real de dichos sistemas mediante el uso de GPUs. Para ello durante el trascurso del periodo de experimentación realizado se han evaluado distintas propuestas a través de casos de uso reales elaborados para algunos de los dispositivos de fusión más representativos como ITER, JET y TCV. Las conclusiones y experiencias obtenidas en dicha fase han permitido proponer un modelo y una metodología de desarrollo para incluir esta tecnología en los sistemas de adquisición para diagnósticos de distinta naturaleza. El modelo define no sólo la arquitectura hardware óptima para realizar dicha integración, sino también la incorporación de este nuevo recurso de procesamiento en los Sistemas de Control de Supervisión y Adquisición de Datos (SCADA) utilizados en la comunidad de fusión (EPICS), proporcionando una solución completa. La propuesta se complementa con la definición de una metodología que resuelve las debilidades detectadas, y permite trazar un camino de integración de la solución en los estándares hardware y software existentes. La evaluación final se ha realizado mediante el desarrollo de un caso de uso representativo de los diagnósticos que necesitan adquisición y procesado de imágenes en el contexto del dispositivo internacional ITER, y ha sido testeada con éxito en sus instalaciones. La solución propuesta en este trabajo ha sido incluida por la ITER IO en su catálogo de soluciones estándar para el desarrollo de sus futuros diagnósticos. Por otra parte, como resultado y fruto de la investigación de esta tesis, cabe destacar el acuerdo llevado a cabo con la empresa National Instruments en términos de transferencia tecnológica, lo que va a permitir la actualización de los sistemas de adquisición utilizados en los dispositivos de fusión. ABSTRACT Data acquisition systems used in the diagnostics of thermonuclear fusion devices face important challenges due to the change in the data acquisition paradigm needed for long pulse operation. Even in shot pulse devices, where data is mainly analyzed after the discharge has finished , there is still a large amount of data that has not been analyzed, therefore producing a lot of buried knowledge that still lies undiscovered in the data bases holding the vast amount of data that has been generated. There has been a strong effort in the fusion community in the last decade to improve the offline analysis methods to overcome this problem, but it has proved to be insufficient unless some of these mechanisms can be run in real time. In long pulse devices this new paradigm, where data acquisition devices include local processing capabilities to be able to run advanced data analysis algorithms, will be a must. The research works done in this thesis aim to determining whether it is possible to increase local capacity for real‐time processing of such systems by using GPUs. For that, during the experimentation period, various proposals have been evaluated through use cases developed for several of the most representative fusion devices, ITER, JET and TCV. Conclusions and experiences obtained have allowed to propose a model, and a development methodology, to include this technology in systems for diagnostics of different nature. The model defines not only the optimal hardware architecture for achieving this integration, but also the incorporation of this new processing resource in one of the Systems of Supervision Control and Data Acquisition (SCADA) systems more relevant at the moment in the fusion community (EPICS), providing a complete solution. The final evaluation has been performed through a use case developed for a generic diagnostic requiring image acquisition and processing for the international ITER device, and has been successfully tested in their premises. The solution proposed in this thesis has been included by the ITER IO in his catalog of standard solutions for the development of their future diagnostics. This has been possible thanks to the technologic transfer agreement signed with xi National Instruments which has permitted us to modify and update one of their core software products targeted for the acquisition systems used in these devices.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recursive filters are widely used in image analysis due to their efficiency and simple implementation. However these filters have an initialisation problem which either produces unusable results near the image boundaries or requires costly approximate solutions such as extending the boundary manually. In this paper, we describe a method for the recursive filtering of symmetrically extended images for filters with symmetric denominator. We begin with an analysis of symmetric extensions and their effect on non-recursive filtering operators. Based on the non-recursive case, we derive a formulation of recursive filtering on symmetric domains as a linear but spatially varying implicit operator. We then give an efficient method for decomposing and solving the linear implicit system, along with a proof that this decomposition always exists. This decomposition needs to be performed only once for each dimension of the image. This yields a filtering which is both stable and consistent with the ideal infinite extension. The filter is efficient, requiring less computation than the standard recursive filtering. We give experimental evidence to verify these claims. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A biologically realizable, unsupervised learning rule is described for the online extraction of object features, suitable for solving a range of object recognition tasks. Alterations to the basic learning rule are proposed which allow the rule to better suit the parameters of a given input space. One negative consequence of such modifications is the potential for learning instability. The criteria for such instability are modeled using digital filtering techniques and predicted regions of stability and instability tested. The result is a family of learning rules which can be tailored to the specific environment, improving both convergence times and accuracy over the standard learning rule, while simultaneously insuring learning stability.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The bispectrum and third-order moment can be viewed as equivalent tools for testing for the presence of nonlinearity in stationary time series. This is because the bispectrum is the Fourier transform of the third-order moment. An advantage of the bispectrum is that its estimator comprises terms that are asymptotically independent at distinct bifrequencies under the null hypothesis of linearity. An advantage of the third-order moment is that its values in any subset of joint lags can be used in the test, whereas when using the bispectrum the entire (or truncated) third-order moment is required to construct the Fourier transform. In this paper, we propose a test for nonlinearity based upon the estimated third-order moment. We use the phase scrambling bootstrap method to give a nonparametric estimate of the variance of our test statistic under the null hypothesis. Using a simulation study, we demonstrate that the test obtains its target significance level, with large power, when compared to an existing standard parametric test that uses the bispectrum. Further we show how the proposed test can be used to identify the source of nonlinearity due to interactions at specific frequencies. We also investigate implications for heuristic diagnosis of nonstationarity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper explores potential for the RAMpage memory hierarchy to use a microkernel with a small memory footprint, in a specialized cache-speed static RAM (tightly-coupled memory, TCM). Dreamy memory is DRAM kept in low-power mode, unless referenced. Simulations show that a small microkernel suits RAMpage well, in that it achieves significantly better speed and energy gains than a standard hierarchy from adding TCM. RAMpage, in its best 128KB L2 case, gained 11% speed using TCM, and reduced energy 14%. Equivalent conventional hierarchy gains were under 1%. While 1MB L2 was significantly faster against lower-energy cases for the smaller L2, the larger SRAM's energy does not justify the speed gain. Using a 128KB L2 cache in a conventional architecture resulted in a best-case overall run time of 2.58s, compared with the best dreamy mode run time (RAMpage without context switches on misses) of 3.34s, a speed penalty of 29%. Energy in the fastest 128KB L2 case was 2.18J vs. 1.50J, a reduction of 31%. The same RAMpage configuration without dreamy mode took 2.83s as simulated, and used 2.39J, an acceptable trade-off (penalty under 10%) for being able to switch easily to a lower-energy mode.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Information security devices must preserve security properties even in the presence of faults. This in turn requires a rigorous evaluation of the system behaviours resulting from component failures, especially how such failures affect information flow. We introduce a compositional method of static analysis for fail-secure behaviour. Our method uses reachability matrices to identify potentially undesirable information flows based on the fault modes of the system's components.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The notorious "dimensionality curse" is a well-known phenomenon for any multi-dimensional indexes attempting to scale up to high dimensions. One well-known approach to overcome degradation in performance with respect to increasing dimensions is to reduce the dimensionality of the original dataset before constructing the index. However, identifying the correlation among the dimensions and effectively reducing them are challenging tasks. In this paper, we present an adaptive Multi-level Mahalanobis-based Dimensionality Reduction (MMDR) technique for high-dimensional indexing. Our MMDR technique has four notable features compared to existing methods. First, it discovers elliptical clusters for more effective dimensionality reduction by using only the low-dimensional subspaces. Second, data points in the different axis systems are indexed using a single B+-tree. Third, our technique is highly scalable in terms of data size and dimension. Finally, it is also dynamic and adaptive to insertions. An extensive performance study was conducted using both real and synthetic datasets, and the results show that our technique not only achieves higher precision, but also enables queries to be processed efficiently. Copyright Springer-Verlag 2005

Relevância:

80.00% 80.00%

Publicador:

Resumo:

All signals that appear to be periodic have some sort of variability from period to period regardless of how stable they appear to be in a data plot. A true sinusoidal time series is a deterministic function of time that never changes and thus has zero bandwidth around the sinusoid's frequency. A zero bandwidth is impossible in nature since all signals have some intrinsic variability over time. Deterministic sinusoids are used to model cycles as a mathematical convenience. Hinich [IEEE J. Oceanic Eng. 25 (2) (2000) 256-261] introduced a parametric statistical model, called the randomly modulated periodicity (RMP) that allows one to capture the intrinsic variability of a cycle. As with a deterministic periodic signal the RMP can have a number of harmonics. The likelihood ratio test for this model when the amplitudes and phases are known is given in [M.J. Hinich, Signal Processing 83 (2003) 1349-13521. A method for detecting a RMP whose amplitudes and phases are unknown random process plus a stationary noise process is addressed in this paper. The only assumption on the additive noise is that it has finite dependence and finite moments. Using simulations based on a simple RMP model we show a case where the new method can detect the signal when the signal is not detectable in a standard waterfall spectrograrn display. (c) 2005 Elsevier B.V. All rights reserved.