917 resultados para Parallel or distributed processing
Resumo:
We present a technique to estimate accurate speedups for parallel logic programs with relative independence from characteristics of a given implementation or underlying parallel hardware. The proposed technique is based on gathering accurate data describing one execution at run-time, which is fed to a simulator. Alternative schedulings are then simulated and estimates computed for the corresponding speedups. A tool implementing the aforementioned techniques is presented, and its predictions are compared to the performance of real systems, showing good correlation.
Resumo:
In recent years a lot of research has been invested in parallel processing of numerical applications. However, parallel processing of Symbolic and AI applications has received less attention. This paper presents a system for parallel symbolic computitig, narned ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parall< lism and Independent And-parallelism. In this paper vve focus on the implementation of the and-parallel part of the ACE system (ralled &ACE) on a shared memory multiprocessor, d< scribing its organization, some optimizations, and presenting some performance figures, proving the abilhy of &ACE to efficiently exploit parallelism.
Resumo:
In this paper we present a novel execution model for parallel implementation of logic programs which is capable of exploiting both independent and-parallelism and or-parallelism in an efficient way. This model extends the stack copying approach, which has been successfully applied in the Muse system to implement or-parallelism, by integrating it with proven techniques used to support independent and-parallelism. We show how all solutions to non-deterministic andparallel goals are found without repetitions. This is done through recomputation as in Prolog (and in various and-parallel systems, like &-Prolog and DDAS), i.e., solutions of and-parallel goals are not shared. We propose a scheme for the efficient management of the address space in a way that is compatible with the apparently incompatible requirements of both and- and or-parallelism. We also show how the full Prolog language, with all its extra-logical features, can be supported in our and-or parallel system so that its sequential semantics is preserved. The resulting system retains the advantages of both purely or-parallel systems as well as purely and-parallel systems. The stack copying scheme together with our proposed memory management scheme can also be used to implement models that combine dependent and-parallelism and or-parallelism, such as Andorra and Prometheus.
Resumo:
We argüe that in order to exploit both Independent And- and Or-parallelism in Prolog programs there is advantage in recomputing some of the independent goals, as opposed to all their solutions being reused. We present an abstract model, called the Composition- Tree, for representing and-or parallelism in Prolog Programs. The Composition-tree closely mirrors sequential Prolog execution by recomputing some independent goals rather than fully re-using them. We also outline two environment representation techniques for And-Or parallel execution of full Prolog based on the Composition-tree model abstraction. We argüe that these techniques have advantages over earlier proposals for exploiting and-or parallelism in Prolog.
Resumo:
In recent future, wireless sensor networks (WSNs) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of WSNs facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers (DCs). The high economical and environmental impact of the energy consumption in DCs requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of WSNs: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of DCs: energy-optimal workload assignment policies in heterogeneous DCs, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.
Resumo:
In recent future, wireless sensor networks ({WSNs}) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of {WSNs} facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers ({DCs}). The high economical and environmental impact of the energy consumption in {DCs} requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of {WSNs}: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of {DCs}: energy-optimal workload assignment policies in heterogeneous {DCs}, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.
Resumo:
We study a cognitive radio scenario in which the network of sec- ondary users wishes to identify which primary user, if any, is trans- mitting. To achieve this, the nodes will rely on some form of location information. In our previous work we proposed two fully distributed algorithms for this task, with and without a pre-detection step, using propagation parameters as the only source of location information. In a real distributed deployment, each node must estimate its own po- sition and/or propagation parameters. Hence, in this work we study the effect of uncertainty, or error in these estimates on the proposed distributed identification algorithms. We show that the pre-detection step significantly increases robustness against uncertainty in nodes' locations.
Resumo:
With the advent of cloud computing, many applications have embraced the ensuing paradigm shift towards modern distributed key-value data stores, like HBase, in order to benefit from the elastic scalability on offer. However, many applications still hesitate to make the leap from the traditional relational database model simply because they cannot compromise on the standard transactional guarantees of atomicity, isolation, and durability. To get the best of both worlds, one option is to integrate an independent transaction management component with a distributed key-value store. In this paper, we discuss the implications of this approach for durability. In particular, if the transaction manager provides durability (e.g., through logging), then we can relax durability constraints in the key-value store. However, if a component fails (e.g., a client or a key-value server), then we need a coordinated recovery procedure to ensure that commits are persisted correctly. In our research, we integrate an independent transaction manager with HBase. Our main contribution is a failure recovery middleware for the integrated system, which tracks the progress of each commit as it is flushed down by the client and persisted within HBase, so that we can recover reliably from failures. During recovery, commits that were interrupted by the failure are replayed from the transaction management log. Importantly, the recovery process does not interrupt transaction processing on the available servers. Using a benchmark, we evaluate the impact of component failure, and subsequent recovery, on application performance.
Resumo:
Esta Tesis aborda los problemas de eficiencia de las redes eléctrica desde el punto de vista del consumo. En particular, dicha eficiencia es mejorada mediante el suavizado de la curva de consumo agregado. Este objetivo de suavizado de consumo implica dos grandes mejoras en el uso de las redes eléctricas: i) a corto plazo, un mejor uso de la infraestructura existente y ii) a largo plazo, la reducción de la infraestructura necesaria para suplir las mismas necesidades energéticas. Además, esta Tesis se enfrenta a un nuevo paradigma energético, donde la presencia de generación distribuida está muy extendida en las redes eléctricas, en particular, la generación fotovoltaica (FV). Este tipo de fuente energética afecta al funcionamiento de la red, incrementando su variabilidad. Esto implica que altas tasas de penetración de electricidad de origen fotovoltaico es perjudicial para la estabilidad de la red eléctrica. Esta Tesis trata de suavizar la curva de consumo agregado considerando esta fuente energética. Por lo tanto, no sólo se mejora la eficiencia de la red eléctrica, sino que también puede ser aumentada la penetración de electricidad de origen fotovoltaico en la red. Esta propuesta conlleva grandes beneficios en los campos económicos, social y ambiental. Las acciones que influyen en el modo en que los consumidores hacen uso de la electricidad con el objetivo producir un ahorro energético o un aumento de eficiencia son llamadas Gestión de la Demanda Eléctrica (GDE). Esta Tesis propone dos algoritmos de GDE diferentes para cumplir con el objetivo de suavizado de la curva de consumo agregado. La diferencia entre ambos algoritmos de GDE reside en el marco en el cual estos tienen lugar: el marco local y el marco de red. Dependiendo de este marco de GDE, el objetivo energético y la forma en la que se alcanza este objetivo son diferentes. En el marco local, el algoritmo de GDE sólo usa información local. Este no tiene en cuenta a otros consumidores o a la curva de consumo agregado de la red eléctrica. Aunque esta afirmación pueda diferir de la definición general de GDE, esta vuelve a tomar sentido en instalaciones locales equipadas con Recursos Energéticos Distribuidos (REDs). En este caso, la GDE está enfocada en la maximización del uso de la energía local, reduciéndose la dependencia con la red. El algoritmo de GDE propuesto mejora significativamente el auto-consumo del generador FV local. Experimentos simulados y reales muestran que el auto-consumo es una importante estrategia de gestión energética, reduciendo el transporte de electricidad y alentando al usuario a controlar su comportamiento energético. Sin embargo, a pesar de todas las ventajas del aumento de auto-consumo, éstas no contribuyen al suavizado del consumo agregado. Se han estudiado los efectos de las instalaciones locales en la red eléctrica cuando el algoritmo de GDE está enfocado en el aumento del auto-consumo. Este enfoque puede tener efectos no deseados, incrementando la variabilidad en el consumo agregado en vez de reducirlo. Este efecto se produce porque el algoritmo de GDE sólo considera variables locales en el marco local. Los resultados sugieren que se requiere una coordinación entre las instalaciones. A través de esta coordinación, el consumo debe ser modificado teniendo en cuenta otros elementos de la red y buscando el suavizado del consumo agregado. En el marco de la red, el algoritmo de GDE tiene en cuenta tanto información local como de la red eléctrica. En esta Tesis se ha desarrollado un algoritmo autoorganizado para controlar el consumo de la red eléctrica de manera distribuida. El objetivo de este algoritmo es el suavizado del consumo agregado, como en las implementaciones clásicas de GDE. El enfoque distribuido significa que la GDE se realiza desde el lado de los consumidores sin seguir órdenes directas emitidas por una entidad central. Por lo tanto, esta Tesis propone una estructura de gestión paralela en lugar de una jerárquica como en las redes eléctricas clásicas. Esto implica que se requiere un mecanismo de coordinación entre instalaciones. Esta Tesis pretende minimizar la cantidad de información necesaria para esta coordinación. Para lograr este objetivo, se han utilizado dos técnicas de coordinación colectiva: osciladores acoplados e inteligencia de enjambre. La combinación de estas técnicas para llevar a cabo la coordinación de un sistema con las características de la red eléctrica es en sí mismo un enfoque novedoso. Por lo tanto, este objetivo de coordinación no es sólo una contribución en el campo de la gestión energética, sino también en el campo de los sistemas colectivos. Los resultados muestran que el algoritmo de GDE propuesto reduce la diferencia entre máximos y mínimos de la red eléctrica en proporción a la cantidad de energía controlada por el algoritmo. Por lo tanto, conforme mayor es la cantidad de energía controlada por el algoritmo, mayor es la mejora de eficiencia en la red eléctrica. Además de las ventajas resultantes del suavizado del consumo agregado, otras ventajas surgen de la solución distribuida seguida en esta Tesis. Estas ventajas se resumen en las siguientes características del algoritmo de GDE propuesto: • Robustez: en un sistema centralizado, un fallo o rotura del nodo central provoca un mal funcionamiento de todo el sistema. La gestión de una red desde un punto de vista distribuido implica que no existe un nodo de control central. Un fallo en cualquier instalación no afecta el funcionamiento global de la red. • Privacidad de datos: el uso de una topología distribuida causa de que no hay un nodo central con información sensible de todos los consumidores. Esta Tesis va más allá y el algoritmo propuesto de GDE no utiliza información específica acerca de los comportamientos de los consumidores, siendo la coordinación entre las instalaciones completamente anónimos. • Escalabilidad: el algoritmo propuesto de GDE opera con cualquier número de instalaciones. Esto implica que se permite la incorporación de nuevas instalaciones sin afectar a su funcionamiento. • Bajo coste: el algoritmo de GDE propuesto se adapta a las redes actuales sin requisitos topológicos. Además, todas las instalaciones calculan su propia gestión con un bajo requerimiento computacional. Por lo tanto, no se requiere un nodo central con un alto poder de cómputo. • Rápido despliegue: las características de escalabilidad y bajo coste de los algoritmos de GDE propuestos permiten una implementación rápida. No se requiere una planificación compleja para el despliegue de este sistema. ABSTRACT This Thesis addresses the efficiency problems of the electrical grids from the consumption point of view. In particular, such efficiency is improved by means of the aggregated consumption smoothing. This objective of consumption smoothing entails two major improvements in the use of electrical grids: i) in the short term, a better use of the existing infrastructure and ii) in long term, the reduction of the required infrastructure to supply the same energy needs. In addition, this Thesis faces a new energy paradigm, where the presence of distributed generation is widespread over the electrical grids, in particular, the Photovoltaic (PV) generation. This kind of energy source affects to the operation of the grid by increasing its variability. This implies that a high penetration rate of photovoltaic electricity is pernicious for the electrical grid stability. This Thesis seeks to smooth the aggregated consumption considering this energy source. Therefore, not only the efficiency of the electrical grid is improved, but also the penetration of photovoltaic electricity into the grid can be increased. This proposal brings great benefits in the economic, social and environmental fields. The actions that influence the way that consumers use electricity in order to achieve energy savings or higher efficiency in energy use are called Demand-Side Management (DSM). This Thesis proposes two different DSM algorithms to meet the aggregated consumption smoothing objective. The difference between both DSM algorithms lie in the framework in which they take place: the local framework and the grid framework. Depending on the DSM framework, the energy goal and the procedure to reach this goal are different. In the local framework, the DSM algorithm only uses local information. It does not take into account other consumers or the aggregated consumption of the electrical grid. Although this statement may differ from the general definition of DSM, it makes sense in local facilities equipped with Distributed Energy Resources (DERs). In this case, the DSM is focused on the maximization of the local energy use, reducing the grid dependence. The proposed DSM algorithm significantly improves the self-consumption of the local PV generator. Simulated and real experiments show that self-consumption serves as an important energy management strategy, reducing the electricity transport and encouraging the user to control his energy behavior. However, despite all the advantages of the self-consumption increase, they do not contribute to the smooth of the aggregated consumption. The effects of the local facilities on the electrical grid are studied when the DSM algorithm is focused on self-consumption maximization. This approach may have undesirable effects, increasing the variability in the aggregated consumption instead of reducing it. This effect occurs because the algorithm only considers local variables in the local framework. The results suggest that coordination between these facilities is required. Through this coordination, the consumption should be modified by taking into account other elements of the grid and seeking for an aggregated consumption smoothing. In the grid framework, the DSM algorithm takes into account both local and grid information. This Thesis develops a self-organized algorithm to manage the consumption of an electrical grid in a distributed way. The goal of this algorithm is the aggregated consumption smoothing, as the classical DSM implementations. The distributed approach means that the DSM is performed from the consumers side without following direct commands issued by a central entity. Therefore, this Thesis proposes a parallel management structure rather than a hierarchical one as in the classical electrical grids. This implies that a coordination mechanism between facilities is required. This Thesis seeks for minimizing the amount of information necessary for this coordination. To achieve this objective, two collective coordination techniques have been used: coupled oscillators and swarm intelligence. The combination of these techniques to perform the coordination of a system with the characteristics of the electric grid is itself a novel approach. Therefore, this coordination objective is not only a contribution in the energy management field, but in the collective systems too. Results show that the proposed DSM algorithm reduces the difference between the maximums and minimums of the electrical grid proportionally to the amount of energy controlled by the system. Thus, the greater the amount of energy controlled by the algorithm, the greater the improvement of the efficiency of the electrical grid. In addition to the advantages resulting from the smoothing of the aggregated consumption, other advantages arise from the distributed approach followed in this Thesis. These advantages are summarized in the following features of the proposed DSM algorithm: • Robustness: in a centralized system, a failure or breakage of the central node causes a malfunction of the whole system. The management of a grid from a distributed point of view implies that there is not a central control node. A failure in any facility does not affect the overall operation of the grid. • Data privacy: the use of a distributed topology causes that there is not a central node with sensitive information of all consumers. This Thesis goes a step further and the proposed DSM algorithm does not use specific information about the consumer behaviors, being the coordination between facilities completely anonymous. • Scalability: the proposed DSM algorithm operates with any number of facilities. This implies that it allows the incorporation of new facilities without affecting its operation. • Low cost: the proposed DSM algorithm adapts to the current grids without any topological requirements. In addition, every facility calculates its own management with low computational requirements. Thus, a central computational node with a high computational power is not required. • Quick deployment: the scalability and low cost features of the proposed DSM algorithms allow a quick deployment. A complex schedule of the deployment of this system is not required.
Resumo:
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, function of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells has only very recently been proposed (Jerusalem et al., 2013). In this paper, we present the implementation details of Neurite: the finite difference parallel program used in this reference. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite-explicit and implicit-were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between lectrophysiology and mechanics (Jerusalem et al., 2013). This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted.
Resumo:
This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.
Resumo:
Everybody has to coordinate several tasks everyday, usually in a manual manner. Recently, the concept of Task Automation Services has been introduced to automate and personalize the task coordination problem. Several user centered platforms and applications have arisen in the last years, that let their users configure their very own automations based on third party services. In this paper, we propose a new system architecture for Task Automation Services in a heterogeneous mobile, smart devices, and cloud services environment. Our architecture is based on the novel idea to employ distributed Complex Event Processing to implement innovative mixed execution profiles. The major advantage of the approach is its ability to incorporate context-awareness and real-time coordination in Task Automation Services.
Resumo:
We introduce the need for a distributed guideline-based decision sup-port (DSS) process, describe its characteristics, and explain how we implement-ed this process within the European Union?s MobiGuide project. In particular, we have developed a mechanism of sequential, piecemeal projection, i.e., 'downloading' small portions of the guideline from the central DSS server, to the local DSS in the patient's mobile device, which then applies that portion, us-ing the mobile device's local resources. The mobile device sends a callback to the central DSS when it encounters a triggering pattern predefined in the pro-jected module, which leads to an appropriate predefined action by the central DSS, including sending a new projected module, or directly controlling the rest of the workflow. We suggest that such a distributed architecture that explicitly defines a dialog between a central DSS server and a local DSS module, better balances the computational load and exploits the relative advantages of the cen-tral server and of the local mobile device.
Resumo:
In this work we review some earlier distributed algorithms developed by the authors and collaborators, which are based on two different approaches, namely, distributed moment estimation and distributed stochastic approximations. We show applications of these algorithms on image compression, linear classification and stochastic optimal control. In all cases, the benefit of cooperation is clear: even when the nodes have access to small portions of the data, by exchanging their estimates, they achieve the same performance as that of a centralized architecture, which would gather all the data from all the nodes.
Resumo:
La evolución de los teléfonos móviles inteligentes, dotados de cámaras digitales, está provocando una creciente demanda de aplicaciones cada vez más complejas que necesitan algoritmos de visión artificial en tiempo real; puesto que el tamaño de las señales de vídeo no hace sino aumentar y en cambio el rendimiento de los procesadores de un solo núcleo se ha estancado, los nuevos algoritmos que se diseñen para visión artificial han de ser paralelos para poder ejecutarse en múltiples procesadores y ser computacionalmente escalables. Una de las clases de procesadores más interesantes en la actualidad se encuentra en las tarjetas gráficas (GPU), que son dispositivos que ofrecen un alto grado de paralelismo, un excelente rendimiento numérico y una creciente versatilidad, lo que los hace interesantes para llevar a cabo computación científica. En esta tesis se exploran dos aplicaciones de visión artificial que revisten una gran complejidad computacional y no pueden ser ejecutadas en tiempo real empleando procesadores tradicionales. En cambio, como se demuestra en esta tesis, la paralelización de las distintas subtareas y su implementación sobre una GPU arrojan los resultados deseados de ejecución con tasas de refresco interactivas. Asimismo, se propone una técnica para la evaluación rápida de funciones de complejidad arbitraria especialmente indicada para su uso en una GPU. En primer lugar se estudia la aplicación de técnicas de síntesis de imágenes virtuales a partir de únicamente dos cámaras lejanas y no paralelas—en contraste con la configuración habitual en TV 3D de cámaras cercanas y paralelas—con información de color y profundidad. Empleando filtros de mediana modificados para la elaboración de un mapa de profundidad virtual y proyecciones inversas, se comprueba que estas técnicas son adecuadas para una libre elección del punto de vista. Además, se demuestra que la codificación de la información de profundidad con respecto a un sistema de referencia global es sumamente perjudicial y debería ser evitada. Por otro lado se propone un sistema de detección de objetos móviles basado en técnicas de estimación de densidad con funciones locales. Este tipo de técnicas es muy adecuada para el modelado de escenas complejas con fondos multimodales, pero ha recibido poco uso debido a su gran complejidad computacional. El sistema propuesto, implementado en tiempo real sobre una GPU, incluye propuestas para la estimación dinámica de los anchos de banda de las funciones locales, actualización selectiva del modelo de fondo, actualización de la posición de las muestras de referencia del modelo de primer plano empleando un filtro de partículas multirregión y selección automática de regiones de interés para reducir el coste computacional. Los resultados, evaluados sobre diversas bases de datos y comparados con otros algoritmos del estado del arte, demuestran la gran versatilidad y calidad de la propuesta. Finalmente se propone un método para la aproximación de funciones arbitrarias empleando funciones continuas lineales a tramos, especialmente indicada para su implementación en una GPU mediante el uso de las unidades de filtraje de texturas, normalmente no utilizadas para cómputo numérico. La propuesta incluye un riguroso análisis matemático del error cometido en la aproximación en función del número de muestras empleadas, así como un método para la obtención de una partición cuasióptima del dominio de la función para minimizar el error. ABSTRACT The evolution of smartphones, all equipped with digital cameras, is driving a growing demand for ever more complex applications that need to rely on real-time computer vision algorithms. However, video signals are only increasing in size, whereas the performance of single-core processors has somewhat stagnated in the past few years. Consequently, new computer vision algorithms will need to be parallel to run on multiple processors and be computationally scalable. One of the most promising classes of processors nowadays can be found in graphics processing units (GPU). These are devices offering a high parallelism degree, excellent numerical performance and increasing versatility, which makes them interesting to run scientific computations. In this thesis, we explore two computer vision applications with a high computational complexity that precludes them from running in real time on traditional uniprocessors. However, we show that by parallelizing subtasks and implementing them on a GPU, both applications attain their goals of running at interactive frame rates. In addition, we propose a technique for fast evaluation of arbitrarily complex functions, specially designed for GPU implementation. First, we explore the application of depth-image–based rendering techniques to the unusual configuration of two convergent, wide baseline cameras, in contrast to the usual configuration used in 3D TV, which are narrow baseline, parallel cameras. By using a backward mapping approach with a depth inpainting scheme based on median filters, we show that these techniques are adequate for free viewpoint video applications. In addition, we show that referring depth information to a global reference system is ill-advised and should be avoided. Then, we propose a background subtraction system based on kernel density estimation techniques. These techniques are very adequate for modelling complex scenes featuring multimodal backgrounds, but have not been so popular due to their huge computational and memory complexity. The proposed system, implemented in real time on a GPU, features novel proposals for dynamic kernel bandwidth estimation for the background model, selective update of the background model, update of the position of reference samples of the foreground model using a multi-region particle filter, and automatic selection of regions of interest to reduce computational cost. The results, evaluated on several databases and compared to other state-of-the-art algorithms, demonstrate the high quality and versatility of our proposal. Finally, we propose a general method for the approximation of arbitrarily complex functions using continuous piecewise linear functions, specially formulated for GPU implementation by leveraging their texture filtering units, normally unused for numerical computation. Our proposal features a rigorous mathematical analysis of the approximation error in function of the number of samples, as well as a method to obtain a suboptimal partition of the domain of the function to minimize approximation error.