910 resultados para implementations
Resumo:
A new representation of spatio-temporal random processes is proposed in this work. In practical applications, such processes are used to model velocity fields, temperature distributions, response of vibrating systems, to name a few. Finding an efficient representation for any random process leads to encapsulation of information which makes it more convenient for a practical implementations, for instance, in a computational mechanics problem. For a single-parameter process such as spatial or temporal process, the eigenvalue decomposition of the covariance matrix leads to the well-known Karhunen-Loeve (KL) decomposition. However, for multiparameter processes such as a spatio-temporal process, the covariance function itself can be defined in multiple ways. Here the process is assumed to be measured at a finite set of spatial locations and a finite number of time instants. Then the spatial covariance matrix at different time instants are considered to define the covariance of the process. This set of square, symmetric, positive semi-definite matrices is then represented as a third-order tensor. A suitable decomposition of this tensor can identify the dominant components of the process, and these components are then used to define a closed-form representation of the process. The procedure is analogous to the KL decomposition for a single-parameter process, however, the decompositions and interpretations vary significantly. The tensor decompositions are successfully applied on (i) a heat conduction problem, (ii) a vibration problem, and (iii) a covariance function taken from the literature that was fitted to model a measured wind velocity data. It is observed that the proposed representation provides an efficient approximation to some processes. Furthermore, a comparison with KL decomposition showed that the proposed method is computationally cheaper than the KL, both in terms of computer memory and execution time.
Resumo:
Designing and implementing thread-safe multithreaded libraries can be a daunting task as developers of these libraries need to ensure that their implementations are free from concurrency bugs, including deadlocks. The usual practice involves employing software testing and/or dynamic analysis to detect. deadlocks. Their effectiveness is dependent on well-designed multithreaded test cases. Unsurprisingly, developing multithreaded tests is significantly harder than developing sequential tests for obvious reasons. In this paper, we address the problem of automatically synthesizing multithreaded tests that can induce deadlocks. The key insight to our approach is that a subset of the properties observed when a deadlock manifests in a concurrent execution can also be observed in a single threaded execution. We design a novel, automatic, scalable and directed approach that identifies these properties and synthesizes a deadlock revealing multithreaded test. The input to our approach is the library implementation under consideration and the output is a set of deadlock revealing multithreaded tests. We have implemented our approach as part of a tool, named OMEN1. OMEN is able to synthesize multithreaded tests on many multithreaded Java libraries. Applying a dynamic deadlock detector on the execution of the synthesized tests results in the detection of a number of deadlocks, including 35 real deadlocks in classes documented as thread-safe. Moreover, our experimental results show that dynamic analysis on multithreaded tests that are either synthesized randomly or developed by third-party programmers are ineffective in detecting the deadlocks.
Resumo:
QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2 D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.
Resumo:
The correctness of a hard real-time system depends its ability to meet all its deadlines. Existing real-time systems use either a pure real-time scheduler or a real-time scheduler embedded as a real-time scheduling class in the scheduler of an operating system (OS). Existing implementations of schedulers in multicore systems that support real-time and non-real-time tasks, permit the execution of non-real-time tasks in all the cores with priorities lower than those of real-time tasks, but interrupts and softirqs associated with these non-real-time tasks can execute in any core with priorities higher than those of real-time tasks. As a result, the execution overhead of real-time tasks is quite large in these systems, which, in turn, affects their runtime. In order that the hard real-time tasks can be executed in such systems with minimal interference from other Linux tasks, we propose, in this paper, an integrated scheduler architecture, called SchedISA, which aims to considerably reduce the execution overhead of real-time tasks in these systems. In order to test the efficacy of the proposed scheduler, we implemented partitioned earliest deadline first (P-EDF) scheduling algorithm in SchedISA on Linux kernel, version 3.8, and conducted experiments on Intel core i7 processor with eight logical cores. We compared the execution overhead of real-time tasks in the above implementation of SchedISA with that in SCHED_DEADLINE's P-EDF implementation, which concurrently executes real-time and non-real-time tasks in Linux OS in all the cores. The experimental results show that the execution overhead of real-time tasks in the above implementation of SchedISA is considerably less than that in SCHED_DEADLINE. We believe that, with further refinement of SchedISA, the execution overhead of real-time tasks in SchedISA can be reduced to a predictable maximum, making it suitable for scheduling hard real-time tasks without affecting the CPU share of Linux tasks.
Resumo:
Remote sensing of physiological parameters could be a cost effective approach to improving health care, and low-power sensors are essential for remote sensing because these sensors are often energy constrained. This paper presents a power optimized photoplethysmographic sensor interface to sense arterial oxygen saturation, a technique to dynamically trade off SNR for power during sensor operation, and a simple algorithm to choose when to acquire samples in photoplethysmography. A prototype of the proposed pulse oximeter built using commercial-off-the-shelf (COTS) components is tested on 10 adults. The dynamic adaptation techniques described reduce power consumption considerably compared to our reference implementation, and our approach is competitive to state-of-the-art implementations. The techniques presented in this paper may be applied to low-power sensor interface designs where acquiring samples is expensive in terms of power as epitomized by pulse oximetry.
Resumo:
This paper presents the experimental results for an attractive control scheme implementation using an 8 bit microcontroller. The power converter involved is a 3 phase full controlled bridge rectifier. A single quadrant DC drive has been realized and results have been presented for both open and closed loop implementations.
Resumo:
The charge-pump (CP) mismatch current is a dominant source of static phase error and reference spur in the nano-meter CMOS PLL implementations due to its worsened channel length modulation effect. This paper presents a charge-pump (CP) mismatch current reduction technique utilizing an adaptive body bias tuning of CP transistors and a zero CP mismatch current tracking PLL architecture for reference spur suppression. A chip prototype of the proposed circuit was implemented in 0.13 mu m CMOS technology. The frequency synthesizer consumes 8.2 mA current from a 13 V supply voltage and achieves a phase noise of -96.01 dBc/Hz @ 1 MHz offset from a 2.4 GHz RF carrier. The charge-pump measurements using the proposed calibration technique exhibited a mismatch current of less than 0.3 mu A (0.55%) over the VCO control voltage range of 0.3-1.0 V. The closed loop measurements show a minimized static phase error of within +/- 70 ps and a similar or equal to 9 dB reduction in reference spur level across the PLL output frequency range 2.4-2.5 GHz. The presented CP calibration technique compensates for the DC current mismatch and the mismatch due to channel length modulation effect and therefore improves the performance of CP-PLLs in nano-meter CMOS implementations. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.
Resumo:
This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious. The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.
Resumo:
This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM [1], in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging. © 2010 IEEE.
Resumo:
Sequential Monte Carlo (SMC) methods are popular computational tools for Bayesian inference in non-linear non-Gaussian state-space models. For this class of models, we propose SMC algorithms to compute the score vector and observed information matrix recursively in time. We propose two different SMC implementations, one with computational complexity $\mathcal{O}(N)$ and the other with complexity $\mathcal{O}(N^{2})$ where $N$ is the number of importance sampling draws. Although cheaper, the performance of the $\mathcal{O}(N)$ method degrades quickly in time as it inherently relies on the SMC approximation of a sequence of probability distributions whose dimension is increasing linearly with time. In particular, even under strong \textit{mixing} assumptions, the variance of the estimates computed with the $\mathcal{O}(N)$ method increases at least quadratically in time. The $\mathcal{O}(N^{2})$ is a non-standard SMC implementation that does not suffer from this rapid degrade. We then show how both methods can be used to perform batch and recursive parameter estimation.
Resumo:
Optimal Bayesian multi-target filtering is in general computationally impractical owing to the high dimensionality of the multi-target state. The Probability Hypothesis Density (PHD) filter propagates the first moment of the multi-target posterior distribution. While this reduces the dimensionality of the problem, the PHD filter still involves intractable integrals in many cases of interest. Several authors have proposed Sequential Monte Carlo (SMC) implementations of the PHD filter. However, these implementations are the equivalent of the Bootstrap Particle Filter, and the latter is well known to be inefficient. Drawing on ideas from the Auxiliary Particle Filter (APF), a SMC implementation of the PHD filter which employs auxiliary variables to enhance its efficiency was proposed by Whiteley et. al. Numerical examples were presented for two scenarios, including a challenging nonlinear observation model, to support the claim. This paper studies the theoretical properties of this auxiliary particle implementation. $\mathbb{L}_p$ error bounds are established from which almost sure convergence follows.
Resumo:
Optimal Bayesian multi-target filtering is, in general, computationally impractical owing to the high dimensionality of the multi-target state. The Probability Hypothesis Density (PHD) filter propagates the first moment of the multi-target posterior distribution. While this reduces the dimensionality of the problem, the PHD filter still involves intractable integrals in many cases of interest. Several authors have proposed Sequential Monte Carlo (SMC) implementations of the PHD filter. However, these implementations are the equivalent of the Bootstrap Particle Filter, and the latter is well known to be inefficient. Drawing on ideas from the Auxiliary Particle Filter (APF), we present a SMC implementation of the PHD filter which employs auxiliary variables to enhance its efficiency. Numerical examples are presented for two scenarios, including a challenging nonlinear observation model.
Resumo:
A filosofia do acesso livre ao conhecimento científico surgiu da dificuldade das bibliotecas universitárias de todo mundo em manter atualizadas as assinaturas das coleções de periódicos científicos. Os repositórios institucionais são uma das ferramentas que se mostram como alternativa para a comunicação da Ciência livre de barreiras de acesso. A pesquisa tem por objetivo verificar quais são as perspectivas futuras das atuais políticas de implementação de repositórios institucionais de acesso livre no Brasil na opinião de especialistas na área, tendo como base a análise do estado da arte das implementações de Repositórios Institucionais no Brasil. Neste trabalho, a pesquisa é dividida em três etapas: a primeira consiste na coleta de dados descritivos dos repositórios institucionais da Universidade de Brasília (RIUnB) e do Superior Tribunal de Justiça (BDJur-STJ); já na segunda etapa de pesquisa, procede-se à consulta aos especialistas indagando-os acerca da situação atual das implementações de repositórios institucionais no Brasil; e, por fim, na terceira etapa de pesquisa consultamos estes mesmos especialistas sobre os desdobramentos futuros destas políticas no País. Utilizamos da técnica Delfos de pesquisa, onde os especialistas são consultados através de questionários constituídos de perguntas abertas, possibilitando assim chegar a um consenso das opiniões no final da pesquisa. Como resultado da pesquisa, é elaborado um quadro com a tabulação das respostas dos especialistas consultados revelando o panorama das perspectivas futuras das implementações de repositórios institucionais no Brasil na opinião dos especialistas que fazem parte da nossa amostra de pesquisa.
Resumo:
Traditional software development captures the user needs during the requirement analysis. The Web makes this endeavour even harder due to the difficulty to determine who these users are. In an attempt to tackle the heterogeneity of the user base, Web Personalization techniques are proposed to guide the users’ experience. In addition, Open Innovation allows organisations to look beyond their internal resources to develop new products or improve existing processes. This thesis sits in between by introducing Open Personalization as a means to incorporate actors other than webmasters in the personalization of web applications. The aim is to provide the technological basis that builds up a trusty environment for webmasters and companion actors to collaborate, i.e. "an architecture of participation". Such architecture very much depends on these actors’ profile. This work tackles three profiles (i.e. software partners, hobby programmers and end users), and proposes three "architectures of participation" tuned for each profile. Each architecture rests on different technologies: a .NET annotation library based on Inversion of Control for software partners, a Modding Interface in JavaScript for hobby programmers, and finally, a domain specific language for end-users. Proof-of-concept implementations are available for the three cases while a quantitative evaluation is conducted for the domain specific language.