Biblioteca Digital

104 resultados para Scalable Nanofabrication

A programmable architecture for layered multimedia streams in IPv6 networks

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new configurable architecture is presented that offers multiple levels of video playback by accommodating variable levels of network utilization and bandwidth. By utilizing scalable MPEG-4 encoding at the network edge and using specific video delivery protocols, media streaming components are merged to fully optimize video playback for IPv6 networks, thus improving QoS. This is achieved by introducing “programmable network functionality” (PNF) which splits layered video transmission and distributes it evenly over available bandwidth, reducing packet loss and delay caused by out-of-profile DiffServ classes. An FPGA design is given which gives improved performance, e.g. link utilization, end-to-end delay, and that during congestion, improves on-time delivery of video frames by up to 80% when compared to current “static” DiffServ.

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Modal Representation of the Resonant Body within a Finite Difference Framework for Simulation of String Instruments

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates numerical simulation of a string coupled
transversely to a resonant body. Starting from a complete nite
difference formulation, a second model is derived in which the
body is represented in modal form. The main advantage of this hybrid form is that the body model is scalable, i.e. the computational
complexity can be adjusted to the available processing power. Numerical results are calculated and discussed for simplied models
in the form of string-string coupling and string-plate coupling.

Physical model for the generation of ideal resources in multipartite quantum networking

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a physical model for generating multipartite entangled states of spin-s particles that have important applications in distributed quantum information processing. Our protocol is based on a process where mobile spins induce the interaction among remote scattering centers. As such, a major advantage lies in the management of stationary and well-separated spins. Among the generable states, there is a class of N-qubit singlets allowing for optimal quantum telecloning in a scalable and controllable way. We also show how to prepare Aharonov, W, and Greenberger-Horne-Zeilinger states.

Advanced educational parallel DSP system based on TMS320C25 processors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational Parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digital processors to implement DSP algorithms. This system will be used in the DSP laboratory by graduate students to work on advanced topics such as developing parallel DSP algorithms. The graduating senior students who have gained some experience in DSP can also use the system. The DSP laboratory has proved to be a useful tool in the hands of the instructor to teach the mathematically oriented topics of DSP that are often difficult for students to grasp. The DSP laboratory with assigned projects has greatly improved the ability of the students to understand such complex topics as the fast Fourier transform algorithm, linear and circular convolution, the theory and design of infinite impulse response (IIR) and finite impulse response (FIR) filters. The user friendly PC software support of the AdEPar system makes it easy to develop DSP programs for students. This paper gives the architecture of the AdEPar DSP system. The communication between processors and the PC-DSP processor communication are explained. The parallel debugger kernels and the restrictions of the system are described. The programming in the AdEPar is explained, and two benchmarks (parallel FFT and DES) are presented to show the system performance.

An On Demand Queue Management Architecture for a Programmable Traffic Manager

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A queue manager (QM) is a core traffic management (TM) function used to provide per-flow queuing in access andmetro networks; however current designs have limited scalability. An on-demand QM (OD-QM) which is part of a new modular field-programmable gate-array (FPGA)-based TM is presented that dynamically maps active flows to the available physical resources; its scalability is derived from exploiting the observation that there are only a few hundred active flows in a high speed network. Simulations with real traffic show that it is a scalable, cost-effective approach that enhances per-flow queuing performance, thereby allowing per-flow QM without the need for extra external memory at speeds up to 10 Gbps. It utilizes 2.3%–16.3% of a Xilinx XC5VSX50t FPGA and works at 111 MHz.

Fully hardware based WFQ architecture for high-speed QoS packet scheduling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A full hardware implementation of a Weighted Fair Queuing (WFQ) packet scheduler is proposed. The circuit architecture presented has been implemented using Altera Stratix II FPGA technology, utilizing RLDII and QDRII memory components. The circuit can provide fine granularity Quality of Service (QoS) support at a line throughput rate of 12.8Gb/s in its current implementation. The authors suggest that, due to the flexible and scalable modular circuit design approach used, the current circuit architecture can be targeted for a full ASIC implementation to deliver 50 Gb/s throughput. The circuit itself comprises three main components; a WFQ algorithm computation circuit, a tag/time-stamp sort and retrieval circuit, and a high throughput shared buffer. The circuit targets the support of emerging wireline and wireless network nodes that focus on Service Level Agreements (SLA's) and Quality of Experience.

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.

The Paralax Infrastructure: Automatic Parallelization With a Helping Hand

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exempli?ed by the SPECint codes, is a very hard problem. This paper shows that, with a helping hand, such auto-parallelization is possible and fruitful. This paper makes the following contributions: (i) A compiler framework for extracting pipeline-like parallelism from outer program loops is presented. (ii) Using a light-weight programming model based on annotations, the programmer helps the compiler to ?nd thread-level parallelism. Each of the annotations speci?es only a small piece of semantic information that compiler analysis misses, e.g. stating that a variable is dead at a certain program point. The annotations are designed such that correctness is easily veri?ed. Furthermore, we present a tool for suggesting annotations to the programmer. (iii) The methodology is applied to autoparallelize several SPECint benchmarks. For the benchmark with most parallelism (hmmer), we obtain a scalable 7-fold speedup on an AMD quad-core dual processor. The annotations constitute a parallel programming model that relies extensively on a sequential program representation. Hereby, the complexity of debugging is not increased and it does not obscure the source code. These properties could prove valuable to increase the ef?ciency of parallel programming.

Temporal extension of Laplacian Eigenmaps for unsupervised dimensionality reduction of time series

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel non-linear dimensionality reduction method, called Temporal Laplacian Eigenmaps, is introduced to process efficiently time series data. In this embedded-based approach, temporal information is intrinsic to the objective function, which produces description of low dimensional spaces with time coherence between data points. Since the proposed scheme also includes bidirectional mapping between data and embedded spaces and automatic tuning of key parameters, it offers the same benefits as mapping-based approaches. Experiments on a couple of computer vision applications demonstrate the superiority of the new approach to other dimensionality reduction method in term of accuracy. Moreover, its lower computational cost and generalisation abilities suggest it is scalable to larger datasets. © 2010 IEEE.

Automatic Parallelization in the Paralax Compiler

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The efficient development of multi-threaded software has, for many years, been an unsolved problem in computer science. Finding a solution to this problem has become urgent with the advent of multi-core processors. Furthermore, the problem has become more complicated because multi-cores are everywhere (desktop, laptop, embedded system). As such, they execute generic programs which exhibit very different characteristics than the scientific applications that have been the focus of parallel computing in the past.
Implicitly parallel programming is an approach to parallel pro- gramming that promises high productivity and efficiency and rules out synchronization errors and race conditions by design. There are two main ingredients to implicitly parallel programming: (i) a con- ventional sequential programming language that is extended with annotations that describe the semantics of the program and (ii) an automatic parallelizing compiler that uses the annotations to in- crease the degree of parallelization.
It is extremely important that the annotations and the automatic parallelizing compiler are designed with the target application do- main in mind. In this paper, we discuss the Paralax approach to im- plicitly parallel programming and we review how the annotations and the compiler design help to successfully parallelize generic programs. We evaluate Paralax on SPECint benchmarks, which are a model for such programs, and demonstrate scalable speedups, up to a factor of 6 on 8 cores.

Directed self-assembly of nanorod networks: bringing the top down to the bottom up

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Self-assembled electrodeposited nanorod materials have been shown to offer an exciting landscape for a wide array of research ranging from nanophotonics through to biosening and magnetics. However, until now, the scope for site-specific preparation of the nanorods on wafers is limited to local area definition. Further there is little or no lateral control of nanorod height. In this work we present a scalable method for controlling the growth of the nanorods in the vertical direction as well as their lateral position. A focused ion beam (FIB) pre-patterns the Au cathode layer prior to the creation of the Anodized Aluminium Oxide (AAO) template on top. When the pre-patterning is of the same dimension to the pore spacing of the AAO template, lines of single nanorods are successfully grown. Further, for sub-200 nm wide features a relationship between the nanorod height and distance from non-patterned cathode can be seen to follow a quadratic growth rate obeying Faradays law of electrodeposition. This facilitates lateral control of nanorod height combined with localised growth of the nanorods.

Optical transmission of periodic annular apertures in metal film on high-refractive index substrate: The role of the nanopillar shape

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The influence of annular aperture parameters on the optical transmission through arrays of coaxial apertures in a metal film on high refractive index substrates has been investigated experimentally and numerically. It is shown that the transmission resonances are related to plasmonic crystal effects rather than frequency cutoff behavior associated with annular apertures. The role of deviations from ideal aperture shape occurring during the fabrication process has also been studied. Annular aperture arrays are often considered in many applications for achieving high optical transmission through metal films and understanding of nanofabrication tolerances are important. (C) 2010 American Institute of Physics.

Methylation using dimethylcarbonate catalysed by ionic liquids under continuous flow conditions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The ionic liquid, tributylmethylammonium methylcarbonate, has been employed as a catalytic base for clean N-methylation of indole with dimethylcarbonate. The reaction conditions were optimised under microwave heating to give 100% conversion and 100% selectivity to N-methylindole, and subsequently transferred to a high temperature/high pressure (285 degrees C/150 bar) continuous flow process using a short (3 min) residence time and 2 mol% of the catalyst to efficiently methylate a variety of different amines, phenols, thiophenols and carboxylic acid substrates. The extremely short residence times, versatility, and high selectivity have significant implications for the synthesis of a wide range of pharmaceutical intermediates, as high product throughputs can be obtained via this scalable continuous flow protocol. It has also been shown that the ionic liquid can be generated in situ from tributylamine, which has the net effect of transforming an ineffective stoichiometric base into a highly efficient catalyst for this broad class of reactions.

Wavelet packet transforms for system-on-chip applications

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A methodology for the production of silicon cores for wavelet packet decomposition has been developed. The scheme utilizes efficient scalable architectures for both orthonormal and biorthogonal wavelet transforms. The cores produced from these architectures can be readily scaled for any wavelet function and are easily configurable for any subband structure. The cores are fully parameterized in terms of wavelet choice and appropriate wordlengths. Designs produced are portable across a range of silicon foundries as well as FPGA and PLD technologies. A number of exemplar implementations have been produced.

«
1
2
3
4
5
6
7
»