38 resultados para User interfaces (Computer systems)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and
cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures are highly susceptible to pipeline bubbles resulting from data and control hazards; the only way to mitigate against these is manual interleaving of
application tasks on each datapath, since no suitable automated interleaving approach exists. In this paper we describe a new automated integrated mapping/scheduling approach to map algorithm tasks to processors and a new low-complexity list scheduling technique to generate the interleaved schedules. When applied to a spatial Fixed-Complexity Sphere Decoding (FSD) detector
for next-generation Multiple-Input Multiple-Output (MIMO) systems, the resulting schedules achieve real-time performance for IEEE 802.11n systems on a network of 16-way SIMD processors on FPGA, enable better performance/complexity balance than current approaches and produce results comparable to handcrafted implementations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Realising high performance image and signal processing
applications on modern FPGA presents a challenging implementation problem due to the large data frames streaming through these systems. Specifically, to meet the high bandwidth and data storage demands of these applications, complex hierarchical memory architectures must be manually specified
at the Register Transfer Level (RTL). Automated approaches which convert high-level operation descriptions, for instance in the form of C programs, to an FPGA architecture, are unable to automatically realise such architectures. This paper
presents a solution to this problem. It presents a compiler to automatically derive such memory architectures from a C program. By transforming the input C program to a unique dataflow modelling dialect, known as Valved Dataflow (VDF), a mapping and synthesis approach developed for this dialect can
be exploited to automatically create high performance image and video processing architectures. Memory intensive C kernels for Motion Estimation (CIF Frames at 30 fps), Matrix Multiplication (128x128 @ 500 iter/sec) and Sobel Edge Detection (720p @ 30 fps), which are unrealisable by current state-of-the-art C-based synthesis tools, are automatically derived from a C description of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rapid expansion of the internet and the increasing demand on Web servers, many techniques were developed to overcome the servers' hardware performance limitation. Mirrored Web Servers is one of the techniques used where a number of servers carrying the same "mirrored" set of services are deployed. Client access requests are then distributed over the set of mirrored servers to even up the load. In this paper we present a generic reference software architecture for load balancing over mirrored web servers. The architecture was designed adopting the latest NaSr architectural style [1] and described using the ADLARS [2] architecture description language. With minimal effort, different tailored product architectures can be generated from the reference architecture to serve different network protocols and server operating systems. An example product system is described and a sample Java implementation is presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “dengue fever headache.” In this demonstration, we present SOPS, the Spatial-Keyword Publish/Subscribe System, that is capable of efficiently processing spatial keyword continuous queries. SOPS supports two types of queries: (1) Boolean Range Continuous (BRC) query that can be used to subscribe the geo-textual objects satisfying a boolean keyword expression and falling in a specified spatial region; (2) Temporal Spatial-Keyword Top-k Continuous (TaSK) query that continuously maintains up-to-date top-k most relevant results over a stream of geo-textual objects. SOPS enables users to formulate their queries and view the real-time results over a stream of geotextual objects by browser-based user interfaces. On the server side, we propose solutions to efficiently processing a large number of BRC queries (tens of millions) and TaSK queries over a stream of geo-textual objects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

NanoStreams explores the design, implementation,and system software stack of micro-servers aimed at processingdata in-situ and in real time. These micro-servers can serve theemerging Edge computing ecosystem, namely the provisioningof advanced computational, storage, and networking capabilitynear data sources to achieve both low latency event processingand high throughput analytical processing, before consideringoff-loading some of this processing to high-capacity datacentres.NanoStreams explores a scale-out micro-server architecture thatcan achieve equivalent QoS to that of conventional rack-mountedservers for high-capacity datacentres, but with dramaticallyreduced form factors and power consumption. To this end,NanoStreams introduces novel solutions in programmable & con-figurable hardware accelerators, as well as the system softwarestack used to access, share, and program those accelerators.Our NanoStreams micro-server prototype has demonstrated 5.5×higher energy-efficiency than a standard Xeon Server. Simulationsof the microserver’s memory system extended to leveragehybrid DDR/NVM main memory indicated 5× higher energyefficiencythan a conventional DDR-based system. 

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Power capping is a fundamental method for reducing the energy consumption of a wide range of modern computing environments, ranging from mobile embedded systems to datacentres. Unfortunately, maximising performance and system efficiency under static power caps remains challenging, while maximising performance under dynamic power caps has been largely unexplored. We present an adaptive power capping method that reduces the power consumption and maximizes the performance of heterogeneous SoCs for mobile and server platforms. Our technique combines power capping with coordinated DVFS, data partitioning and core allocations on a heterogeneous SoC with ARM processors and FPGA resources. We design our framework as a run-time system based on OpenMP and OpenCL to utilise the heterogeneous resources. We evaluate it through five data-parallel benchmarks on the Xilinx SoC which allows fully voltage and frequency control. Our experiments show a significant performance boost of 30% under dynamic power caps with concurrent execution on ARM and FPGA, compared to a naive separate approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Monitoring multiple myeloma patients for relapse requires sensitive methods to measure minimal residual disease and to establish a more precise prognosis. The present study aimed to standardize a real-time quantitative polymerase chain reaction (PCR) test for the IgH gene with a JH consensus self-quenched fluorescence reverse primer and a VDJH or DJH allele-specific sense primer (self-quenched PCR). This method was compared with allele-specific real-time quantitative PCR test for the IgH gene using a TaqMan probe and a JH consensus primer (TaqMan PCR). We studied nine multiple myeloma patients from the Spanish group treated with the MM2000 therapeutic protocol. Self-quenched PCR demonstrated sensitivity of >or=10(-4) or 16 genomes in most cases, efficiency was 1.71 to 2.14, and intra-assay and interassay reproducibilities were 1.18 and 0.75%, respectively. Sensitivity, efficiency, and residual disease detection were similar with both PCR methods. TaqMan PCR failed in one case because of a mutation in the JH primer binding site, and self-quenched PCR worked well in this case. In conclusion, self-quenched PCR is a sensitive and reproducible method for quantifying residual disease in multiple myeloma patients; it yields similar results to TaqMan PCR and may be more effective than the latter when somatic mutations are present in the JH intronic primer binding site.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new technique based on adaptive code-to-user allocation for interference management on the downlink of BPSK based TDD DS-CDMA systems is presented. The principle of the proposed technique is to exploit the dependency of multiple access interference on the instantaneous symbol values of the active users. The objective is to adaptively allocate the available spreading sequences to users on a symbol-by-symbol basis to optimize the decision variables at the downlink receivers. The presented simulations show an overall system BER performance improvement of more than an order of a magnitude with the proposed technique while the adaptation overhead is kept less than 10% of the available bandwidth.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.