668 resultados para Bitrate overhead
Resumo:
Long reach passive optical networks (LR-PONs), which integrate fibre-to-the-home with metro networks, have been the subject of intensive research in recent years and are considered one of the most promising candidates for the next generation of optical access networks. Such systems ideally have reaches greater than 100km and bit rates of at least 10Gb/s per wavelength in the downstream and upstream directions. Due to the limited equipment sharing that is possible in access networks, the laser transmitters in the terminal units, which are usually the most expensive components, must be as cheap as possible. However, the requirement for low cost is generally incompatible with the need for a transmitter chirp characteristic that is optimised for such long reaches at 10Gb/s, and hence dispersion compensation is required. In this thesis electronic dispersion compensation (EDC) techniques are employed to increase the chromatic dispersion tolerance and to enhance the system performance at the expense of moderate additional implementation complexity. In order to use such EDC in LR-PON architectures, a number of challenges associated with the burst-mode nature of the upstream link need to be overcome. In particular, the EDC must be made adaptive from one burst to the next (burst-mode EDC, or BM-EDC) in time scales on the order of tens to hundreds of nanoseconds. Burst-mode operation of EDC has received little attention to date. The main objective of this thesis is to demonstrate the feasibility of such a concept and to identify the key BM-EDC design parameters required for applications in a 10Gb/s burst-mode link. This is achieved through a combination of simulations and transmission experiments utilising off-line data processing. The research shows that burst-to-burst adaptation can in principle be implemented efficiently, opening the possibility of low overhead, adaptive EDC-enabled burst-mode systems.
Resumo:
The mobile cloud computing model promises to address the resource limitations of mobile devices, but effectively implementing this model is difficult. Previous work on mobile cloud computing has required the user to have a continuous, high-quality connection to the cloud infrastructure. This is undesirable and possibly infeasible, as the energy required on the mobile device to maintain a connection, and transfer sizeable amounts of data is large; the bandwidth tends to be quite variable, and low on cellular networks. The cloud deployment itself needs to efficiently allocate scalable resources to the user as well. In this paper, we formulate the best practices for efficiently managing the resources required for the mobile cloud model, namely energy, bandwidth and cloud computing resources. These practices can be realised with our mobile cloud middleware project, featuring the Cloud Personal Assistant (CPA). We compare this with the other approaches in the area, to highlight the importance of minimising the usage of these resources, and therefore ensure successful adoption of the model by end users. Based on results from experiments performed with mobile devices, we develop a no-overhead decision model for task and data offloading to the CPA of a user, which provides efficient management of mobile cloud resources.
Resumo:
A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimization technique known as relative gain optimization which both balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
Multilevel algorithms are a successful class of optimization techniques that address the mesh partitioning problem for mapping meshes onto parallel computers. They usually combine a graph contraction algorithm together with a local optimization method that refines the partition at each graph level. To date, these algorithms have been used almost exclusively to minimize the cut-edge weight in the graph with the aim of minimizing the parallel communication overhead. However, it has been shown that for certain classes of problems, the convergence of the underlying solution algorithm is strongly influenced by the shape or aspect ratio of the subdomains. Therefore, in this paper, the authors modify the multilevel algorithms to optimize a cost function based on the aspect ratio. Several variants of the algorithms are tested and shown to provide excellent results.
Resumo:
Multilevel algorithms are a successful class of optimisation techniques which address the mesh partitioning problem for distributing unstructured meshes onto parallel computers. They usually combine a graph contraction algorithm together with a local optimisation method which refines the partition at each graph level. To date these algorithms have been used almost exclusively to minimise the cut edge weight in the graph with the aim of minimising the parallel communication overhead, but recently there has been a perceived need to take into account the communications network of the parallel machine. For example the increasing use of SMP clusters (systems of multiprocessor compute nodes with very fast intra-node communications but relatively slow inter-node networks) suggest the use of hierarchical network models. Indeed this requirement is exacerbated in the early experiments with meta-computers (multiple supercomputers combined together, in extreme cases over inter-continental networks). In this paper therefore, we modify a multilevel algorithm in order to minimise a cost function based on a model of the communications network. Several network models and variants of the algorithm are tested and we establish that it is possible to successfully guide the optimisation to reflect the chosen architecture.
Resumo:
This paper, chosen as a best paper from the 2005 SAMOS Workshop on Computer Systems: describes the for the first time the major Abhainn project for automated system level design of embedded signal processing systems. In particular, this describes four key novelties: novel algorithm modelling techniques for DSP systems, automated implementation realisation, algorithm transformation for system optimisation and automated inter-processor communication. This is applied to two complex systems: a radar and sonar system. In both cases technology which allows non-experts to automatically create low-overhead, high performance embedded signal processing systems is exhibited.
Resumo:
The results of a study aimed at determining the most important experimental parameters for automated, quantitative analysis of solid dosage form pharmaceuticals (seized and model 'ecstasy' tablets) are reported. Data obtained with a macro-Raman spectrometer were complemented by micro-Raman measurements, which gave information on particle size and provided excellent data for developing statistical models of the sampling errors associated with collecting data as a series of grid points on the tablets' surface. Spectra recorded at single points on the surface of seized MDMA-caffeine-lactose tablets with a Raman microscope (lambda(ex) = 785 nm, 3 mum diameter spot) were typically dominated by one or other of the three components, consistent with Raman mapping data which showed the drug and caffeine microcrystals were ca 40 mum in diameter. Spectra collected with a microscope from eight points on a 200 mum grid were combined and in the resultant spectra the average value of the Raman band intensity ratio used to quantify the MDMA: caffeine ratio, mu(r), was 1.19 with an unacceptably high standard deviation, sigma(r), of 1.20. In contrast, with a conventional macro-Raman system (150 mum spot diameter), combined eight grid point data gave mu(r) = 1.47 with sigma(r) = 0.16. A simple statistical model which could be used to predict sigma(r) under the various conditions used was developed. The model showed that the decrease in sigma(r) on moving to a 150 mum spot was too large to be due entirely to the increased spot diameter but was consistent with the increased sampling volume that arose from a combination of the larger spot size and depth of focus in the macroscopic system. With the macro-Raman system, combining 64 grid points (0.5 mm spacing and 1-2 s accumulation per point) to give a single averaged spectrum for a tablet was found to be a practical balance between minimizing sampling errors and keeping overhead times at an acceptable level. The effectiveness of this sampling strategy was also tested by quantitative analysis of a set of model ecstasy tablets prepared from MDEA-sorbitol (0-30% by mass MDEA). A simple univariate calibration model of averaged 64 point data had R-2 = 0.998 and an r.m.s. standard error of prediction of 1.1% whereas data obtained by sampling just four points on the same tablet showed deviations from the calibration of up to 5%.
Resumo:
In this paper, the performance of the network coded amplify-forward cooperative protocol is studied. The use of network coding can suppress the bandwidth resource consumed by relay transmission, and hence increase the spectral efficiency of cooperative diversity. A distributed strategy of relay selection is applied to the cooperative scheme, which can reduce system overhead and also facilitate the development of the explicit expressions of information metrics, such as outage probability and ergodic capacity. Both analytical and numerical results demonstrate that the proposed protocol can achieve large ergodic capacity and full diversity gain simultaneously.
Resumo:
Lateralized behaviour in the felids has been subject to little investigation. We examined the paw use of 42 domestic cats on three tasks designed to determine whether the animals performed asymmetrical motor behaviour. The influence of the cats' sex and age on their paw preferences was also explored. The distribution of the cats' paw preferences differed significantly between the three tasks. Task 1, the most complex exercise involving retrieval of a food treat from an empty jar, encouraged the most apparent display of lateralized behaviour, with all but one animal showing a strong preference to use either their left or right paw consistently. Tasks 2 (an exercise involving reaching for a toy suspended overhead) and 3 (a challenge involving reaching for a toy moving along the ground) encouraged ambilateral motor performance. Lateralized behaviour was strongly sex related. Male and female cats showed paw preferences at the level of the population, but in opposite directions. Females had a greater preference for using their right paw; males were more inclined to adopt their left paw. Feline age was unrelated to either strength or direction of preferred paw use. Overall, the findings suggest that there are two distinct populations of paw preference in the cat that cluster strongly around the animals' sex. The results also point to a relationship between lateralized behaviour and task complexity. More apparent patterns of lateralized behaviour were evident on more complex manipulatory tasks, hinting at functional brain specialization in this species. © 2009 The Association for the Study of Animal Behaviour.
Resumo:
A new technique based on adaptive code-to-user allocation for interference management on the downlink of BPSK based TDD DS-CDMA systems is presented. The principle of the proposed technique is to exploit the dependency of multiple access interference on the instantaneous symbol values of the active users. The objective is to adaptively allocate the available spreading sequences to users on a symbol-by-symbol basis to optimize the decision variables at the downlink receivers. The presented simulations show an overall system BER performance improvement of more than an order of a magnitude with the proposed technique while the adaptation overhead is kept less than 10% of the available bandwidth.
Resumo:
Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.
Resumo:
Computing has recently reached an inflection point with the introduction of multicore processors. On-chip thread-level parallelism is doubling approximately every other year. Concurrency lends itself naturally to allowing a program to trade performance for power savings by regulating the number of active cores; however, in several domains, users are unwilling to sacrifice performance to save power. We present a prediction model for identifying energy-efficient operating points of concurrency in well-tuned multithreaded scientific applications and a runtime system that uses live program analysis to optimize applications dynamically. We describe a dynamic phase-aware performance prediction model that combines multivariate regression techniques with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. Using our model, we develop a prediction-driven phase-aware runtime optimization scheme that throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each program phase. The use of prediction reduces the overhead of searching the optimization space while achieving near-optimal performance and power savings. A thorough evaluation of our approach shows a reduction in power consumption of 10.8 percent, simultaneous with an improvement in performance of 17.9 percent, resulting in energy savings of 26.7 percent.
Resumo:
Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.
This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.
Resumo:
The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchronization. It is not straightforward, however, to express all patterns of parallelism in these models. Pipelines are an important parallel construct, although difficult to express in Cilk and TBBs in a straightfor- ward way, not without a verbose restructuring of the code. In this paper we demonstrate that pipeline parallelism can be easily and concisely expressed in a Cilk-like language, which we extend with input, output and input/output dependency types on procedure arguments, enforced at runtime by the scheduler. We evaluate our implementation on real applications and show that our Cilk-like scheduler, extended to track and enforce these dependencies has performance comparable to Cilk++.