598 resultados para processor
Resumo:
Computers employing some degree of data flow organisation are now well established as providing a possible vehicle for concurrent computation. Although data-driven computation frees the architecture from the constraints of the single program counter, processor and global memory, inherent in the classic von Neumann computer, there can still be problems with the unconstrained generation of fresh result tokens if a pure data flow approach is adopted. The advantages of allowing serial processing for those parts of a program which are inherently serial, and of permitting a demand-driven, as well as data-driven, mode of operation are identified and described. The MUSE machine described here is a structured architecture supporting both serial and parallel processing which allows the abstract structure of a program to be mapped onto the machine in a logical way.
Resumo:
Scientific applications rely heavily on floating point data types. Floating point operations are complex and require complicated hardware that is both area and power intensive. The emergence of massively parallel architectures like Rigel creates new challenges and poses new questions with respect to floating point support. The massively parallel aspect of Rigel places great emphasis on area efficient, low power designs. At the same time, Rigel is a general purpose accelerator and must provide high performance for a wide class of applications. This thesis presents an analysis of various floating point unit (FPU) components with respect to Rigel, and attempts to present a candidate design of an FPU that balances performance, area, and power and is suitable for massively parallel architectures like Rigel.
Resumo:
For various reasons, many Algol 68 compilers do not directly implement the parallel processing operations defined in the Revised Algol 68 Report. It is still possible however, to perform parallel processing, multitasking and simulation provided that the implementation permits the creation of a master routine for the coordination and initiation of processes under its control. The package described here is intended for real time applications and runs in conjunction with the Algol 68R system; it extends and develops the original Algol 68RT package, which was designed for use with multiplexers at the Royal Radar Establishment, Malvern. The facilities provided, in addition to the synchronising operations, include an interface to an ICL Communications Processor enabling the abstract processes to be realised as the interaction of several teletypes or visual display units with a real time program providing a useful service.
Resumo:
Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.
An Estimation of Distribution Algorithm with Intelligent Local Search for Rule-based Nurse Rostering
Resumo:
This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse's assignment. The main framework of the algorithm is an estimation of distribution algorithm, in which an ant-miner methodology improves the individual solutions produced in each generation. Unlike our previous work (where learning is implicit), the learning in the memetic estimation of distribution algorithm is explicit, i.e. we are able to identify building blocks directly. The overall approach learns by building a probabilistic model, i.e. an estimation of the probability distribution of individual nurse-rule pairs that are used to construct schedules. The local search processor (i.e. the ant-miner) reinforces nurse-rule pairs that receive higher rewards. A challenging real world nurse rostering problem is used as the test problem. Computational results show that the proposed approach outperforms most existing approaches. It is suggested that the learning methodologies suggested in this paper may be applied to other scheduling problems where schedules are built systematically according to specific rules.
Resumo:
Salt use in meat products is changing. Consumers desire sea salt which may also contain trace metals and the government is demanding a reduction in sodium. Therefore a need exists to understand how varying impurity levels in salt affect meat quality. This study evaluated the effects of various salt preparations on lipid oxidation, sensory characteristics, protein extractability, and bind strength of ground turkey and pork. This study was a completely randomized design with 5 treatment groups and 6 replications in 2 species. Ground, turkey and pork meat was formulated into one hundred and fifty gram patties with sodium chloride (1%) containing varying amounts of metal impurities (copper, iron, and manganese). Samples were randomly assigned to frozen storage periods of 0, 3, 6, and 9 weeks. After storage, samples were packaged in PVC overwrap and stored under retail display for 5 days. Samples were evaluated for proximate analysis to ensure the fat content was similar for all of the starting material.Thiobarbituric acid reactive substances (TBARS) were determined on raw and cooked samples to evaluate lipid oxidation. A trained six member sensory panel evaluated the samples at each storage period for saltiness, off flavor, and oxidized odor. Break strength was conducted using a Texture Analyzer and compared with salt soluble proteins (increasing salt concentrations) to evaluate protein extractability characteristics. Statistical analyses were conducted using the MIXED procedure of SAS within repeated measures over time where appropriate. No significant differences were observed among the salt treatments for raw and cooked TBARS when the control group was removed (P>0.05). Sensory panelists detected increased levels of off flavor and oxidized odor over the entire storage duration. Less force was required to break the patties from the control group when compared with the salt treatments (P<0.05). As salt concentration increased salt-soluble protein extraction increased, but there was no effect of salt type. Overall, no meaningful statistical differences among the various salt treatments were observed for all of the parameters evaluated for turkey and pork. Salt at a 1% inclusion rate containing varying levels of copper, iron, and manganese impurities in ground turkey thigh meat and ground pork served as a prooxidant. However, if a meat processor uses a 1% inclusion rate of salt in turkey and pork regardless of impurities included, it is unlikely that differences in shelf life or protein functionality would be observed.
Resumo:
The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.
An Estimation of Distribution Algorithm with Intelligent Local Search for Rule-based Nurse Rostering
Resumo:
This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse's assignment. The main framework of the algorithm is an estimation of distribution algorithm, in which an ant-miner methodology improves the individual solutions produced in each generation. Unlike our previous work (where learning is implicit), the learning in the memetic estimation of distribution algorithm is explicit, i.e. we are able to identify building blocks directly. The overall approach learns by building a probabilistic model, i.e. an estimation of the probability distribution of individual nurse-rule pairs that are used to construct schedules. The local search processor (i.e. the ant-miner) reinforces nurse-rule pairs that receive higher rewards. A challenging real world nurse rostering problem is used as the test problem. Computational results show that the proposed approach outperforms most existing approaches. It is suggested that the learning methodologies suggested in this paper may be applied to other scheduling problems where schedules are built systematically according to specific rules.
Resumo:
This document presents GEmSysC, an unified cryptographic API for embedded systems. Software layers implementing this API can be built over existing libraries, allowing embedded software to access cryptographic functions in a consistent way that does not depend on the underlying library. The API complies to good practices for API design and good practices for embedded software development and took its inspiration from other cryptographic libraries and standards. The main inspiration for creating GEmSysC was the CMSIS-RTOS standard, which defines an unified API for embedded software in an implementation-independent way, but targets operating systems instead of cryptographic functions. GEmSysC is made of a generic core and attachable modules, one for each cryptographic algorithm. This document contains the specification of the core of GEmSysC and three of its modules: AES, RSA and SHA-256. GEmSysC was built targeting embedded systems, but this does not restrict its use only in such systems – after all, embedded systems are just very limited computing devices. As a proof of concept, two implementations of GEmSysC were made. One of them was built over wolfSSL, which is an open source library for embedded systems. The other was built over OpenSSL, which is open source and a de facto standard. Unlike wolfSSL, OpenSSL does not specifically target embedded systems. The implementation built over wolfSSL was evaluated in a Cortex- M3 processor with no operating system while the implementation built over OpenSSL was evaluated on a personal computer with Windows 10 operating system. This document displays test results showing GEmSysC to be simpler than other libraries in some aspects. These results have shown that both implementations incur in little overhead in computation time compared to the cryptographic libraries themselves. The overhead of the implementation has been measured for each cryptographic algorithm and is between around 0% and 0.17% for the implementation over wolfSSL and between 0.03% and 1.40% for the one over OpenSSL. This document also presents the memory costs for each implementation.
Resumo:
This work presents the development and application of a three-dimensional oil spill model for predicting the movement of an oil slick in the coastal waters of Singapore. In the model, the oil slick is divided into a number of small elements for simulating of the oil processes of spreading, advection, turbulent diffusion. This model is capable of predicting the horizontal movement of surface oil slick. Satellite images and field observations of oil slicks on the surface in the Singapore Straits are used to validate the newly developed model. Compared with the observations, the numerical results of the oil spill model show good conformity. In this study, the 3d model was generated using the geometrical data of Singapore Straits waters by GAMBIT which is a pre-processor of FLUENT programmed.
Resumo:
The big data era has dramatically transformed our lives; however, security incidents such as data breaches can put sensitive data (e.g. photos, identities, genomes) at risk. To protect users' data privacy, there is a growing interest in building secure cloud computing systems, which keep sensitive data inputs hidden, even from computation providers. Conceptually, secure cloud computing systems leverage cryptographic techniques (e.g., secure multiparty computation) and trusted hardware (e.g. secure processors) to instantiate a “secure” abstract machine consisting of a CPU and encrypted memory, so that an adversary cannot learn information through either the computation within the CPU or the data in the memory. Unfortunately, evidence has shown that side channels (e.g. memory accesses, timing, and termination) in such a “secure” abstract machine may potentially leak highly sensitive information, including cryptographic keys that form the root of trust for the secure systems. This thesis broadly expands the investigation of a research direction called trace oblivious computation, where programming language techniques are employed to prevent side channel information leakage. We demonstrate the feasibility of trace oblivious computation, by formalizing and building several systems, including GhostRider, which is a hardware-software co-design to provide a hardware-based trace oblivious computing solution, SCVM, which is an automatic RAM-model secure computation system, and ObliVM, which is a programming framework to facilitate programmers to develop applications. All of these systems enjoy formal security guarantees while demonstrating a better performance than prior systems, by one to several orders of magnitude.
Resumo:
Reconfigurable HW can be used to build a hardware multitasking system where tasks can be assigned to the reconfigurable HW at run-time according to the requirements of the running applications. Normally the execution in this kind of systems is controlled by an embedded processor. In these systems tasks are frequently represented as subtask graphs, where a subtask is the basic scheduling unit that can be assigned to a reconfigurable HW. In order to control the execution of these tasks, the processor must manage at run-time complex data structures, like graphs or linked list, which may generate significant execution-time penalties. In addition, HW/SW communications are frequently a system bottleneck. Hence, it is very interesting to find a way to reduce the run-time SW computations and the HW/SW communications. To this end we have developed a HW execution manager that controls the execution of subtask graphs over a set of reconfigurable units. This manager receives as input a subtask graph coupled to a subtask schedule, and guarantees its proper execution. In addition it includes support to reduce the execution-time overhead due to reconfigurations. With this HW support the execution of task graphs can be managed efficiently generating only very small run-time penalties.
Resumo:
The proliferation of new mobile communication devices, such as smartphones and tablets, has led to an exponential growth in network traffic. The demand for supporting the fast-growing consumer data rates urges the wireless service providers and researchers to seek a new efficient radio access technology, which is the so-called 5G technology, beyond what current 4G LTE can provide. On the other hand, ubiquitous RFID tags, sensors, actuators, mobile phones and etc. cut across many areas of modern-day living, which offers the ability to measure, infer and understand the environmental indicators. The proliferation of these devices creates the term of the Internet of Things (IoT). For the researchers and engineers in the field of wireless communication, the exploration of new effective techniques to support 5G communication and the IoT becomes an urgent task, which not only leads to fruitful research but also enhance the quality of our everyday life. Massive MIMO, which has shown the great potential in improving the achievable rate with a very large number of antennas, has become a popular candidate. However, the requirement of deploying a large number of antennas at the base station may not be feasible in indoor scenarios. Does there exist a good alternative that can achieve similar system performance to massive MIMO for indoor environment? In this dissertation, we address this question by proposing the time-reversal technique as a counterpart of massive MIMO in indoor scenario with the massive multipath effect. It is well known that radio signals will experience many multipaths due to the reflection from various scatters, especially in indoor environments. The traditional TR waveform is able to create a focusing effect at the intended receiver with very low transmitter complexity in a severe multipath channel. TR's focusing effect is in essence a spatial-temporal resonance effect that brings all the multipaths to arrive at a particular location at a specific moment. We show that by using time-reversal signal processing, with a sufficiently large bandwidth, one can harvest the massive multipaths naturally existing in a rich-scattering environment to form a large number of virtual antennas and achieve the desired massive multipath effect with a single antenna. Further, we explore the optimal bandwidth for TR system to achieve maximal spectral efficiency. Through evaluating the spectral efficiency, the optimal bandwidth for TR system is found determined by the system parameters, e.g., the number of users and backoff factor, instead of the waveform types. Moreover, we investigate the tradeoff between complexity and performance through establishing a generalized relationship between the system performance and waveform quantization in a practical communication system. It is shown that a 4-bit quantized waveforms can be used to achieve the similar bit-error-rate compared to the TR system with perfect precision waveforms. Besides 5G technology, Internet of Things (IoT) is another terminology that recently attracts more and more attention from both academia and industry. In the second part of this dissertation, the heterogeneity issue within the IoT is explored. One of the significant heterogeneity considering the massive amount of devices in the IoT is the device heterogeneity, i.e., the heterogeneous bandwidths and associated radio-frequency (RF) components. The traditional middleware techniques result in the fragmentation of the whole network, hampering the objects interoperability and slowing down the development of a unified reference model for the IoT. We propose a novel TR-based heterogeneous system, which can address the bandwidth heterogeneity and maintain the benefit of TR at the same time. The increase of complexity in the proposed system lies in the digital processing at the access point (AP), instead of at the devices' ends, which can be easily handled with more powerful digital signal processor (DSP). Meanwhile, the complexity of the terminal devices stays low and therefore satisfies the low-complexity and scalability requirement of the IoT. Since there is no middleware in the proposed scheme and the additional physical layer complexity concentrates on the AP side, the proposed heterogeneous TR system better satisfies the low-complexity and energy-efficiency requirement for the terminal devices (TDs) compared with the middleware approach.
Resumo:
Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time according to the requirements of the running applications. These tasks are frequently represented as direct acyclic graphs and their execution is typically controlled by an embedded processor that schedules the graph execution. In order to improve the efficiency of the system, the scheduler can apply prefetch and reuse techniques that can greatly reduce the reconfiguration latencies. For an embedded processor all these computations represent a heavy computational load that can significantly reduce the system performance. To overcome this problem we have implemented a HW scheduler using reconfigurable resources. In addition we have implemented both prefetch and replacement techniques that obtain as good results as previous complex SW approaches, while demanding just a few clock cycles to carry out the computations. We consider that the HW cost of the system (in our experiments 3% of a Virtex-II PRO xc2vp30 FPGA) is affordable taking into account the great efficiency of the techniques applied to hide the reconfiguration latency and the negligible run-time penalty introduced by the scheduler computations.
Resumo:
El sector eléctrico está experimentando cambios importantes tanto a nivel de gestión como a nivel de mercado. Una de las claves que están acelerando este cambio es la penetración cada vez mayor de los Sistemas de Generación Distribuida (DER), que están dando un mayor protagonismo al usuario a la hora de plantear la gestión del sistema eléctrico. La complejidad del escenario que se prevé en un futuro próximo, exige que los equipos de la red tenga la capacidad de interactuar en un sistema mucho más dinámico que en el presente, donde la interfaz de conexión deberá estar dotada de la inteligencia necesaria y capacidad de comunicación para que todo el sistema pueda ser gestionado en su conjunto de manera eficaz. En la actualidad estamos siendo testigos de la transición desde el modelo de sistema eléctrico tradicional hacia un nuevo sistema, activo e inteligente, que se conoce como Smart Grid. En esta tesis se presenta el estudio de un Dispositivo Electrónico Inteligente (IED) orientado a aportar soluciones para las necesidades que la evolución del sistema eléctrico requiere, que sea capaz de integrase en el equipamiento actual y futuro de la red, aportando funcionalidades y por tanto valor añadido a estos sistemas. Para situar las necesidades de estos IED se ha llevado a cabo un amplio estudio de antecedentes, comenzando por analizar la evolución histórica de estos sistemas, las características de la interconexión eléctrica que han de controlar, las diversas funciones y soluciones que deben aportar, llegando finalmente a una revisión del estado del arte actual. Dentro de estos antecedentes, también se lleva a cabo una revisión normativa, a nivel internacional y nacional, necesaria para situarse desde el punto de vista de los distintos requerimientos que deben cumplir estos dispositivos. A continuación se exponen las especificaciones y consideraciones necesarias para su diseño, así como su arquitectura multifuncional. En este punto del trabajo, se proponen algunos enfoques originales en el diseño, relacionados con la arquitectura del IED y cómo deben sincronizarse los datos, dependiendo de la naturaleza de los eventos y las distintas funcionalidades. El desarrollo del sistema continua con el diseño de los diferentes subsistemas que lo componen, donde se presentan algunos algoritmos novedosos, como el enfoque del sistema anti-islanding con detección múltiple ponderada. Diseñada la arquitectura y funciones del IED, se expone el desarrollo de un prototipo basado en una plataforma hardware. Para ello se analizan los requisitos necesarios que debe tener, y se justifica la elección de una plataforma embebida de altas prestaciones que incluye un procesador y una FPGA. El prototipo desarrollado se somete a un protocolo de pruebas de Clase A, según las normas IEC 61000-4-30 e IEC 62586-2, para comprobar la monitorización de parámetros. También se presentan diversas pruebas en las que se han estimado los retardos implicados en los algoritmos relacionados con las protecciones. Finalmente se comenta un escenario de prueba real, dentro del contexto de un proyecto del Plan Nacional de Investigación, donde este prototipo ha sido integrado en un inversor dotándole de la inteligencia necesaria para un futuro contexto Smart Grid.