537 resultados para overhead
Resumo:
The explosion in mobile data traffic is a driver for future network operator technologies, given its large potential to affect both network performance and generated revenue. The concept of distributed mobility management (DMM) has emerged in order to overcome efficiency-wise limitations in centralized mobility approaches, proposing not only the distribution of anchoring functions but also dynamic mobility activation sensitive to the applications needs. Nevertheless, there is not an acceptable solution for IP multicast in DMM environments, as the first proposals based on MLD Proxy are prone to tunnel replication problem or service disruption. We propose the application of PIM-SM in mobility entities as an alternative solution for multicast support in DMM, and introduce an architecture enabling mobile multicast listeners support over distributed anchoring frameworks in a network-efficient way. The architecture aims at providing operators with flexible options to provide multicast mobility, supporting three modes: the first one introduces basic IP multicast support in DMM; the second improves subscription time through extensions to the mobility protocol, obliterating the dependence on MLD protocol; and the third enables fast listener mobility by avoiding potentially slow multicast tree convergence latency in larger infrastructures, by benefiting from mobility tunnels. The different modes were evaluated by mathematical analysis regarding disruption time and packet loss during handoff against several parameters, total and tunneling packet delivery cost, and regarding packet and signaling overhead.
Resumo:
Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.
Resumo:
Many existing encrypted Internet protocols leak information through packet sizes and timing. Though seemingly innocuous, prior work has shown that such leakage can be used to recover part or all of the plaintext being encrypted. The prevalence of encrypted protocols as the underpinning of such critical services as e-commerce, remote login, and anonymity networks and the increasing feasibility of attacks on these services represent a considerable risk to communications security. Existing mechanisms for preventing traffic analysis focus on re-routing and padding. These prevention techniques have considerable resource and overhead requirements. Furthermore, padding is easily detectable and, in some cases, can introduce its own vulnerabilities. To address these shortcomings, we propose embedding real traffic in synthetically generated encrypted cover traffic. Novel to our approach is our use of realistic network protocol behavior models to generate cover traffic. The observable traffic we generate also has the benefit of being indistinguishable from other real encrypted traffic further thwarting an adversary's ability to target attacks. In this dissertation, we introduce the design of a proxy system called TrafficMimic that implements realistic cover traffic tunneling and can be used alone or integrated with the Tor anonymity system. We describe the cover traffic generation process including the subtleties of implementing a secure traffic generator. We show that TrafficMimic cover traffic can fool a complex protocol classification attack with 91% of the accuracy of real traffic. TrafficMimic cover traffic is also not detected by a binary classification attack specifically designed to detect TrafficMimic. We evaluate the performance of tunneling with independent cover traffic models and find that they are comparable, and, in some cases, more efficient than generic constant-rate defenses. We then use simulation and analytic modeling to understand the performance of cover traffic tunneling more deeply. We find that we can take measurements from real or simulated traffic with no tunneling and use them to estimate parameters for an accurate analytic model of the performance impact of cover traffic tunneling. Once validated, we use this model to better understand how delay, bandwidth, tunnel slowdown, and stability affect cover traffic tunneling. Finally, we take the insights from our simulation study and develop several biasing techniques that we can use to match the cover traffic to the real traffic while simultaneously bounding external information leakage. We study these bias methods using simulation and evaluate their security using a Bayesian inference attack. We find that we can safely improve performance with biasing while preventing both traffic analysis and defense detection attacks. We then apply these biasing methods to the real TrafficMimic implementation and evaluate it on the Internet. We find that biasing can provide 3-5x improvement in bandwidth for bulk transfers and 2.5-9.5x speedup for Web browsing over tunneling without biasing.
Resumo:
The last couple of decades have been the stage for the introduction of new telecommunication networks. It is expected that in the future all types of vehicles, such as cars, buses and trucks have the ability to intercommunicate and form a vehicular network. Vehicular networks display particularities when compared to other networks due to their continuous node mobility and their wide geographical dispersion, leading to a permanent network fragmentation. Therefore, the main challenges that this type of network entails relate to the intermittent connectivity and the long and variable delay in information delivery. To address the problems related to the intermittent connectivity, a new concept was introduced – Delay Tolerant Network (DTN). This architecture is built on a Store-Carry-and-Forward (SCF) mechanism in order to assure the delivery of information when there is no end-to-end path defined. Vehicular networks support a multiplicity of services, including the transportation of non-urgent information. Therefore, it is possible to conclude that the use of a DTN for the dissemination of non-urgent information is able to surpass the aforementioned challenges. The work developed focused on the use of DTNs for the dissemination of non-urgent information. This information is originated in the network service provider and should be available on mobile network terminals during a limited period of time. In order to do so, four different strategies were deployed: Random, Least Number of Hops First (LNHF), Local Rarest Bundle First (LRBF) e Local Rarest Generation First (LRGF). All of these strategies have a common goal: to disseminate content into the network in the shortest period of time and minimizing network congestion. This work also contemplates the analysis and implementation of techniques that reduce network congestion. The design, implementation and validation of the proposed strategies was divided into three stages. The first stage focused on creating a Matlab emulator for the fast implementation and strategy validation. This stage resulted in the four strategies that were afterwards implemented in the DTNs software Helix – developed in a partnership between Instituto de Telecomunicac¸˜oes (IT) and Veniam R , which are responsible for the largest operating vehicular network worldwide that is located in Oporto city. The strategies were later evaluated on an emulator that was built for the largescale testing of DTN. Both emulators account for vehicular mobility based on information previously collected from the real platform. Finally, the strategy that presented the best overall performance was tested on a real platform – in a lab environment – for concept and operability demonstration. It is possible to conclude that two of the implemented strategies (LRBF and LRGF) can be deployed in the real network and guarantee a significant delivery rate. The LRBF strategy has the best performance in terms of delivery. However, it needs to add a significant overhead to the network in order to work. In the future, tests of scalability should be conducted in a real environment in order to confirm the emulator results. The real implementation of the strategies should be accompanied by the introduction of new types of services for content distribution.
Resumo:
Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.
Resumo:
Les langages de programmation typés dynamiquement tels que JavaScript et Python repoussent la vérification de typage jusqu’au moment de l’exécution. Afin d’optimiser la performance de ces langages, les implémentations de machines virtuelles pour langages dynamiques doivent tenter d’éliminer les tests de typage dynamiques redondants. Cela se fait habituellement en utilisant une analyse d’inférence de types. Cependant, les analyses de ce genre sont souvent coûteuses et impliquent des compromis entre le temps de compilation et la précision des résultats obtenus. Ceci a conduit à la conception d’architectures de VM de plus en plus complexes. Nous proposons le versionnement paresseux de blocs de base, une technique de compilation à la volée simple qui élimine efficacement les tests de typage dynamiques redondants sur les chemins d’exécution critiques. Cette nouvelle approche génère paresseusement des versions spécialisées des blocs de base tout en propageant de l’information de typage contextualisée. Notre technique ne nécessite pas l’utilisation d’analyses de programme coûteuses, n’est pas contrainte par les limitations de précision des analyses d’inférence de types traditionnelles et évite la complexité des techniques d’optimisation spéculatives. Trois extensions sont apportées au versionnement de blocs de base afin de lui donner des capacités d’optimisation interprocédurale. Une première extension lui donne la possibilité de joindre des informations de typage aux propriétés des objets et aux variables globales. Puis, la spécialisation de points d’entrée lui permet de passer de l’information de typage des fonctions appellantes aux fonctions appellées. Finalement, la spécialisation des continuations d’appels permet de transmettre le type des valeurs de retour des fonctions appellées aux appellants sans coût dynamique. Nous démontrons empiriquement que ces extensions permettent au versionnement de blocs de base d’éliminer plus de tests de typage dynamiques que toute analyse d’inférence de typage statique.
Resumo:
This document presents GEmSysC, an unified cryptographic API for embedded systems. Software layers implementing this API can be built over existing libraries, allowing embedded software to access cryptographic functions in a consistent way that does not depend on the underlying library. The API complies to good practices for API design and good practices for embedded software development and took its inspiration from other cryptographic libraries and standards. The main inspiration for creating GEmSysC was the CMSIS-RTOS standard, which defines an unified API for embedded software in an implementation-independent way, but targets operating systems instead of cryptographic functions. GEmSysC is made of a generic core and attachable modules, one for each cryptographic algorithm. This document contains the specification of the core of GEmSysC and three of its modules: AES, RSA and SHA-256. GEmSysC was built targeting embedded systems, but this does not restrict its use only in such systems – after all, embedded systems are just very limited computing devices. As a proof of concept, two implementations of GEmSysC were made. One of them was built over wolfSSL, which is an open source library for embedded systems. The other was built over OpenSSL, which is open source and a de facto standard. Unlike wolfSSL, OpenSSL does not specifically target embedded systems. The implementation built over wolfSSL was evaluated in a Cortex- M3 processor with no operating system while the implementation built over OpenSSL was evaluated on a personal computer with Windows 10 operating system. This document displays test results showing GEmSysC to be simpler than other libraries in some aspects. These results have shown that both implementations incur in little overhead in computation time compared to the cryptographic libraries themselves. The overhead of the implementation has been measured for each cryptographic algorithm and is between around 0% and 0.17% for the implementation over wolfSSL and between 0.03% and 1.40% for the one over OpenSSL. This document also presents the memory costs for each implementation.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2015.
Resumo:
During the lifetime of a research project, different partners develop several research prototype tools that share many common aspects. This is equally true for researchers as individuals and as groups: during a period of time they often develop several related tools to pursue a specific research line. Making research prototype tools easily accessible to the community is of utmost importance to promote the corresponding research, get feedback, and increase the tools’ lifetime beyond the duration of a specific project. One way to achieve this is to build graphical user interfaces (GUIs) that facilitate trying tools; in particular, with web-interfaces one avoids the overhead of downloading and installing the tools. Building GUIs from scratch is a tedious task, in particular for web-interfaces, and thus it typically gets low priority when developing a research prototype. Often we opt for copying the GUI of one tool and modifying it to fit the needs of a new related tool. Apart from code duplication, these tools will “live” separately, even though we might benefit from having them all in a common environment since they are related. This work aims at simplifying the process of building GUIs for research prototypes tools. In particular, we present EasyInterface, a toolkit that is based on novel methodology that provides an easy way to make research prototype tools available via common different environments such as a web-interface, within Eclipse, etc. It includes a novel text-based output language that allows to present results graphically without requiring any knowledge in GUI/Web programming. For example, an output of a tool could be (a structured version of) “highlight line number 10 of file ex.c” and “when the user clicks on line 10, open a dialog box with the text ...”. The environment will interpret this output and converts it to corresponding visual e_ects. The advantage of using this approach is that it will be interpreted equally by all environments of EasyInterface, e.g., the web-interface, the Eclipse plugin, etc. EasyInterface has been developed in the context of the Envisage [5] project, and has been evaluated on tools developed in this project, which include static analyzers, test-case generators, compilers, simulators, etc. EasyInterface is open source and available at GitHub2.
Resumo:
Dynamically reconfigurable hardware is a promising technology that combines in the same device both the high performance and the flexibility that many recent applications demand. However, one of its main drawbacks is the reconfiguration overhead, which involves important delays in the task execution, usually in the order of hundreds of milliseconds, as well as high energy consumption. One of the most powerful ways to tackle this problem is configuration reuse, since reusing a task does not involve any reconfiguration overhead. In this paper we propose a configuration replacement policy for reconfigurable systems that maximizes task reuse in highly dynamic environments. We have integrated this policy in an external taskgraph execution manager that applies task prefetch by loading and executing the tasks as soon as possible (ASAP). However, we have also modified this ASAP technique in order to make the replacements more flexible, by taking into account the mobility of the tasks and delaying some of the reconfigurations. In addition, this replacement policy is a hybrid design-time/run-time approach, which performs the bulk of the computations at design time in order to save run-time computations. Our results illustrate that the proposed strategy outperforms other state-ofthe-art replacement policies in terms of reuse rates and achieves near-optimal reconfiguration overhead reductions. In addition, by performing the bulk of the computations at design time, we reduce the execution time of the replacement technique by 10 times with respect to an equivalent purely run-time one.
Resumo:
Double Degree
Resumo:
Reconfigurable platforms are a promising technology that offers an interesting trade-off between flexibility and performance, which many recent embedded system applications demand, especially in fields such as multimedia processing. These applications typically involve multiple ad-hoc tasks for hardware acceleration, which are usually represented using formalisms such as Data Flow Diagrams (DFDs), Data Flow Graphs (DFGs), Control and Data Flow Graphs (CDFGs) or Petri Nets. However, none of these models is able to capture at the same time the pipeline behavior between tasks (that therefore can coexist in order to minimize the application execution time), their communication patterns, and their data dependencies. This paper proves that the knowledge of all this information can be effectively exploited to reduce the resource requirements and the timing performance of modern reconfigurable systems, where a set of hardware accelerators is used to support the computation. For this purpose, this paper proposes a novel task representation model, named Temporal Constrained Data Flow Diagram (TCDFD), which includes all this information. This paper also presents a mapping-scheduling algorithm that is able to take advantage of the new TCDFD model. It aims at minimizing the dynamic reconfiguration overhead while meeting the communication requirements among the tasks. Experimental results show that the presented approach achieves up to 75% of resources saving and up to 89% of reconfiguration overhead reduction with respect to other state-of-the-art techniques for reconfigurable platforms.
Resumo:
Reconfigurable HW can be used to build a hardware multitasking system where tasks can be assigned to the reconfigurable HW at run-time according to the requirements of the running applications. Normally the execution in this kind of systems is controlled by an embedded processor. In these systems tasks are frequently represented as subtask graphs, where a subtask is the basic scheduling unit that can be assigned to a reconfigurable HW. In order to control the execution of these tasks, the processor must manage at run-time complex data structures, like graphs or linked list, which may generate significant execution-time penalties. In addition, HW/SW communications are frequently a system bottleneck. Hence, it is very interesting to find a way to reduce the run-time SW computations and the HW/SW communications. To this end we have developed a HW execution manager that controls the execution of subtask graphs over a set of reconfigurable units. This manager receives as input a subtask graph coupled to a subtask schedule, and guarantees its proper execution. In addition it includes support to reduce the execution-time overhead due to reconfigurations. With this HW support the execution of task graphs can be managed efficiently generating only very small run-time penalties.
Resumo:
Power distribution systems are susceptible to extreme damage from natural hazards especially hurricanes. Hurricane winds can knock down distribution poles thereby causing damage to the system and power outages which can result in millions of dollars in lost revenue and restoration costs. Timber has been the dominant material used to support overhead lines in distribution systems. Recently however, utility companies have been searching for a cost-effective alternative to timber poles due to environmental concerns, durability, high cost of maintenance and need for improved aesthetics. Steel has emerged as a viable alternative to timber due to its advantages such as relatively lower maintenance cost, light weight, consistent performance, and invulnerability to wood-pecker attacks. Both timber and steel poles are prone to deterioration over time due to decay in the timber and corrosion of the steel. This research proposes a framework for conducting fragility analysis of timber and steel poles subjected to hurricane winds considering deterioration of the poles over time. Monte Carlo simulation was used to develop the fragility curves considering uncertainties in strength, geometry and wind loads. A framework for life-cycle cost analysis is also proposed to compare the steel and timber poles. The results show that steel poles can have superior reliability and lower life-cycle cost compared to timber poles, which makes them suitable substitutes.
Resumo:
Future power grids are envisioned to be serviced by heterogeneous arrangements of renewable energy sources. Due to their stochastic nature, energy storage distribution and management are pivotal in realizing microgrids serviced heavily by renewable energy assets. Identifying the required response characteristics to meet the operational requirements of a power grid are of great importance and must be illuminated in order to discern optimal hardware topologies. Hamiltonian Surface Shaping and Power Flow Control (HSSPFC) presents the tools to identify such characteristics. By using energy storage as actuation within the closed loop controller, the response requirements may be identified while providing a decoupled controller solution. A DC microgrid servicing a fixed RC load through source and bus level storage managed by HSSPFC was realized in hardware. A procedure was developed to calibrate the DC microgrid architecture of this work to the reduced order model used by the HSSPFC law. Storage requirements were examined through simulation and experimental testing. Bandwidth contributions between feed forward and PI components of the HSSPFC law are illuminated and suggest the need for well-known system losses to prevent the need for additional overhead in storage allocations. The following work outlines the steps taken in realizing a DC microgrid and presents design considerations for system calibration and storage requirements per the closed loop controls for future DC microgrids.