989 resultados para Memory management


Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a type-based approach to statically derive symbolic closed-form formulae that characterize the bounds of heap memory usages of programs written in object-oriented languages. Given a program with size and alias annotations, our inference system will compute the amount of memory required by the methods to execute successfully as well as the amount of memory released when methods return. The obtained analysis results are useful for networked devices with limited computational resources as well as embedded software.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Distributed Shared Memory (DSM) provides programmers with a shared memory environment in systems where memory is not physically shared. Clusters of Workstations (COWs), an often untapped source of computing power, are characterised by a very low cost/performance ratio. The combination of Clusters of Workstations (COWs) with DSM provides an environment in which the programmer can use the well known approaches and methods of programming for physically shared memory systems and parallel processing can be carried out to make full use of the computing power and cost advantages of the COW. The aim of this research is to synthesise and develop a distributed shared memory system as an integral part of an operating system in order to provide application programmers with a convenient environment in which the development and execution of parallel applications can be done easily and efficiently, and which does this in a transparent manner. Furthermore, in order to satisfy our challenging design requirements we want to demonstrate that the operating system into which the DSM system is integrated should be a distributed operating system. In this thesis a study into the synthesis of a DSM system within a microkernel and client-server based distributed operating system which uses both strict and weak consistency models, with a write-invalidate and write-update based approach for consistency maintenance is reported. Furthermore a unique automatic initialisation system which allows the programmer to start the parallel execution of a group of processes with a single library call is reported. The number and location of these processes are determined by the operating system based on system load information. The DSM system proposed has a novel approach in that it provides programmers with a complete programming environment in which they are easily able to develop and run their code or indeed run existing shared memory code. A set of demanding DSM system design requirements are presented and the incentives for the placement of the DSM system with a distributed operating system and in particular in the memory management server have been reported. The new DSM system concentrated on an event-driven set of cooperating and distributed entities, and a detailed description of the events and reactions to these events that make up the operation of the DSM system is then presented. This is followed by a pseudocode form of the detailed design of the main modules and activities of the primitives used in the proposed DSM system. Quantitative results of performance tests and qualitative results showing the ease of programming and use of the RHODOS DSM system are reported. A study of five different application is given and the results of tests carried out on these applications together with a discussion of the results are given. A discussion of how RHODOS’ DSM allows programmers to write shared memory code in an easy to use and familiar environment and a comparative evaluation of RHODOS DSM with other DSM systems is presented. In particular, the ease of use and transparency of the DSM system have been demonstrated through the description of the ease with which a moderately inexperienced undergraduate programmer was able to convert, write and run applications for the testing of the DSM system. Furthermore, the description of the tests performed using physically shared memory shows that the latter is indistinguishable from distributed shared memory; this is further evidence that the DSM system is fully transparent. This study clearly demonstrates that the aim of the research has been achieved; it is possible to develop a programmer friendly and efficient DSM system fully integrated within a distributed operating system. It is clear from this research that client-server and microkernel based distributed operating system integrated DSM makes shared memory operations transparent and almost completely removes the involvement of the programmer beyond classical activities needed to deal with shared memory. The conclusion can be drawn that DSM, when implemented within a client-server and microkernel based distributed operating system, is one of the most encouraging approaches to parallel processing since it guarantees performance improvements with minimal programmer involvement.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis is a study of performance management of Complex Event Processing (CEP) systems. Since CEP systems have distinct characteristics from other well-studied computer systems such as batch and online transaction processing systems and database-centric applications, these characteristics introduce new challenges and opportunities to the performance management for CEP systems. Methodologies used in benchmarking CEP systems in many performance studies focus on scaling the load injection, but not considering the impact of the functional capabilities of CEP systems. This thesis proposes the approach of evaluating the performance of CEP engines’ functional behaviours on events and develops a benchmark platform for CEP systems: CEPBen. The CEPBen benchmark platform is developed to explore the fundamental functional performance of event processing systems: filtering, transformation and event pattern detection. It is also designed to provide a flexible environment for exploring new metrics and influential factors for CEP systems and evaluating the performance of CEP systems. Studies on factors and new metrics are carried out using the CEPBen benchmark platform on Esper. Different measurement points of response time in performance management of CEP systems are discussed and response time of targeted event is proposed to be used as a metric for quality of service evaluation combining with the traditional response time in CEP systems. Maximum query load as a capacity indicator regarding to the complexity of queries and number of live objects in memory as a performance indicator regarding to the memory management are proposed in performance management of CEP systems. Query depth is studied as a performance factor that influences CEP system performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Appearance-based localization is increasingly used for loop closure detection in metric SLAM systems. Since it relies only upon the appearance-based similarity between images from two locations, it can perform loop closure regardless of accumulated metric error. However, the computation time and memory requirements of current appearance-based methods scale linearly not only with the size of the environment but also with the operation time of the platform. These properties impose severe restrictions on longterm autonomy for mobile robots, as loop closure performance will inevitably degrade with increased operation time. We present a set of improvements to the appearance-based SLAM algorithm CAT-SLAM to constrain computation scaling and memory usage with minimal degradation in performance over time. The appearance-based comparison stage is accelerated by exploiting properties of the particle observation update, and nodes in the continuous trajectory map are removed according to minimal information loss criteria. We demonstrate constant time and space loop closure detection in a large urban environment with recall performance exceeding FAB-MAP by a factor of 3 at 100% precision, and investigate the minimum computational and memory requirements for maintaining mapping performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The television quiz program Letters and Numbers, broadcast on the SBS network, has recently become quite popular in Australia. This paper explores the potential of this game to illustrate and engage student interest in a range of fundamental concepts of computer science and mathematics. The Numbers Game in particular has a rich mathematical structure whose analysis and solution involves concepts of counting and problem size, discrete (tree) structures, language theory, recurrences, computational complexity, and even advanced memory management. This paper presents an analysis of these games and their teaching applications, and presents some initial results of use in student assignments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

为了提高TTCN-3测试组件间通信的性能,在基于TTCN-3的基础测试平台上设计并实现了一套支持动态切换内存管理策略的共享内存管理框架,并在该框架下实现了三套不同的共享内存分配和自动回收策略.框架能够根据系统运行时刻的共享内存统计特征在这些策略中选择性能预期最好的一套,并动态地将当前内存管理策略切换为该套方案.在基础测试平台上运行一系列并发测试用例表明,该框架能提升内存管理以及整个系统的平均性能.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low-level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose and evaluate an admission control paradigm for RTDBS, in which a transaction is submitted to the system as a pair of processes: a primary task, and a recovery block. The execution requirements of the primary task are not known a priori, whereas those of the recovery block are known a priori. Upon the submission of a transaction, an Admission Control Mechanism is employed to decide whether to admit or reject that transaction. Once admitted, a transaction is guaranteed to finish executing before its deadline. A transaction is considered to have finished executing if exactly one of two things occur: Either its primary task is completed (successful commitment), or its recovery block is completed (safe termination). Committed transactions bring a profit to the system, whereas a terminated transaction brings no profit. The goal of the admission control and scheduling protocols (e.g., concurrency control, I/O scheduling, memory management) employed in the system is to maximize system profit. We describe a number of admission control strategies and contrast (through simulations) their relative performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Weak references provide the programmer with limited control over the process of memory management. By using them, a programmer can make decisions based on previous actions that are taken by the garbage collector. Although this is often helpful, the outcome of a program using weak references is less predictable due to the nondeterminism they introduce in program evaluation. It is therefore desirable to have a framework of formal tools to reason about weak references and programs that use them. We present several calculi that formalize various aspects of weak references, inspired by their implementation in Java. We provide a calculus to model multiple levels of non-strong references, where a different garbage collection policy is applied to each level. We consider different collection policies such as eager collection and lazy collection. Similar to the way they are implemented in Java, we give the semantics of eager collection to weak references and the semantics of lazy collection to soft references. Moreover, we condition garbage collection on the availability of time and space resources. While time constraints are used in order to restrict garbage collection, space constraints are used in order to trigger it. Finalizers are a problematic feature in Java, especially when they interact with weak references. We provide a calculus to model finalizer evaluation. Since finalizers have little meaning in a language without side-effect, we introduce a limited form of side effect into the calculus. We discuss determinism and the separate notion of uniqueness of (evaluation) outcome. We show that in our calculus, finalizer evaluation does not affect uniqueness of outcome.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

How can GPU acceleration be obtained as a service in a cluster? This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request the acceleration of a set of remote GPUs on demand. The rCUDA framework exploits virtualisation and ensures that multiple nodes can share the same GPU. In this paper we test the feasibility of the rCUDA framework on a real-world application employed in the financial risk industry that can benefit from AaaS in the production setting. The results confirm the feasibility of rCUDA and highlight that rCUDA achieves similar performance compared to CUDA, provides consistent results, and more importantly, allows for a single application to benefit from all the GPUs available in the cluster without loosing efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dans le but d’optimiser la représentation en mémoire des enregistrements Scheme dans le compilateur Gambit, nous avons introduit dans celui-ci un système d’annotations de type et des vecteurs contenant une représentation abrégée des enregistrements. Ces derniers omettent la référence vers le descripteur de type et l’entête habituellement présents sur chaque enregistrement et utilisent plutôt un arbre de typage couvrant toute la mémoire pour retrouver le vecteur contenant une référence. L’implémentation de ces nouvelles fonctionnalités se fait par le biais de changements au runtime de Gambit. Nous introduisons de nouvelles primitives au langage et modifions l’architecture existante pour gérer correctement les nouveaux types de données. On doit modifier le garbage collector pour prendre en compte des enregistrements contenants des valeurs hétérogènes à alignements irréguliers, et l’existence de références contenues dans d’autres objets. La gestion de l’arbre de typage doit aussi être faite automatiquement. Nous conduisons ensuite une série de tests de performance visant à déterminer si des gains sont possibles avec ces nouvelles primitives. On constate une amélioration majeure de performance au niveau de l’allocation et du comportement du gc pour les enregistrements typés de grande taille et des vecteurs d’enregistrements typés ou non. De légers surcoûts sont toutefois encourus lors des accès aux champs et, dans le cas des vecteurs d’enregistrements, au descripteur de type.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic pro­gramming (and more recently, constraint programming) resulting in quite capable paralle­lizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoríal way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc.