976 resultados para Memory space


Relevância:

70.00% 70.00%

Publicador:

Resumo:

We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Today's SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. In this article, we address the on-chip memory architecture exploration for DSP processors which are organized as multiple memory banks, where banks can be single/dual ported with non-uniform bank sizes. In this paper we propose two different methods for physical memory architecture exploration and identify the strengths and applicability of these methods in a systematic way. Both methods address the memory architecture exploration for a given target application by considering the application's data access characteristics and generates a set of Pareto-optimal design points that are interesting from a power, performance and VLSI area perspective. To the best of our knowledge, this is the first comprehensive work on memory space exploration at physical memory level that integrates data layout and memory exploration to address the system objectives from both hardware design and application software development perspective. Further we propose an automatic framework that explores the design space identifying 100's of Pareto-optimal design points within a few hours of running on a standard desktop configuration.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although incidence matrix representation has been used to analyze the Petri net based models of a system, it has the limitation that it does not preserve reflexive properties (i.e., the presence of selfloops) of Petri nets. But in many practical applications self-loops play very important roles. This paper proposes a new representation scheme for general Petri nets. This scheme defines a matrix called "reflexive incidence matrix (RIM) c which is a combination of two matrices, a "base matrix Cb,,, and a "power matrix CP." This scheme preserves the reflexive and other properties of the Petri nets. Through a detailed analysis it is shown that the proposed scheme requires less memory space and less processing time for answering commonly encountered net queries compared to other schemes. Algorithms to generate the RIM from the given net description and to decompose RIM into input and output function matrices are also given. The proposed Petri net representation scheme is very useful to model and analyze the systems having shared resources, chemical processes, network protocols, etc., and to evaluate the performance of asynchronous concurrent systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

SIR is a computer system, programmed in the LISP language, which accepts information and answers questions expressed in a restricted form of English. This system demonstrates what can reasonably be called an ability to "understand" semantic information. SIR's semantic and deductive ability is based on the construction of an internal model, which uses word associations and property lists, for the relational information normally conveyed in conversational statements. A format-matching procedure extracts semantic content from English sentences. If an input sentence is declarative, the system adds appropriate information to the model. If an input sentence is a question, the system searches the model until it either finds the answer or determines why it cannot find the answer. In all cases SIR reports its conclusions. The system has some capacity to recognize exceptions to general rules, resolve certain semantic ambiguities, and modify its model structure in order to save computer memory space. Judging from its conversational ability, SIR, is a first step toward intelligent man-machine communication. The author proposes a next step by describing how to construct a more general system which is less complex and yet more powerful than SIR. This proposed system contains a generalized version of the SIR model, a formal logical system called SIR1, and a computer program for testing the truth of SIR1 statements with respect to the generalized model by using partial proof procedures in the predicate calculus. The thesis also describes the formal properties of SIR1 and how they relate to the logical structure of SIR.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A simulation program has been developed to calculate the power-spectral density of thin avalanche photodiodes, which are used in optical networks. The program extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. We describe our experiences in parallelizing the code using both MPI and OpenMP. Several array partitioning schemes and scheduling policies are implemented and tested Our results show that the OpenMP code is scalable up to 64 processors on an SGI Origin 2000 machine and has small average errors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important factor for high-speed optical communication is the availability of ultrafast and low-noise photodetectors. Among the semiconductor photodetectors that are commonly used in today’s long-haul and metro-area fiber-optic systems, avalanche photodiodes (APDs) are often preferred over p-i-n photodiodes due to their internal gain, which significantly improves the receiver sensitivity and alleviates the need for optical pre-amplification. Unfortunately, the random nature of the very process of carrier impact ionization, which generates the gain, is inherently noisy and results in fluctuations not only in the gain but also in the time response. Recently, a theory characterizing the autocorrelation function of APDs has been developed by us which incorporates the dead-space effect, an effect that is very significant in thin, high-performance APDs. The research extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. In this research, we describe our experiences in parallelizing the code in MPI and OpenMP using CAPTools. Several array partitioning schemes and scheduling policies are implemented and tested. Our results show that the code is scalable up to 64 processors on a SGI Origin 2000 machine and has small average errors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, a hardware solution for packet classification based on multi-fields is presented. The proposed scheme focuses on a new architecture based on the decomposition method. A hash circuit is used in order to reduce the memory space required for the Recursive Flow Classification (RFC) algorithm. The implementation results show that the proposed architecture achieves significant performance advantage that is comparable to that of some well-known algorithms. The solution is based on Altera Stratix III FPGA technology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multi-bit trie is a popular approach performing the longest prefix matching for packet classification. However, it requires a long lookup time and inefficiently consumes memory space. This paper presents an in-depth study of different variations of multi-bit trie for IP address lookup. Our main aim is to study a method of data structure which reduces memory space. The proposed approach has been implemented using the label method in two approaches. Both methods present better results regarding lookup speed, update time and memory bit consumptions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La traduction statistique vise l’automatisation de la traduction par le biais de modèles statistiques. Dans ce travail, nous relevons un des grands défis du domaine : la recherche (Brown et al., 1993). Les systèmes de traduction statistique de référence, tel Moses (Koehn et al., 2007), effectuent généralement la recherche en explorant l’espace des préfixes par programmation dynamique, une solution coûteuse sur le plan computationnel pour ce problème potentiellement NP-complet (Knight, 1999). Nous postulons qu’une approche par recherche locale (Langlais et al., 2007) peut mener à des solutions tout aussi intéressantes en un temps et un espace mémoire beaucoup moins importants (Russell et Norvig, 2010). De plus, ce type de recherche facilite l’incorporation de modèles globaux qui nécessitent des traductions complètes et permet d’effectuer des modifications sur ces dernières de manière non-continue, deux tâches ardues lors de l’exploration de l’espace des préfixes. Nos expériences nous révèlent que la recherche locale en traduction statistique est une approche viable, s’inscrivant dans l’état de l’art.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Applying gang scheduling can alleviate the blockade problem caused by exclusively space-sharing scheduling. To simply allow jobs to run simultaneously on the same processors as in conventional gang scheduling, however, may introduce a large number of time slots in the system. In consequence the cost of context switches will be greatly increased, and each running job can only obtain a small portion of resources including memory space and processor utilisation and so no jobs can finish their computations quickly. Therefore, the number of jobs allowed to run in the system should be limited. In this paper we present some experimental results to show that by limiting real large jobs time-sharing the same processors and applying the backfilling technique we can greatly reduce the average number of time slots in the system and significantly improve the performance of both small and large jobs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Applying gang scheduling can alleviate the blockade problem caused by exclusively space-sharing scheduling. To simply allow jobs to run simultaneously on the same processors as in the conventional gang scheduling, however, may introduce a large number of time slots in the system. In consequence the cost of context switches will be greatly increased, and each running job can only obtain a small portion of resources including memory space and processor utilisation and so no jobs can finish their computations quickly. In this paper we present some experimental results to show that to properly divide jobs into different classes and to apply different scheduling strategies to jobs of different classes can greatly reduce the average number of time slots in the system and significantly improve the performance in terms of average slowdown.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes an alternative algorithm to solve the median shortest path problem (MSPP) in the planning and design of urban transportation networks. The proposed vector labeling algorithm is based on the labeling of each node in terms of a multiple and conflicting vector of objectives which deletes cyclic, infeasible and extreme-dominated paths in the criteria space imposing cyclic break (CB), path cost constraint (PCC) and access cost parameter (ACP) respectively. The output of the algorithm is a set of Pareto optimal paths (POP) with an objective vector from predetermined origin to destination nodes. Thus, this paper formulates an algorithm to identify a non-inferior solution set of POP based on a non-dominated set of objective vectors that leaves the ultimate decision to decision-makers. A numerical experiment is conducted using an artificial transportation network in order to validate and compare results. Sensitivity analysis has shown that the proposed algorithm is more efficient and advantageous over existing solutions in terms of computing execution time and memory space used.

Relevância:

60.00% 60.00%

Publicador: