978 resultados para Distributed shared memory
Resumo:
Combinatorial Optimization Problems occur in a wide variety of contexts and generally are NP-hard problems. At a corporate level solving this problems is of great importance since they contribute to the optimization of operational costs. In this thesis we propose to solve the Public Transport Bus Assignment problem considering an heterogeneous fleet and line exchanges, a variant of the Multi-Depot Vehicle Scheduling Problem in which additional constraints are enforced to model a real life scenario. The number of constraints involved and the large number of variables makes impracticable solving to optimality using complete search techniques. Therefore, we explore metaheuristics, that sacrifice optimality to produce solutions in feasible time. More concretely, we focus on the development of algorithms based on a sophisticated metaheuristic, Ant-Colony Optimization (ACO), which is based on a stochastic learning mechanism. For complex problems with a considerable number of constraints, sophisticated metaheuristics may fail to produce quality solutions in a reasonable amount of time. Thus, we developed parallel shared-memory (SM) synchronous ACO algorithms, however, synchronism originates the straggler problem. Therefore, we proposed three SM asynchronous algorithms that break the original algorithm semantics and differ on the degree of concurrency allowed while manipulating the learned information. Our results show that our sequential ACO algorithms produced better solutions than a Restarts metaheuristic, the ACO algorithms were able to learn and better solutions were achieved by increasing the amount of cooperation (number of search agents). Regarding parallel algorithms, our asynchronous ACO algorithms outperformed synchronous ones in terms of speedup and solution quality, achieving speedups of 17.6x. The cooperation scheme imposed by asynchronism also achieved a better learning rate than the original one.
Resumo:
Este trabajo analiza el rendimiento de cuatro nodos de cómputo multiprocesador de memoria compartida para resolver el problema N-body. Se paraleliza el algoritmo serie, y se codifica usando el lenguaje C extendido con OpenMP. El resultado son dos variantes que obedecen a dos criterios de optimización diferentes: minimizar los requisitos de memoria y minimizar el volumen de cómputo. Posteriormente, se realiza un proceso de análisis de las prestaciones del programa sobre los nodos de cómputo. Se modela el rendimiento de las variantes secuenciales y paralelas de la aplicación, y de los nodos de cómputo; se instrumentan y ejecutan los programas para obtener resultados en forma de varias métricas; finalmente se muestran e interpretan los resultados, proporcionando claves que explican ineficiencias y cuellos de botella en el rendimiento y posibles líneas de mejora. La experiencia de este estudio concreto ha permitido esbozar una incipiente metodología de análisis de rendimiento, identificación de problemas y sintonización de algoritmos a nodos de cómputo multiprocesador de memoria compartida.
Resumo:
Performance analysis is the task of monitor the behavior of a program execution. The main goal is to find out the possible adjustments that might be done in order improve the performance. To be able to get that improvement it is necessary to find the different causes of overhead. Nowadays we are already in the multicore era, but there is a gap between the level of development of the two main divisions of multicore technology (hardware and software). When we talk about multicore we are also speaking of shared memory systems, on this master thesis we talk about the issues involved on the performance analysis and tuning of applications running specifically in a shared Memory system. We move one step ahead to take the performance analysis to another level by analyzing the applications structure and patterns. We also present some tools specifically addressed to the performance analysis of OpenMP multithread application. At the end we present the results of some experiments performed with a set of OpenMP scientific application.
Resumo:
In fear conditioning, an animal learns to associate an unconditioned stimulus (US), such as a shock, and a conditioned stimulus (CS), such as a tone, so that the presentation of the CS alone can trigger conditioned responses. Recent research on the lateral amygdala has shown that following cued fear conditioning, only a subset of higher-excitable neurons are recruited in the memory trace. Their selective deletion after fear conditioning results in a selective erasure of the fearful memory. I hypothesize that the recruitment of highly excitable neurons depends on responsiveness to stimuli, intrinsic excitability and local connectivity. In addition, I hypothesize that neurons recruited for an initial memory also participate in subsequent memories, and that changes in neuronal excitability affect secondary fear learning. To address these hypotheses, I will show that A) a rat can learn to associate two successive short-term fearful memories; B) neuronal populations in the LA are competitively recruited in the memory traces depending on individual neuronal advantages, as well as advantages granted by the local network. By performing two successive cued fear conditioning experiments, I found that rats were able to learn and extinguish the two successive short-term memories, when tested 1 hour after learning for each memory. These rats were equipped with a system of stable extracellular recordings that I developed, which allowed to monitor neuronal activity during fear learning. 233 individual putative pyramidal neurons could modulate their firing rate in response to the conditioned tone (conditioned neurons) and/or non- conditioned tones (generalizing neurons). Out of these recorded putative pyramidal neurons 86 (37%) neurons were conditioned to one or both tones. More precisely, one population of neurons encoded for a shared memory while another group of neurons likely encoded the memories' new features. Notably, in spite of a successful behavioral extinction, the firing rate of those conditioned neurons in response to the conditioned tone remained unchanged throughout memory testing. Furthermore, by analyzing the pre-conditioning characteristics of the conditioned neurons, I determined that it was possible to predict neuronal recruitment based on three factors: 1) initial sensitivity to auditory inputs, with tone-sensitive neurons being more easily recruited than tone- insensitive neurons; 2) baseline excitability levels, with more highly excitable neurons being more likely to become conditioned; and 3) the number of afferent connections received from local neurons, with neurons destined to become conditioned receiving more connections than non-conditioned neurons. - En conditionnement de la peur, un animal apprend à associer un stimulus inconditionnel (SI), tel un choc électrique, et un stimulus conditionné (SC), comme un son, de sorte que la présentation du SC seul suffit pour déclencher des réflexes conditionnés. Des recherches récentes sur l'amygdale latérale (AL) ont montré que, suite au conditionnement à la peur, seul un sous-ensemble de neurones plus excitables sont recrutés pour constituer la trace mnésique. Pour apprendre à associer deux sons au même SI, je fais l'hypothèse que les neurones entrent en compétition afin d'être sélectionnés lors du recrutement pour coder la trace mnésique. Ce recrutement dépendrait d'un part à une activation facilité des neurones ainsi qu'une activation facilité de réseaux de neurones locaux. En outre, je fais l'hypothèse que l'activation de ces réseaux de l'AL, en soi, est suffisante pour induire une mémoire effrayante. Pour répondre à ces hypothèses, je vais montrer que A) selon un processus de mémoire à court terme, un rat peut apprendre à associer deux mémoires effrayantes apprises successivement; B) des populations neuronales dans l'AL sont compétitivement recrutées dans les traces mnésiques en fonction des avantages neuronaux individuels, ainsi que les avantages consentis par le réseau local. En effectuant deux expériences successives de conditionnement à la peur, des rats étaient capables d'apprendre, ainsi que de subir un processus d'extinction, pour les deux souvenirs effrayants. La mesure de l'efficacité du conditionnement à la peur a été effectuée 1 heure après l'apprentissage pour chaque souvenir. Ces rats ont été équipés d'un système d'enregistrements extracellulaires stables que j'ai développé, ce qui a permis de suivre l'activité neuronale pendant l'apprentissage de la peur. 233 neurones pyramidaux individuels pouvaient moduler leur taux d'activité en réponse au son conditionné (neurones conditionnés) et/ou au son non conditionné (neurones généralisant). Sur les 233 neurones pyramidaux putatifs enregistrés 86 (37%) d'entre eux ont été conditionnés à un ou deux tons. Plus précisément, une population de neurones code conjointement pour un souvenir partagé, alors qu'un groupe de neurones différent code pour de nouvelles caractéristiques de nouveaux souvenirs. En particulier, en dépit d'une extinction du comportement réussie, le taux de décharge de ces neurones conditionné en réponse à la tonalité conditionnée est resté inchangée tout au long de la mesure d'apprentissage. En outre, en analysant les caractéristiques de pré-conditionnement des neurones conditionnés, j'ai déterminé qu'il était possible de prévoir le recrutement neuronal basé sur trois facteurs : 1) la sensibilité initiale aux entrées auditives, avec les neurones sensibles aux sons étant plus facilement recrutés que les neurones ne répondant pas aux stimuli auditifs; 2) les niveaux d'excitabilité des neurones, avec les neurones plus facilement excitables étant plus susceptibles d'être conditionnés au son ; et 3) le nombre de connexions reçues, puisque les neurones conditionné reçoivent plus de connexions que les neurones non-conditionnés. Enfin, nous avons constaté qu'il était possible de remplacer de façon satisfaisante le SI lors d'un conditionnement à la peur par des injections bilatérales de bicuculline, un antagoniste des récepteurs de l'acide y-Aminobutirique.
Resumo:
Estudi comparatiu amb benchmark del rendiment en dues plataformes multicore multithreading de diferents modalitats de paral·lelització de multiplicacions de matrius de nombres enters i de nombres en coma flotant mitjançant el model de memòria compartida OpenMP versió 2.5 i OpenMP versió 3.0.
Resumo:
Remote sensing spatial, spectral, and temporal resolutions of images, acquired over a reasonably sized image extent, result in imagery that can be processed to represent land cover over large areas with an amount of spatial detail that is very attractive for monitoring, management, and scienti c activities. With Moore's Law alive and well, more and more parallelism is introduced into all computing platforms, at all levels of integration and programming to achieve higher performance and energy e ciency. Being the geometric calibration process one of the most time consuming processes when using remote sensing images, the aim of this work is to accelerate this process by taking advantage of new computing architectures and technologies, specially focusing in exploiting computation over shared memory multi-threading hardware. A parallel implementation of the most time consuming process in the remote sensing geometric correction has been implemented using OpenMP directives. This work compares the performance of the original serial binary versus the parallelized implementation, using several multi-threaded modern CPU architectures, discussing about the approach to nd the optimum hardware for a cost-e ective execution.
Resumo:
BACKGROUND: The structure and organisation of ecological interactions within an ecosystem is modified by the evolution and coevolution of the individual species it contains. Understanding how historical conditions have shaped this architecture is vital for understanding system responses to change at scales from the microbial upwards. However, in the absence of a group selection process, the collective behaviours and ecosystem functions exhibited by the whole community cannot be organised or adapted in a Darwinian sense. A long-standing open question thus persists: Are there alternative organising principles that enable us to understand and predict how the coevolution of the component species creates and maintains complex collective behaviours exhibited by the ecosystem as a whole? RESULTS: Here we answer this question by incorporating principles from connectionist learning, a previously unrelated discipline already using well-developed theories on how emergent behaviours arise in simple networks. Specifically, we show conditions where natural selection on ecological interactions is functionally equivalent to a simple type of connectionist learning, 'unsupervised learning', well-known in neural-network models of cognitive systems to produce many non-trivial collective behaviours. Accordingly, we find that a community can self-organise in a well-defined and non-trivial sense without selection at the community level; its organisation can be conditioned by past experience in the same sense as connectionist learning models habituate to stimuli. This conditioning drives the community to form a distributed ecological memory of multiple past states, causing the community to: a) converge to these states from any random initial composition; b) accurately restore historical compositions from small fragments; c) recover a state composition following disturbance; and d) to correctly classify ambiguous initial compositions according to their similarity to learned compositions. We examine how the formation of alternative stable states alters the community's response to changing environmental forcing, and we identify conditions under which the ecosystem exhibits hysteresis with potential for catastrophic regime shifts. CONCLUSIONS: This work highlights the potential of connectionist theory to expand our understanding of evo-eco dynamics and collective ecological behaviours. Within this framework we find that, despite not being a Darwinian unit, ecological communities can behave like connectionist learning systems, creating internal conditions that habituate to past environmental conditions and actively recalling those conditions. REVIEWERS: This article was reviewed by Prof. Ricard V Solé, Universitat Pompeu Fabra, Barcelona and Prof. Rob Knight, University of Colorado, Boulder.
Resumo:
Das hier frei verfügbare Skript gehört zu einer gleichnamigen Vorlesung, die von Prof. Dr. Lutz Wegner bis zum Sommersemester 2007 gehalten wurde. Davor lief sie bis 1999 unter dem etwas irreführenden Titel „Ausgewählte Themen zu Rechnernetzen“. Behandelt wird die IPC in UNIX-basierten Rechnernetzen. Dazu gehören allgemeine Kenntnisse der Prozessumgebung, die fork- und exec-Systemaufrufe, Lock Files, Signale, Pipes, das Botschaftenkonzept (message queues), Semaphore, Shared Memory, Remote Procedure Calls, Sockets und Threads. Jedes Konzept wird mit kleinen Beispielen besprochen, die in C geschrieben sind. Der Quelltext liegt auf unseren Anlagen vor (für AIX, LINUX, Solaris). Grundlage der Vorlesung und des Skripts ist das ausgezeichnete Buch von John Shapley Gray „Interprocess Communications in UNIX“ aus dem Jahr 1998 bzw. die auf Linux angepasste Auflage desselben Buches „Interprocess Communications in LINUX“ aus dem Jahr 2003.
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Internet applications such as media streaming, collaborative computing and massive multiplayer are on the rise,. This leads to the need for multicast communication, but unfortunately group communications support based on IP multicast has not been widely adopted due to a combination of technical and non-technical problems. Therefore, a number of different application-layer multicast schemes have been proposed in recent literature to overcome the drawbacks. In addition, these applications often behave as both providers and clients of services, being called peer-topeer applications, and where participants come and go very dynamically. Thus, servercentric architectures for membership management have well-known problems related to scalability and fault-tolerance, and even peer-to-peer traditional solutions need to have some mechanism that takes into account member's volatility. The idea of location awareness distributes the participants in the overlay network according to their proximity in the underlying network allowing a better performance. Given this context, this thesis proposes an application layer multicast protocol, called LAALM, which takes into account the actual network topology in the assembly process of the overlay network. The membership algorithm uses a new metric, IPXY, to provide location awareness through the processing of local information, and it was implemented using a distributed shared and bi-directional tree. The algorithm also has a sub-optimal heuristic to minimize the cost of membership process. The protocol has been evaluated in two ways. First, through an own simulator developed in this work, where we evaluated the quality of distribution tree by metrics such as outdegree and path length. Second, reallife scenarios were built in the ns-3 network simulator where we evaluated the network protocol performance by metrics such as stress, stretch, time to first packet and reconfiguration group time
Resumo:
Artificial neural networks are usually applied to solve complex problems. In problems with more complexity, by increasing the number of layers and neurons, it is possible to achieve greater functional efficiency. Nevertheless, this leads to a greater computational effort. The response time is an important factor in the decision to use neural networks in some systems. Many argue that the computational cost is higher in the training period. However, this phase is held only once. Once the network trained, it is necessary to use the existing computational resources efficiently. In the multicore era, the problem boils down to efficient use of all available processing cores. However, it is necessary to consider the overhead of parallel computing. In this sense, this paper proposes a modular structure that proved to be more suitable for parallel implementations. It is proposed to parallelize the feedforward process of an RNA-type MLP, implemented with OpenMP on a shared memory computer architecture. The research consistes on testing and analizing execution times. Speedup, efficiency and parallel scalability are analyzed. In the proposed approach, by reducing the number of connections between remote neurons, the response time of the network decreases and, consequently, so does the total execution time. The time required for communication and synchronization is directly linked to the number of remote neurons in the network, and so it is necessary to investigate which one is the best distribution of remote connections
Resumo:
This work presents a scalable and efficient parallel implementation of the Standard Simplex algorithm in the multicore architecture to solve large scale linear programming problems. We present a general scheme explaining how each step of the standard Simplex algorithm was parallelized, indicating some important points of the parallel implementation. Performance analysis were conducted by comparing the sequential time using the Simplex tableau and the Simplex of the CPLEXR IBM. The experiments were executed on a shared memory machine with 24 cores. The scalability analysis was performed with problems of different dimensions, finding evidence that our parallel standard Simplex algorithm has a better parallel efficiency for problems with more variables than constraints. In comparison with CPLEXR , the proposed parallel algorithm achieved a efficiency of up to 16 times better
Resumo:
Research on the micro-structural characterization of metal-matrix composites uses X-ray computed tomography to collect information about the interior features of the samples, in order to elucidate their exhibited properties. The tomographic raw data needs several steps of computational processing in order to eliminate noise and interference. Our experience with a program (Tritom) that handles these questions has shown that in some cases the processing steps take a very long time and that it is not easy for a Materials Science specialist to interact with Tritom in order to define the most adequate parameter values and the proper sequence of the available processing steps. For easing the use of Tritom, a system was built which addresses the aspects described before and that is based on the OpenDX visualization system. OpenDX visualization facilities constitute a great benefit to Tritom. The visual programming environment of OpenDX allows an easy definition of a sequence of processing steps thus fulfilling the requirement of an easy use by non-specialists on Computer Science. Also the possibility of incorporating external modules in a visual OpenDX program allows the researchers to tackle the aspect of reducing the long execution time of some processing steps. The longer processing steps of Tritom have been parallelized in two different types of hardware architectures (message-passing and shared-memory); the corresponding parallel programs can be easily incorporated in a sequence of processing steps defined in an OpenDX program. The benefits of our system are illustrated through an example where the tool is applied in the study of the sensitivity to crushing – and the implications thereof – of the reinforcements used in a functionally graded syntactic metallic foam.
Resumo:
Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)