964 resultados para Distributed algorithm
Resumo:
L'objectif de cette thèse est de présenter différentes applications du programme de recherche de calcul conditionnel distribué. On espère que ces applications, ainsi que la théorie présentée ici, mènera à une solution générale du problème d'intelligence artificielle, en particulier en ce qui a trait à la nécessité d'efficience. La vision du calcul conditionnel distribué consiste à accélérer l'évaluation et l'entraînement de modèles profonds, ce qui est très différent de l'objectif usuel d'améliorer sa capacité de généralisation et d'optimisation. Le travail présenté ici a des liens étroits avec les modèles de type mélange d'experts. Dans le chapitre 2, nous présentons un nouvel algorithme d'apprentissage profond qui utilise une forme simple d'apprentissage par renforcement sur un modèle d'arbre de décisions à base de réseau de neurones. Nous démontrons la nécessité d'une contrainte d'équilibre pour maintenir la distribution d'exemples aux experts uniforme et empêcher les monopoles. Pour rendre le calcul efficient, l'entrainement et l'évaluation sont contraints à être éparse en utilisant un routeur échantillonnant des experts d'une distribution multinomiale étant donné un exemple. Dans le chapitre 3, nous présentons un nouveau modèle profond constitué d'une représentation éparse divisée en segments d'experts. Un modèle de langue à base de réseau de neurones est construit à partir des transformations éparses entre ces segments. L'opération éparse par bloc est implémentée pour utilisation sur des cartes graphiques. Sa vitesse est comparée à deux opérations denses du même calibre pour démontrer le gain réel de calcul qui peut être obtenu. Un modèle profond utilisant des opérations éparses contrôlées par un routeur distinct des experts est entraîné sur un ensemble de données d'un milliard de mots. Un nouvel algorithme de partitionnement de données est appliqué sur un ensemble de mots pour hiérarchiser la couche de sortie d'un modèle de langage, la rendant ainsi beaucoup plus efficiente. Le travail présenté dans cette thèse est au centre de la vision de calcul conditionnel distribué émis par Yoshua Bengio. Elle tente d'appliquer la recherche dans le domaine des mélanges d'experts aux modèles profonds pour améliorer leur vitesse ainsi que leur capacité d'optimisation. Nous croyons que la théorie et les expériences de cette thèse sont une étape importante sur la voie du calcul conditionnel distribué car elle cadre bien le problème, surtout en ce qui concerne la compétitivité des systèmes d'experts.
Resumo:
Decimal multiplication is an integral part of financial, commercial, and internet-based computations. A novel design for single digit decimal multiplication that reduces the critical path delay and area for an iterative multiplier is proposed in this research. The partial products are generated using single digit multipliers, and are accumulated based on a novel RPS algorithm. This design uses n single digit multipliers for an n × n multiplication. The latency for the multiplication of two n-digit Binary Coded Decimal (BCD) operands is (n + 1) cycles and a new multiplication can begin every n cycle. The accumulation of final partial products and the first iteration of partial product generation for next set of inputs are done simultaneously. This iterative decimal multiplier offers low latency and high throughput, and can be extended for decimal floating-point multiplication.
Resumo:
In Wireless Sensor Networks (WSN), neglecting the effects of varying channel quality can lead to an unnecessary wastage of precious battery resources and in turn can result in the rapid depletion of sensor energy and the partitioning of the network. Fairness is a critical issue when accessing a shared wireless channel and fair scheduling must be employed to provide the proper flow of information in a WSN. In this paper, we develop a channel adaptive MAC protocol with a traffic-aware dynamic power management algorithm for efficient packet scheduling and queuing in a sensor network, with time varying characteristics of the wireless channel also taken into consideration. The proposed protocol calculates a combined weight value based on the channel state and link quality. Then transmission is allowed only for those nodes with weights greater than a minimum quality threshold and nodes attempting to access the wireless medium with a low weight will be allowed to transmit only when their weight becomes high. This results in many poor quality nodes being deprived of transmission for a considerable amount of time. To avoid the buffer overflow and to achieve fairness for the poor quality nodes, we design a Load prediction algorithm. We also design a traffic aware dynamic power management scheme to minimize the energy consumption by continuously turning off the radio interface of all the unnecessary nodes that are not included in the routing path. By Simulation results, we show that our proposed protocol achieves a higher throughput and fairness besides reducing the delay
Resumo:
In this report, we discuss the application of global optimization and Evolutionary Computation to distributed systems. We therefore selected and classified many publications, giving an insight into the wide variety of optimization problems which arise in distributed systems. Some interesting approaches from different areas will be discussed in greater detail with the use of illustrative examples.
Resumo:
Research on autonomous intelligent systems has focused on how robots can robustly carry out missions in uncertain and harsh environments with very little or no human intervention. Robotic execution languages such as RAPs, ESL, and TDL improve robustness by managing functionally redundant procedures for achieving goals. The model-based programming approach extends this by guaranteeing correctness of execution through pre-planning of non-deterministic timed threads of activities. Executing model-based programs effectively on distributed autonomous platforms requires distributing this pre-planning process. This thesis presents a distributed planner for modelbased programs whose planning and execution is distributed among agents with widely varying levels of processor power and memory resources. We make two key contributions. First, we reformulate a model-based program, which describes cooperative activities, into a hierarchical dynamic simple temporal network. This enables efficient distributed coordination of robots and supports deployment on heterogeneous robots. Second, we introduce a distributed temporal planner, called DTP, which solves hierarchical dynamic simple temporal networks with the assistance of the distributed Bellman-Ford shortest path algorithm. The implementation of DTP has been demonstrated successfully on a wide range of randomly generated examples and on a pursuer-evader challenge problem in simulation.
Resumo:
We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.
Resumo:
In this paper we address the problem of extracting representative point samples from polygonal models. The goal of such a sampling algorithm is to find points that are evenly distributed. We propose star-discrepancy as a measure for sampling quality and propose new sampling methods based on global line distributions. We investigate several line generation algorithms including an efficient hardware-based sampling method. Our method contributes to the area of point-based graphics by extracting points that are more evenly distributed than by sampling with current algorithms
Resumo:
We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.
Resumo:
This paper presents a parallel genetic algorithm to the Steiner Problem in Networks. Several previous papers have proposed the adoption of GAs and others metaheuristics to solve the SPN demonstrating the validity of their approaches. This work differs from them for two main reasons: the dimension and the characteristics of the networks adopted in the experiments and the aim from which it has been originated. The reason that aimed this work was namely to build a comparison term for validating deterministic and computationally inexpensive algorithms which can be used in practical engineering applications, such as the multicast transmission in the Internet. On the other hand, the large dimensions of our sample networks require the adoption of a parallel implementation of the Steiner GA, which is able to deal with such large problem instances.
Resumo:
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.
Resumo:
The paper presents a design for a hardware genetic algorithm which uses a pipeline of systolic arrays. These arrays have been designed using systolic synthesis techniques which involve expressing the algorithm as a set of uniform recurrence relations. The final design divorces the fitness function evaluation from the hardware and can process chromosomes of different lengths, giving the design a generic quality. The paper demonstrates the design methodology by progressively re-writing a simple genetic algorithm, expressed in C code, into a form from which systolic structures can be deduced. This paper extends previous work by introducing a simplification to a previous systolic design for the genetic algorithm. The simplification results in the removal of 2N 2 + 4N cells and reduces the time complexity by 3N + 1 cycles.
Resumo:
Fully connected cubic networks (FCCNs) are a class of newly proposed hierarchical interconnection networks for multicomputer systems, which enjoy the strengths of constant node degree and good expandability. The shortest path routing in FCCNs is an open problem. In this paper, we present an oblivious routing algorithm for n-level FCCN with N = 8(n) nodes, and prove that this algorithm creates a shortest path from the source to the destination. At the costs of both an O(N)-parallel-step off-line preprocessing phase and a list of size N stored at each node, the proposed algorithm is carried out at each related node in O(n) time. In some cases the proposed algorithm is superior to the one proposed by Chang and Wang in terms of the length of the routing path. This justifies the utility of our routing strategy. (C) 2006 Elsevier Inc. All rights reserved.
Resumo:
A Bayesian Model Averaging approach to the estimation of lag structures is introduced, and applied to assess the impact of R&D on agricultural productivity in the US from 1889 to 1990. Lag and structural break coefficients are estimated using a reversible jump algorithm that traverses the model space. In addition to producing estimates and standard deviations for the coe¢ cients, the probability that a given lag (or break) enters the model is estimated. The approach is extended to select models populated with Gamma distributed lags of di¤erent frequencies. Results are consistent with the hypothesis that R&D positively drives productivity. Gamma lags are found to retain their usefulness in imposing a plausible structure on lag coe¢ cients, and their role is enhanced through the use of model averaging.
Resumo:
The past decade has witnessed explosive growth of mobile subscribers and services. With the purpose of providing better-swifter-cheaper services, radio network optimisation plays a crucial role but faces enormous challenges. The concept of Dynamic Network Optimisation (DNO), therefore, has been introduced to optimally and continuously adjust network configurations, in response to changes in network conditions and traffic. However, the realization of DNO has been seriously hindered by the bottleneck of optimisation speed performance. An advanced distributed parallel solution is presented in this paper, as to bridge the gap by accelerating the sophisticated proprietary network optimisation algorithm, while maintaining the optimisation quality and numerical consistency. The ariesoACP product from Arieso Ltd serves as the main platform for acceleration. This solution has been prototyped, implemented and tested. Real-project based results exhibit a high scalability and substantial acceleration at an average speed-up of 2.5, 4.9 and 6.1 on a distributed 5-core, 9-core and 16-core system, respectively. This significantly outperforms other parallel solutions such as multi-threading. Furthermore, augmented optimisation outcome, alongside high correctness and self-consistency, have also been fulfilled. Overall, this is a breakthrough towards the realization of DNO.
Resumo:
Higher order cumulant analysis is applied to the blind equalization of linear time-invariant (LTI) nonminimum-phase channels. The channel model is moving-average based. To identify the moving average parameters of channels, a higher-order cumulant fitting approach is adopted in which a novel relay algorithm is proposed to obtain the global solution. In addition, the technique incorporates model order determination. The transmitted data are considered as independently identically distributed random variables over some discrete finite set (e.g., set {±1, ±3}). A transformation scheme is suggested so that third-order cumulant analysis can be applied to this type of data. Simulation examples verify the feasibility and potential of the algorithm. Performance is compared with that of the noncumulant-based Sato scheme in terms of the steady state MSE and convergence rate.