866 resultados para Recommended Systems, Collaborative Filtering, Customization, Distributed Recommender
Resumo:
A key trait of Free and Open Source Software (FOSS) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a FOSS project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, is found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in FOSS projects. The study presents a preliminary data set consisting of version control and mailing list data.
Resumo:
A key trait of Free and Open Source Software (FOSS) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a FOSS project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, is found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in FOSS projects. The study presents a preliminary data set consisting of version control and mailing list data.
Resumo:
Erasure coding techniques are used to increase the reliability of distributed storage systems while minimizing storage overhead. Also of interest is minimization of the bandwidth required to repair the system following a node failure. In a recent paper, Wu et al. characterize the tradeoff between the repair bandwidth and the amount of data stored per node. They also prove the existence of regenerating codes that achieve this tradeoff. In this paper, we introduce Exact Regenerating Codes, which are regenerating codes possessing the additional property of being able to duplicate the data stored at a failed node. Such codes require low processing and communication overheads, making the system practical and easy to maintain. Explicit construction of exact regenerating codes is provided for the minimum bandwidth point on the storage-repair bandwidth tradeoff, relevant to distributed-mail-server applications. A sub-space based approach is provided and shown to yield necessary and sufficient conditions on a linear code to possess the exact regeneration property as well as prove the uniqueness of our construction. Also included in the paper, is an explicit construction of regenerating codes for the minimum storage point for parameters relevant to storage in peer-to-peer systems. This construction supports a variable number of nodes and can handle multiple, simultaneous node failures. All constructions given in the paper are of low complexity, requiring low field size in particular.
Resumo:
Motivated by certain situations in manufacturing systems and communication networks, we look into the problem of maximizing the profit in a queueing system with linear reward and cost structure and having a choice of selecting the streams of Poisson arrivals according to an independent Markov chain. We view the system as a MMPP/GI/1 queue and seek to maximize the profits by optimally choosing the stationary probabilities of the modulating Markov chain. We consider two formulations of the optimization problem. The first one (which we call the PUT problem) seeks to maximize the profit per unit time whereas the second one considers the maximization of the profit per accepted customer (the PAC problem). In each of these formulations, we explore three separate problems. In the first one, the constraints come from bounding the utilization of an infinite capacity server; in the second one the constraints arise from bounding the mean queue length of the same queue; and in the third one the finite capacity of the buffer reflect as a set of constraints. In the problems bounding the utilization factor of the queue, the solutions are given by essentially linear programs, while the problems with mean queue length constraints are linear programs if the service is exponentially distributed. The problems modeling the finite capacity queue are non-convex programs for which global maxima can be found. There is a rich relationship between the solutions of the PUT and PAC problems. In particular, the PUT solutions always make the server work at a utilization factor that is no less than that of the PAC solutions.
Resumo:
A detailed characterization of interference power statistics in CDMA systems is of considerable practical and theoretical interest. Such a characterization for uplink inter-cell interference has been difficult because of transmit power control, randomness in the number of interfering mobile stations, and randomness in their locations. We develop a new method to model the uplink inter-cell interference power as a lognormal distribution, and show that it is an order of magnitude more accurate than the conventional Gaussian approximation even when the average number of mobile stations per cell is relatively large and even outperforms the moment-matched lognormal approximation considered in the literature. The proposed method determines the lognormal parameters by matching its moment generating function with a new approximation of the moment generating function for the inter-cell interference. The method is tractable and exploits the elegant spatial Poisson process theory. Using several numerical examples, the accuracy of the proposed method in modeling the probability distribution of inter-cell interference is verified for both small and large values of interference.
Resumo:
There are a number of large networks which occur in many problems dealing with the flow of power, communication signals, water, gas, transportable goods, etc. Both design and planning of these networks involve optimization problems. The first part of this paper introduces the common characteristics of a nonlinear network (the network may be linear, the objective function may be non linear, or both may be nonlinear). The second part develops a mathematical model trying to put together some important constraints based on the abstraction for a general network. The third part deals with solution procedures; it converts the network to a matrix based system of equations, gives the characteristics of the matrix and suggests two solution procedures, one of them being a new one. The fourth part handles spatially distributed networks and evolves a number of decomposition techniques so that we can solve the problem with the help of a distributed computer system. Algorithms for parallel processors and spatially distributed systems have been described.There are a number of common features that pertain to networks. A network consists of a set of nodes and arcs. In addition at every node, there is a possibility of an input (like power, water, message, goods etc) or an output or none. Normally, the network equations describe the flows amoungst nodes through the arcs. These network equations couple variables associated with nodes. Invariably, variables pertaining to arcs are constants; the result required will be flows through the arcs. To solve the normal base problem, we are given input flows at nodes, output flows at nodes and certain physical constraints on other variables at nodes and we should find out the flows through the network (variables at nodes will be referred to as across variables).The optimization problem involves in selecting inputs at nodes so as to optimise an objective function; the objective may be a cost function based on the inputs to be minimised or a loss function or an efficiency function. The above mathematical model can be solved using Lagrange Multiplier technique since the equalities are strong compared to inequalities. The Lagrange multiplier technique divides the solution procedure into two stages per iteration. Stage one calculates the problem variables % and stage two the multipliers lambda. It is shown that the Jacobian matrix used in stage one (for solving a nonlinear system of necessary conditions) occurs in the stage two also.A second solution procedure has also been imbedded into the first one. This is called total residue approach. It changes the equality constraints so that we can get faster convergence of the iterations.Both solution procedures are found to coverge in 3 to 7 iterations for a sample network.The availability of distributed computer systems — both LAN and WAN — suggest the need for algorithms to solve the optimization problems. Two types of algorithms have been proposed — one based on the physics of the network and the other on the property of the Jacobian matrix. Three algorithms have been deviced, one of them for the local area case. These algorithms are called as regional distributed algorithm, hierarchical regional distributed algorithm (both using the physics properties of the network), and locally distributed algorithm (a multiprocessor based approach with a local area network configuration). The approach used was to define an algorithm that is faster and uses minimum communications. These algorithms are found to converge at the same rate as the non distributed (unitary) case.
Resumo:
Many real-time database applications arise in electronic financial services, safety-critical installations and military systems where enforcing security is crucial to the success of the enterprise. For real-time database systems supporting applications with firm deadlines, we investigate here the performance implications, in terms of killed transactions, of guaranteeing multilevel secrecy. In particular, we focus on the concurrency control (CC) aspects of this issue. Our main contributions are the following: First, we identify which among the previously proposed real-time CC protocols are capable of providing covert-channel-free security. Second, using a detailed simulation model, we profile the real-time performance of a representative set of these secure CC protocols for a variety of security-classified workloads and system configurations. Our experiments show that a prioritized optimistic CC protocol, OPT-WAIT, provides the best overall performance. Third, we propose and evaluate a novel "dual-CC" approach that allows the real-time database system to simultaneously use different CC mechanisms for guaranteeing security and for improving real-time performance. By appropriately choosing these different mechanisms, concurrency control protocols that provide even better performance than OPT-WAIT are designed. Finally, we propose and evaluate GUARD, an adaptive admission-control policy designed to provide fairness with respect to the distribution of killed transactions across security levels. Our experiments show that GUARD efficiently provides close to ideal fairness for real-time applications that can tolerate covert channel bandwidths of upto one bit per second.
Resumo:
In a storage system where individual storage nodes are prone to failure, the redundant storage of data in a distributed manner across multiple nodes is a must to ensure reliability. Reed-Solomon codes possess the reconstruction property under which the stored data can be recovered by connecting to any k of the n nodes in the network across which data is dispersed. This property can be shown to lead to vastly improved network reliability over simple replication schemes. Also of interest in such storage systems is the minimization of the repair bandwidth, i.e., the amount of data needed to be downloaded from the network in order to repair a single failed node. Reed-Solomon codes perform poorly here as they require the entire data to be downloaded. Regenerating codes are a new class of codes which minimize the repair bandwidth while retaining the reconstruction property. This paper provides an overview of regenerating codes including a discussion on the explicit construction of optimum codes.
Resumo:
A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.
Resumo:
Human activities extract and displace different substances and materials from the earth s crust, thus causing various environmental problems, such as climate change, acidification and eutrophication. As problems have become more complicated, more holistic measures that consider the origins and sources of pollutants have been called for. Industrial ecology is a field of science that forms a comprehensive framework for studying the interactions between the modern technological society and the environment. Industrial ecology considers humans and their technologies to be part of the natural environment, not separate from it. Industrial operations form natural systems that must also function as such within the constraints set by the biosphere. Industrial symbiosis (IS) is a central concept of industrial ecology. Industrial symbiosis studies look at the physical flows of materials and energy in local industrial systems. In an ideal IS, waste material and energy are exchanged by the actors of the system, thereby reducing the consumption of virgin material and energy inputs and the generation of waste and emissions. Companies are seen as part of the chains of suppliers and consumers that resemble those of natural ecosystems. The aim of this study was to analyse the environmental performance of an industrial symbiosis based on pulp and paper production, taking into account life cycle impacts as well. Life Cycle Assessment (LCA) is a tool for quantitatively and systematically evaluating the environmental aspects of a product, technology or service throughout its whole life cycle. Moreover, the Natural Step Sustainability Principles formed a conceptual framework for assessing the environmental performance of the case study symbiosis (Paper I). The environmental performance of the case study symbiosis was compared to four counterfactual reference scenarios in which the actors of the symbiosis operated on their own. The research methods used were process-based life cycle assessment (LCA) (Papers II and III) and hybrid LCA, which combines both process and input-output LCA (Paper IV). The results showed that the environmental impacts caused by the extraction and processing of the materials and the energy used by the symbiosis were considerable. If only the direct emissions and resource use of the symbiosis had been considered, less than half of the total environmental impacts of the system would have been taken into account. When the results were compared with the counterfactual reference scenarios, the net environmental impacts of the symbiosis were smaller than those of the reference scenarios. The reduction in environmental impacts was mainly due to changes in the way energy was produced. However, the results are sensitive to the way the reference scenarios are defined. LCA is a useful tool for assessing the overall environmental performance of industrial symbioses. It is recommended that in addition to the direct effects, the upstream impacts should be taken into account as well when assessing the environmental performance of industrial symbioses. Industrial symbiosis should be seen as part of the process of improving the environmental performance of a system. In some cases, it may be more efficient, from an environmental point of view, to focus on supply chain management instead.
Resumo:
In this paper we propose a novel technique to model and ana¿ lyze the performability of parallel and distributed architectures using GSPN-reward models.
Resumo:
Formal specification is vital to the development of distributed real-time systems as these systems are inherently complex and safety-critical. It is widely acknowledged that formal specification and automatic analysis of specifications can significantly increase system reliability. Although a number of specification techniques for real-time systems have been reported in the literature, most of these formalisms do not adequately address to the constraints that the aspects of 'distribution' and 'real-time' impose on specifications. Further, an automatic verification tool is necessary to reduce human errors in the reasoning process. In this regard, this paper is an attempt towards the development of a novel executable specification language for distributed real-time systems. First, we give a precise characterization of the syntax and semantics of DL. Subsequently, we discuss the problems of model checking, automatic verification of satisfiability of DL specifications, and testing conformance of event traces with DL specifications. Effective solutions to these problems are presented as extensions to the classical first-order tableau algorithm. The use of the proposed framework is illustrated by specifying a sample problem.
Resumo:
In the area of testing communication systems, the interfaces between systems to be tested and their testers have great impact on test generation and fault detectability. Several types of such interfaces have been standardized by the International Standardization Organization (ISO). A general distributed test architecture, containing distributed interfaces, has been presented in the literature for testing distributed systems based on the Open Distributing Processing (ODP) Basic Reference Model (BRM), which is a generalized version of ISO distributed test architecture. We study in this paper the issue of test selection with respect to such an test architecture. In particular, we consider communication systems that can be modeled by finite state machines with several distributed interfaces, called ports. A test generation method is developed for generating test sequences for such finite state machines, which is based on the idea of synchronizable test sequences. Starting from the initial effort by Sarikaya, a certain amount of work has been done for generating test sequences for finite state machines with respect to the ISO distributed test architecture, all based on the idea of modifying existing test generation methods to generate synchronizable test sequences. However, none studies the fault coverage provided by their methods. We investigate the issue of fault coverage and point out a fact that the methods given in the literature for the distributed test architecture cannot ensure the same fault coverage as the corresponding original testing methods. We also study the limitation of fault detectability in the distributed test architecture.
Resumo:
Beavers are often found to be in conflict with human interests by creating nuisances like building dams on flowing water (leading to flooding), blocking irrigation canals, cutting down timbers, etc. At the same time they contribute to raising water tables, increased vegetation, etc. Consequently, maintaining an optimal beaver population is beneficial. Because of their diffusion externality (due to migratory nature), strategies based on lumped parameter models are often ineffective. Using a distributed parameter model for beaver population that accounts for their spatial and temporal behavior, an optimal control (trapping) strategy is presented in this paper that leads to a desired distribution of the animal density in a region in the long run. The optimal control solution presented, imbeds the solution for a large number of initial conditions (i.e., it has a feedback form), which is otherwise nontrivial to obtain. The solution obtained can be used in real-time by a nonexpert in control theory since it involves only using the neural networks trained offline. Proper orthogonal decomposition-based basis function design followed by their use in a Galerkin projection has been incorporated in the solution process as a model reduction technique. Optimal solutions are obtained through a "single network adaptive critic" (SNAC) neural-network architecture.