873 resultados para constrained clustering
Resumo:
This paper proposes a novel protocol which uses the Internet Domain Name System (DNS) to partition Web clients into disjoint sets, each of which is associated with a single DNS server. We define an L-DNS cluster to be a grouping of Web Clients that use the same Local DNS server to resolve Internet host names. We identify such clusters in real-time using data obtained from a Web Server in conjunction with that server's Authoritative DNS―both instrumented with an implementation of our clustering algorithm. Using these clusters, we perform measurements from four distinct Internet locations. Our results show that L-DNS clustering enables a better estimation of proximity of a Web Client to a Web Server than previously proposed techniques. Thus, in a Content Distribution Network, a DNS-based scheme that redirects a request from a web client to one of many servers based on the client's name server coordinates (e.g., hops/latency/loss-rates between the client and servers) would perform better with our algorithm.
Resumo:
Personal communication devices are increasingly equipped with sensors for passive monitoring of encounters and surroundings. We envision the emergence of services that enable a community of mobile users carrying such resource-limited devices to query such information at remote locations in the field in which they collectively roam. One approach to implement such a service is directed placement and retrieval (DPR), whereby readings/queries about a specific location are routed to a node responsible for that location. In a mobile, potentially sparse setting, where end-to-end paths are unavailable, DPR is not an attractive solution as it would require the use of delay-tolerant (flooding-based store-carry-forward) routing of both readings and queries, which is inappropriate for applications with data freshness constraints, and which is incompatible with stringent device power/memory constraints. Alternatively, we propose the use of amorphous placement and retrieval (APR), in which routing and field monitoring are integrated through the use of a cache management scheme coupled with an informed exchange of cached samples to diffuse sensory data throughout the network, in such a way that a query answer is likely to be found close to the query origin. We argue that knowledge of the distribution of query targets could be used effectively by an informed cache management policy to maximize the utility of collective storage of all devices. Using a simple analytical model, we show that the use of informed cache management is particularly important when the mobility model results in a non-uniform distribution of users over the field. We present results from extensive simulations which show that in sparsely-connected networks, APR is more cost-effective than DPR, that it provides extra resilience to node failure and packet losses, and that its use of informed cache management yields superior performance.
Resumo:
The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.
Resumo:
As the commoditization of sensing, actuation and communication hardware increases, so does the potential for dynamically tasked sense and respond networked systems (i.e., Sensor Networks or SNs) to replace existing disjoint and inflexible special-purpose deployments (closed-circuit security video, anti-theft sensors, etc.). While various solutions have emerged to many individual SN-centric challenges (e.g., power management, communication protocols, role assignment), perhaps the largest remaining obstacle to widespread SN deployment is that those who wish to deploy, utilize, and maintain a programmable Sensor Network lack the programming and systems expertise to do so. The contributions of this thesis centers on the design, development and deployment of the SN Workbench (snBench). snBench embodies an accessible, modular programming platform coupled with a flexible and extensible run-time system that, together, support the entire life-cycle of distributed sensory services. As it is impossible to find a one-size-fits-all programming interface, this work advocates the use of tiered layers of abstraction that enable a variety of high-level, domain specific languages to be compiled to a common (thin-waist) tasking language; this common tasking language is statically verified and can be subsequently re-translated, if needed, for execution on a wide variety of hardware platforms. snBench provides: (1) a common sensory tasking language (Instruction Set Architecture) powerful enough to express complex SN services, yet simple enough to be executed by highly constrained resources with soft, real-time constraints, (2) a prototype high-level language (and corresponding compiler) to illustrate the utility of the common tasking language and the tiered programming approach in this domain, (3) an execution environment and a run-time support infrastructure that abstract a collection of heterogeneous resources into a single virtual Sensor Network, tasked via this common tasking language, and (4) novel formal methods (i.e., static analysis techniques) that verify safety properties and infer implicit resource constraints to facilitate resource allocation for new services. This thesis presents these components in detail, as well as two specific case-studies: the use of snBench to integrate physical and wireless network security, and the use of snBench as the foundation for semester-long student projects in a graduate-level Software Engineering course.
Resumo:
Routing protocols for ad-hoc networks assume that the nodes forming the network are either under a single authority, or else that they would be altruistically forwarding data for other nodes with no expectation of a return. These assumptions are unrealistic since in ad-hoc networks, nodes are likely to be autonomous and rational (selfish), and thus unwilling to help unless they have an incentive to do so. Providing such incentives is an important aspect that should be considered when designing ad-hoc routing protocols. In this paper, we propose a dynamic, decentralized routing protocol for ad-hoc networks that provides incentives in the form of payments to intermediate nodes used to forward data for others. In our Constrained Selfish Routing (CSR) protocol, game-theoretic approaches are used to calculate payments (incentives) that ensure both the truthfulness of participating nodes and the fairness of the CSR protocol. We show through simulations that CSR is an energy efficient protocol and that it provides lower communication overhead in the best and average cases compared to existing approaches.
Resumo:
Research on the construction of logical overlay networks has gained significance in recent times. This is partly due to work on peer-to-peer (P2P) systems for locating and retrieving distributed data objects, and also scalable content distribution using end-system multicast techniques. However, there are emerging applications that require the real-time transport of data from various sources to potentially many thousands of subscribers, each having their own quality-of-service (QoS) constraints. This paper primarily focuses on the properties of two popular topologies found in interconnection networks, namely k-ary n-cubes and de Bruijn graphs. The regular structure of these graph topologies makes them easier to analyze and determine possible routes for real-time data than complete or irregular graphs. We show how these overlay topologies compare in their ability to deliver data according to the QoS constraints of many subscribers, each receiving data from specific publishing hosts. Comparisons are drawn on the ability of each topology to route data in the presence of dynamic system effects, due to end-hosts joining and departing the system. Finally, experimental results show the service guarantees and physical link stress resulting from efficient multicast trees constructed over both kinds of overlay networks.
Resumo:
Overlay networks have become popular in recent times for content distribution and end-system multicasting of media streams. In the latter case, the motivation is based on the lack of widespread deployment of IP multicast and the ability to perform end-host processing. However, constructing routes between various end-hosts, so that data can be streamed from content publishers to many thousands of subscribers, each having their own QoS constraints, is still a challenging problem. First, any routes between end-hosts using trees built on top of overlay networks can increase stress on the underlying physical network, due to multiple instances of the same data traversing a given physical link. Second, because overlay routes between end-hosts may traverse physical network links more than once, they increase the end-to-end latency compared to IP-level routing. Third, algorithms for constructing efficient, large-scale trees that reduce link stress and latency are typically more complex. This paper therefore compares various methods to construct multicast trees between end-systems, that vary in terms of implementation costs and their ability to support per-subscriber QoS constraints. We describe several algorithms that make trade-offs between algorithmic complexity, physical link stress and latency. While no algorithm is best in all three cases we show how it is possible to efficiently build trees for several thousand subscribers with latencies within a factor of two of the optimal, and link stresses comparable to, or better than, existing technologies.
Resumo:
In the framework of iBench research project, our previous work created a domain specific language TRAFFIC [6] that facilitates specification, programming, and maintenance of distributed applications over a network. It allows safety property to be formalized in terms of types and subtyping relations. Extending upon our previous work, we add Hindley-Milner style polymorphism [8] with constraints [9] to the type system of TRAFFIC. This allows a programmer to use for-all quantifier to describe types of network components, escalating power and expressiveness of types to a new level that was not possible before with propositional subtyping relations. Furthermore, we design our type system with a pluggable constraint system, so it can adapt to different application needs while maintaining soundness. In this paper, we show the soundness of the type system, which is not syntax-directed but is easier to do typing derivation. We show that there is an equivalent syntax-directed type system, which is what a type checker program would implement to verify the safety of a network flow. This is followed by discussion on several constraint systems: polymorphism with subtyping constraints, Linear Programming, and Constraint Handling Rules (CHR) [3]. Finally, we provide some examples to illustrate workings of these constraint systems.
Resumo:
Spectral methods of graph partitioning have been shown to provide a powerful approach to the image segmentation problem. In this paper, we adopt a different approach, based on estimating the isoperimetric constant of an image graph. Our algorithm produces the high quality segmentations and data clustering of spectral methods, but with improved speed and stability.
Resumo:
Training data for supervised learning neural networks can be clustered such that the input/output pairs in each cluster are redundant. Redundant training data can adversely affect training time. In this paper we apply two clustering algorithms, ART2 -A and the Generalized Equality Classifier, to identify training data clusters and thus reduce the training data and training time. The approach is demonstrated for a high dimensional nonlinear continuous time mapping. The demonstration shows six-fold decrease in training time at little or no loss of accuracy in the handling of evaluation data.
Resumo:
A supersonic expansion containing acetylene seeded into Ar and produced from a circular nozzle is investigated using CW/cavity ring down spectroscopy, in the 1.5 μm range. The results, also involving experiments with pure acetylene and acetylene-He expansions, as well as slit nozzles, demonstrate that the denser central section in the expansion is slightly heated by the formation of acetylene aggregates, resulting into a dip in the monomer absorption line profiles. Acetylene-Ar aggregates are also formed at the edge of the circular nozzle expansion cone. © 2008 Elsevier B.V. All rights reserved.
Resumo:
The receptor deleted in colorectal cancer (DCC) directs dynamic polarizing activities in animals toward its extracellular ligand netrin. How DCC polarizes toward netrin is poorly understood. By performing live-cell imaging of the DCC orthologue UNC-40 during anchor cell invasion in Caenorhabditis elegans, we have found that UNC-40 clusters, recruits F-actin effectors, and generates F-actin in the absence of UNC-6 (netrin). Time-lapse analyses revealed that UNC-40 clusters assemble, disassemble, and reform at periodic intervals in different regions of the cell membrane. This oscillatory behavior indicates that UNC-40 clusters through a mechanism involving interlinked positive (formation) and negative (disassembly) feedback. We show that endogenous UNC-6 and ectopically provided UNC-6 orient and stabilize UNC-40 clustering. Furthermore, the UNC-40-binding protein MADD-2 (a TRIM family protein) promotes ligand-independent clustering and robust UNC-40 polarization toward UNC-6. Together, our data suggest that UNC-6 (netrin) directs polarized responses by stabilizing UNC-40 clustering. We propose that ligand-independent UNC-40 clustering provides a robust and adaptable mechanism to polarize toward netrin.
Resumo:
This paper presents two multilevel refinement algorithms for the capacitated clustering problem. Multilevel refinement is a collaborative technique capable of significantly aiding the solution process for optimisation problems. The central methodologies of the technique are filtering solutions from the search space and reducing the level of problem detail to be considered at each level of the solution process. The first multilevel algorithm uses a simple tabu search while the other executes a standard local search procedure. Both algorithms demonstrate that the multilevel technique is capable of aiding the solution process for this combinatorial optimisation problem.