8 resultados para Data-communication
em DRUM (Digital Repository at the University of Maryland)
Resumo:
In this work we introduce a new mathematical tool for optimization of routes, topology design, and energy efficiency in wireless sensor networks. We introduce a vector field formulation that models communication in the network, and routing is performed in the direction of this vector field at every location of the network. The magnitude of the vector field at every location represents the density of amount of data that is being transited through that location. We define the total communication cost in the network as the integral of a quadratic form of the vector field over the network area. With the above formulation, we introduce a mathematical machinery based on partial differential equations very similar to the Maxwell's equations in electrostatic theory. We show that in order to minimize the cost, the routes should be found based on the solution of these partial differential equations. In our formulation, the sensors are sources of information, and they are similar to the positive charges in electrostatics, the destinations are sinks of information and they are similar to negative charges, and the network is similar to a non-homogeneous dielectric media with variable dielectric constant (or permittivity coefficient). In one of the applications of our mathematical model based on the vector fields, we offer a scheme for energy efficient routing. Our routing scheme is based on changing the permittivity coefficient to a higher value in the places of the network where nodes have high residual energy, and setting it to a low value in the places of the network where the nodes do not have much energy left. Our simulations show that our method gives a significant increase in the network life compared to the shortest path and weighted shortest path schemes. Our initial focus is on the case where there is only one destination in the network, and later we extend our approach to the case where there are multiple destinations in the network. In the case of having multiple destinations, we need to partition the network into several areas known as regions of attraction of the destinations. Each destination is responsible for collecting all messages being generated in its region of attraction. The complexity of the optimization problem in this case is how to define regions of attraction for the destinations and how much communication load to assign to each destination to optimize the performance of the network. We use our vector field model to solve the optimization problem for this case. We define a vector field, which is conservative, and hence it can be written as the gradient of a scalar field (also known as a potential field). Then we show that in the optimal assignment of the communication load of the network to the destinations, the value of that potential field should be equal at the locations of all the destinations. Another application of our vector field model is to find the optimal locations of the destinations in the network. We show that the vector field gives the gradient of the cost function with respect to the locations of the destinations. Based on this fact, we suggest an algorithm to be applied during the design phase of a network to relocate the destinations for reducing the communication cost function. The performance of our proposed schemes is confirmed by several examples and simulation experiments. In another part of this work we focus on the notions of responsiveness and conformance of TCP traffic in communication networks. We introduce the notion of responsiveness for TCP aggregates and define it as the degree to which a TCP aggregate reduces its sending rate to the network as a response to packet drops. We define metrics that describe the responsiveness of TCP aggregates, and suggest two methods for determining the values of these quantities. The first method is based on a test in which we drop a few packets from the aggregate intentionally and measure the resulting rate decrease of that aggregate. This kind of test is not robust to multiple simultaneous tests performed at different routers. We make the test robust to multiple simultaneous tests by using ideas from the CDMA approach to multiple access channels in communication theory. Based on this approach, we introduce tests of responsiveness for aggregates, and call it CDMA based Aggregate Perturbation Method (CAPM). We use CAPM to perform congestion control. A distinguishing feature of our congestion control scheme is that it maintains a degree of fairness among different aggregates. In the next step we modify CAPM to offer methods for estimating the proportion of an aggregate of TCP traffic that does not conform to protocol specifications, and hence may belong to a DDoS attack. Our methods work by intentionally perturbing the aggregate by dropping a very small number of packets from it and observing the response of the aggregate. We offer two methods for conformance testing. In the first method, we apply the perturbation tests to SYN packets being sent at the start of the TCP 3-way handshake, and we use the fact that the rate of ACK packets being exchanged in the handshake should follow the rate of perturbations. In the second method, we apply the perturbation tests to the TCP data packets and use the fact that the rate of retransmitted data packets should follow the rate of perturbations. In both methods, we use signature based perturbations, which means packet drops are performed with a rate given by a function of time. We use analogy of our problem with multiple access communication to find signatures. Specifically, we assign orthogonal CDMA based signatures to different routers in a distributed implementation of our methods. As a result of orthogonality, the performance does not degrade because of cross interference made by simultaneously testing routers. We have shown efficacy of our methods through mathematical analysis and extensive simulation experiments.
Resumo:
Gemstone Team AUDIO (Assessing and Understanding Deaf Individuals' Occupations)
Resumo:
Gemstone Team FLIP (File Lending in Proximity)
Resumo:
Americans are accustomed to a wide range of data collection in their lives: census, polls, surveys, user registrations, and disclosure forms. When logging onto the Internet, users’ actions are being tracked everywhere: clicking, typing, tapping, swiping, searching, and placing orders. All of this data is stored to create data-driven profiles of each user. Social network sites, furthermore, set the voluntarily sharing of personal data as the default mode of engagement. But people’s time and energy devoted to creating this massive amount of data, on paper and online, are taken for granted. Few people would consider their time and energy spent on data production as labor. Even if some people do acknowledge their labor for data, they believe it is accessory to the activities at hand. In the face of pervasive data collection and the rising time spent on screens, why do people keep ignoring their labor for data? How has labor for data been become invisible, as something that is disregarded by many users? What does invisible labor for data imply for everyday cultural practices in the United States? Invisible Labor for Data addresses these questions. I argue that three intertwined forces contribute to framing data production as being void of labor: data production institutions throughout history, the Internet’s technological infrastructure (especially with the implementation of algorithms), and the multiplication of virtual spaces. There is a common tendency in the framework of human interactions with computers to deprive data and bodies of their materiality. My Introduction and Chapter 1 offer theoretical interventions by reinstating embodied materiality and redefining labor for data as an ongoing process. The middle Chapters present case studies explaining how labor for data is pushed to the margin of the narratives about data production. I focus on a nationwide debate in the 1960s on whether the U.S. should build a databank, contemporary Big Data practices in the data broker and the Internet industries, and the group of people who are hired to produce data for other people’s avatars in the virtual games. I conclude with a discussion on how the new development of crowdsourcing projects may usher in the new chapter in exploiting invisible and discounted labor for data.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
Datacenters have emerged as the dominant form of computing infrastructure over the last two decades. The tremendous increase in the requirements of data analysis has led to a proportional increase in power consumption and datacenters are now one of the fastest growing electricity consumers in the United States. Another rising concern is the loss of throughput due to network congestion. Scheduling models that do not explicitly account for data placement may lead to a transfer of large amounts of data over the network causing unacceptable delays. In this dissertation, we study different scheduling models that are inspired by the dual objectives of minimizing energy costs and network congestion in a datacenter. As datacenters are equipped to handle peak workloads, the average server utilization in most datacenters is very low. As a result, one can achieve huge energy savings by selectively shutting down machines when demand is low. In this dissertation, we introduce the network-aware machine activation problem to find a schedule that simultaneously minimizes the number of machines necessary and the congestion incurred in the network. Our model significantly generalizes well-studied combinatorial optimization problems such as hard-capacitated hypergraph covering and is thus strongly NP-hard. As a result, we focus on finding good approximation algorithms. Data-parallel computation frameworks such as MapReduce have popularized the design of applications that require a large amount of communication between different machines. Efficient scheduling of these communication demands is essential to guarantee efficient execution of the different applications. In the second part of the thesis, we study the approximability of the co-flow scheduling problem that has been recently introduced to capture these application-level demands. Finally, we also study the question, "In what order should one process jobs?'' Often, precedence constraints specify a partial order over the set of jobs and the objective is to find suitable schedules that satisfy the partial order. However, in the presence of hard deadline constraints, it may be impossible to find a schedule that satisfies all precedence constraints. In this thesis we formalize different variants of job scheduling with soft precedence constraints and conduct the first systematic study of these problems.
Resumo:
In energy harvesting communications, users transmit messages using energy harvested from nature. In such systems, transmission policies of the users need to be carefully designed according to the energy arrival profiles. When the energy management policies are optimized, the resulting performance of the system depends only on the energy arrival profiles. In this dissertation, we introduce and analyze the notion of energy cooperation in energy harvesting communications where users can share a portion of their harvested energy with the other users via wireless energy transfer. This energy cooperation enables us to control and optimize the energy arrivals at users to the extent possible. In the classical setting of cooperation, users help each other in the transmission of their data by exploiting the broadcast nature of wireless communications and the resulting overheard information. In contrast to the usual notion of cooperation, which is at the signal level, energy cooperation we introduce here is at the battery energy level. In a multi-user setting, energy may be abundant in one user in which case the loss incurred by transferring it to another user may be less than the gain it yields for the other user. It is this cooperation that we explore in this dissertation for several multi-user scenarios, where energy can be transferred from one user to another through a separate wireless energy transfer unit. We first consider the offline optimal energy management problem for several basic multi-user network structures with energy harvesting transmitters and one-way wireless energy transfer. In energy harvesting transmitters, energy arrivals in time impose energy causality constraints on the transmission policies of the users. In the presence of wireless energy transfer, energy causality constraints take a new form: energy can flow in time from the past to the future for each user, and from one user to the other at each time. This requires a careful joint management of energy flow in two separate dimensions, and different management policies are required depending on how users share the common wireless medium and interact over it. In this context, we analyze several basic multi-user energy harvesting network structures with wireless energy transfer. To capture the main trade-offs and insights that arise due to wireless energy transfer, we focus our attention on simple two- and three-user communication systems, such as the relay channel, multiple access channel and the two-way channel. Next, we focus on the delay minimization problem for networks. We consider a general network topology of energy harvesting and energy cooperating nodes. Each node harvests energy from nature and all nodes may share a portion of their harvested energies with neighboring nodes through energy cooperation. We consider the joint data routing and capacity assignment problem for this setting under fixed data and energy routing topologies. We determine the joint routing of energy and data in a general multi-user scenario with data and energy transfer. Next, we consider the cooperative energy harvesting diamond channel, where the source and two relays harvest energy from nature and the physical layer is modeled as a concatenation of a broadcast and a multiple access channel. Since the broadcast channel is degraded, one of the relays has the message of the other relay. Therefore, the multiple access channel is an extended multiple access channel with common data. We determine the optimum power and rate allocation policies of the users in order to maximize the end-to-end throughput of this system. Finally, we consider the two-user cooperative multiple access channel with energy harvesting users. The users cooperate at the physical layer (data cooperation) by establishing common messages through overheard signals and then cooperatively sending them. For this channel model, we investigate the effect of intermittent data arrivals to the users. We find the optimal offline transmit power and rate allocation policy that maximize the departure region. When the users can further cooperate at the battery level (energy cooperation), we find the jointly optimal offline transmit power and rate allocation policy together with the energy transfer policy that maximize the departure region.
Resumo:
The past few decades have witnessed the widespread adaptation of wireless devices such as cellular phones and Wifi-connected laptops, and demand for wireless communication is expected to continue to increase. Though radio frequency (RF) communication has traditionally dominated in this application space, recent decades have seen an increasing interest in the use of optical wireless (OW) communication to supplement RF communications. In contrast to RF communication technology, OW systems offer the use of largely unregulated electromagnetic spectrum and large bandwidths for communication. They also offer the potential to be highly secure against jamming and eavesdropping. Interest in OW has become especially keen in light of the maturation of light-emitting diode (LED) technology. This maturation, and the consequent emerging ubiquity of LED technology in lighting systems, has motivated the exploration of LEDs for wireless communication purposes in a wide variety of applications. Recent interest in this field has largely focused on the potential for indoor local area networks (LANs) to be realized with increasingly common LED-based lighting systems. We envision the use of LED-based OW to serve as a supplement to RF technology in communication between mobile platforms, which may include automobiles, robots, or unmanned aerial vehicles (UAVs). OW technology may be especially useful in what are known as RF-denied environments, in which RF communication may be prohibited or undesirable. The use of OW in these settings presents major challenges. In contrast to many RF systems, OWsystems that operate at ranges beyond a few meters typically require relatively precise alignment. For example, some laser-based optical wireless communication systems require alignment precision to within small fractions of a degree. This level of alignment precision can be difficult to maintain between mobile platforms. Additionally, the use of OW systems in outdoor settings presents the challenge of interference from ambient light, which can be much brighter than any LED transmitter. This thesis addresses these challenges to the use of LED-based communication between mobile platforms. We propose and analyze a dual-link LED-based system that uses one link with a wide transmission beam and relaxed alignment constraints to support a more narrow, precisely aligned, higher-data-rate link. The use of an optical link with relaxed alignment constraints to support the alignment of a more precisely aligned link motivates our exploration of a panoramic imaging receiver for estimating the range and bearing of neighboring nodes. The precision of such a system is analyzed and an experimental system is realized. Finally, we present an experimental prototype of a self-aligning LED-based link.