10 resultados para Dynamic storage allocation (Computer science)

em DRUM (Digital Repository at the University of Maryland)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Deployment of low power basestations within cellular networks can potentially increase both capacity and coverage. However, such deployments require efficient resource allocation schemes for managing interference from the low power and macro basestations that are located within each other’s transmission range. In this dissertation, we propose novel and efficient dynamic resource allocation algorithms in the frequency, time and space domains. We show that the proposed algorithms perform better than the current state-of-art resource management algorithms. In the first part of the dissertation, we propose an interference management solution in the frequency domain. We introduce a distributed frequency allocation scheme that shares frequencies between macro and low power pico basestations, and guarantees a minimum average throughput to users. The scheme seeks to minimize the total number of frequencies needed to honor the minimum throughput requirements. We evaluate our scheme using detailed simulations and show that it performs on par with the centralized optimum allocation. Moreover, our proposed scheme outperforms a static frequency reuse scheme and the centralized optimal partitioning between the macro and picos. In the second part of the dissertation, we propose a time domain solution to the interference problem. We consider the problem of maximizing the alpha-fairness utility over heterogeneous wireless networks (HetNets) by jointly optimizing user association, wherein each user is associated to any one transmission point (TP) in the network, and activation fractions of all TPs. Activation fraction of a TP is the fraction of the frame duration for which it is active, and together these fractions influence the interference seen in the network. To address this joint optimization problem which we show is NP-hard, we propose an alternating optimization based approach wherein the activation fractions and the user association are optimized in an alternating manner. The subproblem of determining the optimal activation fractions is solved using a provably convergent auxiliary function method. On the other hand, the subproblem of determining the user association is solved via a simple combinatorial algorithm. Meaningful performance guarantees are derived in either case. Simulation results over a practical HetNet topology reveal the superior performance of the proposed algorithms and underscore the significant benefits of the joint optimization. In the final part of the dissertation, we propose a space domain solution to the interference problem. We consider the problem of maximizing system utility by optimizing over the set of user and TP pairs in each subframe, where each user can be served by multiple TPs. To address this optimization problem which is NP-hard, we propose a solution scheme based on difference of submodular function optimization approach. We evaluate our scheme using detailed simulations and show that it performs on par with a much more computationally demanding difference of convex function optimization scheme. Moreover, the proposed scheme performs within a reasonable percentage of the optimal solution. We further demonstrate the advantage of the proposed scheme by studying its performance with variation in different network topology parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing complexity of today's software, the software development process is becoming highly time and resource consuming. The increasing number of software configurations, input parameters, usage scenarios, supporting platforms, external dependencies, and versions plays an important role in expanding the costs of maintaining and repairing unforeseeable software faults. To repair software faults, developers spend considerable time in identifying the scenarios leading to those faults and root-causing the problems. While software debugging remains largely manual, it is not the case with software testing and verification. The goal of this research is to improve the software development process in general, and software debugging process in particular, by devising techniques and methods for automated software debugging, which leverage the advances in automatic test case generation and replay. In this research, novel algorithms are devised to discover faulty execution paths in programs by utilizing already existing software test cases, which can be either automatically or manually generated. The execution traces, or alternatively, the sequence covers of the failing test cases are extracted. Afterwards, commonalities between these test case sequence covers are extracted, processed, analyzed, and then presented to the developers in the form of subsequences that may be causing the fault. The hypothesis is that code sequences that are shared between a number of faulty test cases for the same reason resemble the faulty execution path, and hence, the search space for the faulty execution path can be narrowed down by using a large number of test cases. To achieve this goal, an efficient algorithm is implemented for finding common subsequences among a set of code sequence covers. Optimization techniques are devised to generate shorter and more logical sequence covers, and to select subsequences with high likelihood of containing the root cause among the set of all possible common subsequences. A hybrid static/dynamic analysis approach is designed to trace back the common subsequences from the end to the root cause. A debugging tool is created to enable developers to use the approach, and integrate it with an existing Integrated Development Environment. The tool is also integrated with the environment's program editors so that developers can benefit from both the tool suggestions, and their source code counterparts. Finally, a comparison between the developed approach and the state-of-the-art techniques shows that developers need only to inspect a small number of lines in order to find the root cause of the fault. Furthermore, experimental evaluation shows that the algorithm optimizations lead to better results in terms of both the algorithm running time and the output subsequence length.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Graphical User Interface (GUI) is an integral component of contemporary computer software. A stable and reliable GUI is necessary for correct functioning of software applications. Comprehensive verification of the GUI is a routine part of most software development life-cycles. The input space of a GUI is typically large, making exhaustive verification difficult. GUI defects are often revealed by exercising parts of the GUI that interact with each other. It is challenging for a verification method to drive the GUI into states that might contain defects. In recent years, model-based methods, that target specific GUI interactions, have been developed. These methods create a formal model of the GUI’s input space from specification of the GUI, visible GUI behaviors and static analysis of the GUI’s program-code. GUIs are typically dynamic in nature, whose user-visible state is guided by underlying program-code and dynamic program-state. This research extends existing model-based GUI testing techniques by modelling interactions between the visible GUI of a GUI-based software and its underlying program-code. The new model is able to, efficiently and effectively, test the GUI in ways that were not possible using existing methods. The thesis is this: Long, useful GUI testcases can be created by examining the interactions between the GUI, of a GUI-based application, and its program-code. To explore this thesis, a model-based GUI testing approach is formulated and evaluated. In this approach, program-code level interactions between GUI event handlers will be examined, modelled and deployed for constructing long GUI testcases. These testcases are able to drive the GUI into states that were not possible using existing models. Implementation and evaluation has been conducted using GUITAR, a fully-automated, open-source GUI testing framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The central motif of this work is prediction and optimization in presence of multiple interacting intelligent agents. We use the phrase `intelligent agents' to imply in some sense, a `bounded rationality', the exact meaning of which varies depending on the setting. Our agents may not be `rational' in the classical game theoretic sense, in that they don't always optimize a global objective. Rather, they rely on heuristics, as is natural for human agents or even software agents operating in the real-world. Within this broad framework we study the problem of influence maximization in social networks where behavior of agents is myopic, but complication stems from the structure of interaction networks. In this setting, we generalize two well-known models and give new algorithms and hardness results for our models. Then we move on to models where the agents reason strategically but are faced with considerable uncertainty. For such games, we give a new solution concept and analyze a real-world game using out techniques. Finally, the richest model we consider is that of Network Cournot Competition which deals with strategic resource allocation in hypergraphs, where agents reason strategically and their interaction is specified indirectly via player's utility functions. For this model, we give the first equilibrium computability results. In all of the above problems, we assume that payoffs for the agents are known. However, for real-world games, getting the payoffs can be quite challenging. To this end, we also study the inverse problem of inferring payoffs, given game history. We propose and evaluate a data analytic framework and we show that it is fast and performant.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the last decade, success of social networks has significantly reshaped how people consume information. Recommendation of contents based on user profiles is well-received. However, as users become dominantly mobile, little is done to consider the impacts of the wireless environment, especially the capacity constraints and changing channel. In this dissertation, we investigate a centralized wireless content delivery system, aiming to optimize overall user experience given the capacity constraints of the wireless networks, by deciding what contents to deliver, when and how. We propose a scheduling framework that incorporates content-based reward and deliverability. Our approach utilizes the broadcast nature of wireless communication and social nature of content, by multicasting and precaching. Results indicate this novel joint optimization approach outperforms existing layered systems that separate recommendation and delivery, especially when the wireless network is operating at maximum capacity. Utilizing limited number of transmission modes, we significantly reduce the complexity of the optimization. We also introduce the design of a hybrid system to handle transmissions for both system recommended contents ('push') and active user requests ('pull'). Further, we extend the joint optimization framework to the wireless infrastructure with multiple base stations. The problem becomes much harder in that there are many more system configurations, including but not limited to power allocation and how resources are shared among the base stations ('out-of-band' in which base stations transmit with dedicated spectrum resources, thus no interference; and 'in-band' in which they share the spectrum and need to mitigate interference). We propose a scalable two-phase scheduling framework: 1) each base station obtains delivery decisions and resource allocation individually; 2) the system consolidates the decisions and allocations, reducing redundant transmissions. Additionally, if the social network applications could provide the predictions of how the social contents disseminate, the wireless networks could schedule the transmissions accordingly and significantly improve the dissemination performance by reducing the delivery delay. We propose a novel method utilizing: 1) hybrid systems to handle active disseminating requests; and 2) predictions of dissemination dynamics from the social network applications. This method could mitigate the performance degradation for content dissemination due to wireless delivery delay. Results indicate that our proposed system design is both efficient and easy to implement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class­ based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state­ of ­the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state­ of­ the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.