4 resultados para Modern science based resource management

em DRUM (Digital Repository at the University of Maryland)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Deployment of low power basestations within cellular networks can potentially increase both capacity and coverage. However, such deployments require efficient resource allocation schemes for managing interference from the low power and macro basestations that are located within each other’s transmission range. In this dissertation, we propose novel and efficient dynamic resource allocation algorithms in the frequency, time and space domains. We show that the proposed algorithms perform better than the current state-of-art resource management algorithms. In the first part of the dissertation, we propose an interference management solution in the frequency domain. We introduce a distributed frequency allocation scheme that shares frequencies between macro and low power pico basestations, and guarantees a minimum average throughput to users. The scheme seeks to minimize the total number of frequencies needed to honor the minimum throughput requirements. We evaluate our scheme using detailed simulations and show that it performs on par with the centralized optimum allocation. Moreover, our proposed scheme outperforms a static frequency reuse scheme and the centralized optimal partitioning between the macro and picos. In the second part of the dissertation, we propose a time domain solution to the interference problem. We consider the problem of maximizing the alpha-fairness utility over heterogeneous wireless networks (HetNets) by jointly optimizing user association, wherein each user is associated to any one transmission point (TP) in the network, and activation fractions of all TPs. Activation fraction of a TP is the fraction of the frame duration for which it is active, and together these fractions influence the interference seen in the network. To address this joint optimization problem which we show is NP-hard, we propose an alternating optimization based approach wherein the activation fractions and the user association are optimized in an alternating manner. The subproblem of determining the optimal activation fractions is solved using a provably convergent auxiliary function method. On the other hand, the subproblem of determining the user association is solved via a simple combinatorial algorithm. Meaningful performance guarantees are derived in either case. Simulation results over a practical HetNet topology reveal the superior performance of the proposed algorithms and underscore the significant benefits of the joint optimization. In the final part of the dissertation, we propose a space domain solution to the interference problem. We consider the problem of maximizing system utility by optimizing over the set of user and TP pairs in each subframe, where each user can be served by multiple TPs. To address this optimization problem which is NP-hard, we propose a solution scheme based on difference of submodular function optimization approach. We evaluate our scheme using detailed simulations and show that it performs on par with a much more computationally demanding difference of convex function optimization scheme. Moreover, the proposed scheme performs within a reasonable percentage of the optimal solution. We further demonstrate the advantage of the proposed scheme by studying its performance with variation in different network topology parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ecological risk assessment (ERA) is a framework for monitoring risks of exposure and adverse effects of environmental stressors to populations or communities of interest. One tool of ERA is the biomarker, which is a characteristic of an organism that reliably indicates exposure to or effects of a stressor like chemical pollution. Traditional biomarkers which rely on characteristics at the tissue level and higher often detect only acute exposures to stressors. Sensitive molecular biomarkers may detect lower stressor levels than traditional biomarkers, which helps inform risk mitigation and restoration efforts before populations and communities are irreversibly affected. In this study I developed gene expression-based molecular biomarkers of exposure to metals and insecticides in the model toxicological freshwater amphipod Hyalella azteca. My goals were to not only create sensitive molecular biomarkers for these chemicals, but also to show the utility and versatility of H. azteca in molecular studies for toxicology and risk assessment. I sequenced and assembled the H. azteca transcriptome to identify reference and stress-response gene transcripts suitable for expression monitoring. I exposed H. azteca to sub-lethal concentrations of metals (cadmium and copper) and insecticides (DDT, permethrin, and imidacloprid). Reference genes used to create normalization factors were determined for each exposure using the programs BestKeeper, GeNorm, and NormFinder. Both metals increased expression of a nuclear transcription factor (Cnc), an ABC transporter (Mrp4), and a heat shock protein (Hsp90), giving evidence of general metal exposure signature. Cadmium uniquely increased expression of a DNA repair protein (Rad51) and increased Mrp4 expression more than copper (7-fold increase compared to 2-fold increase). Together these may be unique biomarkers distinguishing cadmium and copper exposures. DDT increased expression of Hsp90, Mrp4, and the immune response gene Lgbp. Permethrin increased expression of a cytochrome P450 (Cyp2j2) and decreased expression of the immune response gene Lectin-1. Imidacloprid did not affect gene expression. Unique biomarkers were seen for DDT and permethrin, but the genes studied were not sensitive enough to detect imidacloprid at the levels used here. I demonstrated that gene expression in H. azteca detects specific chemical exposures at sub-lethal concentrations, making expression monitoring using this amphipod a useful and sensitive biomarker for risk assessment of chemical exposure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research investigates the feasibility of using web-based project management systems for dredging. To achieve this objective the research assessed both the positive and negative aspects of using web-based technology for the management of dredging projects. Information gained from literature review and prior investigations of dredging projects revealed that project performance, social, political, technical, and business aspects of the organization were important factors in deciding to use web-based systems for the management of dredging projects. These factors were used to develop the research assumptions. An exploratory case study methodology was used to gather the empirical evidence and perform the analysis. An operational prototype of the system was developed to help evaluate developmental and functional requirements, as well as the influence on performance, and on the organization. The evidence gathered from three case study projects, and from a survey of 31 experts, were used to validate the assumptions. Baselines, representing the assumptions, were created as a reference to assess the responses and qualitative measures. The deviation of the responses was used to evaluate for the analysis. Finally, the conclusions were assessed by validating the assumptions with the evidence, derived from the analysis. The research findings are as follows: 1. The system would help improve project performance. 2. Resistance to implementation may be experienced if the system is implemented. Therefore, resistance to implementation needs to be investigated further and more R&D work is needed in order to advance to the final design and implementation. 3. System may be divided into standalone modules in order to simplify the system and facilitate incremental changes. 4. The QA/QC conceptual approach used by this research needs to be redefined during future R&D to satisfy both owners and contractors. Yin (2009) Case Study Research Design and Methods was used to develop the research approach, design, data collection, and analysis. Markus (1983) Resistance Theory was used during the assumptions definition to predict potential problems to the implementation of web-based project management systems for the dredging industry. Keen (1981) incremental changes and facilitative approach tactics were used as basis to classify solutions, and how to overcome resistance to implementation of the web-based project management system. Davis (1989) Technology Acceptance Model (TAM) was used to assess the solutions needed to overcome the resistances to the implementation of web-base management systems for dredging projects.