921 resultados para Data Storage Solutions
Resumo:
Replication Data Management (RDM) aims at enabling the use of data collections from several iterations of an experiment. However, there are several major challenges to RDM from integrating data models and data from empirical study infrastructures that were not designed to cooperate, e.g., data model variation of local data sources. [Objective] In this paper we analyze RDM needs and evaluate conceptual RDM approaches to support replication researchers. [Method] We adapted the ATAM evaluation process to (a) analyze RDM use cases and needs of empirical replication study research groups and (b) compare three conceptual approaches to address these RDM needs: central data repositories with a fixed data model, heterogeneous local repositories, and an empirical ecosystem. [Results] While the central and local approaches have major issues that are hard to resolve in practice, the empirical ecosystem allows bridging current gaps in RDM from heterogeneous data sources. [Conclusions] The empirical ecosystem approach should be explored in diverse empirical environments.
Resumo:
We dedicate this paper to the memory of Prof. Andres Perez Estaún, who was a great and committed scientist, wonderful colleague and even better friend. The datasets in this work have been funded by Fundación Ciudad de la Energía (Spanish Government, www.ciuden.es) and by the European Union through the “European Energy Programme 15 for Recovery” and the Compostilla OXYCFB300 project. Dr. Juan Alcalde is currently funded by NERC grant NE/M007251/1. Simon Campbell and Samuel Cheyney are acknowledged for thoughtful comments on gravity inversion
Resumo:
The multiobjective optimization model studied in this paper deals with simultaneous minimization of finitely many linear functions subject to an arbitrary number of uncertain linear constraints. We first provide a radius of robust feasibility guaranteeing the feasibility of the robust counterpart under affine data parametrization. We then establish dual characterizations of robust solutions of our model that are immunized against data uncertainty by way of characterizing corresponding solutions of robust counterpart of the model. Consequently, we present robust duality theorems relating the value of the robust model with the corresponding value of its dual problem.
Resumo:
In this paper we examine multi-objective linear programming problems in the face of data uncertainty both in the objective function and the constraints. First, we derive a formula for the radius of robust feasibility guaranteeing constraint feasibility for all possible scenarios within a specified uncertainty set under affine data parametrization. We then present numerically tractable optimality conditions for minmax robust weakly efficient solutions, i.e., the weakly efficient solutions of the robust counterpart. We also consider highly robust weakly efficient solutions, i.e., robust feasible solutions which are weakly efficient for any possible instance of the objective matrix within a specified uncertainty set, providing lower bounds for the radius of highly robust efficiency guaranteeing the existence of this type of solutions under affine and rank-1 objective data uncertainty. Finally, we provide numerically tractable optimality conditions for highly robust weakly efficient solutions.
Resumo:
No more published?
Resumo:
Mode of access: Internet.
Resumo:
Cover title.
Resumo:
Mode of access: Internet.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-04
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY WITH PRIOR ARRANGEMENT
Resumo:
The purpose of this paper is to investigate the technological development of electronic inventory solutions from perspective of patent analysis. We first applied the international patent classification to classify the top categories of data processing technologies and their corresponding top patenting countries. Then we identified the core technologies by the calculation of patent citation strength and standard deviation criterion for each patent. To eliminate those core innovations having no reference relationships with the other core patents, relevance strengths between core technologies were evaluated also. Our findings provide market intelligence not only for the research and development community, but for the decision making of advanced inventory solutions.
Resumo:
Electrical energy is an essential resource for the modern world. Unfortunately, its price has almost doubled in the last decade. Furthermore, energy production is also currently one of the primary sources of pollution. These concerns are becoming more important in data-centers. As more computational power is required to serve hundreds of millions of users, bigger data-centers are becoming necessary. This results in higher electrical energy consumption. Of all the energy used in data-centers, including power distribution units, lights, and cooling, computer hardware consumes as much as 80%. Consequently, there is opportunity to make data-centers more energy efficient by designing systems with lower energy footprint. Consuming less energy is critical not only in data-centers. It is also important in mobile devices where battery-based energy is a scarce resource. Reducing the energy consumption of these devices will allow them to last longer and re-charge less frequently. Saving energy in computer systems is a challenging problem. Improving a system's energy efficiency usually comes at the cost of compromises in other areas such as performance or reliability. In the case of secondary storage, for example, spinning-down the disks to save energy can incur high latencies if they are accessed while in this state. The challenge is to be able to increase the energy efficiency while keeping the system as reliable and responsive as before. This thesis tackles the problem of improving energy efficiency in existing systems while reducing the impact on performance. First, we propose a new technique to achieve fine grained energy proportionality in multi-disk systems; Second, we design and implement an energy-efficient cache system using flash memory that increases disk idleness to save energy; Finally, we identify and explore solutions for the page fetch-before-update problem in caching systems that can: (a) control better I/O traffic to secondary storage and (b) provide critical performance improvement for energy efficient systems.
Resumo:
Acknowledgements The authors would like to thank Jonathan Dick, Josie Geris, Jason Lessels, and Claire Tunaley for data collection and Audrey Innes for lab sample preparation. We also thank Christian Birkel for discussions about the model structure and comments on an earlier draft of the paper. Climatic data were provided by Iain Malcolm and Marine Scotland Fisheries at the Freshwater Lab, Pitlochry. Additional precipitation data were provided by the UK Meteorological Office and the British Atmospheric Data Centre (BADC).We thank the European Research Council ERC (project GA 335910 VEWA) for funding the VeWa project.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.