983 resultados para Data access


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Open Educational Resources (OER) are teaching, learning and research materials that have been released under an open licence that permits online access and re-use by others. The 2012 Paris OER Declaration encourages the open licensing of educational materials produced with public funds. Digital data and data sets produced as a result of scientific and non-scientific research are an increasingly important category of educational materials. This paper discusses the legal challenges presented when publicly funded research data is made available as OER, arising from intellectual property rights, confidentiality and information privacy laws, and the lack of a legal duty to ensure data quality. If these legal challenges are not understood, addressed and effectively managed, they may impede and restrict access to and re-use of research data. This paper identifies some of the legal challenges that need to be addressed and describes 10 proposed best practices which are recommended for adoption to so that publicly funded research data can be made available for access and re-use as OER.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The world has experienced a large increase in the amount of available data. Therefore, it requires better and more specialized tools for data storage and retrieval and information privacy. Recently Electronic Health Record (EHR) Systems have emerged to fulfill this need in health systems. They play an important role in medicine by granting access to information that can be used in medical diagnosis. Traditional systems have a focus on the storage and retrieval of this information, usually leaving issues related to privacy in the background. Doctors and patients may have different objectives when using an EHR system: patients try to restrict sensible information in their medical records to avoid misuse information while doctors want to see as much information as possible to ensure a correct diagnosis. One solution to this dilemma is the Accountable e-Health model, an access protocol model based in the Information Accountability Protocol. In this model patients are warned when doctors access their restricted data. They also enable a non-restrictive access for authenticated doctors. In this work we use FluxMED, an EHR system, and augment it with aspects of the Information Accountability Protocol to address these issues. The Implementation of the Information Accountability Framework (IAF) in FluxMED provides ways for both patients and physicians to have their privacy and access needs achieved. Issues related to storage and data security are secured by FluxMED, which contains mechanisms to ensure security and data integrity. The effort required to develop a platform for the management of medical information is mitigated by the FluxMED's workflow-based architecture: the system is flexible enough to allow the type and amount of information being altered without the need to change in your source code.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider a setting in which several operators offer downlink wireless data access services in a certain geographical region. Each operator deploys several base stations or access points, and registers some subscribers. In such a situation, if operators pool their infrastructure, and permit the possibility of subscribers being served by any of the cooperating operators, then there can be overall better user satisfaction, and increased operator revenue. We use coalitional game theory to investigate such resource pooling and cooperation between operators.We use utility functions to model user satisfaction, and show that the resulting coalitional game has the property that if all operators cooperate (i.e., form a grand coalition) then there is an operating point that maximizes the sum utility over the operators while providing the operators revenues such that no subset of operators has an incentive to break away from the coalition. We investigate whether such operating points can result in utility unfairness between users of the various operators. We also study other revenue sharing concepts, namely, the nucleolus and the Shapely value. Such investigations throw light on criteria for operators to accept or reject subscribers, based on the service level agreements proposed by them. We also investigate the situation in which only certain subsets of operators may be willing to cooperate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Pharma(ceuticals) industry is at a cross-roads. There are growing concerns that illegitimate products are penetrating the supply chain. There are proposals in many countries to apply RFID and other traceability technologies to solve this problem. However there are several trade-offs and one of the most crucial is between data visibility and confidentiality. In this paper, we use the TrakChain assessment framework tools to study the US Pharma supply chain and to compare candidate solutions to achieve traceability data security: Point-of-Dispense Authentication, Network-based electronic Pedigree, and Document-based electronic Pedigree. We also propose extensions to a supply chain authorization language that is able to capture expressive data sharing conditions considered necessary by the industry's trading partners. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In data-intensive distributed systems, replication is the most widely used approach to offer high data availability, low bandwidth consumption, increased fault-tolerance and improved scalability of the overall system. Replication-based systems implement replica control protocols that enforce a specified semantics of accessing the data. Also, the performance depends on a number of factors, the chief of which is the protocol used to maintain consistency among object replica. In this paper, we propose a new low-cost and high data availability protocol called the box-shaped grid structure for maintaining consistency of replicated data on networked distributed computing systems. We show that the proposed protocol provides high data availability, low communication costs, and increased fault-tolerance as compared to the baseline replica control protocols.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ontology-Based Data Access (OBDA) permite el acceso a diferentes tipos de fuentes de datos (tradicionalmente bases de datos) usando un modelo más abstracto proporcionado por una ontología. La reescritura de consultas (query rewriting) usa una ontología para reescribir una consulta en una consulta reescrita que puede ser evaluada en la fuente de datos. Las consultas reescritas recuperan las respuestas que están implicadas por la combinación de los datos explicitamente almacenados en la fuente de datos, la consulta original y la ontología. Al trabajar sólo sobre las queries, la reescritura de consultas permite OBDA sobre cualquier fuente de datos que puede ser consultada, independientemente de las posibilidades para modificarla. Sin embargo, producir y evaluar las consultas reescritas son procesos costosos que suelen volverse más complejos conforme la expresividad y tamaño de la ontología y las consultas aumentan. En esta tesis exploramos distintas optimizaciones que peuden ser realizadas tanto en el proceso de reescritura como en las consultas reescritas para mejorar la aplicabilidad de OBDA en contextos realistas. Nuestra contribución técnica principal es un sistema de reescritura de consultas que implementa las optimizaciones presentadas en esta tesis. Estas optimizaciones son las contribuciones principales de la tesis y se pueden agrupar en tres grupos diferentes: -optimizaciones que se pueden aplicar al considerar los predicados en la ontología que no están realmente mapeados con las fuentes de datos. -optimizaciones en ingeniería que se pueden aplicar al manejar el proceso de reescritura de consultas en una forma que permite reducir la carga computacional del proceso de generación de consultas reescritas. -optimizaciones que se pueden aplicar al considerar metainformación adicional acerca de las características de la ABox. En esta tesis proporcionamos demostraciones formales acerca de la corrección y completitud de las optimizaciones propuestas, y una evaluación empírica acerca del impacto de estas optimizaciones. Como contribución adicional, parte de este enfoque empírico, proponemos un banco de pruebas (benchmark) para la evaluación de los sistemas de reescritura de consultas. Adicionalmente, proporcionamos algunas directrices para la creación y expansión de esta clase de bancos de pruebas. ABSTRACT Ontology-Based Data Access (OBDA) allows accessing different kinds of data sources (traditionally databases) using a more abstract model provided by an ontology. Query rewriting uses such ontology to rewrite a query into a rewritten query that can be evaluated on the data source. The rewritten queries retrieve the answers that are entailed by the combination of the data explicitly stored in the data source, the original query and the ontology. However, producing and evaluating the rewritten queries are both costly processes that become generally more complex as the expressiveness and size of the ontology and queries increase. In this thesis we explore several optimisations that can be performed both in the rewriting process and in the rewritten queries to improve the applicability of OBDA in real contexts. Our main technical contribution is a query rewriting system that implements the optimisations presented in this thesis. These optimisations are the core contributions of the thesis and can be grouped into three different groups: -optimisations that can be applied when considering the predicates in the ontology that are actually mapped to the data sources. -engineering optimisations that can be applied by handling the process of query rewriting in a way that permits to reduce the computational load of the query generation process. -optimisations that can be applied when considering additional metainformation about the characteristics of the ABox. In this thesis we provide formal proofs for the correctness of the proposed optimisations, and an empirical evaluation about the impact of the optimisations. As an additional contribution, part of this empirical approach, we propose a benchmark for the evaluation of query rewriting systems. We also provide some guidelines for the creation and expansion of this kind of benchmarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.