919 resultados para Data access
Resumo:
El presente trabajo se ha centrado en la investigación de soluciones para automatizar la tarea del enriquecimiento de fuentes de datos sobre redes de sensores con descripciones lingüísticas, con el fin de facilitar la posterior generación de textos en lenguaje natural. El uso de descripciones en lenguaje natural facilita el acceso a los datos a una mayor diversidad de usuarios y, como consecuencia, permite aprovechar mejor las inversiones en redes de sensores. En el trabajo se ha considerado el uso de bases de datos abiertas para abordar la necesidad de disponer de un gran volumen y diversidad de conocimiento geográfico. Se ha analizado también el enriquecimiento de datos dentro de enfoques metodológicos de curación de datos y métodos de generación de lenguaje natural. Como resultado del trabajo, se ha planteado un método general basado en una estrategia de generación y prueba que incluye una forma de representación y uso del conocimiento heurístico con varias etapas de razonamiento para la construcción de descripciones lingüísticas de enriquecimiento de datos. En la evaluación de la propuesta general se han manejado tres escenarios, dos de ellos para generación de referencias geográficas sobre redes de sensores complejas de dimensión real y otro para la generación de referencias temporales. Los resultados de la evaluación han mostrado la validez práctica de la propuesta general exhibiendo mejoras de rendimiento respecto a otros enfoques. Además, el análisis de los resultados ha permitido identificar y cuantificar el impacto previsible de diversas líneas de mejora en bases de datos abiertas. ABSTRACT This work has focused on the search for solutions to automate the task of enrichment sensor-network-based data sources with textual descriptions, so as to facilitate the generation of natural language texts. Using natural language descriptions facilitates data access to a wider range of users and, therefore, allows better leveraging investments in sensor networks. In this work we have considered the use of open databases to address the need for a large volume and diversity of geographical knowledge. We have also analyzed data enrichment in methodological approaches and data curation methods of natural language generation. As a result, it has raised a general method based on a strategy of generating and testing that includes a representation using heuristic knowledge with several stages of reasoning for the construction of linguistic descriptions of data enrichment. In assessing the overall proposal three scenarios have been addressed, two of them in the environmental domain with complex sensor networks and another real dimension in the time domain. The evaluation results have shown the validity and practicality of our proposal, showing performance improvements over other approaches. Furthermore, the analysis of the results has allowed identifying and quantifying the expected impact of various lines of improvement in open databases.
Resumo:
- Mobile telecommunications markets are an important part of the European Commission’s strategy for the completion of the European Union Digital Single. The use of mobile telecommunications – particularly mobile data access – is growing and becoming an increasingly important input for the economy. - The EU currently does not have a unified mobile telecommunications market. The EU compares favourably to the United States in terms of prices and connection speed, but lags behind in terms of coverage of high-speed 4G wireless connections. -Europe’s long-term goal should be to make data access easier by increasing highspeed wireless coverage while keeping prices down for users. An increase in cross-border competition could help to achieve that goal. - The Commission has two important levers to help stimulate cross-border supply:(a) ensuring competition in intra-country mobile markets in order to provide an incentive for operators to expand into other jurisdictions, and (b) reducing mobile operators’ costs of expansion into multiple EU countries. The further development of policies on international roaming and radio spectrum management will be central to this effort.
Resumo:
The Continuous Plankton Recorder (CPR) survey, operated by the Sir Alister Hardy Foundation for Ocean Science (SAHFOS), is the largest plankton monitoring programme in the world and has spanned > 70 yr. The dataset contains information from -200 000 samples, with over 2.3 million records of individual taxa. Here we outline the evolution of the CPR database through changes in technology, and how this has increased data access. Recent high-impact publications and the expanded role of CPR data in marine management demonstrate the usefulness of the dataset. We argue that solely supplying data to the research community is not sufficient in the current research climate; to promote wider use, additional tools need to be developed to provide visual representation and summary statistics. We outline 2 software visualisation tools, SAHFOS WinCPR and the digital CPR Atlas, which provide access to CPR data for both researchers and non-plankton specialists. We also describe future directions of the database, data policy and the development of visualisation tools. We believe that the approach at SAHFOS to increase data accessibility and provide new visualisation tools has enhanced awareness of the data and led to the financial security of the organisation; it also provides a good model of how long-term monitoring programmes can evolve to help secure their future.
Resumo:
The purpose of this study was to develop, explicate, and validate a comprehensive model in order to more effectively assess community injury prevention needs, plan and target efforts, identify potential interventions, and provide a framework for an outcome-based evaluation of the effectiveness of interventions. A systems model approach was developed to conceptualize the major components of inputs, efforts, outcomes and feedback within a community setting. Profiling of multiple data sources demonstrated a community feedback mechanism that increased awareness of priority issues and elicited support from traditional as well as non-traditional injury prevention partners. Injury countermeasures including education, enforcement, engineering, and economic incentives were presented for their potential synergistic effect impacting on knowledge, attitudes, or behaviors of a targeted population. Levels of outcome data were classified into ultimate, intermediate and immediate indicators to assist with determining the effectiveness of intervention efforts. A collaboration between business and health care was successful in achieving data access and use of an emergency department level of injury data for monitoring of the impact of community interventions. Evaluation of injury events and preventive efforts within the context of a dynamic community systems environment was applied to a study community with examples detailing actual profiling and trending of injuries. The resulting model of community injury prevention was validated using a community focus group, community injury prevention coordinators, and injury prevention national experts. ^
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds—parent-to-child or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5–54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3–127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.
Resumo:
Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds - parent-tochild or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5-54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3-127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
Given the growing demand for the development of mobile applications, driven by use increasingly common in smartphones and tablets grew in society the need for remote data access in full in the use of mobile application without connectivity environments where there is no provision network access at all times. Given this reality, this work proposes a framework that present main functions are the provision of a persistence mechanism, replication and data synchronization, contemplating the creation, deletion, update and display persisted or requested data, even though the mobile device without connectivity with the network. From the point of view of the architecture and programming practices, it reflected in defining strategies for the main functions of the framework are met. Through a controlled study was to validate the solution proposal, being found as the gains in reducing the number of lines code and the amount of time required to perform the development of an application without there being significant increase for the operations.
Resumo:
Given the growing demand for the development of mobile applications, driven by use increasingly common in smartphones and tablets grew in society the need for remote data access in full in the use of mobile application without connectivity environments where there is no provision network access at all times. Given this reality, this work proposes a framework that present main functions are the provision of a persistence mechanism, replication and data synchronization, contemplating the creation, deletion, update and display persisted or requested data, even though the mobile device without connectivity with the network. From the point of view of the architecture and programming practices, it reflected in defining strategies for the main functions of the framework are met. Through a controlled study was to validate the solution proposal, being found as the gains in reducing the number of lines code and the amount of time required to perform the development of an application without there being significant increase for the operations.
Resumo:
A well-documented, publicly available, global data set of surface ocean carbon dioxide (CO2) parameters has been called for by international groups for nearly two decades. The Surface Ocean CO2 Atlas (SOCAT) project was initiated by the international marine carbon science community in 2007 with the aim of providing a comprehensive, publicly available, regularly updated, global data set of marine surface CO2, which had been subject to quality control (QC). Many additional CO2 data, not yet made public via the Carbon Dioxide Information Analysis Center (CDIAC), were retrieved from data originators, public websites and other data centres. All data were put in a uniform format following a strict protocol. Quality control was carried out according to clearly defined criteria. Regional specialists performed the quality control, using state-of-the-art web-based tools, specially developed for accomplishing this global team effort. SOCAT version 1.5 was made public in September 2011 and holds 6.3 million quality controlled surface CO2 data points from the global oceans and coastal seas, spanning four decades (1968-2007). Three types of data products are available: individual cruise files, a merged complete data set and gridded products. With the rapid expansion of marine CO2 data collection and the importance of quantifying net global oceanic CO2 uptake and its changes, sustained data synthesis and data access are priorities.
Resumo:
Data access and analyses were funded by Boehringer Ingelheim, who played no role in the conduct or reporting of the study.
Resumo:
We propose an ISA extension that decouples the data access and register write operations in a load instruction. We describe system and hardware support for decoupled loads. Furthermore, we show how compilers can generate better static instruction schedules by hoisting a decoupled load’s data access above may-alias stores and branches. We find that decoupled loads improve performance with geometric mean speedups of 8.4%.
Resumo:
Visualization and interpretation of geological observations into a cohesive geological model are essential to Earth sciences and related fields. Various emerging technologies offer approaches to multi-scale visualization of heterogeneous data, providing new opportunities that facilitate model development and interpretation processes. These include increased accessibility to 3D scanning technology, global connectivity, and Web-based interactive platforms. The geological sciences and geological engineering disciplines are adopting these technologies as volumes of data and physical samples greatly increase. However, a standardized and universally agreed upon workflow and approach have yet to properly be developed. In this thesis, the 3D scanning workflow is presented as a foundation for a virtual geological database. This database provides augmented levels of tangibility to students and researchers who have little to no access to locations that are remote or inaccessible. A Web-GIS platform was utilized jointly with customized widgets developed throughout the course of this research to aid in visualizing hand-sized/meso-scale geological samples within a geologic and geospatial context. This context is provided as a macro-scale GIS interface, where geophysical and geodetic images and data are visualized. Specifically, an interactive interface is developed that allows for simultaneous visualization to improve the understanding of geological trends and relationships. These developed tools will allow for rapid data access and global sharing, and will facilitate comprehension of geological models using multi-scale heterogeneous observations.
Resumo:
Modern software applications are becoming more dependent on database management systems (DBMSs). DBMSs are usually used as black boxes by software developers. For example, Object-Relational Mapping (ORM) is one of the most popular database abstraction approaches that developers use nowadays. Using ORM, objects in Object-Oriented languages are mapped to records in the database, and object manipulations are automatically translated to SQL queries. As a result of such conceptual abstraction, developers do not need deep knowledge of databases; however, all too often this abstraction leads to inefficient and incorrect database access code. Thus, this thesis proposes a series of approaches to improve the performance of database-centric software applications that are implemented using ORM. Our approaches focus on troubleshooting and detecting inefficient (i.e., performance problems) database accesses in the source code, and we rank the detected problems based on their severity. We first conduct an empirical study on the maintenance of ORM code in both open source and industrial applications. We find that ORM performance-related configurations are rarely tuned in practice, and there is a need for tools that can help improve/tune the performance of ORM-based applications. Thus, we propose approaches along two dimensions to help developers improve the performance of ORM-based applications: 1) helping developers write more performant ORM code; and 2) helping developers configure ORM configurations. To provide tooling support to developers, we first propose static analysis approaches to detect performance anti-patterns in the source code. We automatically rank the detected anti-pattern instances according to their performance impacts. Our study finds that by resolving the detected anti-patterns, the application performance can be improved by 34% on average. We then discuss our experience and lessons learned when integrating our anti-pattern detection tool into industrial practice. We hope our experience can help improve the industrial adoption of future research tools. However, as static analysis approaches are prone to false positives and lack runtime information, we also propose dynamic analysis approaches to further help developers improve the performance of their database access code. We propose automated approaches to detect redundant data access anti-patterns in the database access code, and our study finds that resolving such redundant data access anti-patterns can improve application performance by an average of 17%. Finally, we propose an automated approach to tune performance-related ORM configurations using both static and dynamic analysis. Our study shows that our approach can help improve application throughput by 27--138%. Through our case studies on real-world applications, we show that all of our proposed approaches can provide valuable support to developers and help improve application performance significantly.