46 resultados para multiple data sources

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, we demonstrate how it is possible to sharply image multiple object points. The Simultaneous Multiple Surface (SMS) design method has usually been presented as a method to couple N wave-front pairs with N surfaces, but recent findings show that when using N surfaces, we can obtain M image points when N

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Neighbourhood representation and scale used to measure the built environment have been treated in many ways. However, it is anything but clear what representation of neighbourhood is the most feasible in the existing literature. This paper presents an exhaustive analysis of built environment attributes through three spatial scales. For this purpose multiple data sources are integrated, and a set of 943 observations is analysed. This paper simultaneously analyses the influence of two methodological issues in the study of the relationship between built environment and travel behaviour: (1) detailed representation of neighbourhood by testing different spatial scales; (2) the influence of unobserved individual sensitivity to built environment attributes. The results show that different spatial scales of built environment attributes produce different results. Hence, it is important to produce local and regional transport measures, according to geographical scale. Additionally, the results show significant sensitivity to built environment attributes depending on place of residence. This effect, called residential sorting, acquires different magnitudes depending on the geographical scale used to measure the built environment attributes. Spatial scales risk to the stability of model results. Hence, transportation modellers and planners must take into account both effects of self-selection and spatial scales.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Publishing Linked Data is a process that involves several design decisions and technologies. Although some initial guidelines have been already provided by Linked Data publishers, these are still far from covering all the steps that are necessary (from data source selection to publication) or giving enough details about all these steps, technologies, intermediate products, etc. Furthermore, given the variety of data sources from which Linked Data can be generated, we believe that it is possible to have a single and uni�ed method for publishing Linked Data, but we should rely on di�erent techniques, technologies and tools for particular datasets of a given domain. In this paper we present a general method for publishing Linked Data and the application of the method to cover di�erent sources from di�erent domains.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As the number of data sources publishing their data on the Web of Data is growing, we are experiencing an immense growth of the Linked Open Data cloud. The lack of control on the published sources, which could be untrustworthy or unreliable, along with their dynamic nature that often invalidates links and causes conflicts or other discrepancies, could lead to poor quality data. In order to judge data quality, a number of quality indicators have been proposed, coupled with quality metrics that quantify the “quality level” of a dataset. In addition to the above, some approaches address how to improve the quality of the datasets through a repair process that focuses on how to correct invalidities caused by constraint violations by either removing or adding triples. In this paper we argue that provenance is a critical factor that should be taken into account during repairs to ensure that the most reliable data is kept. Based on this idea, we propose quality metrics that take into account provenance and evaluate their applicability as repair guidelines in a particular data fusion setting.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent future, wireless sensor networks (WSNs) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of WSNs facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers (DCs). The high economical and environmental impact of the energy consumption in DCs requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of WSNs: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of DCs: energy-optimal workload assignment policies in heterogeneous DCs, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent future, wireless sensor networks ({WSNs}) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of {WSNs} facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers ({DCs}). The high economical and environmental impact of the energy consumption in {DCs} requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of {WSNs}: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of {DCs}: energy-optimal workload assignment policies in heterogeneous {DCs}, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La mayoría de las aplicaciones forestales del escaneo laser aerotransportado (ALS, del inglés airborne laser scanning) requieren la integración y uso simultaneo de diversas fuentes de datos, con el propósito de conseguir diversos objetivos. Los proyectos basados en sensores remotos normalmente consisten en aumentar la escala de estudio progresivamente a lo largo de varias fases de fusión de datos: desde la información más detallada obtenida sobre un área limitada (la parcela de campo), hasta una respuesta general de la cubierta forestal detectada a distancia de forma más incierta pero cubriendo un área mucho más amplia (la extensión cubierta por el vuelo o el satélite). Todas las fuentes de datos necesitan en ultimo termino basarse en las tecnologías de sistemas de navegación global por satélite (GNSS, del inglés global navigation satellite systems), las cuales son especialmente erróneas al operar por debajo del dosel forestal. Otras etapas adicionales de procesamiento, como la ortorectificación, también pueden verse afectadas por la presencia de vegetación, deteriorando la exactitud de las coordenadas de referencia de las imágenes ópticas. Todos estos errores introducen ruido en los modelos, ya que los predictores se desplazan de la posición real donde se sitúa su variable respuesta. El grado por el que las estimaciones forestales se ven afectadas depende de la dispersión espacial de las variables involucradas, y también de la escala utilizada en cada caso. Esta tesis revisa las fuentes de error posicional que pueden afectar a los diversos datos de entrada involucrados en un proyecto de inventario forestal basado en teledetección ALS, y como las propiedades del dosel forestal en sí afecta a su magnitud, aconsejando en consecuencia métodos para su reducción. También se incluye una discusión sobre las formas más apropiadas de medir exactitud y precisión en cada caso, y como los errores de posicionamiento de hecho afectan a la calidad de las estimaciones, con vistas a una planificación eficiente de la adquisición de los datos. La optimización final en el posicionamiento GNSS y de la radiometría del sensor óptico permitió detectar la importancia de este ultimo en la predicción de la desidad relativa de un bosque monoespecífico de Pinus sylvestris L. ABSTRACT Most forestry applications of airborne laser scanning (ALS) require the integration and simultaneous use of various data sources, pursuing a variety of different objectives. Projects based on remotely-sensed data generally consist in upscaling data fusion stages: from the most detailed information obtained for a limited area (field plot) to a more uncertain forest response sensed over a larger extent (airborne and satellite swath). All data sources ultimately rely on global navigation satellite systems (GNSS), which are especially error-prone when operating under forest canopies. Other additional processing stages, such as orthorectification, may as well be affected by vegetation, hence deteriorating the accuracy of optical imagery’s reference coordinates. These errors introduce noise to the models, as predictors displace from their corresponding response. The degree to which forest estimations are affected depends on the spatial dispersion of the variables involved and the scale used. This thesis reviews the sources of positioning errors which may affect the different inputs involved in an ALS-assisted forest inventory project, and how the properties of the forest canopy itself affects their magnitude, advising on methods for diminishing them. It is also discussed how accuracy should be assessed, and how positioning errors actually affect forest estimation, toward a cost-efficient planning for data acquisition. The final optimization in positioning the GNSS and optical image allowed to detect the importance of the latter in predicting relative density in a monospecific Pinus sylvestris L. forest.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Replication Data Management (RDM) aims at enabling the use of data collections from several iterations of an experiment. However, there are several major challenges to RDM from integrating data models and data from empirical study infrastructures that were not designed to cooperate, e.g., data model variation of local data sources. [Objective] In this paper we analyze RDM needs and evaluate conceptual RDM approaches to support replication researchers. [Method] We adapted the ATAM evaluation process to (a) analyze RDM use cases and needs of empirical replication study research groups and (b) compare three conceptual approaches to address these RDM needs: central data repositories with a fixed data model, heterogeneous local repositories, and an empirical ecosystem. [Results] While the central and local approaches have major issues that are hard to resolve in practice, the empirical ecosystem allows bridging current gaps in RDM from heterogeneous data sources. [Conclusions] The empirical ecosystem approach should be explored in diverse empirical environments.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Purpose – Linked data is gaining great interest in the cultural heritage domain as a new way for publishing, sharing and consuming data. The paper aims to provide a detailed method and MARiMbA a tool for publishing linked data out of library catalogues in the MARC 21 format, along with their application to the catalogue of the National Library of Spain in the datos.bne.es project. Design/methodology/approach – First, the background of the case study is introduced. Second, the method and process of its application are described. Third, each of the activities and tasks are defined and a discussion of their application to the case study is provided. Findings – The paper shows that the FRBR model can be applied to MARC 21 records following linked data best practices, librarians can successfully participate in the process of linked data generation following a systematic method, and data sources quality can be improved as a result of the process. Originality/value – The paper proposes a detailed method for publishing and linking linked data from MARC 21 records, provides practical examples, and discusses the main issues found in the application to a real case. Also, it proposes the integration of a data curation activity and the participation of librarians in the linked data generation process.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Empirical Software Engineering (ESE) replication researchers need to store and manipulate experimental data for several purposes, in particular analysis and reporting. Current research needs call for sharing and preservation of experimental data as well. In a previous work, we analyzed Replication Data Management (RDM) needs. A novel concept, called Experimental Ecosystem, was proposed to solve current deficiencies in RDMapproaches. The empirical ecosystem provides replication researchers with a common framework that integrates transparently local heterogeneous data sources. A typical situation where the Empirical Ecosystem is applicable, is when several members of a research group, or several research groups collaborating together, need to share and access each other experimental results. However, to be able to apply the Empirical Ecosystem concept and deliver all promised benefits, it is necessary to analyze the software architectures and tools that can properly support it.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

RDF streams are sequences of timestamped RDF statements or graphs, which can be generated by several types of data sources (sensors, social networks, etc.). They may provide data at high volumes and rates, and be consumed by applications that require real-time responses. Hence it is important to publish and interchange them efficiently. In this paper, we exploit a key feature of RDF data streams, which is the regularity of their structure and data values, proposing a compressed, efficient RDF interchange (ERI) format, which can reduce the amount of data transmitted when processing RDF streams. Our experimental evaluation shows that our format produces state-of-the-art streaming compression, remaining efficient in performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ontology-Based Data Access (OBDA) permite el acceso a diferentes tipos de fuentes de datos (tradicionalmente bases de datos) usando un modelo más abstracto proporcionado por una ontología. La reescritura de consultas (query rewriting) usa una ontología para reescribir una consulta en una consulta reescrita que puede ser evaluada en la fuente de datos. Las consultas reescritas recuperan las respuestas que están implicadas por la combinación de los datos explicitamente almacenados en la fuente de datos, la consulta original y la ontología. Al trabajar sólo sobre las queries, la reescritura de consultas permite OBDA sobre cualquier fuente de datos que puede ser consultada, independientemente de las posibilidades para modificarla. Sin embargo, producir y evaluar las consultas reescritas son procesos costosos que suelen volverse más complejos conforme la expresividad y tamaño de la ontología y las consultas aumentan. En esta tesis exploramos distintas optimizaciones que peuden ser realizadas tanto en el proceso de reescritura como en las consultas reescritas para mejorar la aplicabilidad de OBDA en contextos realistas. Nuestra contribución técnica principal es un sistema de reescritura de consultas que implementa las optimizaciones presentadas en esta tesis. Estas optimizaciones son las contribuciones principales de la tesis y se pueden agrupar en tres grupos diferentes: -optimizaciones que se pueden aplicar al considerar los predicados en la ontología que no están realmente mapeados con las fuentes de datos. -optimizaciones en ingeniería que se pueden aplicar al manejar el proceso de reescritura de consultas en una forma que permite reducir la carga computacional del proceso de generación de consultas reescritas. -optimizaciones que se pueden aplicar al considerar metainformación adicional acerca de las características de la ABox. En esta tesis proporcionamos demostraciones formales acerca de la corrección y completitud de las optimizaciones propuestas, y una evaluación empírica acerca del impacto de estas optimizaciones. Como contribución adicional, parte de este enfoque empírico, proponemos un banco de pruebas (benchmark) para la evaluación de los sistemas de reescritura de consultas. Adicionalmente, proporcionamos algunas directrices para la creación y expansión de esta clase de bancos de pruebas. ABSTRACT Ontology-Based Data Access (OBDA) allows accessing different kinds of data sources (traditionally databases) using a more abstract model provided by an ontology. Query rewriting uses such ontology to rewrite a query into a rewritten query that can be evaluated on the data source. The rewritten queries retrieve the answers that are entailed by the combination of the data explicitly stored in the data source, the original query and the ontology. However, producing and evaluating the rewritten queries are both costly processes that become generally more complex as the expressiveness and size of the ontology and queries increase. In this thesis we explore several optimisations that can be performed both in the rewriting process and in the rewritten queries to improve the applicability of OBDA in real contexts. Our main technical contribution is a query rewriting system that implements the optimisations presented in this thesis. These optimisations are the core contributions of the thesis and can be grouped into three different groups: -optimisations that can be applied when considering the predicates in the ontology that are actually mapped to the data sources. -engineering optimisations that can be applied by handling the process of query rewriting in a way that permits to reduce the computational load of the query generation process. -optimisations that can be applied when considering additional metainformation about the characteristics of the ABox. In this thesis we provide formal proofs for the correctness of the proposed optimisations, and an empirical evaluation about the impact of the optimisations. As an additional contribution, part of this empirical approach, we propose a benchmark for the evaluation of query rewriting systems. We also provide some guidelines for the creation and expansion of this kind of benchmarks.