975 resultados para Structured data
Resumo:
In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies
Resumo:
It is very often the case that programs require passing, maintaining, and updating some notion of state. Prolog programs often implement such stateful computations by carrying this state in predicate arguments (or, alternatively, in the internal datábase). This often causes code obfuscation, complicates code reuse, introduces dependencies on the data model, and is prone to incorrect propagation of the state information among predicate calis. To partly solve these problems, we introduce contexts as a consistent mechanism for specifying implicit arguments and its threading in clause goals. We propose a notation and an interpretation for contexts, ranging from single goals to complete programs, give an intuitive semantics, and describe a translation into standard Prolog. We also discuss a particular light-weight implementation in Ciao Prolog, and we show the usefulness of our proposals on a series of examples and applications, including code directiy using contexts, DCGs, extended DCGs, logical loops and other custom control structures.
Resumo:
We present a framework specially designed to deal with structurally complex data, where all individuals have the same structure, as is the case in many medical domains. A structurally complex individual may be composed of any type of singlevalued or multivalued attributes, including time series, for example. These attributes are structured according to domain-dependent hierarchies. Our aim is to generate reference models of population groups. These models represent the population archetype and are very useful for supporting such important tasks as diagnosis, detecting fraud, analyzing patient evolution, identifying control groups, etc.
Resumo:
Esta monografía presenta los fundamentos, contexto y detalles técnicos de un Esquema de Aplicación para la incorporación de datos espaciales relativos al patrimonio cultural en el marco definido por la directiva europea INSPIRE sobre información geográfica. Abstract: This monograph presents the background, context and technical details of an Application Schema for the inclusion of cultural heritage spatial data into the INSPIRE framework. Nowadays, INSPIRE provides the most relevant framework for the dissemination and exchange of geographical data, covering many different thematic fields, particularly relevant for envi-ronmental datasets. Although cultural heritage elements are partially addressed within INSPIRE, there is no specific documentation on how these data should be considered, structured and published. This text aims to provide technical guidelines for decision makers, public administrations and the scientific community for the definition and implementation of harmonized datasets for cultural heritage, according to the interoperability principles of INSPIRE.
Resumo:
Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as “which particular data was input to a particular workflow to test a particular hypothesis?”, and “which particular conclusions were drawn from a particular workflow?”. Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.
Resumo:
The Web of Data currently comprises ? 62 billion triples from more than 2,000 different datasets covering many fields of knowledge3. This volume of structured Linked Data can be seen as a particular case of Big Data, referred to as Big Semantic Data [4]. Obviously, powerful computational configurations are tradi- tionally required to deal with the scalability problems arising to Big Semantic Data. It is not surprising that this ?data revolution? has competed in parallel with the growth of mobile computing. Smartphones and tablets are massively used at the expense of traditional computers but, to date, mobile devices have more limited computation resources. Therefore, one question that we may ask ourselves would be: can (potentially large) semantic datasets be consumed natively on mobile devices? Currently, only a few mobile apps (e.g., [1, 9, 2, 8]) make use of semantic data that they store in the mobile devices, while many others access existing SPARQL endpoints or Linked Data directly. Two main reasons can be considered for this fact. On the one hand, in spite of some initial approaches [6, 3], there are no well-established triplestores for mobile devices. This is an important limitation because any po- tential app must assume both RDF storage and SPARQL resolution. On the other hand, the particular features of these devices (little storage space, less computational power or more limited bandwidths) limit the adoption of seman- tic data for different uses and purposes. This paper introduces our HDTourist mobile application prototype. It con- sumes urban data from DBpedia4 to help tourists visiting a foreign city. Although it is a simple app, its functionality allows illustrating how semantic data can be stored and queried with limited resources. Our prototype is implemented for An- droid, but its foundations, explained in Section 2, can be deployed in any other platform. The app is described in Section 3, and Section 4 concludes about our current achievements and devises the future work.
Resumo:
The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.
Resumo:
Several works deal with 3D data in SLAM problem. Data come from a 3D laser sweeping unit or a stereo camera, both providing a huge amount of data. In this paper, we detail an efficient method to extract planar patches from 3D raw data. Then, we use these patches in an ICP-like method in order to address the SLAM problem. Using ICP with planes is not a trivial task. It needs some adaptation from the original ICP. Some promising results are shown for outdoor environment.
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.
Resumo:
PAS1192-2 (2013) outlines the “fundamental principles of Level 2 information modeling”, one of these principles is the use of what is commonly referred to as a Common Data Environment (CDE). A CDE could be described as an internet-enabled cloudhosting platform, accessible to all construction team members to access shared project information. For the construction sector to achieve increased productivity goals, the next generation of industry professionals will need to be educated in a way that provides them with an appreciation of Building Information Modelling (BIM) working methods, at all levels, including an understanding of how data in a CDE should be structured, managed, shared and published. This presents a challenge for educational institutions in terms of providing a CDE that addresses the requirements set out in PAS1192-2, and mirrors organisational and professional working practices without causing confusion due to over complexity. This paper presents the findings of a two-year study undertaken at Ulster University comparing the use of a leading industry CDE platform with one derived from the in-house Virtual Learning Environment (VLE), for the delivery of a student BIM project. The research methodology employed was a qualitative case study analysis, focusing on observations from the academics involved and feedback from students. The results of the study show advantages for both CDE platforms depending on the learning outcomes required.
Resumo:
Structured soils are characterized by the presence of inter- and intra-aggregate pore systems and aggregates, which show varying chemical, physical, and biological properties depending on the aggregate type and land use system. How far these aspects also affect the ion exchange processes and to what extent the interaction between the carbon distribution and kind of organic substances affect the internal soil strength as well as hydraulic properties like wettability are still under discussion. Thus, the objective of this research was to clarify the effect of soil aggregation on physical and chemical properties of structured soils at two scales: homogenized material and single aggregates. Data obtained by sequentially peeling off soil aggregates layers revealed gradients in the chemical composition from the aggregate surface to the aggregate core. In aggregates from long term untreated forest soils we found lower amounts of carbon in the external layer, while in arable soils the differentiation was not pronounced. However, soil aggregates originating from these sites exhibited a higher concentration of microbial activity in the outer aggregate layer and declined towards the interior. Furthermore, soil depth and the vegetation type affected the wettability. Aggregate strength depended. on water suction and differences in tillage treatments.
Resumo:
O trabalho desenvolvido analisa a Comunicação Social no contexto da internet e delineia novas metodologias de estudo para a área na filtragem de significados no âmbito científico dos fluxos de informação das redes sociais, mídias de notícias ou qualquer outro dispositivo que permita armazenamento e acesso a informação estruturada e não estruturada. No intento de uma reflexão sobre os caminhos, que estes fluxos de informação se desenvolvem e principalmente no volume produzido, o projeto dimensiona os campos de significados que tal relação se configura nas teorias e práticas de pesquisa. O objetivo geral deste trabalho é contextualizar a área da Comunicação Social dentro de uma realidade mutável e dinâmica que é o ambiente da internet e fazer paralelos perante as aplicações já sucedidas por outras áreas. Com o método de estudo de caso foram analisados três casos sob duas chaves conceituais a Web Sphere Analysis e a Web Science refletindo os sistemas de informação contrapostos no quesito discursivo e estrutural. Assim se busca observar qual ganho a Comunicação Social tem no modo de visualizar seus objetos de estudo no ambiente das internet por essas perspectivas. O resultado da pesquisa mostra que é um desafio para o pesquisador da Comunicação Social buscar novas aprendizagens, mas a retroalimentação de informação no ambiente colaborativo que a internet apresenta é um caminho fértil para pesquisa, pois a modelagem de dados ganha corpus analítico quando o conjunto de ferramentas promovido e impulsionado pela tecnologia permite isolar conteúdos e possibilita aprofundamento dos significados e suas relações.
Resumo:
A conventional neural network approach to regression problems approximates the conditional mean of the output vector. For mappings which are multi-valued this approach breaks down, since the average of two solutions is not necessarily a valid solution. In this article mixture density networks, a principled method to model conditional probability density functions, are applied to retrieving Cartesian wind vector components from satellite scatterometer data. A hybrid mixture density network is implemented to incorporate prior knowledge of the predominantly bimodal function branches. An advantage of a fully probabilistic model is that more sophisticated and principled methods can be used to resolve ambiguities.
Resumo:
Digital watermarking aims at embedding information in digital data. The watermark is usually required to be imperceptible, unremovable and to have a high information content. Unfortunately, these three requirements are contradicting. For example, having a more robust watermark makes it either more perceptible or/and less informative. For Gaussian data and additive white Gaussian noise, an optimal but also impractical scheme has already be devised. Since then, many practical schemes have tried to approach the theoretical limits. This paper investigate improvements to current state-of-the-art embedding schemes.