982 resultados para Data source
Resumo:
In 2008, the 50th anniversary of the IGY (International Geophysical Year), WDCMARE presents with this CD publication 3632 data sets in Open Access as part of the most important results from 73 cruises of the research vessel METEOR between 1964 and 1985. The archive is a coherent organized collection of published and unpublished data sets produced by scientists of all marine research disciplines who participated in Meteor expeditions, measured environmental parameters during cruises and investigated sample material post cruise in the labs of the participating institutions. In most cases, the data was gathered from the Meteor Forschungsergebnisse, published by the Deutsche Forschungsgemeinschaft (DFG). A second important data source are time series and radiosonde ascensions of more than 20 years of ships weather observations, which were provided by the Deutscher Wetterdienst, Hamburg. The final inclusion of all data into the PANGAEA information system ensures secure archiving, future updates, widespread distribution in electronic, machine-readable form with longterm access via the Internet. To produce this publication, all data sets with metadata were extracted from PANGAEA and organized in a directory structure on a CD together with a search capability.
Resumo:
Publishing Linked Data is a process that involves several design decisions and technologies. Although some initial guidelines have been already provided by Linked Data publishers, these are still far from covering all the steps that are necessary (from data source selection to publication) or giving enough details about all these steps, technologies, intermediate products, etc. Furthermore, given the variety of data sources from which Linked Data can be generated, we believe that it is possible to have a single and uni�ed method for publishing Linked Data, but we should rely on di�erent techniques, technologies and tools for particular datasets of a given domain. In this paper we present a general method for publishing Linked Data and the application of the method to cover di�erent sources from di�erent domains.
Resumo:
Ontology-Based Data Access (OBDA) permite el acceso a diferentes tipos de fuentes de datos (tradicionalmente bases de datos) usando un modelo más abstracto proporcionado por una ontología. La reescritura de consultas (query rewriting) usa una ontología para reescribir una consulta en una consulta reescrita que puede ser evaluada en la fuente de datos. Las consultas reescritas recuperan las respuestas que están implicadas por la combinación de los datos explicitamente almacenados en la fuente de datos, la consulta original y la ontología. Al trabajar sólo sobre las queries, la reescritura de consultas permite OBDA sobre cualquier fuente de datos que puede ser consultada, independientemente de las posibilidades para modificarla. Sin embargo, producir y evaluar las consultas reescritas son procesos costosos que suelen volverse más complejos conforme la expresividad y tamaño de la ontología y las consultas aumentan. En esta tesis exploramos distintas optimizaciones que peuden ser realizadas tanto en el proceso de reescritura como en las consultas reescritas para mejorar la aplicabilidad de OBDA en contextos realistas. Nuestra contribución técnica principal es un sistema de reescritura de consultas que implementa las optimizaciones presentadas en esta tesis. Estas optimizaciones son las contribuciones principales de la tesis y se pueden agrupar en tres grupos diferentes: -optimizaciones que se pueden aplicar al considerar los predicados en la ontología que no están realmente mapeados con las fuentes de datos. -optimizaciones en ingeniería que se pueden aplicar al manejar el proceso de reescritura de consultas en una forma que permite reducir la carga computacional del proceso de generación de consultas reescritas. -optimizaciones que se pueden aplicar al considerar metainformación adicional acerca de las características de la ABox. En esta tesis proporcionamos demostraciones formales acerca de la corrección y completitud de las optimizaciones propuestas, y una evaluación empírica acerca del impacto de estas optimizaciones. Como contribución adicional, parte de este enfoque empírico, proponemos un banco de pruebas (benchmark) para la evaluación de los sistemas de reescritura de consultas. Adicionalmente, proporcionamos algunas directrices para la creación y expansión de esta clase de bancos de pruebas. ABSTRACT Ontology-Based Data Access (OBDA) allows accessing different kinds of data sources (traditionally databases) using a more abstract model provided by an ontology. Query rewriting uses such ontology to rewrite a query into a rewritten query that can be evaluated on the data source. The rewritten queries retrieve the answers that are entailed by the combination of the data explicitly stored in the data source, the original query and the ontology. However, producing and evaluating the rewritten queries are both costly processes that become generally more complex as the expressiveness and size of the ontology and queries increase. In this thesis we explore several optimisations that can be performed both in the rewriting process and in the rewritten queries to improve the applicability of OBDA in real contexts. Our main technical contribution is a query rewriting system that implements the optimisations presented in this thesis. These optimisations are the core contributions of the thesis and can be grouped into three different groups: -optimisations that can be applied when considering the predicates in the ontology that are actually mapped to the data sources. -engineering optimisations that can be applied by handling the process of query rewriting in a way that permits to reduce the computational load of the query generation process. -optimisations that can be applied when considering additional metainformation about the characteristics of the ABox. In this thesis we provide formal proofs for the correctness of the proposed optimisations, and an empirical evaluation about the impact of the optimisations. As an additional contribution, part of this empirical approach, we propose a benchmark for the evaluation of query rewriting systems. We also provide some guidelines for the creation and expansion of this kind of benchmarks.
Resumo:
"GAO-03-176."
Resumo:
An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
This research is investigating the claim that Change Data Capture (CDC) technologies capture data changes in real-time. Based on theory, our hypothesis states that real-time CDC is not achievable with traditional approaches (log scanning, triggers and timestamps). Traditional approaches to CDC require a resource to be polled, which prevents true real-time CDC. We propose an approach to CDC that encapsulates the data source with a set of web services. These web services will propagate the changes to the targets and eliminate the need for polling. Additionally we propose a framework for CDC technologies that allow changes to flow from source to target. This paper discusses current CDC technologies and presents the theory about why they are unable to deliver changes in real-time. Following, we discuss our web service approach to CDC and accompanying framework, explaining how they can produce real-time CDC. The paper concludes with a discussion on the research required to investigate the real-time capabilities of CDC technologies. © 2010 IEEE.
Resumo:
The data set consists of maps of total velocity of surface currents in the Ibiza Channel, derived from HF radar measurements.
Resumo:
One of the most challenging task underlying many hyperspectral imagery applications is the spectral unmixing, which decomposes a mixed pixel into a collection of reectance spectra, called endmember signatures, and their corresponding fractional abundances. Independent Component Analysis (ICA) have recently been proposed as a tool to unmix hyperspectral data. The basic goal of ICA is to nd a linear transformation to recover independent sources (abundance fractions) given only sensor observations that are unknown linear mixtures of the unobserved independent sources. In hyperspectral imagery the sum of abundance fractions associated to each pixel is constant due to physical constraints in the data acquisition process. Thus, sources cannot be independent. This paper address hyperspectral data source dependence and its impact on ICA performance. The study consider simulated and real data. In simulated scenarios hyperspectral observations are described by a generative model that takes into account the degradation mechanisms normally found in hyperspectral applications. We conclude that ICA does not unmix correctly all sources. This conclusion is based on the a study of the mutual information. Nevertheless, some sources might be well separated mainly if the number of sources is large and the signal-to-noise ratio (SNR) is high.
Resumo:
OBJETIVO: A prática de exercícios físicos, devido à produção inerente de calor, pode conduzir à desidratação. A maioria dos estudos que abordam os riscos da desidratação e fornecem recomendações de reposição hídrica é direcionada a indivíduos adultos residentes em regiões de clima temperado, porém, em regiões tropicais, pouco é conhecido sobre as necessidades de reposição hídrica em crianças fisicamente ativas. Esta revisão discute as recomendações para esta população e estabelece os riscos da prática esportiva em ambiente de clima tropical. FONTES DE DADOS: Análise sistemática com levantamento da literatura nacional (SciELO) e internacional (Medline) de artigos publicados entre 1972 e 2009, com os seguintes descritores isolados ou em combinação: hidratação, crianças, desidratação e reposição hídrica. Foram selecionados artigos publicados nas línguas portuguesa e inglesa. SÍNTESES DE DADOS: Observou-se que há riscos de desidratação e possível desenvolvimento de um quadro de hipertermia principalmente se as crianças são submetidas a condições climáticas desfavoráveis sem reposição hídrica adequada. O principal fator desencadeante da hipertermia é a menor adaptação das crianças aos extremos de temperatura, em comparação aos adultos, por possuírem área maior de superfície corporal e capacidade menor de termorregulação por evaporação. CONCLUSÕES: Conhecidos os fatores intervenientes da desidratação, a melhor recomendação, perante uma condição climática sabidamente desfavorável, é estabelecer um plano impositivo de hidratação com bebida com sabor e acréscimo de carboidratos e sódio, evitando-se uma perda hídrica significativa, diminuição da performance e, principalmente, com o objetivo de reduzir os riscos à saúde impostos pela hipertermia e desidratação a crianças fisicamente ativas.
Resumo:
OBJETIVO: Descrever e comparar estudos longitudinais que permitam inferir sobre a influência da creche no estado nutricional de crianças pré-escolares. FONTES DE DADOS: Revisão sistemática de trabalhos científicos publicados entre janeiro de 1990 e dezembro de 2008. Buscaram-se os estudos nas seguintes bases de dados: Lilacs, Scielo e PubMed. Realizou-se também pesquisa manual dos artigos referenciados. A busca ocorreu no período de março de 2008 a junho de 2009, e os descritores utilizados foram: "creche", "estado nutricional", "antropometria", "consumo alimentar", "anemia" e "alimentação escolar". SÍNTESE DOS DADOS: Na primeira etapa do estudo, obtiveram-se 78 artigos, mas somente sete puderam ser incluídos. Os outros 71 não apresentaram dados para contribuir com o objetivo específico deste estudo. Entre os artigos pesquisados na literatura, existem poucos que permitem inferir sobre a influência que a creche pode ter em relação ao estado nutricional de pré-escolares. Contudo, estudos longitudinais têm mostrado a relação causal entre a presença frequente da criança na creche e a melhoria do estado nutricional. CONCLUSÕES: Existe uma relação positiva entre a frequência da criança na creche e a melhoria do estado nutricional.
Resumo:
O envelhecimento populacional é um fato marcante da transição demográfica. O estudo das causas básicas em idosos permite visualizar seu perfil epidemiológico, embora possa ser prejudicado pela alta proporção de causas mal definidas. O objetivo deste trabalho é descrever a mortalidade dos idosos por essas causas no Brasil. A fonte dos dados foi o Sistema de Informações sobre Mortalidade do Ministério da Saúde.Entre as variáveis, a principal modalidade foi a causa básica mal definida [ Capítulo XVIII da Classificação Estatística Internacional de Doenças e Problemas Relacionados à Saúde-Décima Revisão (CID-10)]. O decréscimo desses óbitos em idosos foi de 35 por cento entre 1996 e 2005.Considerando os óbitos de 60 a 69 anos e os de 80 e mais anos, as proporções de mal definidos aumentaram em 9,9 por cento e 14,8 por cento, respectivamente, no ano de 2005. Métodos visando a sua diminuição são sugeridos, salientando-se que o fato mais importante é o de os médicos preencherem adequadamente as declarações de óbito- com as reais causas básicas, conseqüênciais e terminais-, objetivo maior dos estudiosos
Resumo:
Brazilian science has increased fast during the last decades. An example is the increasing in the country`s share in the world`s scientific publication within the main international databases. But what is the actual weight of international publications to the whole Brazilian productivity? In order to respond this question, we have elaborated a new indicator, the International Publication Ratio (IPR). The data source was Lattes Database, a database organized by one of the main Brazilian S&T funding agency, which encompasses publication data from 1997 to 2004 of about 51,000 Brazilian researchers. Influences of distinct parameters, such as sectors, fields, career age and gender, are analyzed. We hope the data presented may help S&T managers and other S&T interests to better understand the complexity under the concept scientific productivity, especially in peripheral countries in science, such as Brazil.
Resumo:
The majority of the world's population now resides in urban environments and information on the internal composition and dynamics of these environments is essential to enable preservation of certain standards of living. Remotely sensed data, especially the global coverage of moderate spatial resolution satellites such as Landsat, Indian Resource Satellite and Systeme Pour I'Observation de la Terre (SPOT), offer a highly useful data source for mapping the composition of these cities and examining their changes over time. The utility and range of applications for remotely sensed data in urban environments could be improved with a more appropriate conceptual model relating urban environments to the sampling resolutions of imaging sensors and processing routines. Hence, the aim of this work was to take the Vegetation-Impervious surface-Soil (VIS) model of urban composition and match it with the most appropriate image processing methodology to deliver information on VIS composition for urban environments. Several approaches were evaluated for mapping the urban composition of Brisbane city (south-cast Queensland, Australia) using Landsat 5 Thematic Mapper data and 1:5000 aerial photographs. The methods evaluated were: image classification; interpretation of aerial photographs; and constrained linear mixture analysis. Over 900 reference sample points on four transects were extracted from the aerial photographs and used as a basis to check output of the classification and mixture analysis. Distinctive zonations of VIS related to urban composition were found in the per-pixel classification and aggregated air-photo interpretation; however, significant spectral confusion also resulted between classes. In contrast, the VIS fraction images produced from the mixture analysis enabled distinctive densities of commercial, industrial and residential zones within the city to be clearly defined, based on their relative amount of vegetation cover. The soil fraction image served as an index for areas being (re)developed. The logical match of a low (L)-resolution, spectral mixture analysis approach with the moderate spatial resolution image data, ensured the processing model matched the spectrally heterogeneous nature of the urban environments at the scale of Landsat Thematic Mapper data.
Resumo:
Measuring perceptions of customers can be a major problem for marketers of tourism and travel services. Much of the problem is to determine which attributes carry most weight in the purchasing decision. Older travellers weigh many travel features before making their travel decisions. This paper presents a descriptive analysis of neural network methodology and provides a research technique that assesses the weighting of different attributes and uses an unsupervised neural network model to describe a consumer-product relationship. The development of this rich class of models was inspired by the neural architecture of the human brain. These models mathematically emulate the neurophysical structure and decision making of the human brain, and, from a statistical perspective, are closely related to generalised linear models. Artificial neural networks or neural networks are, however, nonlinear and do not require the same restrictive assumptions about the relationship between the independent variables and dependent variables. Using neural networks is one way to determine what trade-offs older travellers make as they decide their travel plans. The sample of this study is from a syndicated data source of 200 valid cases from Western Australia. From senior groups, active learner, relaxed family body, careful participants and elementary vacation were identified and discussed. (C) 2003 Published by Elsevier Science Ltd.