994 resultados para data source


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Objective: Several surveillance definitions of influenza-like illness (ILI) have been proposed, based on the presence of symptoms. Symptom data can be obtained from patients, medical records, or both. Past research has found that agreements between health record data and self-report are variable depending on the specific symptom. Therefore, we aimed to explore the implications of using data on influenza symptoms extracted from medical records, similar data collected prospectively from outpatients, and the combined data from both sources as predictors of laboratory-confirmed influenza. Methods: Using data from the Hutterite Influenza Prevention Study, we calculated: 1) the sensitivity, specificity and predictive values of individual symptoms within surveillance definitions; 2) how frequently surveillance definitions correlated to laboratory-confirmed influenza; and 3) the predictive value of surveillance definitions. Results: Of the 176 participants with reports from participants and medical records, 142 (81%) were tested for influenza and 37 (26%) were PCR positive for influenza. Fever (alone) and fever combined with cough and/or sore throat were highly correlated with being PCR positive for influenza for all data sources. ILI surveillance definitions, based on symptom data from medical records only or from both medical records and self-report, were better predictors of laboratory-confirmed influenza with higher odds ratios and positive predictive values. Discussion: The choice of data source to determine ILI will depend on the patient population, outcome of interest, availability of data source, and use for clinical decision making, research, or surveillance. © Canadian Public Health Association, 2012.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

As wind generation increases, system impact studies rely on predictions of future generation and effective representation of wind variability. A well-established approach to investigate the impact of wind variability is to simulate generation using observations from 10 m meteorological mast-data. However, there are problems with relying purely on historical wind-speed records or generation histories: mast-data is often incomplete, not sited at a relevant wind generation sites, and recorded at the wrong altitude above ground (usually 10 m), each of which may distort the generation profile. A possible complimentary approach is to use reanalysis data, where data assimilation techniques are combined with state-of-the-art weather forecast models to produce complete gridded wind time-series over an area. Previous investigations of reanalysis datasets have placed an emphasis on comparing reanalysis to meteorological site records whereas this paper compares wind generation simulated using reanalysis data directly against historic wind generation records. Importantly, this comparison is conducted using raw reanalysis data (typical resolution ∼50 km), without relying on a computationally expensive “dynamical downscaling” for a particular target region. Although the raw reanalysis data cannot, by nature of its construction, represent the site-specific effects of sub-gridscale topography, it is nevertheless shown to be comparable to or better than the mast-based simulation in the region considered and it is therefore argued that raw reanalysis data may offer a number of significant advantages as a data source.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent research suggests Eurasian snow-covered area (SCA) influences the Arctic Oscillation (AO) via the polar vortex. This could be important for Northern Hemisphere winter season forecasting. A fairly strong negative correlation between October SCA and the AO, based on both monthly and daily observational data, has been noted in the literature. While reproducing these previous links when using the same data, we find no further evidence of the link when using an independent satellite data source, or when using a climate model.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The aim of this study is to evaluate the variation of solar radiation data between different data sources that will be free and available at the Solar Energy Research Center (SERC). The comparison between data sources will be carried out for two locations: Stockholm, Sweden and Athens, Greece. For the desired locations, data is gathered for different tilt angles: 0°, 30°, 45°, 60° facing south. The full dataset is available in two excel files: “Stockholm annual irradiation” and “Athens annual irradiation”. The World Radiation Data Center (WRDC) is defined as a reference for the comparison with other dtaasets, because it has the highest time span recorded for Stockholm (1964–2010) and Athens (1964–1986), in form of average monthly irradiation, expressed in kWh/m2. The indicator defined for the data comparison is the estimated standard deviation. The mean biased error (MBE) and the root mean square error (RMSE) were also used as statistical indicators for the horizontal solar irradiation data. The variation in solar irradiation data is categorized in two categories: natural or inter-annual variability, due to different data sources and lastly due to different calculation models. The inter-annual variation for Stockholm is 140.4kWh/m2 or 14.4% and 124.3kWh/m2 or 8.0% for Athens. The estimated deviation for horizontal solar irradiation is 3.7% for Stockholm and 4.4% Athens. This estimated deviation is respectively equal to 4.5% and 3.6% for Stockholm and Athens at 30° tilt, 5.2% and 4.5% at 45° tilt, 5.9% and 7.0% at 60°. NASA’s SSE, SAM and RETScreen (respectively Satel-light) exhibited the highest deviation from WRDC’s data for Stockholm (respectively Athens). The essential source for variation is notably the difference in horizontal solar irradiation. The variation increases by 1-2% per degree of tilt, using different calculation models, as used in PVSYST and Meteonorm. The location and altitude of the data source did not directly influence the variation with the WRDC data. Further examination is suggested in order to improve the methodology of selecting the location; Examining the functional dependence of ground reflected radiation with ambient temperature; variation of ambient temperature and its impact on different solar energy systems; Im pact of variation in solar irradiation and ambient temperature on system output.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Multimedia content understanding research requires rigorous approach to deal with the complexity of the data. At the crux of this problem is the method to deal with multilevel data whose structure exists at multiple scales and across data sources. A common example is modeling tags jointly with images to improve retrieval, classification and tag recommendation. Associated contextual observation, such as metadata, is rich that can be exploited for content analysis. A major challenge is the need for a principal approach to systematically incorporate associated media with the primary data source of interest. Taking a factor modeling approach, we propose a framework that can discover low-dimensional structures for a primary data source together with other associated information. We cast this task as a subspace learning problem under the framework of Bayesian nonparametrics and thus the subspace dimensionality and the number of clusters are automatically learnt from data instead of setting these parameters a priori. Using Beta processes as the building block, we construct random measures in a hierarchical structure to generate multiple data sources and capture their shared statistical at the same time. The model parameters are inferred efficiently using a novel combination of Gibbs and slice sampling. We demonstrate the applicability of the proposed model in three applications: image retrieval, automatic tag recommendation and image classification. Experiments using two real-world datasets show that our approach outperforms various state-of-the-art related methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this chapter, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications—improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We are investigating the combination of wavelets and decision trees to detect ships and other maritime surveillance targets from medium resolution SAR images. Wavelets have inherent advantages to extract image descriptors while decision trees are able to handle different data sources. In addition, our work aims to consider oceanic features such as ship wakes and ocean spills. In this incipient work, Haar and Cohen-Daubechies-Feauveau 9/7 wavelets obtain detailed descriptors from targets and ocean features and are inserted with other statistical parameters and wavelets into an oblique decision tree. © 2011 Springer-Verlag.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The accuracy of medicine use information was compared for a telephone interview and mail questionnaire, using an in-home medicine check as the standard of assessment The validity of medicine use information varied by data source, level of specificity of data, and respondent characteristics. The mail questionnaire was the more valid source of overall medicine use information. Implications for both service providers and researchers are provided.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Using an extensive network of occurrence records for 293 plant species collected over the past 40 years across a climatically diverse geographic section of western North America, we find that plant species distributions were just as likely to shift upwards (i.e., towards higher elevations) as downward (i.e., towards lower elevations) - despite consistent warming across the study area. Although there was no clear directional response to climate warming across the entire study area, there was significant region-to region- variation in responses (i.e. from as many as 73% to as few as32% of species shifting upward or downward). To understand the factors that might be controlling region-specific distributional shifts, we explored the relationship between the direction of change in distribution limits and the nature of recent climate change. We found that the direction of distribution limit shifts was explained by an interaction between the rate of change in local summer temperatures and seasonal precipitation. Specifically, species shifted upward at their upper elevational limit when snowfall declined at slower rates and minimum temperatures increased. By contrast, species shifted upwards at their lower elevation limit when maximum temperatures increased or both temperature and precipitation decreased. Our results suggest that future species' elevational distribution shifts will be complex, depending on the interaction between seasonal temperature and precipitation change.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In 2008, the 50th anniversary of the IGY (International Geophysical Year), WDCMARE presents with this CD publication 3632 data sets in Open Access as part of the most important results from 73 cruises of the research vessel METEOR between 1964 and 1985. The archive is a coherent organized collection of published and unpublished data sets produced by scientists of all marine research disciplines who participated in Meteor expeditions, measured environmental parameters during cruises and investigated sample material post cruise in the labs of the participating institutions. In most cases, the data was gathered from the Meteor Forschungsergebnisse, published by the Deutsche Forschungsgemeinschaft (DFG). A second important data source are time series and radiosonde ascensions of more than 20 years of ships weather observations, which were provided by the Deutscher Wetterdienst, Hamburg. The final inclusion of all data into the PANGAEA information system ensures secure archiving, future updates, widespread distribution in electronic, machine-readable form with longterm access via the Internet. To produce this publication, all data sets with metadata were extracted from PANGAEA and organized in a directory structure on a CD together with a search capability.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Publishing Linked Data is a process that involves several design decisions and technologies. Although some initial guidelines have been already provided by Linked Data publishers, these are still far from covering all the steps that are necessary (from data source selection to publication) or giving enough details about all these steps, technologies, intermediate products, etc. Furthermore, given the variety of data sources from which Linked Data can be generated, we believe that it is possible to have a single and uni�ed method for publishing Linked Data, but we should rely on di�erent techniques, technologies and tools for particular datasets of a given domain. In this paper we present a general method for publishing Linked Data and the application of the method to cover di�erent sources from di�erent domains.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ontology-Based Data Access (OBDA) permite el acceso a diferentes tipos de fuentes de datos (tradicionalmente bases de datos) usando un modelo más abstracto proporcionado por una ontología. La reescritura de consultas (query rewriting) usa una ontología para reescribir una consulta en una consulta reescrita que puede ser evaluada en la fuente de datos. Las consultas reescritas recuperan las respuestas que están implicadas por la combinación de los datos explicitamente almacenados en la fuente de datos, la consulta original y la ontología. Al trabajar sólo sobre las queries, la reescritura de consultas permite OBDA sobre cualquier fuente de datos que puede ser consultada, independientemente de las posibilidades para modificarla. Sin embargo, producir y evaluar las consultas reescritas son procesos costosos que suelen volverse más complejos conforme la expresividad y tamaño de la ontología y las consultas aumentan. En esta tesis exploramos distintas optimizaciones que peuden ser realizadas tanto en el proceso de reescritura como en las consultas reescritas para mejorar la aplicabilidad de OBDA en contextos realistas. Nuestra contribución técnica principal es un sistema de reescritura de consultas que implementa las optimizaciones presentadas en esta tesis. Estas optimizaciones son las contribuciones principales de la tesis y se pueden agrupar en tres grupos diferentes: -optimizaciones que se pueden aplicar al considerar los predicados en la ontología que no están realmente mapeados con las fuentes de datos. -optimizaciones en ingeniería que se pueden aplicar al manejar el proceso de reescritura de consultas en una forma que permite reducir la carga computacional del proceso de generación de consultas reescritas. -optimizaciones que se pueden aplicar al considerar metainformación adicional acerca de las características de la ABox. En esta tesis proporcionamos demostraciones formales acerca de la corrección y completitud de las optimizaciones propuestas, y una evaluación empírica acerca del impacto de estas optimizaciones. Como contribución adicional, parte de este enfoque empírico, proponemos un banco de pruebas (benchmark) para la evaluación de los sistemas de reescritura de consultas. Adicionalmente, proporcionamos algunas directrices para la creación y expansión de esta clase de bancos de pruebas. ABSTRACT Ontology-Based Data Access (OBDA) allows accessing different kinds of data sources (traditionally databases) using a more abstract model provided by an ontology. Query rewriting uses such ontology to rewrite a query into a rewritten query that can be evaluated on the data source. The rewritten queries retrieve the answers that are entailed by the combination of the data explicitly stored in the data source, the original query and the ontology. However, producing and evaluating the rewritten queries are both costly processes that become generally more complex as the expressiveness and size of the ontology and queries increase. In this thesis we explore several optimisations that can be performed both in the rewriting process and in the rewritten queries to improve the applicability of OBDA in real contexts. Our main technical contribution is a query rewriting system that implements the optimisations presented in this thesis. These optimisations are the core contributions of the thesis and can be grouped into three different groups: -optimisations that can be applied when considering the predicates in the ontology that are actually mapped to the data sources. -engineering optimisations that can be applied by handling the process of query rewriting in a way that permits to reduce the computational load of the query generation process. -optimisations that can be applied when considering additional metainformation about the characteristics of the ABox. In this thesis we provide formal proofs for the correctness of the proposed optimisations, and an empirical evaluation about the impact of the optimisations. As an additional contribution, part of this empirical approach, we propose a benchmark for the evaluation of query rewriting systems. We also provide some guidelines for the creation and expansion of this kind of benchmarks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^