994 resultados para Data source comparability


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ontology-Based Data Access (OBDA) permite el acceso a diferentes tipos de fuentes de datos (tradicionalmente bases de datos) usando un modelo más abstracto proporcionado por una ontología. La reescritura de consultas (query rewriting) usa una ontología para reescribir una consulta en una consulta reescrita que puede ser evaluada en la fuente de datos. Las consultas reescritas recuperan las respuestas que están implicadas por la combinación de los datos explicitamente almacenados en la fuente de datos, la consulta original y la ontología. Al trabajar sólo sobre las queries, la reescritura de consultas permite OBDA sobre cualquier fuente de datos que puede ser consultada, independientemente de las posibilidades para modificarla. Sin embargo, producir y evaluar las consultas reescritas son procesos costosos que suelen volverse más complejos conforme la expresividad y tamaño de la ontología y las consultas aumentan. En esta tesis exploramos distintas optimizaciones que peuden ser realizadas tanto en el proceso de reescritura como en las consultas reescritas para mejorar la aplicabilidad de OBDA en contextos realistas. Nuestra contribución técnica principal es un sistema de reescritura de consultas que implementa las optimizaciones presentadas en esta tesis. Estas optimizaciones son las contribuciones principales de la tesis y se pueden agrupar en tres grupos diferentes: -optimizaciones que se pueden aplicar al considerar los predicados en la ontología que no están realmente mapeados con las fuentes de datos. -optimizaciones en ingeniería que se pueden aplicar al manejar el proceso de reescritura de consultas en una forma que permite reducir la carga computacional del proceso de generación de consultas reescritas. -optimizaciones que se pueden aplicar al considerar metainformación adicional acerca de las características de la ABox. En esta tesis proporcionamos demostraciones formales acerca de la corrección y completitud de las optimizaciones propuestas, y una evaluación empírica acerca del impacto de estas optimizaciones. Como contribución adicional, parte de este enfoque empírico, proponemos un banco de pruebas (benchmark) para la evaluación de los sistemas de reescritura de consultas. Adicionalmente, proporcionamos algunas directrices para la creación y expansión de esta clase de bancos de pruebas. ABSTRACT Ontology-Based Data Access (OBDA) allows accessing different kinds of data sources (traditionally databases) using a more abstract model provided by an ontology. Query rewriting uses such ontology to rewrite a query into a rewritten query that can be evaluated on the data source. The rewritten queries retrieve the answers that are entailed by the combination of the data explicitly stored in the data source, the original query and the ontology. However, producing and evaluating the rewritten queries are both costly processes that become generally more complex as the expressiveness and size of the ontology and queries increase. In this thesis we explore several optimisations that can be performed both in the rewriting process and in the rewritten queries to improve the applicability of OBDA in real contexts. Our main technical contribution is a query rewriting system that implements the optimisations presented in this thesis. These optimisations are the core contributions of the thesis and can be grouped into three different groups: -optimisations that can be applied when considering the predicates in the ontology that are actually mapped to the data sources. -engineering optimisations that can be applied by handling the process of query rewriting in a way that permits to reduce the computational load of the query generation process. -optimisations that can be applied when considering additional metainformation about the characteristics of the ABox. In this thesis we provide formal proofs for the correctness of the proposed optimisations, and an empirical evaluation about the impact of the optimisations. As an additional contribution, part of this empirical approach, we propose a benchmark for the evaluation of query rewriting systems. We also provide some guidelines for the creation and expansion of this kind of benchmarks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This research is investigating the claim that Change Data Capture (CDC) technologies capture data changes in real-time. Based on theory, our hypothesis states that real-time CDC is not achievable with traditional approaches (log scanning, triggers and timestamps). Traditional approaches to CDC require a resource to be polled, which prevents true real-time CDC. We propose an approach to CDC that encapsulates the data source with a set of web services. These web services will propagate the changes to the targets and eliminate the need for polling. Additionally we propose a framework for CDC technologies that allow changes to flow from source to target. This paper discusses current CDC technologies and presents the theory about why they are unable to deliver changes in real-time. Following, we discuss our web service approach to CDC and accompanying framework, explaining how they can produce real-time CDC. The paper concludes with a discussion on the research required to investigate the real-time capabilities of CDC technologies. © 2010 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The data set consists of maps of total velocity of surface currents in the Ibiza Channel, derived from HF radar measurements.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One of the most challenging task underlying many hyperspectral imagery applications is the spectral unmixing, which decomposes a mixed pixel into a collection of reectance spectra, called endmember signatures, and their corresponding fractional abundances. Independent Component Analysis (ICA) have recently been proposed as a tool to unmix hyperspectral data. The basic goal of ICA is to nd a linear transformation to recover independent sources (abundance fractions) given only sensor observations that are unknown linear mixtures of the unobserved independent sources. In hyperspectral imagery the sum of abundance fractions associated to each pixel is constant due to physical constraints in the data acquisition process. Thus, sources cannot be independent. This paper address hyperspectral data source dependence and its impact on ICA performance. The study consider simulated and real data. In simulated scenarios hyperspectral observations are described by a generative model that takes into account the degradation mechanisms normally found in hyperspectral applications. We conclude that ICA does not unmix correctly all sources. This conclusion is based on the a study of the mutual information. Nevertheless, some sources might be well separated mainly if the number of sources is large and the signal-to-noise ratio (SNR) is high.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objective: To examine the impact on dental utilisation following the introduction of a participating provider scheme (Regional and Rural Oral Health Program {RROHP)). In this model dentists receive higher third party payments from a private health insurance fund for delivering an agreed range of preventive and diagnostic benefits at no out-ofpocket cost to insured patients. Data source/Study setting: Hospitals Contribution Fund of Australia (HCF) dental claims for all members resident in New South Wales over the six financial years from l99811999 to 200312004. Study design: This cohort study involves before and after analyses of dental claims experience over a six year period for approximately 81,000 individuals in the intervention group (HCF members resident in regional and rural New South Wales, Australia) and 267,000 in the control group (HCF members resident in the Sydney area). Only claims for individuals who were members of HCF at 31 December 1997 were included. The analysis groups claims into the three years prior to the establishment of the RROHP and the three years subsequent to implementation. Data collection/Extraction methods: The analysis is based on all claims submitted by users of services for visits between 1 July 1988 and 30 June 2004. In these data approximately 1,000,000 services were provided to the intervention group and approximately 4,900,000 in the control group. Principal findings: Using Statistical Process Control (SPC) charts, special cause variation was identified in total utilisation rate of private dental services in the intervention group post implementation. No such variation was present in the control group. On average in the three years after implementation of the program the utilisation rate of dental services by regional and rural residents of New South Wales who where members of HCF grew by 12.6%, over eight times the growth rate of 1.5% observed in the control group (HCF members who were Sydney residents). The differences were even more pronounced in the areas of service that were the focus of the program: diagnostic and preventive services. Conclusion: The implementation of a benefit design change, a participating provider scheme, that involved the removal of CO-payments on a defined range of preventive and diagnostic dental services combined with the establishment and promotion of a network of dentists, appears to have had a marked impact on HCF members' utilisation of dental services in regional and rural New South Wales, Australia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Children’s drawings provide rich qualitative data (Walker, 2008) and “valuable information for the assessment of children's environmental perceptions” (Barraza, 1999, p. 49). They are the primary data source being used to re-imagine school from a student perspective (Schratz & Steiner-Löffler, 1998) in a research project being carried out with primary school students in Queensland, Australia. This paper will report on the progress of this project which addresses a mostly unmet need for students’ perspectives to be included in school design (Rudduck & Flutter, 2004). Grade 5/6 students in a number of primary schools have been invited to submit annotated drawings with up to 200 words of text illustrating their ideal educational spaces. Using purpose-designed analytical tools, the submissions will be compared across student backgrounds and school types to obtain a better understanding of the needs and educational desires of young people in relation to changing learning environments. The findings will inform consideration of the design and use of educational spaces with all work exhibited through a dedicated website. The term ‘educational spaces’ avoids restrictive notions of what the concept of ‘school’ means, referring to any real or virtual space in which teaching and learning may occur or, as Ferguson and Seddon (2007) have referred to it, “the shifting imagery of education” that includes red brick schools and dispersed learning networks. The theoretical framework for this study is grounded in the work of Greene (1995) and Wright-Mills (2001) who cited the deployment of critical and empathic imagination in addressing education reform.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigate whether therewas a causal effect of income changes on the health satisfaction of East and West Germans in the years following reunification. Our data source is the German Socio-Economic Panel (GSOEP) between 1984 and 2002, and we fit a recently proposed fixed-effects ordinal estimator to our health measures and use a causal decomposition technique to account for panel attrition.We find evidence of a significant positive effect of income changes on health satisfaction, but the quantitative size of this effect is small. This is the case with respect to current income and a measure of ‘permanent’ income.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background Socioeconomically-disadvantaged adults in developed countries experience a higher prevalence of a number of chronic diseases, such as cardiovascular disease, type 2 diabetes, osteoarthritis and some forms of cancer. Overweight and obesity are major risk factors for these diseases. Lower socioeconomic groups have a greater prevalence of overweight and obesity and this may contribute to their higher morbidity and mortality. International studies suggest that socioeconomic groups may differ in their self-perceptions of weight status and their engagement in weightcontrol behaviours (WCBs). Research has shown that lower socioeconomic adults are more likely to underestimate their weight status, and are less likely to engage in WCBs. This may contribute (in part) to the marked inequalities in weight status observed at the population level. There are few, and somewhat limited, Australian studies that have examined the types of weight-control strategies people adopt, the barriers to their weight control, the determinants of their perceived weight status and WCBs. Furthermore, there are no known Australian studies that have examined socioeconomic differences in these factors to better understand the reasons for socioeconomic inequalities in weight status. Hence, the overall aim of this Thesis is to examine why socioeconomically-disadvantaged group experience a greater prevalence of overweight and obesity than their more-advantaged counterparts. Methods This Thesis used data from two sources. Men and women aged 45 to 60 years were examined from both data source. First, the longitudinal Australian Diabetes, Obesity and Lifestyle (AusDiab) Study were used to advance our knowledge and understanding of socioeconomic differences in weight change, perceived weight status and WCBs. A total of 2753 participants with measured weights at both baseline (1999-2000) and follow-up (2004-2005) were included in the analyses. Percent weight change over the five-year interval was calculated and perceived weight status, WCBs and highest attained education were collected at baseline. Second, the Candidate conducted a postal questionnaire from 1013 Brisbane residents (69.8 % response rate) to investigate the relationship between socioeconomic position, determinants of perceived weight status, WCBs, and barriers and reasons to weight control. A test-retest reliability study was conducted to determine the reliability of the new measures used in the questionnaire. Most new measures had substantial to almost perfect reliability when considering either kappa coefficient or crude agreement. Results The findings from the AusDiab Study (accepted for publication in the Australian and New Zealand Journal of Public Health) showed that low-educated men and women were more likely to be obese at baseline compared to their higheducated respondents (O.R. = 1.97, 95 % C.I. = 1.30-2.98 and O.R. = 1.52, 95 % C.I. = 1.03-2.25, respectively). Over the five year follow-up period (1999-2000 to 2004- 05) there were no socioeconomic differences in weight change among men, however socioeconomically-disadvantaged women had greater weight gains. Participants perceiving themselves as overweight gained less weight than those who saw themselves as underweight or normal weight. There was no relationship between engaging in WCBs and five-year weight change. The postal questionnaire data showed that socioeconomically-disadvantaged groups were less likely to engage in WCBs. If they did engage in weight control, they were less likely to adopt exercise strategies, including moderate and vigorous physical activities but were more likely to decrease their sitting time to control their weight. Socioeconomically-disadvantaged adults reported more barriers to weight control; such as perceiving weight loss as expensive, requiring a lot of cooking skills, not being a high priority and eating differently from other people in the household. These results have been accepted for publication in Public Health Nutrition. The third manuscript (under review in Social Science and Medicine) examined socioeconomic differences in determinants of perceived weight status and reasons for weight control. The results showed that lower socioeconomic adults were more likely to specify the following reasons for weight control: they considered themselves to be too heavy, for occupational requirements, on recommendation from their doctor, family members or friends. Conversely, high-income adults were more likely to report weight control to improve their physical condition or to look more attractive compared with those on lower-incomes. There were few socioeconomic differences in the determinants of perceived weight status. Conclusions Education inequalities in overweight/obesity among men and women may be due to mis-perceptions of weight status; overweight or obese individuals in loweducated groups may not perceive their weight as problematic and therefore may not pay attention to their energy-balance behaviours. Socioeconomic groups differ in WCBs, and their reasons and perceived barriers to weight control. Health promotion programs should encourage weight control among lower socioeconomic groups. More specifically, they should encourage the engagement of physical activity or exercise and dietary strategies among disadvantaged groups. Furthermore, such programs should address potential barriers for weight control that disadvantaged groups may encounter. For example, disadvantaged groups perceive that weight control is expensive, requires cooking skills, not a high priority and eating differently from other people in the household. Lastly, health promotion programs and policies aimed at reducing overweight and obesity should be tailored to the different reasons and motivations to weight control experienced by different socioeconomic groups. Weight-control interventions targeted at higher socioeconomic groups should use improving physical condition and attractiveness as motivational goals; while, utilising social support may be more effective for encouraging weight control among lower socioeconomic groups.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article centres on a research project in which freehand drawings provided a richly creative and colourful data source of children’s imagined, ideal learning environments. Issues concerning the analysis of the visual data are discussed, in particular how imaginative content was analysed and how the analytical process was dependent on an accompanying, secondary data source comprising brief, explanatory written texts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This report is the eight deliverable of the Real Time and Predictive Traveller Information project and the third deliverable of the Arterial Travel Time Information sub-project in the Integrated Traveller Information research Domain of the Smart Transport Research Centre. The primary objective of the Arterial Travel Time Information sub-project is to develop algorithms for real-time travel time estimation and prediction models for arterial traffic. Brisbane arterial network is highly equipped with Bluetooth MAC Scanners, which can provide travel time information. Literature is limited with the knowledge on the Bluetooth protocol based data acquisition process and accuracy and reliability of the analysis performed using the data. This report expands the body of knowledge surrounding the use of data from Bluetooth MAC Scanner (BMS) as a complementary traffic data source. A multi layer simulation model named Traffic and Communication Simulation (TCS) is developed. TCS is utilised to model the theoretical properties of the BMS data and analyse the accuracy and reliability of travel time estimation using the BMS data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The empirical analysis employs individual level data from the Australian Health Survey combined with retrospective data on tobacco price matched to the age at which the individual started and quit smoking. Split-population hazard models are estimated for both starting and quitting smoking. The analysis suggests price plays a significant role in the decision to start smoking but not in the decision to quit. Further sensitivity analysis of different age groups and an alternative data source, questions the robustness of the significant role of price in the smoking initiation decision. From a policy perspective, the results indicate that increases in tobacco taxation can be an important instrument in reducing the incidence of smoking, but should be combined with other mechanisms such as mandating smoke-free environments and antismoking education. Our results strongly support the targeting of antismoking campaigns towards teenagers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Variations that exist in the treatment of patients (with similar symptoms) across different hospitals do substantially impact the quality and costs of healthcare. Consequently, it is important to understand the similarities and differences between the practices across different hospitals. This paper presents a case study on the application of process mining techniques to measure and quantify the differences in the treatment of patients presenting with chest pain symptoms across four South Australian hospitals. Our case study focuses on cross-organisational benchmarking of processes and their performance. Techniques such as clustering, process discovery, performance analysis, and scientific workflows were applied to facilitate such comparative analyses. Lessons learned in overcoming unique challenges in cross-organisational process mining, such as ensuring population comparability, data granularity comparability, and experimental repeatability are also presented.