894 resultados para DATA INTEGRATION
Resumo:
This paper highlights the challenges of satellite monitoring systems integration, in particular based on Grid platform, and reviews possible solutions for these problems. We describe integration issues on different levels: data integration level and task management level (job submission in terms of Grid). We show example of described technologies for integration of monitoring systems of Ukraine (National Space Agency of Ukraine, NASU) and Russia (Space Research Institute RAS, IKI RAN). Another example refers to the development of InterGrid infrastructure that integrates several regional and national Grid systems: Ukrainian Academician Grid (with Satellite data processing Grid segment) and RSGS Grid (Chinese Academy of Sciences).
Resumo:
Background: Major Depressive Disorder (MDD) is among the most prevalent and disabling medical conditions worldwide. Identification of clinical and biological markers ("biomarkers") of treatment response could personalize clinical decisions and lead to better outcomes. This paper describes the aims, design, and methods of a discovery study of biomarkers in antidepressant treatment response, conducted by the Canadian Biomarker Integration Network in Depression (CAN-BIND). The CAN-BIND research program investigates and identifies biomarkers that help to predict outcomes in patients with MDD treated with antidepressant medication. The primary objective of this initial study (known as CAN-BIND-1) is to identify individual and integrated neuroimaging, electrophysiological, molecular, and clinical predictors of response to sequential antidepressant monotherapy and adjunctive therapy in MDD. Methods: CAN-BIND-1 is a multisite initiative involving 6 academic health centres working collaboratively with other universities and research centres. In the 16-week protocol, patients with MDD are treated with a first-line antidepressant (escitalopram 10-20 mg/d) that, if clinically warranted after eight weeks, is augmented with an evidence-based, add-on medication (aripiprazole 2-10 mg/d). Comprehensive datasets are obtained using clinical rating scales; behavioural, dimensional, and functioning/quality of life measures; neurocognitive testing; genomic, genetic, and proteomic profiling from blood samples; combined structural and functional magnetic resonance imaging; and electroencephalography. De-identified data from all sites are aggregated within a secure neuroinformatics platform for data integration, management, storage, and analyses. Statistical analyses will include multivariate and machine-learning techniques to identify predictors, moderators, and mediators of treatment response. Discussion: From June 2013 to February 2015, a cohort of 134 participants (85 outpatients with MDD and 49 healthy participants) has been evaluated at baseline. The clinical characteristics of this cohort are similar to other studies of MDD. Recruitment at all sites is ongoing to a target sample of 290 participants. CAN-BIND will identify biomarkers of treatment response in MDD through extensive clinical, molecular, and imaging assessments, in order to improve treatment practice and clinical outcomes. It will also create an innovative, robust platform and database for future research. Trial registration: ClinicalTrials.gov identifier NCT01655706. Registered July 27, 2012.
Resumo:
An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^
Resumo:
The mediator software architecture design has been developed to provide data integration and retrieval in distributed, heterogeneous environments. Since the initial conceptualization of this architecture, many new technologies have emerged that can facilitate the implementation of this design. The purpose of this thesis was to show that a mediator framework supporting users of mobile devices could be implemented using common software technologies available today. In addition, the prototype was developed with a view to providing a better understanding of what a mediator is and to expose issues that will have to be addressed in full, more robust designs. The prototype developed for this thesis was implemented using various technologies including: Java, XML, and Simple Object Access Protocol (SOAP) among others. SOAP was used to accomplish inter-process communication. In the end, it is expected that more data intensive software applications will be possible in a world with ever-increasing demands for information.
Resumo:
Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. One of the main challenges in data integration is reconciling semantic differences among data sources. Approaches that been used to solve this problem can be categorized as schema-based and attribute-based. Schema-based approaches use schema information to identify the semantic similarity in data; furthermore, they focus on reconciling types before reconciling attributes. In contrast, attribute-based approaches use statistical and structural information of attributes to identify the semantic similarity of data in different sources. This research examines an approach to semantic reconciliation based on integrating properties expressed at different levels of abstraction or granularity using the concept of property precedence. Property precedence reconciles the meaning of attributes by identifying similarities between attributes based on what these attributes represent in the real world. In order to use property precedence for semantic integration, we need to identify the precedence of attributes within and across data sources. The goal of this research is to develop and evaluate a method and algorithms that will identify precedence relations among attributes and build property precedence graph (PPG) that can be used to support integration.
Resumo:
The authors would like to thank the College of Life Sciences of Aberdeen University and Marine Scotland Science which funded CP's PhD project. Skate tagging experiments were undertaken as part of Scottish Government project SP004. We thank Ian Burrett for help in catching the fish and the other fishermen and anglers who returned tags. We thank José Manuel Gonzalez-Irusta for extracting and making available the environmental layers used as environmental covariates in the environmental suitability modelling procedure. We also thank Jason Matthiopoulos for insightful suggestions on habitat utilization metrics as well as Stephen C.F. Palmer, and three anonymous reviewers for useful suggestions to improve the clarity and quality of the manuscript.
Resumo:
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.
Resumo:
Abstract: Decision support systems have been widely used for years in companies to gain insights from internal data, thus making successful decisions. Lately, thanks to the increasing availability of open data, these systems are also integrating open data to enrich decision making process with external data. On the other hand, within an open-data scenario, decision support systems can be also useful to decide which data should be opened, not only by considering technical or legal constraints, but other requirements, such as "reusing potential" of data. In this talk, we focus on both issues: (i) open data for decision making, and (ii) decision making for opening data. We will first briefly comment some research problems regarding using open data for decision making. Then, we will give an outline of a novel decision-making approach (based on how open data is being actually used in open-source projects hosted in Github) for supporting open data publication. Bio of the speaker: Jose-Norberto Mazón holds a PhD from the University of Alicante (Spain). He is head of the "Cátedra Telefónica" on Big Data and coordinator of the Computing degree at the University of Alicante. He is also member of the WaKe research group at the University of Alicante. His research work focuses on open data management, data integration and business intelligence within "big data" scenarios, and their application to the tourism domain (smart tourism destinations). He has published his research in international journals, such as Decision Support Systems, Information Sciences, Data & Knowledge Engineering or ACM Transaction on the Web. Finally, he is involved in the open data project in the University of Alicante, including its open data portal at http://datos.ua.es
Resumo:
This paper discusses a framework in which catalog service communities are built, linked for interaction, and constantly monitored and adapted over time. A catalog service community (represented as a peer node in a peer-to-peer network) in our system can be viewed as domain specific data integration mediators representing the domain knowledge and the registry information. The query routing among communities is performed to identify a set of data sources that are relevant to answering a given query. The system monitors the interactions between the communities to discover patterns that may lead to restructuring of the network (e.g., irrelevant peers removed, new relationships created, etc.).
Resumo:
This paper presents a methodology for estimation of average travel time on signalized urban networks by integrating cumulative plots and probe data. This integration aims to reduce the relative deviations in the cumulative plots due to midlink sources and sinks. During undersaturated traffic conditions, the concept of a virtual probe is introduced, and therefore, accurate travel time can be obtained when a real probe is unavailable. For oversaturated traffic conditions, only one probe per travel time estimation interval—360 s or 3% of vehicles traversing the link as a probe—has the potential to provide accurate travel time.
Resumo:
Clinical information systems have become important tools in contemporary clinical patient care. However, there is a question of whether the current clinical information systems are able to effectively support clinicians in decision making processes. We conducted a survey to identify some of the decision making issues related to the use of existing clinical information systems. The survey was conducted among the end users of the cardiac surgery unit, quality and safety unit, intensive care unit and clinical costing unit at The Prince Charles Hospital (TPCH). Based on the survey results and reviewed literature, it was identified that support from the current information systems for decision-making is limited. Also, survey results showed that the majority of respondents considered lack in data integration to be one of the major issues followed by other issues such as limited access to various databases, lack of time and lack in efficient reporting and analysis tools. Furthermore, respondents pointed out that data quality is an issue and the three major data quality issues being faced are lack of data completeness, lack in consistency and lack in data accuracy. Conclusion: Current clinical information systems support for the decision-making processes in Cardiac Surgery in this institution is limited and this could be addressed by integrating isolated clinical information systems.
Resumo:
This paper introduces PartSS, a new partition-based fil- tering for tasks performing string comparisons under edit distance constraints. PartSS offers improvements over the state-of-the-art method NGPP with the implementation of a new partitioning scheme and also improves filtering abil- ities by exploiting theoretical results on shifting and scaling ranges, thus accelerating the rate of calculating edit distance between strings. PartSS filtering has been implemented within two major tasks of data integration: similarity join and approximate membership extraction under edit distance constraints. The evaluation on an extensive range of real-world datasets demonstrates major gain in efficiency over NGPP and QGrams approaches.
Resumo:
We are pleased to present the papers from the Australasian Health Informatics and Knowledge Management (HIKM) conference stream held on 20 January 2011 in Perth as a session of the Australasian Computer Science Week (ASCW) 2011. Formerly HIKM was named Health Data and Knowledge Management, however the inclusion of the health informatics term is timely given the current health reform. The submissions to HIKM 2011 demonstrated that Australasian researchers lead with many research and development innovations coming to fruition. Some of these innovations can be seen here, and we believe further recognition will accomplish by continuation to HIKM in the future. The HIKM conference is a review of health informatics related research, development and education opportunities. The conference papers were written to communicate with other researchers and share research findings, capturing each and every aspect of the health informatics field. They are namely: conceptual models and architectures, privacy and quality of health data, health workflow management patient journey analysis, health information retrieval, analysis and visualisation, data integration/linking, systems for integrated or coordinated care, electronic health records (EHRs) and personally controlled electronic health records (PCEHRs), health data ontologies, and standardisation in health data and clinical applications.