889 resultados para heterogeneous data sources


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The National Road Safety Strategy 2011-2020 outlines plans to reduce the burden of road trauma via improvements and interventions relating to safe roads, safe speeds, safe vehicles, and safe people. It also highlights that a key aspect in achieving these goals is the availability of comprehensive data on the issue. The use of data is essential so that more in-depth epidemiologic studies of risk can be conducted as well as to allow effective evaluation of road safety interventions and programs. Before utilising data to evaluate the efficacy of prevention programs it is important for a systematic evaluation of the quality of underlying data sources to be undertaken to ensure any trends which are identified reflect true estimates rather than spurious data effects. However, there has been little scientific work specifically focused on establishing core data quality characteristics pertinent to the road safety field and limited work undertaken to develop methods for evaluating data sources according to these core characteristics. There are a variety of data sources in which traffic-related incidents and resulting injuries are recorded, which are collected for a variety of defined purposes. These include police reports, transport safety databases, emergency department data, hospital morbidity data and mortality data to name a few. However, as these data are collected for specific purposes, each of these data sources suffers from some limitations when seeking to gain a complete picture of the problem. Limitations of current data sources include: delays in data being available, lack of accurate and/or specific location information, and an underreporting of crashes involving particular road user groups such as cyclists. This paper proposes core data quality characteristics that could be used to systematically assess road crash data sources to provide a standardised approach for evaluating data quality in the road safety field. The potential for data linkage to qualitatively and quantitatively improve the quality and comprehensiveness of road crash data is also discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Kallikrein 15 (KLK15)/Prostinogen is a plausible candidate for prostate cancer susceptibility. Elevated KLK15 expression has been reported in prostate cancer and it has been described as an unfavorable prognostic marker for the disease. Objectives: We performed a comprehensive analysis of association of variants in the KLK15 gene with prostate cancer risk and aggressiveness by genotyping tagSNPs, as well as putative functional SNPs identified by extensive bioinformatics analysis. Methods and Data Sources: Twelve out of 22 SNPs, selected on the basis of linkage disequilibrium pattern, were analyzed in an Australian sample of 1,011 histologically verified prostate cancer cases and 1,405 ethnically matched controls. Replication was sought from two existing genome wide association studies (GWAS): the Cancer Genetic Markers of Susceptibility (CGEMS) project and a UK GWAS study. Results: Two KLK15 SNPs, rs2659053 and rs3745522, showed evidence of association (p, 0.05) but were not present on the GWAS platforms. KLK15 SNP rs2659056 was found to be associated with prostate cancer aggressiveness and showed evidence of association in a replication cohort of 5,051 patients from the UK, Australia, and the CGEMS dataset of US samples. A highly significant association with Gleason score was observed when the data was combined from these three studies with an Odds Ratio (OR) of 0.85 (95% CI = 0.77-0.93; p = 2.7610 24). The rs2659056 SNP is predicted to alter binding of the RORalpha transcription factor, which has a role in the control of cell growth and differentiation and has been suggested to control the metastatic behavior of prostate cancer cells. Conclusions: Our findings suggest a role for KLK15 genetic variation in the etiology of prostate cancer among men of European ancestry, although further studies in very large sample sets are necessary to confirm effect sizes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper describes an innovative platform that facilitates the collection of objective safety data around occurrences at railway level crossings using data sources including forward-facing video, telemetry from trains and geo-referenced asset and survey data. This platform is being developed with support by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper provides a description of the underlying accident causation model, the development methodology and refinement process as well as a description of the data collection platform. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The decisions people make about medical treatments have a great impact on their lives. Health care practitioners, providers and patients often make decisions about medical treatments without complete understanding of the circumstances. The main reason for this is that medical data are available in fragmented, disparate and heterogeneous data silos. Without a centralised data warehouse structure to integrate these data silos, it is highly unlikely and impractical for the users to get all the information required on time to make a correct decision. In this research paper, a clinical data integration approach using SAS Clinical Data Integration Server tools is presented.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Effective management of chronic diseases is a global health priority. A healthcare information system offers opportunities to address challenges of chronic disease management. However, the requirements of health information systems are often not well understood. The accuracy of requirements has a direct impact on the successful design and implementation of a health information system. Our research describes methods used to understand the requirements of health information systems for advanced prostate cancer management. The research conducted a survey to identify heterogeneous sources of clinical records. Our research showed that the General Practitioner was the common source of patient's clinical records (41%) followed by the Urologist (14%) and other clinicians (14%). Our research describes a method to identify diverse data sources and proposes a novel patient journey browser prototype that integrates disparate data sources.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Diagnostics of rolling element bearings involves a combination of different techniques of signal enhancing and analysis. The most common procedure presents a first step of order tracking and synchronous averaging, able to remove the undesired components, synchronous with the shaft harmonics, from the signal, and a final step of envelope analysis to obtain the squared envelope spectrum. This indicator has been studied thoroughly, and statistically based criteria have been obtained, in order to identify damaged bearings. The statistical thresholds are valid only if all the deterministic components in the signal have been removed. Unfortunately, in various industrial applications, characterized by heterogeneous vibration sources, the first step of synchronous averaging is not sufficient to eliminate completely the deterministic components and an additional step of pre-whitening is needed before the envelope analysis. Different techniques have been proposed in the past with this aim: The most widely spread are linear prediction filters and spectral kurtosis. Recently, a new technique for pre-whitening has been proposed, based on cepstral analysis: the so-called cepstrum pre-whitening. Owing to its low computational requirements and its simplicity, it seems a good candidate to perform the intermediate pre-whitening step in an automatic damage recognition algorithm. In this paper, the effectiveness of the new technique will be tested on the data measured on a full-scale industrial bearing test-rig, able to reproduce the harsh conditions of operation. A benchmark comparison with the traditional pre-whitening techniques will be made, as a final step for the verification of the potentiality of the cepstrum pre-whitening.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Technological advances have led to an influx of affordable hardware that supports sensing, computation and communication. This hardware is increasingly deployed in public and private spaces, tracking and aggregating a wealth of real-time environmental data. Although these technologies are the focus of several research areas, there is a lack of research dealing with the problem of making these capabilities accessible to everyday users. This thesis represents a first step towards developing systems that will allow users to leverage the available infrastructure and create custom tailored solutions. It explores how this notion can be utilized in the context of energy monitoring to improve conventional approaches. The project adopted a user-centered design process to inform the development of a flexible system for real-time data stream composition and visualization. This system features an extensible architecture and defines a unified API for heterogeneous data streams. Rather than displaying the data in a predetermined fashion, it makes this information available as building blocks that can be combined and shared. It is based on the insight that individual users have diverse information needs and presentation preferences. Therefore, it allows users to compose rich information displays, incorporating personally relevant data from an extensive information ecosystem. The prototype was evaluated in an exploratory study to observe its natural use in a real-world setting, gathering empirical usage statistics and conducting semi-structured interviews. The results show that a high degree of customization does not warrant sustained usage. Other factors were identified, yielding recommendations for increasing the impact on energy consumption.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes an experimental study of quality metrics that can be applied to visual and infrared images acquired from cameras onboard an unmanned ground vehicle (UGV). The relevance of existing metrics in this context is discussed and a novel metric is introduced. Selected metrics are evaluated on data collected by a UGV in clear and challenging environmental conditions, represented in this paper by the presence of airborne dust or smoke. An example of application is given with monocular SLAM estimating the pose of the UGV while smoke is present in the environment. It is shown that the proposed novel quality metric can be used to anticipate situations where the quality of the pose estimate will be significantly degraded due to the input image data. This leads to decisions of advantageously switching between data sources (e.g. using infrared images instead of visual images).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Loop detectors are the oldest and widely used traffic data source. On urban arterials, they are mainly installed for signal control. Recently state of the art Bluetooth MAC Scanners (BMS) has significantly captured the interest of stakeholders for exploiting it for area wide traffic monitoring. Loop detectors provide flow- a fundamental traffic parameter; whereas BMS provides individual vehicle travel time between BMS stations. Hence, these two data sources complement each other, and if integrated should increase the accuracy and reliability of the traffic state estimation. This paper proposed a model that integrates loops and BMS data for seamless travel time and density estimation for urban signalised network. The proposed model is validated using both real and simulated data and the results indicate that the accuracy of the proposed model is over 90%.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The reliance on police data for the counting of road crash injuries can be problematic, as it is well known that not all road crash injuries are reported to police which under-estimates the overall burden of road crash injuries. The aim of this study was to use multiple linked data sources to estimate the extent of under-reporting of road crash injuries to police in the Australian state of Queensland. Data from the Queensland Road Crash Database (QRCD), the Queensland Hospital Admitted Patients Data Collection (QHAPDC), Emergency Department Information System (EDIS), and the Queensland Injury Surveillance Unit (QISU) for the year 2009 were linked. The completeness of road crash cases reported to police was examined via discordance rates between the police data (QRCD) and the hospital data collections. In addition, the potential bias of this discordance (under-reporting) was assessed based on gender, age, road user group, and regional location. Results showed that the level of under-reporting varied depending on the data set with which the police data was compared. When all hospital data collections are examined together the estimated population of road crash injuries was approximately 28,000, with around two-thirds not linking to any record in the police data. The results also showed that the under-reporting was more likely for motorcyclists, cyclists, males, young people, and injuries occurring in Remote and Inner Regional areas. These results have important implications for road safety research and policy in terms of: prioritising funding and resources; targeting road safety interventions into areas of higher risk; and estimating the burden of road crash injuries.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Huge amount of data are generated from a variety of information sources in healthcare while the data sources originate from a veracity of clinical information systems and corporate data warehouses. The data derived from the above data sources are used for analysis and trending purposes thus playing an influential role as a real time decision-making tool. The unstructured, narrative data provided by these data sources qualify as healthcare big-data and researchers argue that the application of big-data in healthcare might enable the accountability and efficiency.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis examines the feasibility of a forest inventory method based on two-phase sampling in estimating forest attributes at the stand or substand levels for forest management purposes. The method is based on multi-source forest inventory combining auxiliary data consisting of remote sensing imagery or other geographic information and field measurements. Auxiliary data are utilized as first-phase data for covering all inventory units. Various methods were examined for improving the accuracy of the forest estimates. Pre-processing of auxiliary data in the form of correcting the spectral properties of aerial imagery was examined (I), as was the selection of aerial image features for estimating forest attributes (II). Various spatial units were compared for extracting image features in a remote sensing aided forest inventory utilizing very high resolution imagery (III). A number of data sources were combined and different weighting procedures were tested in estimating forest attributes (IV, V). Correction of the spectral properties of aerial images proved to be a straightforward and advantageous method for improving the correlation between the image features and the measured forest attributes. Testing different image features that can be extracted from aerial photographs (and other very high resolution images) showed that the images contain a wealth of relevant information that can be extracted only by utilizing the spatial organization of the image pixel values. Furthermore, careful selection of image features for the inventory task generally gives better results than inputting all extractable features to the estimation procedure. When the spatial units for extracting very high resolution image features were examined, an approach based on image segmentation generally showed advantages compared with a traditional sample plot-based approach. Combining several data sources resulted in more accurate estimates than any of the individual data sources alone. The best combined estimate can be derived by weighting the estimates produced by the individual data sources by the inverse values of their mean square errors. Despite the fact that the plot-level estimation accuracy in two-phase sampling inventory can be improved in many ways, the accuracy of forest estimates based mainly on single-view satellite and aerial imagery is a relatively poor basis for making stand-level management decisions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This study was undertaken by UKOLN on behalf of the Joint Information Systems Committee (JISC) in the period April to September 2008. Application profiles are metadata schemata which consist of data elements drawn from one or more namespaces, optimized for a particular local application. They offer a way for particular communities to base the interoperability specifications they create and use for their digital material on established open standards. This offers the potential for digital materials to be accessed, used and curated effectively both within and beyond the communities in which they were created. The JISC recognized the need to undertake a scoping study to investigate metadata application profile requirements for scientific data in relation to digital repositories, and specifically concerning descriptive metadata to support resource discovery and other functions such as preservation. This followed on from the development of the Scholarly Works Application Profile (SWAP) undertaken within the JISC Digital Repositories Programme and led by Andy Powell (Eduserv Foundation) and Julie Allinson (RRT UKOLN) on behalf of the JISC. Aims and Objectives 1.To assess whether a single metadata AP for research data, or a small number thereof, would improve resource discovery or discovery-to-delivery in any useful or significant way. 2.If so, then to:a.assess whether the development of such AP(s) is practical and if so, how much effort it would take; b.scope a community uptake strategy that is likely to be successful, identifying the main barriers and key stakeholders. 3.Otherwise, to investigate how best to improve cross-discipline, cross-community discovery-to-delivery for research data, and make recommendations to the JISC and others as appropriate. Approach The Study used a broad conception of what constitutes scientific data, namely data gathered, collated, structured and analysed using a recognizably scientific method, with a bias towards quantitative methods. The approach taken was to map out the landscape of existing data centres, repositories and associated projects, and conduct a survey of the discovery-to-delivery metadata they use or have defined, alongside any insights they have gained from working with this metadata. This was followed up by a series of unstructured interviews, discussing use cases for a Scientific Data Application Profile, and how widely a single profile might be applied. On the latter point, matters of granularity, the experimental/measurement contrast, the quantitative/qualitative contrast, the raw/derived data contrast, and the homogeneous/heterogeneous data collection contrast were discussed. The Study report was loosely structured according to the Singapore Framework for Dublin Core Application Profiles, and in turn considered: the possible use cases for a Scientific Data Application Profile; existing domain models that could either be used or adapted for use within such a profile; and a comparison existing metadata profiles and standards to identify candidate elements for inclusion in the description set profile for scientific data. The report also considered how the application profile might be implemented, its relationship to other application profiles, the alternatives to constructing a Scientific Data Application Profile, the development effort required, and what could be done to encourage uptake in the community. The conclusions of the Study were validated through a reference group of stakeholders.