Biblioteca Digital

15 resultados para data science

em WestminsterResearch - UK

Extending Science Gateway Frameworks to Support Big Data Applications in the Cloud

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Cloud computing offers massive scalability and elasticity required by many scien-tific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.

The LH5 model for data mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the age of E-Business many companies faced with massive data sets that must be analysed for gaining a competitive edge. these data sets are in many instances incomplete and quite often not of very high quality. Although statistical analysis can be used to pre-process these data sets, this technique has its own limitations. In this paper we are presenting a system - and its underlying model - that can be used to test the integrity of existing data and pre-process the data into clearer data sets to be mined. LH5 is a rule-based system, capable of self-learning and is illustrated using a medical data set.

Comparative analysis of data mining algorithms for predicting inpatient length of stay

Relevância:

30.00% 30.00%

Publicador:

Proceedings of the Second International Workshop on User Interfaces in Data Intensive Systems: UIDIS 2001

Relevância:

30.00% 30.00%

Publicador:

Where and how to find data on safety: what do systematic reviews of complementary therapies tell us?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Successfully identifying relevant data for systematic reviews with a focus on safety may require retrieving information from a wider range of sources than for ‘effectiveness’ systematic reviews. Searching for safety data continues to prove a major challenge. Objectives: To examine search methods used in systematic reviews of safety and to investigate indexing. Methods: Systematic reviews focusing on safety of complementary therapies and related interventions were retrieved from comprehensive searches of major databases. Data was extracted on search strategies, sources used and indexing in major databases. Safety related search terms were compared against index terms available on major databases. Data extraction by one researcher using a pre-prepared template was checked for accuracy by a second researcher. Results: Screening of 2563 records resulted in 88 systematic reviews being identified. Information sources used varied with the type of intervention being addressed. Comparison of search terms with available index terms revealed additional potentially relevant terms that could be used in constructing search strategies. Seventy-nine reviews were indexed on PubMed, 84 on EMBASE, 21 on CINAHL, 15 on AMED, 6 on PsycINFO, 2 on BNI and HMIC. The mean number of generic safety-related indexing terms on PubMed records was 2.6. For EMBASE the mean number was 4.8 with at least 61 unique terms being employed. Most frequently used indexing terms and subheadings were adverse effects, side effects, drug interactions and herb-drug interactions. Use of terms specifically referring to safety varied across databases. Conclusions: Investigation of search methods revealed the range of information sources used, a list of which may prove a valuable resource for those planning to conduct systematic reviews of safety. The findings also indicated that there is potential to improve safety-related search strategies. Finally, an insight is provided into indexing of and most effective terms for finding safety studies on major databases.

Using data mining and simulation for health system understanding and capacity planning: an application to urgent care

Relevância:

30.00% 30.00%

Publicador:

IMP: Imperial Metagenomics Pipeline for high-throughput sequence data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have developed an in-house pipeline for the processing and analyses of sequence data generated during Illumina technology-based metagenomic studies of the human gut microbiota. Each component of the pipeline has been selected following comparative analysis of available tools; however, the modular nature of software facilitates replacement of any individual component with an alternative should a better tool become available in due course. The pipeline consists of quality analysis and trimming followed by taxonomic filtering of sequence data allowing reads associated with samples to be binned according to whether they represent human, prokaryotic (bacterial/archaeal), viral, parasite, fungal or plant DNA. Viral, parasite, fungal and plant DNA can be assigned to species level on a presence/absence basis, allowing – for example – identification of dietary intake of plant-based foodstuffs and their derivatives. Prokaryotic DNA is subject to taxonomic and functional analyses, with assignment to taxonomic hierarchies (kingdom, class, order, family, genus, species, strain/subspecies) and abundance determination. After de novo assembly of sequence reads, genes within samples are predicted and used to build a non-redundant catalogue of genes. From this catalogue, per-sample gene abundance can be determined after normalization of data based on gene length. Functional annotation of genes is achieved through mapping of gene clusters against KEGG proteins, and InterProScan. The pipeline is undergoing validation using the human faecal metagenomic data of Qin et al. (2014, Nature 513, 59–64). Outputs from the pipeline allow development of tools for the integration of metagenomic and metabolomic data, moving metagenomic studies beyond determination of gene richness and representation towards microbial-metabolite mapping. There is scope to improve the outputs from viral, parasite, fungal and plant DNA analyses, depending on the depth of sequencing associated with samples. The pipeline can easily be adapted for the analyses of environmental and non-human animal samples, and for use with data generated via non-Illumina sequencing platforms.

Analyzing and Modeling Medical Data on Distributed Computing Infrastructures

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Researchers want to analyse Health Care data which may requires large pools of compute and data resources. To have them they need access to Distributed Computing Infrastructures (DCI). To use them it requires expertise which researchers may not have. Workflows can hide infrastructures. There are many workflow systems but they are not interoperable. To learn a workflow system and create workflows in a workflow system may require significant effort. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows of other workflow systems. As a result, the lack of interoperability prevents workflow sharing and a vast amount of research efforts is wasted. The FP7 Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs (SHIWA) project developed the Coarse-Grained Interoperability (CGI) to enable workflow sharing. The project created the SHIWA Simulation Platform (SSP) to support CGI as a production-level service. The paper describes how the CGI approach can be used for analysis and simulation in Health Care.

"One of our hosts in another country": Challenges of data geolocation in cloud storage

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Physical location of data in cloud storage is an increasingly urgent problem. In a short time, it has evolved from the concern of a few regulated businesses to an important consideration for many cloud storage users. One of the characteristics of cloud storage is fluid transfer of data both within and among the data centres of a cloud provider. However, this has weakened the guarantees with respect to control over data replicas, protection of data in transit and physical location of data. This paper addresses the lack of reliable solutions for data placement control in cloud storage systems. We analyse the currently available solutions and identify their shortcomings. Furthermore, we describe a high-level architecture for a trusted, geolocation-based mechanism for data placement control in distributed cloud storage systems, which are the basis of an on-going work to define the detailed protocol and a prototype of such a solution. This mechanism aims to provide granular control over the capabilities of tenants to access data placed on geographically dispersed storage units comprising the cloud storage.

Image, video and 3D data registration: medical, satellite and video processing applications with quality metrics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data registration refers to a series of techniques for matching or bringing similar objects or datasets together into alignment. These techniques enjoy widespread use in a diverse variety of applications, such as video coding, tracking, object and face detection and recognition, surveillance and satellite imaging, medical image analysis and structure from motion. Registration methods are as numerous as their manifold uses, from pixel level and block or feature based methods to Fourier domain methods. This book is focused on providing algorithms and image and video techniques for registration and quality performance metrics. The authors provide various assessment metrics for measuring registration quality alongside analyses of registration techniques, introducing and explaining both familiar and state–of–the–art registration methodologies used in a variety of targeted applications.

Starting antidepressant use: a qualitative synthesis of UK and Australian data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To explore people's experiences of starting antidepressant treatment. Design Qualitative interpretive approach combining thematic analysis with constant comparison. Relevant coding reports from the original studies (generated using NVivo) relating to initial experiences of antidepressants were explored in further detail, focusing on the ways in which participants discussed their experiences of taking or being prescribed an antidepressant for the first time. Participants 108 men and women aged 22–84 who had taken antidepressants for depression. Setting Respondents recruited throughout the UK during 2003–2004 and 2008 and 2012–2013 and in Australia during 2010–2011. Results People expressed a wide range of feelings about initiating antidepressant use. People's attitudes towards starting antidepressant use were shaped by stereotypes and stigmas related to perceived drug dependency and potentially extreme side effects. Anxieties were expressed about starting use, and about how long the antidepressant might begin to take effect, how much it might help or hinder them, and about what to expect in the initial weeks. People worried about the possibility of experiencing adverse effects and implications for their senses of self. Where people felt they had not been given sufficient time during their consultation information or support to take the medicines, the uncertainty could be particularly unsettling and impact on their ongoing views on and use of antidepressants as a viable treatment option. Conclusions Our paper is the first to explore in-depth patient existential concerns about start of antidepressant use using multicountry data. People need additional support when they make decisions about starting antidepressants. Health professionals can use our findings to better understand and explore with patients’ their concerns before their patients start antidepressants. These insights are key to supporting patients, many of whom feel intimidated by the prospect of taking antidepressants, especially during the uncertain first few weeks of treatment.

Data-driven Hippocampus CA1 Modeling in the Human Brain Project

Relevância:

30.00% 30.00%

Publicador:

Preconditioning 2D Integer Data for Fast Convex Hull Computations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order to accelerate computing the convex hull on a set of n points, a heuristic procedure is often applied to reduce the number of points to a set of s points, s ≤ n, which also contains the same hull. We present an algorithm to precondition 2D data with integer coordinates bounded by a box of size p × q before building a 2D convex hull, with three distinct advantages. First, we prove that under the condition min(p, q) ≤ n the algorithm executes in time within O(n); second, no explicit sorting of data is required; and third, the reduced set of s points forms a simple polygonal chain and thus can be directly pipelined into an O(n) time convex hull algorithm. This paper empirically evaluates and quantifies the speed up gained by preconditioning a set of points by a method based on the proposed algorithm before using common convex hull algorithms to build the final hull. A speedup factor of at least four is consistently found from experiments on various datasets when the condition min(p, q) ≤ n holds; the smaller the ratio min(p, q)/n is in the dataset, the greater the speedup factor achieved.

The Data of Things: Strategies, Patterns and Practice of Cloud-based Participatory Sensing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The broad capabilities of current mobile devices have paved the way for Mobile Crowd Sensing (MCS) applications. The success of this emerging paradigm strongly depends on the quality of received data which, in turn, is contingent to mass user participation; the broader the participation, the more useful these systems become. However, there is an ongoing trend that tries to integrate MCS applications with emerging computing paradigms such as cloud computing. The intuition is that such a transition can significantly improve the overall efficiency while at the same time it offers stronger security and privacy-preserving mechanisms for the end-user. In this position paper, we dwell on the underpinnings of incorporating cloud computing techniques to facilitate the vast amount of data collected in MCS applications. That is, we present a list of core system, security and privacy requirements that must be met if such a transition is to be successful. To this end, we first address several competing challenges not previously considered in the literature such as the scarce energy resources of battery-powered mobile devices as well as their limited computational resources that they often prevent the use of computationally heavy cryptographic operations and thus offering limited security services to the end-user. Finally, we present a use case scenario as a comprehensive example. Based on our findings, we posit open issues and challenges, and discuss possible ways to address them, so that security and privacy do not hinder the migration of MCS systems to the cloud.

Risk Modelling Framework for Emergency Hospital Readmission, Using Hospital Episode Statistics Inpatient Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this study was to develop, test and benchmark a framework and a predictive risk model for hospital emergency readmission within 12 months. We performed the development using routinely collected Hospital Episode Statistics data covering inpatient hospital admissions in England. Three different timeframes were used for training, testing and benchmarking: 1999 to 2004, 2000 to 2005 and 2004 to 2009 financial years. Each timeframe includes 20% of all inpatients admitted within the trigger year. The comparisons were made using positive predictive value, sensitivity and specificity for different risk cut-offs, risk bands and top risk segments, together with the receiver operating characteristic curve. The constructed Bayes Point Machine using this feature selection framework produces a risk probability for each admitted patient, and it was validated for different timeframes, sub-populations and cut-off points. At risk cut-off of 50%, the positive predictive value was 69.3% to 73.7%, the specificity was 88.0% to 88.9% and sensitivity was 44.5% to 46.3% across different timeframes. Also, the area under the receiver operating characteristic curve was 73.0% to 74.3%. The developed framework and model performed considerably better than existing modelling approaches with high precision and moderate sensitivity.