282 resultados para Data anonymization and sanitization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, heterogeneity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers near equivalent answers compared with analyses of the full dataset under a controlled error rate. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally, it has the potential to add value to other Big Data sampling algorithms, in particular divide-and-conquer strategies, by determining efficient sub-samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The world has experienced a large increase in the amount of available data. Therefore, it requires better and more specialized tools for data storage and retrieval and information privacy. Recently Electronic Health Record (EHR) Systems have emerged to fulfill this need in health systems. They play an important role in medicine by granting access to information that can be used in medical diagnosis. Traditional systems have a focus on the storage and retrieval of this information, usually leaving issues related to privacy in the background. Doctors and patients may have different objectives when using an EHR system: patients try to restrict sensible information in their medical records to avoid misuse information while doctors want to see as much information as possible to ensure a correct diagnosis. One solution to this dilemma is the Accountable e-Health model, an access protocol model based in the Information Accountability Protocol. In this model patients are warned when doctors access their restricted data. They also enable a non-restrictive access for authenticated doctors. In this work we use FluxMED, an EHR system, and augment it with aspects of the Information Accountability Protocol to address these issues. The Implementation of the Information Accountability Framework (IAF) in FluxMED provides ways for both patients and physicians to have their privacy and access needs achieved. Issues related to storage and data security are secured by FluxMED, which contains mechanisms to ensure security and data integrity. The effort required to develop a platform for the management of medical information is mitigated by the FluxMED's workflow-based architecture: the system is flexible enough to allow the type and amount of information being altered without the need to change in your source code.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND Bronchiectasis is a major contributor to chronic respiratory morbidity and mortality worldwide. Wheeze and other asthma-like symptoms and bronchial hyperreactivity may occur in people with bronchiectasis. Physicians often use asthma treatments in patients with bronchiectasis. OBJECTIVES To assess the effects of inhaled long-acting beta2-agonists (LABA) combined with inhaled corticosteroids (ICS) in children and adults with bronchiectasis during (1) acute exacerbations and (2) stable state. SEARCH METHODS The Cochrane Airways Group searched the the Cochrane Airways Group Specialised Register of Trials, which includes records identified from the Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE, EMBASE and other databases. The Cochrane Airways Group performed the latest searches in October 2013. SELECTION CRITERIA All randomised controlled trials (RCTs) of combined ICS and LABA compared with a control (placebo, no treatment, ICS as monotherapy) in children and adults with bronchiectasis not related to cystic fibrosis (CF). DATA COLLECTION AND ANALYSIS Two review authors extracted data independently using standard methodological procedures as expected by The Cochrane Collaboration. MAIN RESULTS We found no RCTs comparing ICS and LABA combination with either placebo or usual care. We included one RCT that compared combined ICS and LABA with high-dose ICS in 40 adults with non-CF bronchiectasis without co-existent asthma. All participants received three months of high-dose budesonide dipropionate treatment (1600 micrograms). After three months, participants were randomly assigned to receive either high-dose budesonide dipropionate (1600 micrograms per day) or a combination of budesonide with formoterol (640 micrograms of budesonide and 18 micrograms of formoterol) for three months. The study was not blinded. We assessed it to be an RCT with overall high risk of bias. Data analysed in this review showed that those who received combined ICS-LABA (in stable state) had a significantly better transition dyspnoea index (mean difference (MD) 1.29, 95% confidence interval (CI) 0.40 to 2.18) and cough-free days (MD 12.30, 95% CI 2.38 to 22.2) compared with those receiving ICS after three months of treatment. No significant difference was noted between groups in quality of life (MD -4.57, 95% CI -12.38 to 3.24), number of hospitalisations (odds ratio (OR) 0.26, 95% CI 0.02 to 2.79) or lung function (forced expiratory volume in one second (FEV1) and forced vital capacity (FVC)). Investigators reported 37 adverse events in the ICS group versus 12 events in the ICS-LABA group but did not mention the number of individuals experiencing adverse events. Hence differences between groups were not included in the analyses. We assessed the overall evidence to be low quality. AUTHORS' CONCLUSIONS In adults with bronchiectasis without co-existent asthma, during stable state, a small single trial with a high risk of bias suggests that combined ICS-LABA may improve dyspnoea and increase cough-free days in comparison with high-dose ICS. No data are provided for or against, the use of combined ICS-LABA in adults with bronchiectasis during an acute exacerbation, or in children with bronchiectasis in a stable or acute state. The absence of high quality evidence means that decisions to use or discontinue combined ICS-LABA in people with bronchiectasis may need to take account of the presence or absence of co-existing airway hyper-responsiveness and consideration of adverse events associated with combined ICS-LABA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Flos Chrysanthemum is a generic name for a particular group of edible plants, which also have medicinal properties. There are, in fact, twenty to thirty different cultivars, which are commonly used in beverages and for medicinal purposes. In this work, four Flos Chrysanthemum cultivars, Hangju, Taiju, Gongju, and Boju, were collected and chromatographic fingerprints were used to distinguish and assess these cultivars for quality control purposes. Chromatography fingerprints contain chemical information but also often have baseline drifts and peak shifts, which complicate data processing, and adaptive iteratively reweighted, penalized least squares, and correlation optimized warping were applied to correct the fingerprint peaks. The adjusted data were submitted to unsupervised and supervised pattern recognition methods. Principal component analysis was used to qualitatively differentiate the Flos Chrysanthemum cultivars. Partial least squares, continuum power regression, and K-nearest neighbors were used to predict the unknown samples. Finally, the elliptic joint confidence region method was used to evaluate the prediction ability of these models. The partial least squares and continuum power regression methods were shown to best represent the experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Multilevel and spatial models are being increasingly used to obtain substantive information on area-level inequalities in cancer survival. Multilevel models assume independent geographical areas, whereas spatial models explicitly incorporate geographical correlation, often via a conditional autoregressive prior. However the relative merits of these methods for large population-based studies have not been explored. Using a case-study approach, we report on the implications of using multilevel and spatial survival models to study geographical inequalities in all-cause survival. Methods Multilevel discrete-time and Bayesian spatial survival models were used to study geographical inequalities in all-cause survival for a population-based colorectal cancer cohort of 22,727 cases aged 20–84 years diagnosed during 1997–2007 from Queensland, Australia. Results Both approaches were viable on this large dataset, and produced similar estimates of the fixed effects. After adding area-level covariates, the between-area variability in survival using multilevel discrete-time models was no longer significant. Spatial inequalities in survival were also markedly reduced after adjusting for aggregated area-level covariates. Only the multilevel approach however, provided an estimation of the contribution of geographical variation to the total variation in survival between individual patients. Conclusions With little difference observed between the two approaches in the estimation of fixed effects, multilevel models should be favored if there is a clear hierarchical data structure and measuring the independent impact of individual- and area-level effects on survival differences is of primary interest. Bayesian spatial analyses may be preferred if spatial correlation between areas is important and if the priority is to assess small-area variations in survival and map spatial patterns. Both approaches can be readily fitted to geographically enabled survival data from international settings

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network topology and routing are two important factors in determining the communication costs of big data applications at large scale. As for a given Cluster, Cloud, or Grid system, the network topology is fixed and static or dynamic routing protocols are preinstalled to direct the network traffic. Users cannot change them once the system is deployed. Hence, it is hard for application developers to identify the optimal network topology and routing algorithm for their applications with distinct communication patterns. In this study, we design a CCG virtual system (CCGVS), which first uses container-based virtualization to allow users to create a farm of lightweight virtual machines on a single host. Then, it uses software-defined networking (SDN) technique to control the network traffic among these virtual machines. Users can change the network topology and control the network traffic programmingly, thereby enabling application developers to evaluate their applications on the same system with different network topologies and routing algorithms. The preliminary experimental results through both synthetic big data programs and NPB benchmarks have shown that CCGVS can represent application performance variations caused by network topology and routing algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent trend for journals to require open access to primary data included in publications has been embraced by many biologists, but has caused apprehension amongst researchers engaged in long-term ecological and evolutionary studies. A worldwide survey of 73 principal investigators (Pls) with long-term studies revealed positive attitudes towards sharing data with the agreement or involvement of the PI, and 93% of PIs have historically shared data. Only 8% were in favor of uncontrolled, open access to primary data while 63% expressed serious concern. We present here their viewpoint on an issue that can have non-trivial scientific consequences. We discuss potential costs of public data archiving and provide possible solutions to meet the needs of journals and researchers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Improved sequencing technologies offer unprecedented opportunities for investigating the role of rare genetic variation in common disease. However, there are considerable challenges with respect to study design, data analysis and replication. Using pooled next-generation sequencing of 507 genes implicated in the repair of DNA in 1,150 samples, an analytical strategy focused on protein-truncating variants (PTVs) and a large-scale sequencing case-control replication experiment in 13,642 individuals, here we show that rare PTVs in the p53-inducible protein phosphatase PPM1D are associated with predisposition to breast cancer and ovarian cancer. PPM1D PTV mutations were present in 25 out of 7,781 cases versus 1 out of 5,861 controls (P = 1.12 × 10-5), including 18 mutations in 6,912 individuals with breast cancer (P = 2.42 × 10-4) and 12 mutations in 1,121 individuals with ovarian cancer (P = 3.10 × 10-9). Notably, all of the identified PPM1D PTVs were mosaic in lymphocyte DNA and clustered within a 370-base-pair region in the final exon of the gene, carboxy-terminal to the phosphatase catalytic domain. Functional studies demonstrate that the mutations result in enhanced suppression of p53 in response to ionizing radiation exposure, suggesting that the mutant alleles encode hyperactive PPM1D isoforms. Thus, although the mutations cause premature protein truncation, they do not result in the simple loss-of-function effect typically associated with this class of variant, but instead probably have a gain-of-function effect. Our results have implications for the detection and management of breast and ovarian cancer risk. More generally, these data provide new insights into the role of rare and of mosaic genetic variants in common conditions, and the use of sequencing in their identification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction Lifestyle interventions might be useful in the management of adverse effects of androgen deprivation therapy (ADT) in men with prostate cancer. Objectives To examine the effects of dietary and exercise interventions on quality of life (QoL), metabolic risk factors and androgen deficiency symptoms in men with prostate cancer undergoing ADT. Methods CINAHL, Cochrane library, Medline and PsychINFO were searched to identify randomised controlled trials published from January, 2004 to October, 2014. Data extraction and methodological quality assessment was independently conducted by two reviewers. Meta-analysis was conducted using RevMan® 5.3.5. Results Of 2183 articles retrieved, 11 studies met the inclusion criteria and had low risk of bias.Nine studies evaluated exercise (resistance and/or aerobic and/or counselling) and three evaluated dietary supplementation. Median sample size =79 (33–121) and median intervention duration was 12 weeks (12–24). Exercise improved QoL measures (SMD 0.26, 95%CI −0.01 to 0.53) but not body composition, metabolic risk or vasomotor symptoms. Qualitative analysis indicated soy (or isoflavone) supplementation did not improve vasomotor symptoms; however, may improve QoL. Conclusions Few studies have evaluated the efficacy of lifestyle interventions in the management of adverse effects of ADT. We found inconclusive results for exercise in improving QoL and negative results for other outcomes. For soy-based products, we found negative results for modifying vasomotor symptoms and inconclusive results for improving QoL. Future work should investigate the best mode of exercise for improving QoL and other interventions such as dietary counselling should be investigated for their potential to modify these outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

New media technologies and the narrative turn in qualitative research has expanded the methods through which we gather data about and share findings of groups who have traditionally been written about by others rather than telling their own stories to reveal the complexities of their experiences. This chapter explores two projects that use storytelling and technology in an effort to change public perceptions about disadvantaged a community or cohort that have specific circumstances but are a result of policies beyond their control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This poster presents key features of how QUT’s integrated research data storage and management services work with researchers through their own individual or team research life cycle. By understanding the characteristics of research data, and the long-term need to store this data, QUT has provided resources and tools that support QUT’s goal of being a research intensive institute. Key to successful delivery and operation has been the focus upon researchers’ individual needs and the collaboration between providers, in particular, Information Technology Services, High Performance Computing and Research Support, and QUT Library. QUT’s Research Data Storage service provides all QUT researchers (staff and Higher Degree Research students (HDRs)) with a secure data repository throughout the research data lifecycle. Three distinct storage areas provide for raw research data to be acquired, project data to be worked on, and published data to be archived. Since the service was launched in late 2014, it has provided research project teams from all QUT faculties with acquisition, working or archival data space. Feedback indicates that the storage suits the unique needs of researchers and their data. As part of the workflow to establish storage space for researchers, Research Support Specialists and Research Data Librarians consult with researchers and HDRs to identify data storage requirements for projects and individual researchers, and to select and implement the most suitable data storage services and facilities. While research can be a journey into the unknown[1], a plan can help navigate through the uncertainty. Intertwined in the storage provision is QUT’s Research Data Management Planning tool. Launched in March 2015, it has already attracted 273 QUT staff and 352 HDR student registrations, and over 620 plans have been created (2/10/2015). Developed in collaboration with Office of Research Ethics and Integrity (OREI), uptake of the plan has exceeded expectations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data generated via user activity on social media platforms is routinely used for research across a wide range of social sciences and humanities disciplines. The availability of data through the Twitter APIs in particular has afforded new modes of research, including in media and communication studies; however, there are practical and political issues with gaining access to such data, and with the consequences of how that access is controlled. In their paper ‘Easy Data, Hard Data’, Burgess and Bruns (2015) discuss both the practical and political aspects of Twitter data as they relate to academic research, describing how communication research has been enabled, shaped and constrained by Twitter’s “regimes of access” to data, the politics of data use, and emerging economies of data exchange. This conceptual model, including the ‘easy data, hard data’ formulation, can also be applied to Sina Weibo. In this paper, we build on this model to explore the practical and political challenges and opportunities associated with the ‘regimes of access’ to Weibo data, and their consequences for digital media and communication studies. We argue that in the Chinese context, the politics of data access can be even more complicated than in the case of Twitter, which makes scientific research relying on large social data from this platform more challenging in some ways, but potentially richer and more rewarding in others.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider rank regression for clustered data analysis and investigate the induced smoothing method for obtaining the asymptotic covariance matrices of the parameter estimators. We prove that the induced estimating functions are asymptotically unbiased and the resulting estimators are strongly consistent and asymptotically normal. The induced smoothing approach provides an effective way for obtaining asymptotic covariance matrices for between- and within-cluster estimators and for a combined estimator to take account of within-cluster correlations. We also carry out extensive simulation studies to assess the performance of different estimators. The proposed methodology is substantially Much faster in computation and more stable in numerical results than the existing methods. We apply the proposed methodology to a dataset from a randomized clinical trial.