64 resultados para Data anonymization and sanitization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a hybrid intelligent system that integrates the SOM (Self-Organizing Map) neural network, kMER (kernel-based Maximum Entropy learning Rule), and Probabilistic Neural Network (PNN) for data visualization and classification is proposed. The rationales of this Probabilistic SOM-kMER model are explained, and its applicability is demonstrated using two benchmark data sets. The results are analyzed and compared with those from a number of existing methods. Implication of the proposed hybrid system as a useful and usable data visualization and classification tool is discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data and information quality is a well-established research topic and gradually appears on the decision-makers' top concern lists. Many studies have been conducted on how to investigate the generic data/information quality issues and factors by providing a high-level abstract framework or model. Based on these previous studies, this study tries to discuss the actual data quality issues with the operation-level and middle-level managers emerged during the emergency department data collection and reporting processes. By conduct data quality issues and business processes mapping, possible data quality issues are summarised under the well-known TOP model and the recommendations of data quality improvement are suggested.)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The steady increase of regulations and its acceleration due to the financial crisis heavily affect the management of regulatory compliance. Regulations, such as Basel III and Solvency II particularly impact data warehouses and lead to many organizational and technical changes. From an IS perspective modeling techniques for data warehouse requirement elicitation help to manage conceptual requirements. From a legal perspective attempts to visualize regulatory requirements – so called legal visualization approaches – have been developed. This paper investigates whether a conceptual modeling technique for regulatory-driven data warehouse requirements is applicable for representing data warehouse requirements in a legal environment. Applying the modeling technique H2 for Reporting in three extensive modeling projects provides three contributions. First, evidence for the applicability of a modeling technique for regulatory-driven data warehouse requirements is given. Second, lessons learned for further modeling projects are provided. Third, a discussion towards a combined perspective of information modeling and legal visualization is presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present paper aims to review current evidence for the effectiveness and/or feasibility of using inter-agency data sharing of ED recorded assault information to direct interventions reducing alcohol-related or nightlife assaults, injury or violence. Potential data-sharing partners involve police, local council, liquor licensing regulators and venue management. A systematic review of the peer-reviewed literature was conducted. The initial search discovered 19,506 articles. After removal of duplicates and articles not meeting review criteria, n = 8 articles were included in quantitative and narrative synthesis. Seven of eight studies were conducted in UK EDs, with the remaining study presenting Australian data. All studies included in the review deemed data sharing a worthwhile pursuit. All studies attempting to measure intervention effectiveness reported substantial reductions of assaults and ED attendances post-intervention, with one reporting no change. Negative logistic feasibility concerns were minimal, with general consensus among authors being that data-sharing protocols and partnerships could be easily implemented into modern ED triage systems, with minimal cost, staff workload burden, impact to patient safety, service and anonymity, or risk of harm displacement to other licensed venues, or increase to length of patient stay. However, one study reported a potential harm displacement effect to streets surrounding intervention venues. In future, data-sharing systems should triangulate ED, police and ambulance data sources, and assess intervention effectiveness using randomised controlled trials that account for variations in venue capacity, fluctuations in ED attendance and population levels, seasonal variations in assault and injury, and control for concurrent interventions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical time series methods have proven to be a promising technique in structural health monitoring, since it provides a direct form of data analysis and eliminates the requirement for domain transformation. Latest research in structural health monitoring presents a number of statistical models that have been successfully used to construct quantified models of vibration response signals. Although a majority of these studies present viable results, the aspects of practical implementation, statistical model construction and decision-making procedures are often vaguely defined or omitted from presented work. In this article, a comprehensive methodology is developed, which essentially utilizes an auto-regressive moving average with exogenous input model to create quantified model estimates of experimentally acquired response signals. An iterative self-fitting algorithm is proposed to construct and fit the auto-regressive moving average with exogenous input model, which is capable of integrally finding an optimum set of auto-regressive moving average with exogenous input model parameters. After creating a dataset of quantified response signals, an unlabelled response signal can be identified according to a 'closest-fit' available in the dataset. A unique averaging method is proposed and implemented for multi-sensor data fusion to decrease the margin of error with sensors, thus increasing the reliability of global damage identification. To demonstrate the effectiveness of the developed methodology, a steel frame structure subjected to various bolt-connection damage scenarios is tested. Damage identification results from the experimental study suggest that the proposed methodology can be employed as an efficient and functional damage identification tool. © The Author(s) 2014.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Energy consumption data are required to perform analysis, modelling, evaluation, and optimisation of energy usage in buildings. While a variety of energy consumption data sets have been examined and reported in the literature, there is a lack of a comprehensive categorisation and analysis of the available data sets. In this study, an overview of energy consumption data of buildings is provided. Three common strategies for generating energy consumption data, i.e., measurement, survey, and simulation, are described. A number of important characteristics pertaining to each strategy and the resulting data sets are discussed. In addition, a directory of energy consumption data sets of buildings is developed. The data sets are collected from either published papers or energy related organisations. The main contributions of this study include establishing a resource pertaining to energy consumption data sets and providing information related to the characteristics and availability of the respective data sets; therefore facilitating and promoting research activities in energy consumption data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces an automated medical data classification method using wavelet transformation (WT) and interval type-2 fuzzy logic system (IT2FLS). Wavelet coefficients, which serve as inputs to the IT2FLS, are a compact form of original data but they exhibits highly discriminative features. The integration between WT and IT2FLS aims to cope with both high-dimensional data challenge and uncertainty. IT2FLS utilizes a hybrid learning process comprising unsupervised structure learning by the fuzzy c-means (FCM) clustering and supervised parameter tuning by genetic algorithm. This learning process is computationally expensive, especially when employed with high-dimensional data. The application of WT therefore reduces computational burden and enhances performance of IT2FLS. Experiments are implemented with two frequently used medical datasets from the UCI Repository for machine learning: the Wisconsin breast cancer and Cleveland heart disease. A number of important metrics are computed to measure the performance of the classification. They consist of accuracy, sensitivity, specificity and area under the receiver operating characteristic curve. Results demonstrate a significant dominance of the wavelet-IT2FLS approach compared to other machine learning methods including probabilistic neural network, support vector machine, fuzzy ARTMAP, and adaptive neuro-fuzzy inference system. The proposed approach is thus useful as a decision support system for clinicians and practitioners in the medical practice. copy; 2015 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Open-data has created an unprecedented opportunity with new challenges for ecosystem scientists. Skills in data management are essential to acquire, manage, publish, access and re-use data. These skills span many disciplines and require trans-disciplinary collaboration. Science synthesis centres support analysis and synthesis through collaborative 'Working Groups' where domain specialists work together to synthesise existing information to provide insight into critical problems. The Australian Centre for Ecological Analysis and Synthesis (ACEAS) served a wide range of stakeholders, from scientists to policy-makers to managers. This paper investigates the level of sophistication in data management in the ecosystem science community through the lens of the ACEAS experience, and identifies the important factors required to enable us to benefit from this new data-world and produce innovative science. ACEAS promoted the analysis and synthesis of data to solve transdisciplinary questions, and promoted the publication of the synthesised data. To do so, it provided support in many of the key skillsets required. Analysis and synthesis in multi-disciplinary and multi-organisational teams, and publishing data were new for most. Data were difficult to discover and access, and to make ready for analysis, largely due to lack of metadata. Data use and publication were hampered by concerns about data ownership and a desire for data citation. A web portal was created to visualise geospatial datasets to maximise data interpretation. By the end of the experience there was a significant increase in appreciation of the importance of a Data Management Plan. It is extremely doubtful that the work would have occurred or data delivered without the support of the Synthesis centre, as few of the participants had the necessary networks or skills. It is argued that participation in the Centre provided an important learning opportunity, and has resulted in improved knowledge and understanding of good data management practices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data sharing has never been easier with the advances of cloud computing, and an accurate analysis on the shared data provides an array of benefits to both the society and individuals. Data sharing with a large number of participants must take into account several issues, including efficiency, data integrity and privacy of data owner. Ring signature is a promising candidate to construct an anonymous and authentic data sharing system. It allows a data owner to anonymously authenticate his data which can be put into the cloud for storage or analysis purpose. Yet the costly certificate verification in the traditional public key infrastructure (PKI) setting becomes a bottleneck for this solution to be scalable. Identity-based (ID-based) ring signature, which eliminates the process of certificate verification, can be used instead. In this paper, we further enhance the security of ID-based ring signature by providing forward security: If a secret key of any user has been compromised, all previous generated signatures that include this user still remain valid. This property is especially important to any large scale data sharing system, as it is impossible to ask all data owners to re-authenticate their data even if a secret key of one single user has been compromised. We provide a concrete and efficient instantiation of our scheme, prove its security and provide an implementation to show its practicality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

 Privacy is receiving growing concern from various parties especially consumers due to the simplification of the collection and distribution of personal data. This research focuses on preserving privacy in social network data publishing. The study explores the data anonymization mechanism in order to improve privacy protection of social network users. We identified new type of privacy breach and has proposed an effective mechanism for privacy protection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hybrid cloud is a widely used cloud architecture in large companies that can outsource data to the publiccloud, while still supporting various clients like mobile devices. However, such public cloud data outsourcing raises serious security concerns, such as how to preserve data confidentiality and how to regulate access policies to the data stored in public cloud. To address this issue, we design a hybrid cloud architecture that supports data sharing securely and efficiently, even with resource-limited devices, where private cloud serves as a gateway between the public cloud and the data user. Under such architecture, we propose an improved construction of attribute-based encryption that has the capability of delegating encryption/decryption computation, which achieves flexible access control in the cloud and privacy-preserving in datautilization even with mobile devices. Extensive experiments show the scheme can further decrease the computational cost and space overhead at the user side, which is quite efficient for the user with limited mobile devices. In the process of delegating most of the encryption/decryption computation to private cloud, the user can not disclose any information to the private cloud. We also consider the communication securitythat once frequent attribute revocation happens, our scheme is able to resist some attacks between private cloud and data user by employing anonymous key agreement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is written through the vision on integrating Internet-of-Things (IoT) with the power of Cloud Computing and the intelligence of Big Data analytics. But integration of all these three cutting edge technologies is complex to understand. In this research we first provide a security centric view of three layered approach for understanding the technology, gaps and security issues. Then with a series of lab experiments on different hardware, we have collected performance data from all these three layers, combined these data together and finally applied modern machine learning algorithms to distinguish 18 different activities and cyber-attacks. From our experiments we find classification algorithm RandomForest can identify 93.9% attacks and activities in this complex environment. From the existing literature, no one has ever attempted similar experiment for cyber-attack detection for IoT neither with performance data nor with a three layered approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Securities and Exchange Commission (SEC) in the United States mandated a new digital reporting system for US companies in late 2008. The new generation of information provision has been dubbed by Chairman Cox, ‘interactive data’ (SEC, 2006a). Despite the promise of its name, we find that in the development of the project retail investors are invoked as calculative actors rather than engaged in dialogue. Similarly, the potential for the underlying technology to be applied in ways to encourage new forms of accountability appears to be forfeited in the interests of enrolling company filers.We theorise the activities of the SEC and in particular its chairman at the time, Christopher Cox, over a three year period, both prior to and following the ‘credit crisis’. We argue that individuals and institutions play a central role in advancing the socio-technical project that is constituted by interactive data. We adopt insights from ANT (Callon, 1986, Latour, 1987 and Latour, 2005b) and governmentality (Miller, 2008 and Miller and Rose, 2008) to show how regulators and the proponents of the technology have acted as spokespersons for the interactive data technology and the retail investor. We examine the way in which calculative accountability has been privileged in the SEC's construction of the retail investor as concerned with atomised, quantitative data (Kamuf, 2007, Roberts, 2009 and Tsoukas, 1997). We find that the possibilities for the democratising effects of digital information on the Internet has not been realised in the interactive data project and that it contains risks for the very investors the SEC claims to seek to protect.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommendation systems adopt various techniques to recommend ranked lists of items to help users in identifying items that fit their personal tastes best. Among various recommendation algorithms, user and item-based collaborative filtering methods have been very successful in both industry and academia. More recently, the rapid growth of the Internet and E-commerce applications results in great challenges for recommendation systems as the number of users and the amount of available online information have been growing too fast. These challenges include performing high quality recommendations per second for millions of users and items, achieving high coverage under the circumstance of data sparsity and increasing the scalability of recommendation systems. To obtain higher quality recommendations under the circumstance of data sparsity, in this paper, we propose a novel method to compute the similarity of different users based on the side information which is beyond user-item rating information from various online recommendation and review sites. Furthermore, we take the special interests of users into consideration and combine three types of information (users, items, user-items) to predict the ratings of items. Then FUIR, a novel recommendation algorithm which fuses user and item information, is proposed to generate recommendation results for target users. We evaluate our proposed FUIR algorithm on three data sets and the experimental results demonstrate that our FUIR algorithm is effective against sparse rating data and can produce higher quality recommendations.