64 resultados para Data anonymization and sanitization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the advance of the Internet of Things (IoT), more M2M sensors and devices are connected to the Internet. These sensors and devices generate sensor-based big data and bring new business opportunities and demands for creating and developing sensor-oriented big data infrastructures, platforms and analytics service applications. Big data sensing is becoming a new concept and next technology trend based on a connected sensor world because of IoT. It brings a strong impact on many sensor-oriented applications, including smart city, disaster control and monitor, healthcare services, and environment protection and climate change study. This paper is written as a tutorial paper by providing the informative concepts and taxonomy on big data sensing and services. The paper not only discusses the motivation, research scope, and features of big data sensing and services, but also exams the required services in big data sensing based on the state-of-the-art research work. Moreover, the paper discusses big data sensing challenges, issues, and needs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Radio Frequency Identification (RFID) is an emerging wireless object identification technology with many potential applications such as supply chain management, personnel tracking and healthcare. However, security vulnerabilities of the RFID system have been a serious concern for its wide adoption in many applications. Although much work has been done to provide privacy and anonymity, little focus has been given to ensure RFID data confidentiality, integrity and to address the tampered data recovery problem. To this end, we propose a lightweight stenographic-based approach to ensure RFID data confidentiality and integrity as well as the recovery of tampered RFID data. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While SQL injection attacks have been plaguing web application systems for years, the possibility of them affecting RFID systems was only identified very recently. However, very little work exists to mitigate this serious security threat to RFID-enabled enterprise systems. In this paper, we propose a policy-based SQLIA detection and prevention method for RFID systems. The proposed technique creates data validation and sanitization policies during content analysis and enforces those policies during runtime monitoring. We tested all possible types of dynamic queries that may be generated in RFID systems with all possible types of attacks that can be mounted on those systems. We present an analysis and evaluation of the proposed approach to demonstrate the effectiveness of the proposed approach in mitigating SQLIA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data analytics for traffic accidents is a hot topic and has significant values for a smart and safe traffic in the city. Based on the massive traffic accident data from October 2014 to March 2015 in Xiamen, China, we propose a novel accident occurrences analytics method in both spatial and temporal dimensions to predict when and where an accident with a specific crash type will occur consequentially by whom. Firstly, we analyze and visualize accident occurrences in both temporal and spatial view. Second, we illustrate spatio-temporal visualization results through two case studies in multiple road segments, and the impact of weather on crash types. These findings of accident occurrences analysis and visualization would not only help traffic police department implement instant personnel assignments among simultaneous accidents, but also inform individual drivers about accident-prone sections and the time span which requires their most attention.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A two-stage hybrid model for data classification and rule extraction is proposed. The first stage uses a Fuzzy ARTMAP (FAM) classifier with Q-learning (known as QFAM) for incremental learning of data samples, while the second stage uses a Genetic Algorithm (GA) for rule extraction from QFAM. Given a new data sample, the resulting hybrid model, known as QFAM-GA, is able to provide prediction pertaining to the target class of the data sample as well as to give a fuzzy if-then rule to explain the prediction. To reduce the network complexity, a pruning scheme using Q-values is applied to reduce the number of prototypes generated by QFAM. A 'don't care' technique is employed to minimize the number of input features using the GA. A number of benchmark problems are used to evaluate the effectiveness of QFAM-GA in terms of test accuracy, noise tolerance, model complexity (number of rules and total rule length). The results are comparable, if not better, than many other models reported in the literature. The main significance of this research is a usable and useful intelligent model (i.e., QFAM-GA) for data classification in noisy conditions with the capability of yielding a set of explanatory rules with minimum antecedents. In addition, QFAM-GA is able to maximize accuracy and minimize model complexity simultaneously. The empirical outcome positively demonstrate the potential impact of QFAM-GA in the practical environment, i.e., providing an accurate prediction with a concise justification pertaining to the prediction to the domain users, therefore allowing domain users to adopt QFAM-GA as a useful decision support tool in assisting their decision-making processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complexity analysis of a given time series is executed using various measures of irregularity, the most commonly used being Approximate entropy (ApEn), Sample entropy (SampEn) and Fuzzy entropy (FuzzyEn). However, the dependence of these measures on the critical parameter of tolerance `r' leads to precarious results, owing to random selections of r. Attempts to eliminate the use of r in entropy calculations introduced a new measure of entropy namely distribution entropy (DistEn) based on the empirical probability distribution function (ePDF). DistEn completely avoids the use of a variance dependent parameter like r and replaces it by a parameter M, which corresponds to the number of bins used in the histogram to calculate it. When tested for synthetic data, M has been observed to produce a minimal effect on DistEn as compared to the effect of r on other entropy measures. Also, DistEn is said to be relatively stable with data length (N) variations, as far as synthetic data is concerned. However, these claims have not been analyzed for physiological data. Our study evaluates the effect of data length N and bin number M on the performance of DistEn using both synthetic and physiologic time series data. Synthetic logistic data of `Periodic' and `Chaotic' levels of complexity and 40 RR interval time series belonging to two groups of healthy aging population (young and elderly) have been used for the analysis. The stability and consistency of DistEn as a complexity measure as well as a classifier have been studied. Experiments prove that the parameters N and M are more influential in deciding the efficacy of DistEn performance in the case of physiologic data than synthetic data. Therefore, a generalized random selection of M for a given data length N may not always be an appropriate combination to yield good performance of DistEn for physiologic data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study focuses on soft boot snowboard bindings by looking at how users interact with their binding and proposes a possible solution to overcome such issues. Snowboarding is a multibillion-dollar sport that has only reached mainstream in the last 30 years its levels of progression in technology have evolved in that time. However, snowboard bindings for the most part still consist of the same basic architecture in the last 20 years. This study was aimed at taking a user centric point of view and using additive manufacturing technologies to be able to generate a new snowboard binding that is completely adaptable to the user. The initial part of the study was a survey of 280 snowboarders focussing on preferences, style and habits. This survey was generated from over 15 nations with the vast majority of boarders on the snow for five to fifty days a year. Significant emphasis was placed on the relationship between boarder binding set-up and occurrence of pain and/or injury. From the detailed survey it was found that boarder's experienced pain in the front foot/toe area as a result from the toe strap being too tight. However boarders wanted tighter bindings to increase responsiveness. Survey data was compared to ankle and foot biomechanics to build a relationship to assess the problem of pain versus responsiveness. The design stage of the study was to develop a binding that overcame the over-tightening of the binding but still maintain equivalent or better responsiveness compared to traditional bindings. The resulting design integrated the snowboard boot much more into the design, by using the sole as a "semi-rigid" platform and locking it in laterally between the heel cup and the new toe strap arrangement. The new design developed using additive manufacturing techniques was tested via qualitative and quantitative measures in the snow and in the lab. It was found that using the new arrangement in a system resulted in no loss of performance or responsiveness to the user. Due to the design and manufacturing approach users have the ability to customise the design to their specific needs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online social networks make it easier for people to find and communicate with other people based on shared interests, values, membership in particular groups, etc. Common social networks such as Facebook and Twitter have hundreds of millions or even billions of users scattered all around the world sharing interconnected data. Users demand low latency access to not only their own data but also theirfriends’ data, often very large, e.g. videos, pictures etc. However, social network service providers have a limited monetary capital to store every piece of data everywhere to minimise users’ data access latency. Geo-distributed cloud services with virtually unlimited capabilities are suitable for large scale social networks data storage in different geographical locations. Key problems including how to optimally store and replicate these huge datasets and how to distribute the requests to different datacenters are addressed in this paper. A novel genetic algorithm-based approach is used to find a near-optimal number of replicas for every user’s data and a near-optimal placement of replicas to minimise monetary cost while satisfying latency requirements for all users. Experiments on a large Facebook dataset demonstrate our technique’s effectiveness in outperforming other representative placement and replication strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With emerging trends for Internet of Things (IoT) and Smart Cities, complex data transformation, aggregation and visualization problems are becoming increasingly common. These tasks support improved business intelligence, analytics and enduser access to data. However, in most cases developers of these tasks are presented with challenging problems including noisy data, diverse data formats, data modeling and increasing demand for sophisticated visualization support. This paper describes our experiences with just such problems in the context of Household Travel Surveys data integration and harmonization. We describe a common approach for addressing these harmonizations. We then discuss a set of lessons that we have learned from our experience that we hope will be useful for others embarking on similar problems. We also identify several key directions and needs for future research and practical support in this area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automating Software Engineering is the dream of software Engineers for decades. To make this dream to come to true, data mining can play an important role. Our recent research has shown that to increase the productivity and to reduce the cost of software development, it is essential to have an effective and efficient mechanism to store, manage and utilize existing software resources, and thus to automate software analysis, testing, evaluation and to make use of existing software for new problems. This paper firstly provides a brief overview of traditional data mining followed by a presentation on data mining in broader sense. Secondly, it presents the idea and the technology of software warehouse as an innovative approach in managing software resources using the idea of data warehouse where software assets are systematically accumulated, deposited, retrieved, packaged, managed and utilized driven by data mining and OLAP technologies. Thirdly, we presented the concepts and technology and their applications of data mining and data matrix including software warehouse to software engineering. The perspectives of the role of software warehouse and software mining in modern software development are addressed. We expect that the results will lead to a streamlined high efficient software development process and enhance the productivity in response to modern challenges of the design and development of software applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biophysical investigations of estuaries require a diversity of tasks to be undertaken by a number of disciplines leading to a range of data requirements and dataflow pathways. Technology changes relating to data collection and storage have lead to the need for metadata systems that describe the vast amounts of data now able to be stored electronically. Such a system is described as the first step in the creation of an efficient data management system for biophysical estuarine data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.