953 resultados para Sensor Networks and Data Streaming


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many applications in several domains such as telecommunications, network security, large scale sensor networks, require online processing of continuous data lows. They produce very high loads that requires aggregating the processing capacity of many nodes. Current Stream Processing Engines do not scale with the input load due to single-node bottlenecks. Additionally, they are based on static con?gurations that lead to either under or over-provisioning. In this paper, we present StreamCloud, a scalable and elastic stream processing engine for processing large data stream volumes. StreamCloud uses a novel parallelization technique that splits queries into subqueries that are allocated to independent sets of nodes in a way that minimizes the distribution overhead. Its elastic protocols exhibit low intrusiveness, enabling effective adjustment of resources to the incoming load. Elasticity is combined with dynamic load balancing to minimize the computational resources used. The paper presents the system design, implementation and a thorough evaluation of the scalability and elasticity of the fully implemented system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data aggregation in wireless sensor networks is employed to reduce the communication overhead and prolong the network lifetime. However, an adversary may compromise some sensor nodes, and use them to forge false values as the aggregation result. Previous secure data aggregation schemes have tackled this problem from different angles. The goal of those algorithms is to ensure that the Base Station (BS) does not accept any forged aggregation results. But none of them have tried to detect the nodes that inject into the network bogus aggregation results. Moreover, most of them usually have a communication overhead that is (at best) logarithmic per node. In this paper, we propose a secure and energy-efficient data aggregation scheme that can detect the malicious nodes with a constant per node communication overhead. In our solution, all aggregation results are signed with the private keys of the aggregators so that they cannot be altered by others. Nodes on each link additionally use their pairwise shared key for secure communications. Each node receives the aggregation results from its parent (sent by the parent of its parent) and its siblings (via its parent node), and verifies the aggregation result of the parent node. Theoretical analysis on energy consumption and communication overhead accords with our comparison based simulation study over random data aggregation trees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite significant advancements in wireless sensor networks (WSNs), energy conservation remains one of the most important research challenges. Recently, the problem of energy conservation has been addressed by applying mobile sink as an effective technique that can enhance efficiency of energy consumption in the networks. In this paper, the energy conservation problem is firstly formulated to maximize the lifetime of WSN subject to delay and node energy constraints. Then, to solve the defined energy conservation problem, a data collection scheduling with a mobile sink scheme is proposed. In the proposed approach, the sink movement is governed by a type-2 fuzzy controller to be located at the best location and time to collect sensory data. We conducted extensive experiments to study the effectiveness of the proposed protocol and compared it against the streaming data delivery (SDD) and virtual circle combined straight routing (VCCS) protocols. We observed that the proposed protocol outperforms both SDD and VCCS approaches by reducing energy consumption, minimize delays and enhance data collection quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two new incremental models for online anomaly detection in data streams at nodes in wireless sensor networks are discussed. These models are incremental versions of a model that uses ellipsoids to detect first, second, and higher-ordered anomalies in arrears. The incremental versions can also be used this way but have additional capabilities offered by processing data incrementally as they arrive in time. Specifically, they can detect anomalies 'on-the-fly' in near real time. They can also be used to track temporal changes in near real-time because of sensor drift, cyclic variation, or seasonal changes. One of the new models has a mechanism that enables graceful degradation of inputs in the distant past (fading memory). Three real datasets from single sensors in deployed environmental monitoring networks are used to illustrate various facets of the new models. Examples compare the incremental version with the previous batch and dynamic models and show that the incremental versions can detect various types of dynamic anomalies in near real time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Process Control Systems (PCSs) or Supervisory Control and Data Acquisition (SCADA) systems have recently been added to the already wide collection of wireless sensor networks applications. The PCS/SCADA environment is somewhat more amenable to the use of heavy cryptographic mechanisms such as public key cryptography than other sensor application environments. The sensor nodes in the environment, however, are still open to devastating attacks such as node capture, which makes designing a secure key management challenging. In this paper, a key management scheme is proposed to defeat node capture attack by offering both forward and backward secrecies. Our scheme overcomes the pitfalls which Nilsson et al.'s scheme suffers from, and is not more expensive than their scheme.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present algorithms, systems, and experimental results for underwater data muling. In data muling a mobile agent interacts with static agents to upload, download, or transport data to a different physical location. We consider a system comprising an Autonomous Underwater Vehicle (AUV) and many static Underwater Sensor Nodes (USN) networked together optically and acoustically. The AUV can locate the static nodes using vision and hover above the static nodes for data upload. We describe the hardware and software architecture of this underwater system, as well as experimental data. © 2006 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Remote monitoring of animal behaviour in the environment can assist in managing both the animal and its environmental impact. GPS collars which record animal locations with high temporal frequency allow researchers to monitor both animal behaviour and interactions with the environment. These ground-based sensors can be combined with remotely-sensed satellite images to understand animal-landscape interactions. The key to combining these technologies is communication methods such as wireless sensor networks (WSNs). We explore this concept using a case-study from an extensive cattle enterprise in northern Australia and demonstrate the potential for combining GPS collars and satellite images in a WSN to monitor behavioural preferences and social behaviour of cattle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past few years, numerous data collection protocols have been developed for wireless sensor networks (WSNs). However, there has been no comparison of their relative performance in realistic environments. Here we report the results of an empirical study using a Fleck3 sensor network testbed for four different data collection protocols: One phase pull Directed Diffusion (DD), Expected Number of Transmissions (ETX), ETX with explicit acknowledgment (ETX-eAck), and ETX with implicit acknowledgment (ETX-iAck). Our empirical study provides useful insights for future sensor network deployments. When the required application end-to-end reliability is not strict (e.g., 70%) and link quality is good, DD and ETX are the best options because of their simplicity and low routing overhead. Both ETX-eAck and ETX-iAck achieve more than 90% end-to-end reliability when the link quality is reasonable (less than 25% packet loss). When the link quality is good, ETX-iAck introduces significantly less routing overhead (up to 50%) than ETX-eAck. However, if the radio transceiver supports variable packet length, ETX-eAck can outperform ETX-iAck when the link quality is poor. The important message from this paper is that choice of data collection protocol should come after the operating environment is understood. This understanding must include the characteristics of the radio transceiver, and link loss statistics from a long-term (across seasons and weather variation) radio survey of the site.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Robotics in mines, aerospace, underwater, everyday unstructured environments and sensor networks with communicating devices that collect data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Harmful Algal Blooms (HABs) have become an important environmental concern along the western coast of the United States. Toxic and noxious blooms adversely impact the economies of coastal communities in the region, pose risks to human health, and cause mortality events that have resulted in the deaths of thousands of fish, marine mammals and seabirds. One goal of field-based research efforts on this topic is the development of predictive models of HABs that would enable rapid response, mitigation and ultimately prevention of these events. In turn, these objectives are predicated on understanding the environmental conditions that stimulate these transient phenomena. An embedded sensor network (Fig. 1), under development in the San Pedro Shelf region off the Southern California coast, is providing tools for acquiring chemical, physical and biological data at high temporal and spatial resolution to help document the emergence and persistence of HAB events, supporting the design and testing of predictive models, and providing contextual information for experimental studies designed to reveal the environmental conditions promoting HABs. The sensor platforms contained within this network include pier-based sensor arrays, ocean moorings, HF radar stations, along with mobile sensor nodes in the form of surface and subsurface autonomous vehicles. FreewaveTM radio modems facilitate network communication and form a minimally-intrusive, wireless communication infrastructure throughout the Southern California coastal region, allowing rapid and cost-effective data transfer. An emerging focus of this project is the incorporation of a predictive ocean model that assimilates near-real time, in situ data from deployed Autonomous Underwater Vehicles (AUVs). The model then assimilates the data to increase the skill of both nowcasts and forecasts, thus providing insight into bloom initiation as well as the movement of blooms or other oceanic features of interest (e.g., thermoclines, fronts, river discharge, etc.). From these predictions, deployed mobile sensors can be tasked to track a designated feature. This focus has led to the creation of a technology chain in which algorithms are being implemented for the innovative trajectory design for AUVs. Such intelligent mission planning is required to maneuver a vehicle to precise depths and locations that are the sites of active blooms, or physical/chemical features that might be sources of bloom initiation or persistence. The embedded network yields high-resolution, temporal and spatial measurements of pertinent environmental parameters and resulting biology (see Fig. 1). Supplementing this with ocean current information and remotely sensed imagery and meteorological data, we obtain a comprehensive foundation for developing a fundamental understanding of HAB events. This then directs labor- intensive and costly sampling efforts and analyses. Additionally, we provide coastal municipalities, managers and state agencies with detailed information to aid their efforts in providing responsible environmental stewardship of their coastal waters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Wireless Sensor Network (WSN) is a set of sensors that are integrated with a physical environment. These sensors are small in size, and capable of sensing physical phenomena and processing them. They communicate in a multihop manner, due to a short radio range, to form an Ad Hoc network capable of reporting network activities to a data collection sink. Recent advances in WSNs have led to several new promising applications, including habitat monitoring, military target tracking, natural disaster relief, and health monitoring. The current version of sensor node, such as MICA2, uses a 16 bit, 8 MHz Texas Instruments MSP430 micro-controller with only 10 KB RAM, 128 KB program space, 512 KB external ash memory to store measurement data, and is powered by two AA batteries. Due to these unique specifications and a lack of tamper-resistant hardware, devising security protocols for WSNs is complex. Previous studies show that data transmission consumes much more energy than computation. Data aggregation can greatly help to reduce this consumption by eliminating redundant data. However, aggregators are under the threat of various types of attacks. Among them, node compromise is usually considered as one of the most challenging for the security of WSNs. In a node compromise attack, an adversary physically tampers with a node in order to extract the cryptographic secrets. This attack can be very harmful depending on the security architecture of the network. For example, when an aggregator node is compromised, it is easy for the adversary to change the aggregation result and inject false data into the WSN. The contributions of this thesis to the area of secure data aggregation are manifold. We firstly define the security for data aggregation in WSNs. In contrast with existing secure data aggregation definitions, the proposed definition covers the unique characteristics that WSNs have. Secondly, we analyze the relationship between security services and adversarial models considered in existing secure data aggregation in order to provide a general framework of required security services. Thirdly, we analyze existing cryptographic-based and reputationbased secure data aggregation schemes. This analysis covers security services provided by these schemes and their robustness against attacks. Fourthly, we propose a robust reputationbased secure data aggregation scheme for WSNs. This scheme minimizes the use of heavy cryptographic mechanisms. The security advantages provided by this scheme are realized by integrating aggregation functionalities with: (i) a reputation system, (ii) an estimation theory, and (iii) a change detection mechanism. We have shown that this addition helps defend against most of the security attacks discussed in this thesis, including the On-Off attack. Finally, we propose a secure key management scheme in order to distribute essential pairwise and group keys among the sensor nodes. The design idea of the proposed scheme is the combination between Lamport's reverse hash chain as well as the usual hash chain to provide both past and future key secrecy. The proposal avoids the delivery of the whole value of a new group key for group key update; instead only the half of the value is transmitted from the network manager to the sensor nodes. This way, the compromise of a pairwise key alone does not lead to the compromise of the group key. The new pairwise key in our scheme is determined by Diffie-Hellman based key agreement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For industrial wireless sensor networks, maintaining the routing path for a high packet delivery ratio is one of the key objectives in network operations. It is important to both provide the high data delivery rate at the sink node and guarantee a timely delivery of the data packet at the sink node. Most proactive routing protocols for sensor networks are based on simple periodic updates to distribute the routing information. A faulty link causes packet loss and retransmission at the source until periodic route update packets are issued and the link has been identified as broken. We propose a new proactive route maintenance process where periodic update is backed-up with a secondary layer of local updates repeating with shorter periods for timely discovery of broken links. Proposed route maintenance scheme improves reliability of the network by decreasing the packet loss due to delayed identification of broken links. We show by simulation that proposed mechanism behaves better than the existing popular routing protocols (AODV, AOMDV and DSDV) in terms of end-to-end delay, routing overhead, packet reception ratio.