957 resultados para stream processing crowdsensing scheduling traffic analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In piattaforme di Stream Processing è spesso necessario eseguire elaborazioni differenziate degli stream di input. Questa tesi ha l'obiettivo di realizzare uno scheduler in grado di attribuire priorità di esecuzione differenti agli operatori deputati all'elaborazione degli stream.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The initial timing of face-specific effects in event-related potentials (ERPs) is a point of contention in face processing research. Although effects during the time of the N170 are robust in the literature, inconsistent effects during the time of the P100 challenge the interpretation of the N170 as being the initial face-specific ERP effect. The interpretation of the early P100 effects are often attributed to low-level differences between face stimuli and a host of other image categories. Research using sophisticated controls for low-level stimulus characteristics (Rousselet, Husk, Bennett, & Sekuler, 2008) report robust face effects starting at around 130 ms following stimulus onset. The present study examines the independent components (ICs) of the P100 and N170 complex in the context of a minimally controlled low-level stimulus set and a clear P100 effect for faces versus houses at the scalp. Results indicate that four ICs account for the ERPs to faces and houses in the first 200ms following stimulus onset. The IC that accounts for the majority of the scalp N170 (icNla) begins dissociating stimulus conditions at approximately 130 ms, closely replicating the scalp results of Rousselet et al. (2008). The scalp effects at the time of the P100 are accounted for by two constituent ICs (icP1a and icP1b). The IC that projects the greatest voltage at the scalp during the P100 (icP1a) shows a face-minus-house effect over the period of the P100 that is less robust than the N 170 effect of icN 1 a when measured as the average of single subject differential activation robustness. The second constituent process of the P100 (icP1b), although projecting a smaller voltage to the scalp than icP1a, shows a more robust effect for the face-minus-house contrast starting prior to 100 ms following stimulus onset. Further, the effect expressed by icP1 b takes the form of a larger negative projection to medial occipital sites for houses over faces partially canceling the larger projection of icP1a, thereby enhancing the face positivity at this time. These findings have three main implications for ERP research on face processing: First, the ICs that constitute the face-minus-house P100 effect are independent from the ICs that constitute the N170 effect. This suggests that the P100 effect and the N170 effect are anatomically independent. Second, the timing of the N170 effect can be recovered from scalp ERPs that have spatio-temporally overlapping effects possibly associated with low-level stimulus characteristics. This unmixing of the EEG signals may reduce the need for highly constrained stimulus sets, a characteristic that is not always desirable for a topic that is highly coupled to ecological validity. Third, by unmixing the constituent processes of the EEG signals new analysis strategies are made available. In particular the exploration of the relationship between cortical processes over the period of the P100 and N170 ERP complex (and beyond) may provide previously unaccessible answers to questions such as: Is the face effect a special relationship between low-level and high-level processes along the visual stream?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis discusses StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. Stream- Cloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, Stream- Cloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, Stream- Cloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud. Resumen En los útimos años, aplicaciones en dominios tales como telecomunicaciones, seguridad de redes y redes de sensores de gran escala se han encontrado con múltiples limitaciones en el paradigma tradicional de bases de datos. En este contexto, los sistemas de procesamiento de flujos de datos han emergido como solución a estas aplicaciones que demandan una alta capacidad de procesamiento con una baja latencia. En los sistemas de procesamiento de flujos de datos, los datos no se persisten y luego se procesan, en su lugar los datos son procesados al vuelo en memoria produciendo resultados de forma continua. Los actuales sistemas de procesamiento de flujos de datos, tanto los centralizados, como los distribuidos, no escalan respecto a la carga de entrada del sistema debido a un cuello de botella producido por la concentración de flujos de datos completos en nodos individuales. Por otra parte, éstos están basados en configuraciones estáticas lo que conducen a un sobre o bajo aprovisionamiento. Esta tesis doctoral presenta StreamCloud, un sistema elástico paralelo-distribuido para el procesamiento de flujos de datos que es capaz de procesar grandes volúmenes de datos. StreamCloud minimiza el coste de distribución y paralelización por medio de una técnica novedosa la cual particiona las queries en subqueries paralelas repartiéndolas en subconjuntos de nodos independientes. Ademas, Stream- Cloud posee protocolos de elasticidad y equilibrado de carga que permiten una optimización de los recursos dependiendo de la carga del sistema. Unidos a los protocolos de paralelización y elasticidad, StreamCloud define un protocolo de tolerancia a fallos que introduce un coste mínimo mientras que proporciona una rápida recuperación. StreamCloud ha sido implementado y evaluado mediante varias aplicaciones del mundo real tales como aplicaciones de detección de fraude o aplicaciones de análisis del tráfico de red. La evaluación ha sido realizada en un cluster con más de 300 núcleos, demostrando la alta escalabilidad y la efectividad tanto de la elasticidad, como de la tolerancia a fallos de StreamCloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Image processing offers unparalleled potential for traffic monitoring and control. For many years engineers have attempted to perfect the art of automatic data abstraction from sequences of video images. This paper outlines a research project undertaken at Napier University by the authors in the field of image processing for automatic traffic analysis. A software based system implementing TRIP algorithms to count cars and measure vehicle speed has been developed by members of the Transport Engineering Research Unit (TERU) at the University. The TRIP algorithm has been ported and evaluated on an IBM PC platform with a view to hardware implementation of the pre-processing routines required for vehicle detection. Results show that a software based traffic counting system is realisable for single window processing. Due to the high volume of data required to be processed for full frames or multiple lanes, system operations in real time are limited. Therefore specific hardware is required to be designed. The paper outlines a hardware design for implementation of inter-frame and background differencing, background updating and shadow removal techniques. Preliminary results showing the processing time and counting accuracy for the routines implemented in software are presented and a real time hardware pre-processing architecture is described.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SQL Injection Attack (SQLIA) remains a technique used by a computer network intruder to pilfer an organisation’s confidential data. This is done by an intruder re-crafting web form’s input and query strings used in web requests with malicious intent to compromise the security of an organisation’s confidential data stored at the back-end database. The database is the most valuable data source, and thus, intruders are unrelenting in constantly evolving new techniques to bypass the signature’s solutions currently provided in Web Application Firewalls (WAF) to mitigate SQLIA. There is therefore a need for an automated scalable methodology in the pre-processing of SQLIA features fit for a supervised learning model. However, obtaining a ready-made scalable dataset that is feature engineered with numerical attributes dataset items to train Artificial Neural Network (ANN) and Machine Leaning (ML) models is a known issue in applying artificial intelligence to effectively address ever evolving novel SQLIA signatures. This proposed approach applies numerical attributes encoding ontology to encode features (both legitimate web requests and SQLIA) to numerical data items as to extract scalable dataset for input to a supervised learning model in moving towards a ML SQLIA detection and prevention model. In numerical attributes encoding of features, the proposed model explores a hybrid of static and dynamic pattern matching by implementing a Non-Deterministic Finite Automaton (NFA). This combined with proxy and SQL parser Application Programming Interface (API) to intercept and parse web requests in transition to the back-end database. In developing a solution to address SQLIA, this model allows processed web requests at the proxy deemed to contain injected query string to be excluded from reaching the target back-end database. This paper is intended for evaluating the performance metrics of a dataset obtained by numerical encoding of features ontology in Microsoft Azure Machine Learning (MAML) studio using Two-Class Support Vector Machines (TCSVM) binary classifier. This methodology then forms the subject of the empirical evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A number of problems in network operations and engineering call for new methods of traffic analysis. While most existing traffic analysis methods are fundamentally temporal, there is a clear need for the analysis of traffic across multiple network links — that is, for spatial traffic analysis. In this paper we give examples of problems that can be addressed via spatial traffic analysis. We then propose a formal approach to spatial traffic analysis based on the wavelet transform. Our approach (graph wavelets) generalizes the traditional wavelet transform so that it can be applied to data elements connected via an arbitrary graph topology. We explore the necessary and desirable properties of this approach and consider some of its possible realizations. We then apply graph wavelets to measurements from an operating network. Our results show that graph wavelets are very useful for our motivating problems; for example, they can be used to form highly summarized views of an entire network's traffic load, to gain insight into a network's global traffic response to a link failure, and to localize the extent of a failure event within the network.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increasing design complexity associated with modern Field Programmable Gate Array (FPGA) has prompted the emergence of 'soft'-programmable processors which attempt to replace at least part of the custom circuit design problem with a problem of programming parallel processors. Despite substantial advances in this technology, its performance and resource efficiency for computationally complex operations remains in doubt. In this paper we present the first recorded implementation of a softcore Fast-Fourier Transform (FFT) on Xilinx Virtex FPGA technology. By employing a streaming processing architecture, we show how it is possible to achieve architectures which offer 1.1 GSamples/s throughput and up to 19 times speed-up against the Xilinx Radix-2 FFT dedicated circuit with comparable cost.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Among many other knowledge representations formalisms, Ontologies and Formal Concept Analysis (FCA) aim at modeling ‘concepts’. We discuss how these two formalisms may complement another from an application point of view. In particular, we will see how FCA can be used to support Ontology Engineering, and how ontologies can be exploited in FCA applications. The interplay of FCA and ontologies is studied along the life cycle of an ontology: (i) FCA can support the building of the ontology as a learning technique. (ii) The established ontology can be analyzed and navigated by using techniques of FCA. (iii) Last but not least, the ontology may be used to improve an FCA application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anonymous communication has become a hot research topic in order to meet the increasing demand for web privacy protection. However, there are few such systems which can provide high level anonymity for web browsing. The reason is the current dominant dummy packet padding method for anonymization against traffic analysis attacks. This method inherits huge delay and bandwidth waste, which inhibits its use for web browsing. In this paper, we propose a predicted packet padding strategy to replace the dummy packet padding method for anonymous web browsing systems. The proposed strategy mitigates delay and bandwidth waste significantly on average. We formulated the traffic analysis attack and defense problem, and defined a metric, cost coefficient of anonymization (CCA), to measure the performance of anonymization. We thoroughly analyzed the problem with the characteristics of web browsing and concluded that the proposed strategy is better than the current dummy packet padding strategy in theory. We have conducted extensive experiments on two real world data sets, and the results confirmed the advantage of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is currently no universally recommended and accepted method of data processing within the science of indirect calorimetry for either mixing chamber or breath-by-breath systems of expired gas analysis. Exercise physiologists were first surveyed to determine methods used to process oxygen consumption ([OV0312]O 2) data, and current attitudes to data processing within the science of indirect calorimetry. Breath-by-breath datasets obtained from indirect calorimetry during incremental exercise were then used to demonstrate the consequences of commonly used time, breath and digital filter post-acquisition data processing strategies. Assessment of the variability in breath-by-breath data was determined using multiple regression based on the independent variables ventilation (VE), and the expired gas fractions for oxygen and carbon dioxide, FEO 2 and FECO2, respectively. Based on the results of explanation of variance of the breath-by-breath [OV0312]O2 data, methods of processing to remove variability were proposed for time-averaged, breath-averaged and digital filter applications. Among exercise physiologists, the strategy used to remove the variability in sequential [OV0312]O2 measurements varied widely, and consisted of time averages (30 sec [38%], 60 sec [18%], 20 sec [11%], 15 sec [8%]), a moving average of five to 11 breaths (10%), and the middle five of seven breaths (7%). Most respondents indicated that they used multiple criteria to establish maximum [OV0312]O 2 ([OV0312]O2max) including: the attainment of age-predicted maximum heart rate (HRmax) [53%], respiratory exchange ratio (RER) >1.10 (49%) or RER >1.15 (27%) and a rating of perceived exertion (RPE) of >17, 18 or 19 (20%). The reasons stated for these strategies included their own beliefs (32%), what they were taught (26%), what they read in research articles (22%), tradition (13%) and the influence of their colleagues (7%). The combination of VE, FEO 2 and FECO2 removed 96-98% of [OV0312]O2 breath-by-breath variability in incremental and steady-state exercise [OV0312]O2 data sets, respectively. Correction of residual error in [OV0312]O2 datasets to 10% of the raw variability results from application of a 30-second time average, 15-breath running average, or a 0.04 Hz low cut-off digital filter. Thus, we recommend that once these data processing strategies are used, the peak or maximal value becomes the highest processed datapoint. Exercise physiologists need to agree on, and continually refine through empirical research, a consistent process for analysing data from indirect calorimetry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.