876 resultados para bigdata, data stream processing, dsp, apache storm, cyber security
Resumo:
In this letter, we numerically demonstrate that the use of inline nonlinear optical loop mirrors in strongly dispersion-managed transmission systems dominated by pulse distortion and amplitude noise can achieve all-optical passive 2R regeneration of a 40-Gb/s return-to-zero data stream. We define the tolerance limits of this result to the parameters of the input pulses.
Resumo:
Remote sensing data is routinely used in ecology to investigate the relationship between landscape pattern as characterised by land use and land cover maps, and ecological processes. Multiple factors related to the representation of geographic phenomenon have been shown to affect characterisation of landscape pattern resulting in spatial uncertainty. This study investigated the effect of the interaction between landscape spatial pattern and geospatial processing methods statistically; unlike most papers which consider the effect of each factor in isolation only. This is important since data used to calculate landscape metrics typically undergo a series of data abstraction processing tasks and are rarely performed in isolation. The geospatial processing methods tested were the aggregation method and the choice of pixel size used to aggregate data. These were compared to two components of landscape pattern, spatial heterogeneity and the proportion of landcover class area. The interactions and their effect on the final landcover map were described using landscape metrics to measure landscape pattern and classification accuracy (response variables). All landscape metrics and classification accuracy were shown to be affected by both landscape pattern and by processing methods. Large variability in the response of those variables and interactions between the explanatory variables were observed. However, even though interactions occurred, this only affected the magnitude of the difference in landscape metric values. Thus, provided that the same processing methods are used, landscapes should retain their ranking when their landscape metrics are compared. For example, highly fragmented landscapes will always have larger values for the landscape metric "number of patches" than less fragmented landscapes. But the magnitude of difference between the landscapes may change and therefore absolute values of landscape metrics may need to be interpreted with caution. The explanatory variables which had the largest effects were spatial heterogeneity and pixel size. These explanatory variables tended to result in large main effects and large interactions. The high variability in the response variables and the interaction of the explanatory variables indicate it would be difficult to make generalisations about the impact of processing on landscape pattern as only two processing methods were tested and it is likely that untested processing methods will potentially result in even greater spatial uncertainty. © 2013 Elsevier B.V.
Resumo:
Error free propagation of a single polarisation optical time division multiplexed 40 Gbit/s dispersion managed pulsed data stream over dispersion (non-shifted) fibre. This distance is twice the previous record at this data rate.
Resumo:
questions of forming of learning sets for artificial neural networks in problems of lossless data compression are considered. Methods of construction and use of learning sets are studied. The way of forming of learning set during training an artificial neural network on the data stream is offered.
Resumo:
We present a comparative study of the influence of dispersion induced phase noise for n-level PSK systems. From the analysis, we conclude that the phase noise influence for classical homodyne/heterodyne PSK systems is entirely determined by the modulation complexity (expressed in terms of constellation diagram) and the analogue demodulation format. On the other hand, the use of digital signal processing (DSP) in homodyne/intradyne systems renders a fiber length dependence originating from the generation of equalization enhanced phase noise. For future high capacity systems, high constellations must be used in order to lower the symbol rate to practically manageable speeds, and this fact puts severe requirements to the signal and local oscillator (LO) linewidths. Our results for the bit-error-rate (BER) floor caused by the phase noise influence in the case of QPSK, 16PSK and 64PSK systems outline tolerance limitations for the LO performance: 5 MHz linewidth (at 3-dB level) for 100 Gbit/s QPSK; 1 MHz for 400 Gbit/s QPSK; 0.1 MHz for 400 Gbit/s 16PSK and 1 Tbit/s 64PSK systems. This defines design constrains for the phase noise impact in distributed-feed-back (DFB) or distributed-Bragg-reflector (DBR) semiconductor lasers, that would allow moving the system capacity from 100 Gbit/s system capacity to 400 Gbit/s in 3 years (1 Tbit/s in 5 years). It is imperative at the same time to increase the analogue to digital conversion (ADC) speed such that the single quadrature symbol rate goes from today's 25 GS/s to 100 GS/s (using two samples per symbol). © 2014 by Walter de Gruyter Berlin/Boston.
Bottleneck Problem Solution using Biological Models of Attention in High Resolution Tracking Sensors
Resumo:
Every high resolution imaging system suffers from the bottleneck problem. This problem relates to the huge amount of data transmission from the sensor array to a digital signal processing (DSP) and to bottleneck in performance, caused by the requirement to process a large amount of information in parallel. The same problem exists in biological vision systems, where the information, sensed by many millions of receptors should be transmitted and processed in real time. Models, describing the bottleneck problem solutions in biological systems fall in the field of visual attention. This paper presents the bottleneck problem existing in imagers used for real time salient target tracking and proposes a simple solution by employing models of attention, found in biological systems. The bottleneck problem in imaging systems is presented, the existing models of visual attention are discussed and the architecture of the proposed imager is shown.
Resumo:
The sharing of near real-time traceability knowledge in supply chains plays a central role in coordinating business operations and is a key driver for their success. However before traceability datasets received from external partners can be integrated with datasets generated internally within an organisation, they need to be validated against information recorded for the physical goods received as well as against bespoke rules defined to ensure uniformity, consistency and completeness within the supply chain. In this paper, we present a knowledge driven framework for the runtime validation of critical constraints on incoming traceability datasets encapuslated as EPCIS event-based linked pedigrees. Our constraints are defined using SPARQL queries and SPIN rules. We present a novel validation architecture based on the integration of Apache Storm framework for real time, distributed computation with popular Semantic Web/Linked data libraries and exemplify our methodology on an abstraction of the pharmaceutical supply chain.
Resumo:
This paper studies the key aspects of an optical link which transmits a broadband microwave filter bank multicarrier (FBMC) signal. The study is presented in the context of creating an all-analogue real-time multigigabit orthogonal frequency division multiplexing electro-optical transceiver for short range and high-capacity data center networks. Passive microwave filters are used to perform the pulse shaping of the bit streams, allowing an orthogonal transmission without the necessity of digital signal processing (DSP). Accordingly, a cyclic prefix that would cause a reduction in the net data rate is not required. An experiment consisting of three orthogonally spaced 2.7 Gbaud quadrature phase shift keyed subchannels demonstrates that the spectral efficiency of traditional DSP-less subcarrier multiplexed links can be potentially doubled. A sensitivity of -29.5 dBm is achieved in a 1-km link.
Resumo:
We present experimental results for wavelength-division multiplexed (WDM) transmission performance using unbalanced proportions of 1s and 0s in pseudo-random bit sequence (PRBS) data. This investigation simulates the effect of local, in time, data unbalancing which occurs in some coding systems such as forward error correction when extra bits are added to the WDM data stream. We show that such local unbalancing, which would practically give a time-dependent error-rate, can be employed to improve the legacy long-haul WDM system performance if the system is allowed to operate in the nonlinear power region. We use a recirculating loop to simulate a long-haul fibre system.
Resumo:
A novel versatile digital signal processing (DSP)-based equalizer using support vector machine regression (SVR) is proposed for 16-quadrature amplitude modulated (16-QAM) coherent optical orthogonal frequency-division multiplexing (CO-OFDM) and experimentally compared to traditional DSP-based deterministic fiber-induced nonlinearity equalizers (NLEs), namely the full-field digital back-propagation (DBP) and the inverse Volterra series transfer function-based NLE (V-NLE). For a 40 Gb/s 16-QAM CO-OFDM at 2000 km, SVR-NLE extends the optimum launched optical power (LOP) by 4 dB compared to V-NLE by means of reduction of fiber nonlinearity. In comparison to full-field DBP at a LOP of 6 dBm, SVR-NLE outperforms by ∼1 dB in Q-factor. In addition, SVR-NLE is the most computational efficient DSP-NLE.
Resumo:
With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.
Resumo:
[EN]This paper describes a face detection system which goes beyond traditional approaches normally designed for still images. First the video stream context is considered to apply the detector, and therefore, the resulting system is designed taking into consideration a main feature available in a video stream, i.e. temporal coherence. The resulting system builds a feature based model for each detected face, and searches them using various model information in the next frame. The results achieved for video stream processing outperform Rowley-Kanade's and Viola-Jones' solutions providing eye and face data in a reduced time with a notable correct detection rate.
Resumo:
Abstract Heading into the 2020s, Physics and Astronomy are undergoing experimental revolutions that will reshape our picture of the fabric of the Universe. The Large Hadron Collider (LHC), the largest particle physics project in the world, produces 30 petabytes of data annually that need to be sifted through, analysed, and modelled. In astrophysics, the Large Synoptic Survey Telescope (LSST) will be taking a high-resolution image of the full sky every 3 days, leading to data rates of 30 terabytes per night over ten years. These experiments endeavour to answer the question why 96% of the content of the universe currently elude our physical understanding. Both the LHC and LSST share the 5-dimensional nature of their data, with position, energy and time being the fundamental axes. This talk will present an overview of the experiments and data that is gathered, and outlines the challenges in extracting information. Common strategies employed are very similar to industrial data! Science problems (e.g., data filtering, machine learning, statistical interpretation) and provide a seed for exchange of knowledge between academia and industry. Speaker Biography Professor Mark Sullivan Mark Sullivan is a Professor of Astrophysics in the Department of Physics and Astronomy. Mark completed his PhD at Cambridge, and following postdoctoral study in Durham, Toronto and Oxford, now leads a research group at Southampton studying dark energy using exploding stars called "type Ia supernovae". Mark has many years' experience of research that involves repeatedly imaging the night sky to track the arrival of transient objects, involving significant challenges in data handling, processing, classification and analysis.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.