832 resultados para bigdata, data stream processing, dsp, apache storm, cyber security
Resumo:
Optical communications receivers using wavelet signals processing is proposed in this paper for dense wavelength-division multiplexed (DWDM) systems and modal-division multiplexed (MDM) transmissions. The optical signal-to-noise ratio (OSNR) required to demodulate polarization-division multiplexed quadrature phase shift keying (PDM-QPSK) modulation format is alleviated with the wavelet denoising process. This procedure improves the bit error rate (BER) performance and increasing the transmission distance in DWDM systems. Additionally, the wavelet-based design relies on signal decomposition using time-limited basis functions allowing to reduce the computational cost in Digital-Signal-Processing (DSP) module. Attending to MDM systems, a new scheme of encoding data bits based on wavelets is presented to minimize the mode coupling in few-mode (FWF) and multimode fibers (MMF). The Shifted Prolate Wave Spheroidal (SPWS) functions are proposed to reduce the modal interference.
Resumo:
Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behaviour to the volatile underlying data distributions, what has been called concept drift. Moreover, it is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. This could be extremely useful in ubiquitous environments that are characterized by the existence of resource constrained devices. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. By means of using meta-models as a recurrent drift detection mechanism, the ability to share concepts representations among different data mining processes is open. That kind of exchanges could improve the accuracy of the resultant local model as such model may benefit from patterns similar to the local concept that were observed in other scenarios, but not yet locally. This would also improve the efficiency of training instances used during the classification process, as long as the exchange of models would aid in the application of already trained recurrent models, that have been previously seen by any of the collaborative devices. Which it is to say that the scope of recurrence detection and representation is broaden. In fact the detection, representation and exchange of concept drift patterns would be extremely useful for the law enforcement activities fighting against cyber crime. Being the information exchange one of the main pillars of cooperation, national units would benefit from the experience and knowledge gained by third parties. Moreover, in the specific scope of critical infrastructures protection it is crucial to count with information exchange mechanisms, both from a strategical and technical scope. The exchange of concept drift detection schemes in cyber security environments would aid in the process of preventing, detecting and effectively responding to threads in cyber space. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of: the MMPRec system, that integrates a meta-model mechanism and a fuzzy similarity function; a collaborative environment to share meta-models between different devices; a recurrent drift generator that allows to test the usefulness of recurrent drift systems, as it is the case of MMPRec. Moreover, this thesis presents an experimental validation of the proposed contributions using synthetic and real datasets.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-04
Resumo:
Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.
Resumo:
A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.
Resumo:
Error free propagation of a single polarisation optical time division multiplexed 40 Gbit/s dispersion managed pulsed data stream over dispersion (non-shifted) fibre. This distance is twice the previous record at this data rate.
Resumo:
This thesis experimentally examines the use of different techniques for optical fibre transmission over ultra long haul distances. Its format firstly examines the use of dispersion management as a means of achieving long haul communications. Secondly, examining the use concatenated NOLMs for DM autosoliton ultra long haul propagation, by comparing their performance with a generic system without NOLMs. Thirdly, timing jitter in concatenated NOLM system is examined and compared to the generic system and lastly issues of OTDM amplitude non-uniformity from channel to channel in a saturable absorber, specifically a NOLM, are raised. Transmission at a rate of 40Gbit/s is studied in an all-Raman amplified standard fibre link with amplifier spacing of the order of 80km. We demonstrate in this thesis that the detrimental effects associated with high power Raman amplification can be minimized by dispersion map optimization. As a result, a transmission distance of 1600 km (2000km including dispersion compensating fibre) has been achieved in standard single mode fibre. The use of concatenated NOLMs to provide a stable propagation regime has been proposed theoretically. In this thesis, the observation experimentally of autosoliton propagation is shown for the first time in a dispersion managed optical transmission system. The system is based on a strong dispersion map with large amplifier spacing. Operation at transmission rates of 10, 40 and 80Gbit/s is demonstrated. With an insertion of a stabilizing element to the NOLM, the transmission of a 10 and 20Gbit/s data stream was extended and demonstrated experimentally. Error-free propagation over 100 and 20 thousand kilometres has been achieved at 10 and 20Gbit/s respectively, with terrestrial amplifier spacing. The monitor of timing jitter is of importance to all optical systems. Evolution of timing jitter in a DM autosoliton system has been studied in this thesis and analyzed at bit ranges from 10Gbit/s to 80Gbit/s. Non-linear guiding by in-line regenerators considerably changes the dynamics of jitter accumulation. As transmission systems require higher data rates, the use of OTDM will become more prolific. The dynamics of switching and transmission of an optical signal comprising individual OTDM channels of unequal amplitudes in a dispersion-managed link with in-line non-linear fibre loop mirrors is investigated.
Resumo:
We have recently proposed the framework of independent blind source separation as an advantageous approach to steganography. Amongst the several characteristics noted was a sensitivity to message reconstruction due to small perturbations in the sources. This characteristic is not common in most other approaches to steganography. In this paper we discuss how this sensitivity relates the joint diagonalisation inside the independent component approach, and reliance on exact knowledge of secret information, and how it can be used as an additional and inherent security mechanism against malicious attack to discovery of the hidden messages. The paper therefore provides an enhanced mechanism that can be used for e-document forensic analysis and can be applied to different dimensionality digital data media. In this paper we use a low dimensional example of biomedical time series as might occur in the electronic patient health record, where protection of the private patient information is paramount.
Resumo:
A novel architecture for microwave/millimeter-wave signal generation and data modulation using a fiber-grating-based distributed feedback laser has been proposed in this letter. For demonstration, a 155.52-Mb/s data stream on a 16.9-GHz subcarrier has been transmitted and recovered successfully. It has been proved that this technology would be of benefit to future microwave data transmission systems.
Resumo:
Error-free transmission of a single polarization optical time division multiplexed 40 Gbit/s dispersion managed pulse data stream over 1009 km has been achieved in dispersion-compensated standard (non-dispersion shifted) fibre. This distance is twice the previous record at this data rate.
Resumo:
A novel architecture for microwave/millimeter-wave signal generation and data modulation using a fiber-grating-based distributed feedback laser has been proposed in this letter. For demonstration, a 155.52-Mb/s data stream on a 16.9-GHz subcarrier has been transmitted and recovered successfully. It has been proved that this technology would be of benefit to future microwave data transmission systems. © 2006 IEEE.
Resumo:
Forests play a pivotal role in timber production, maintenance and development of biodiversity and in carbon sequestration and storage in the context of the Kyoto Protocol. Policy makers and forest experts therefore require reliable information on forest extent, type and change for management, planning and modeling purposes. It is becoming increasingly clear that such forest information is frequently inconsistent and unharmonised between countries and continents. This research paper presents a forest information portal that has been developed in line with the GEOSS and INSPIRE frameworks. The web portal provides access to forest resources data at a variety of spatial scales, from global through to regional and local, as well as providing analytical capabilities for monitoring and validating forest change. The system also allows for the utilisation of forest data and processing services within other thematic areas. The web portal has been developed using open standards to facilitate accessibility, interoperability and data transfer.
Resumo:
We demonstrate that the use of in-line nonlinear optical loop mirrors (NOLMs) in dispersion-managed (DM) transmission systems dominated by amplitude noise can achieve passive 2R regeneration of a 40 and 80 Gbit/s RZ data stream. This is an indication that the use of this approach could obviate the need for full-regeneration in high data rate, strong DM systems, when intra-channel four-wave mixing poses serious problems.
Resumo:
In this letter, we numerically demonstrate that the use of inline nonlinear optical loop mirrors in strongly dispersion-managed transmission systems dominated by pulse distortion and amplitude noise can achieve all-optical passive 2R regeneration of a 40-Gb/s return-to-zero data stream. We define the tolerance limits of this result to the parameters of the input pulses.
Resumo:
Remote sensing data is routinely used in ecology to investigate the relationship between landscape pattern as characterised by land use and land cover maps, and ecological processes. Multiple factors related to the representation of geographic phenomenon have been shown to affect characterisation of landscape pattern resulting in spatial uncertainty. This study investigated the effect of the interaction between landscape spatial pattern and geospatial processing methods statistically; unlike most papers which consider the effect of each factor in isolation only. This is important since data used to calculate landscape metrics typically undergo a series of data abstraction processing tasks and are rarely performed in isolation. The geospatial processing methods tested were the aggregation method and the choice of pixel size used to aggregate data. These were compared to two components of landscape pattern, spatial heterogeneity and the proportion of landcover class area. The interactions and their effect on the final landcover map were described using landscape metrics to measure landscape pattern and classification accuracy (response variables). All landscape metrics and classification accuracy were shown to be affected by both landscape pattern and by processing methods. Large variability in the response of those variables and interactions between the explanatory variables were observed. However, even though interactions occurred, this only affected the magnitude of the difference in landscape metric values. Thus, provided that the same processing methods are used, landscapes should retain their ranking when their landscape metrics are compared. For example, highly fragmented landscapes will always have larger values for the landscape metric "number of patches" than less fragmented landscapes. But the magnitude of difference between the landscapes may change and therefore absolute values of landscape metrics may need to be interpreted with caution. The explanatory variables which had the largest effects were spatial heterogeneity and pixel size. These explanatory variables tended to result in large main effects and large interactions. The high variability in the response variables and the interaction of the explanatory variables indicate it would be difficult to make generalisations about the impact of processing on landscape pattern as only two processing methods were tested and it is likely that untested processing methods will potentially result in even greater spatial uncertainty. © 2013 Elsevier B.V.