853 resultados para Data stream mining
Resumo:
Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behaviour to the volatile underlying data distributions, what has been called concept drift. Moreover, it is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. This could be extremely useful in ubiquitous environments that are characterized by the existence of resource constrained devices. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. By means of using meta-models as a recurrent drift detection mechanism, the ability to share concepts representations among different data mining processes is open. That kind of exchanges could improve the accuracy of the resultant local model as such model may benefit from patterns similar to the local concept that were observed in other scenarios, but not yet locally. This would also improve the efficiency of training instances used during the classification process, as long as the exchange of models would aid in the application of already trained recurrent models, that have been previously seen by any of the collaborative devices. Which it is to say that the scope of recurrence detection and representation is broaden. In fact the detection, representation and exchange of concept drift patterns would be extremely useful for the law enforcement activities fighting against cyber crime. Being the information exchange one of the main pillars of cooperation, national units would benefit from the experience and knowledge gained by third parties. Moreover, in the specific scope of critical infrastructures protection it is crucial to count with information exchange mechanisms, both from a strategical and technical scope. The exchange of concept drift detection schemes in cyber security environments would aid in the process of preventing, detecting and effectively responding to threads in cyber space. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of: the MMPRec system, that integrates a meta-model mechanism and a fuzzy similarity function; a collaborative environment to share meta-models between different devices; a recurrent drift generator that allows to test the usefulness of recurrent drift systems, as it is the case of MMPRec. Moreover, this thesis presents an experimental validation of the proposed contributions using synthetic and real datasets.
Resumo:
Collaborate Filtering is one of the most popular recommendation algorithms. Most Collaborative Filtering algorithms work with a static set of data. This paper introduces a novel approach to providing recommendations using Collaborative Filtering when user rating is received over an incoming data stream. In an incoming stream there are massive amounts of data arriving rapidly making it impossible to save all the records for later analysis. By dynamically building a decision tree for every item as data arrive, the incoming data stream is used effectively although an inevitable trade off between accuracy and amount of memory used is introduced. By adding a simple personalization step using a hierarchy of the items, it is possible to improve the predicted ratings made by each decision tree and generate recommendations in real-time. Empirical studies with the dynamically built decision trees show that the personalization step improves the overall predicted accuracy.
Resumo:
Error free propagation of a single polarisation optical time division multiplexed 40 Gbit/s dispersion managed pulsed data stream over dispersion (non-shifted) fibre. This distance is twice the previous record at this data rate.
Resumo:
This thesis experimentally examines the use of different techniques for optical fibre transmission over ultra long haul distances. Its format firstly examines the use of dispersion management as a means of achieving long haul communications. Secondly, examining the use concatenated NOLMs for DM autosoliton ultra long haul propagation, by comparing their performance with a generic system without NOLMs. Thirdly, timing jitter in concatenated NOLM system is examined and compared to the generic system and lastly issues of OTDM amplitude non-uniformity from channel to channel in a saturable absorber, specifically a NOLM, are raised. Transmission at a rate of 40Gbit/s is studied in an all-Raman amplified standard fibre link with amplifier spacing of the order of 80km. We demonstrate in this thesis that the detrimental effects associated with high power Raman amplification can be minimized by dispersion map optimization. As a result, a transmission distance of 1600 km (2000km including dispersion compensating fibre) has been achieved in standard single mode fibre. The use of concatenated NOLMs to provide a stable propagation regime has been proposed theoretically. In this thesis, the observation experimentally of autosoliton propagation is shown for the first time in a dispersion managed optical transmission system. The system is based on a strong dispersion map with large amplifier spacing. Operation at transmission rates of 10, 40 and 80Gbit/s is demonstrated. With an insertion of a stabilizing element to the NOLM, the transmission of a 10 and 20Gbit/s data stream was extended and demonstrated experimentally. Error-free propagation over 100 and 20 thousand kilometres has been achieved at 10 and 20Gbit/s respectively, with terrestrial amplifier spacing. The monitor of timing jitter is of importance to all optical systems. Evolution of timing jitter in a DM autosoliton system has been studied in this thesis and analyzed at bit ranges from 10Gbit/s to 80Gbit/s. Non-linear guiding by in-line regenerators considerably changes the dynamics of jitter accumulation. As transmission systems require higher data rates, the use of OTDM will become more prolific. The dynamics of switching and transmission of an optical signal comprising individual OTDM channels of unequal amplitudes in a dispersion-managed link with in-line non-linear fibre loop mirrors is investigated.
Resumo:
A novel architecture for microwave/millimeter-wave signal generation and data modulation using a fiber-grating-based distributed feedback laser has been proposed in this letter. For demonstration, a 155.52-Mb/s data stream on a 16.9-GHz subcarrier has been transmitted and recovered successfully. It has been proved that this technology would be of benefit to future microwave data transmission systems.
Resumo:
Error-free transmission of a single polarization optical time division multiplexed 40 Gbit/s dispersion managed pulse data stream over 1009 km has been achieved in dispersion-compensated standard (non-dispersion shifted) fibre. This distance is twice the previous record at this data rate.
Resumo:
Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.
Resumo:
A novel architecture for microwave/millimeter-wave signal generation and data modulation using a fiber-grating-based distributed feedback laser has been proposed in this letter. For demonstration, a 155.52-Mb/s data stream on a 16.9-GHz subcarrier has been transmitted and recovered successfully. It has been proved that this technology would be of benefit to future microwave data transmission systems. © 2006 IEEE.
Resumo:
We demonstrate that the use of in-line nonlinear optical loop mirrors (NOLMs) in dispersion-managed (DM) transmission systems dominated by amplitude noise can achieve passive 2R regeneration of a 40 and 80 Gbit/s RZ data stream. This is an indication that the use of this approach could obviate the need for full-regeneration in high data rate, strong DM systems, when intra-channel four-wave mixing poses serious problems.
Resumo:
In this letter, we numerically demonstrate that the use of inline nonlinear optical loop mirrors in strongly dispersion-managed transmission systems dominated by pulse distortion and amplitude noise can achieve all-optical passive 2R regeneration of a 40-Gb/s return-to-zero data stream. We define the tolerance limits of this result to the parameters of the input pulses.
Resumo:
Error free propagation of a single polarisation optical time division multiplexed 40 Gbit/s dispersion managed pulsed data stream over dispersion (non-shifted) fibre. This distance is twice the previous record at this data rate.
Resumo:
questions of forming of learning sets for artificial neural networks in problems of lossless data compression are considered. Methods of construction and use of learning sets are studied. The way of forming of learning set during training an artificial neural network on the data stream is offered.
Resumo:
We present experimental results for wavelength-division multiplexed (WDM) transmission performance using unbalanced proportions of 1s and 0s in pseudo-random bit sequence (PRBS) data. This investigation simulates the effect of local, in time, data unbalancing which occurs in some coding systems such as forward error correction when extra bits are added to the WDM data stream. We show that such local unbalancing, which would practically give a time-dependent error-rate, can be employed to improve the legacy long-haul WDM system performance if the system is allowed to operate in the nonlinear power region. We use a recirculating loop to simulate a long-haul fibre system.
Resumo:
With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.
Resumo:
With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.