6 resultados para internet traffic classification machine learning apache spark hadoop big data word2vec

em CORA - Cork Open Research Archive - University College Cork - Ireland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. A part from introduction and references the paper is organized as follows. The second section presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learningbased algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Humans are profoundly affected by the surroundings which they inhabit. Environmental psychologists have produced numerous credible theories describing optimal human environments, based on the concept of congruence or “fit” (1, 2). Lack of person/environment fit can lead to stress-related illness and lack of psychosocial well-being (3). Conversely, appropriately designed environments can promote wellness (4) or “salutogenesis” (5). Increasingly, research in the area of Evidence-Based Design, largely concentrated in the area of healthcare architecture, has tended to bear out these theories (6). Patients and long-term care residents, because of injury, illness or physical/ cognitive impairment, are less likely to be able to intervene to modify their immediate environment, unless this is designed specifically to facilitate their particular needs. In the context of care settings, detailed design of personal space therefore takes on enormous significance. MyRoom conceptualises a personalisable room, utilising sensoring and networked computing to enable the environment to respond directly and continuously to the occupant. Bio-signals collected and relayed to the system will actuate application(s) intended to positively influence user well-being. Drawing on the evidence base in relation to therapeutic design interventions (7), real-time changes in ambient lighting, colour, image, etc. respond continuously to the user’s physiological state, optimising congruence. Based on research evidence, consideration is also given to development of an application which uses natural images (8). It is envisaged that actuation will require machine-learning based on interpretation of data gathered by sensors; sensoring arrangements may vary depending on context and end-user. Such interventions aim to reduce inappropriate stress/ provide stimulation, supporting both instrumental and cognitive tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The amount and quality of available biomass is a key factor for the sustainable livestock industry and agricultural management related decision making. Globally 31.5% of land cover is grassland while 80% of Ireland’s agricultural land is grassland. In Ireland, grasslands are intensively managed and provide the cheapest feed source for animals. This dissertation presents a detailed state of the art review of satellite remote sensing of grasslands, and the potential application of optical (Moderate–resolution Imaging Spectroradiometer (MODIS)) and radar (TerraSAR-X) time series imagery to estimate the grassland biomass at two study sites (Moorepark and Grange) in the Republic of Ireland using both statistical and state of the art machine learning algorithms. High quality weather data available from the on-site weather station was also used to calculate the Growing Degree Days (GDD) for Grange to determine the impact of ancillary data on biomass estimation. In situ and satellite data covering 12 years for the Moorepark and 6 years for the Grange study sites were used to predict grassland biomass using multiple linear regression, Neuro Fuzzy Inference Systems (ANFIS) models. The results demonstrate that a dense (8-day composite) MODIS image time series, along with high quality in situ data, can be used to retrieve grassland biomass with high performance (R2 = 0:86; p < 0:05, RMSE = 11.07 for Moorepark). The model for Grange was modified to evaluate the synergistic use of vegetation indices derived from remote sensing time series and accumulated GDD information. As GDD is strongly linked to the plant development, or phonological stage, an improvement in biomass estimation would be expected. It was observed that using the ANFIS model the biomass estimation accuracy increased from R2 = 0:76 (p < 0:05) to R2 = 0:81 (p < 0:05) and the root mean square error was reduced by 2.72%. The work on the application of optical remote sensing was further developed using a TerraSAR-X Staring Spotlight mode time series over the Moorepark study site to explore the extent to which very high resolution Synthetic Aperture Radar (SAR) data of interferometrically coherent paddocks can be exploited to retrieve grassland biophysical parameters. After filtering out the non-coherent plots it is demonstrated that interferometric coherence can be used to retrieve grassland biophysical parameters (i. e., height, biomass), and that it is possible to detect changes due to the grass growth, and grazing and mowing events, when the temporal baseline is short (11 days). However, it not possible to automatically uniquely identify the cause of these changes based only on the SAR backscatter and coherence, due to the ambiguity caused by tall grass laid down due to the wind. Overall, the work presented in this dissertation has demonstrated the potential of dense remote sensing and weather data time series to predict grassland biomass using machine-learning algorithms, where high quality ground data were used for training. At present a major limitation for national scale biomass retrieval is the lack of spatial and temporal ground samples, which can be partially resolved by minor modifications in the existing PastureBaseIreland database by adding the location and extent ofeach grassland paddock in the database. As far as remote sensing data requirements are concerned, MODIS is useful for large scale evaluation but due to its coarse resolution it is not possible to detect the variations within the fields and between the fields at the farm scale. However, this issue will be resolved in terms of spatial resolution by the Sentinel-2 mission, and when both satellites (Sentinel-2A and Sentinel-2B) are operational the revisit time will reduce to 5 days, which together with Landsat-8, should enable sufficient cloud-free data for operational biomass estimation at a national scale. The Synthetic Aperture Radar Interferometry (InSAR) approach is feasible if there are enough coherent interferometric pairs available, however this is difficult to achieve due to the temporal decorrelation of the signal. For repeat-pass InSAR over a vegetated area even an 11 days temporal baseline is too large. In order to achieve better coherence a very high resolution is required at the cost of spatial coverage, which limits its scope for use in an operational context at a national scale. Future InSAR missions with pair acquisition in Tandem mode will minimize the temporal decorrelation over vegetation areas for more focused studies. The proposed approach complements the current paradigm of Big Data in Earth Observation, and illustrates the feasibility of integrating data from multiple sources. In future, this framework can be used to build an operational decision support system for retrieval of grassland biophysical parameters based on data from long term planned optical missions (e. g., Landsat, Sentinel) that will ensure the continuity of data acquisition. Similarly, Spanish X-band PAZ and TerraSAR-X2 missions will ensure the continuity of TerraSAR-X and COSMO-SkyMed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a by-product of the ‘information revolution’ which is currently unfolding, lifetimes of man (and indeed computer) hours are being allocated for the automated and intelligent interpretation of data. This is particularly true in medical and clinical settings, where research into machine-assisted diagnosis of physiological conditions gains momentum daily. Of the conditions which have been addressed, however, automated classification of allergy has not been investigated, even though the numbers of allergic persons are rising, and undiagnosed allergies are most likely to elicit fatal consequences. On the basis of the observations of allergists who conduct oral food challenges (OFCs), activity-based analyses of allergy tests were performed. Algorithms were investigated and validated by a pilot study which verified that accelerometer-based inquiry of human movements is particularly well-suited for objective appraisal of activity. However, when these analyses were applied to OFCs, accelerometer-based investigations were found to provide very poor separation between allergic and non-allergic persons, and it was concluded that the avenues explored in this thesis are inadequate for the classification of allergy. Heart rate variability (HRV) analysis is known to provide very significant diagnostic information for many conditions. Owing to this, electrocardiograms (ECGs) were recorded during OFCs for the purpose of assessing the effect that allergy induces on HRV features. It was found that with appropriate analysis, excellent separation between allergic and nonallergic subjects can be obtained. These results were, however, obtained with manual QRS annotations, and these are not a viable methodology for real-time diagnostic applications. Even so, this was the first work which has categorically correlated changes in HRV features to the onset of allergic events, and manual annotations yield undeniable affirmation of this. Fostered by the successful results which were obtained with manual classifications, automatic QRS detection algorithms were investigated to facilitate the fully automated classification of allergy. The results which were obtained by this process are very promising. Most importantly, the work that is presented in this thesis did not obtain any false positive classifications. This is a most desirable result for OFC classification, as it allows complete confidence to be attributed to classifications of allergy. Furthermore, these results could be particularly advantageous in clinical settings, as machine-based classification can detect the onset of allergy which can allow for early termination of OFCs. Consequently, machine-based monitoring of OFCs has in this work been shown to possess the capacity to significantly and safely advance the current state of clinical art of allergy diagnosis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The electroencephalogram (EEG) is a medical technology that is used in the monitoring of the brain and in the diagnosis of many neurological illnesses. Although coarse in its precision, the EEG is a non-invasive tool that requires minimal set-up times, and is suitably unobtrusive and mobile to allow continuous monitoring of the patient, either in clinical or domestic environments. Consequently, the EEG is the current tool-of-choice with which to continuously monitor the brain where temporal resolution, ease-of- use and mobility are important. Traditionally, EEG data are examined by a trained clinician who identifies neurological events of interest. However, recent advances in signal processing and machine learning techniques have allowed the automated detection of neurological events for many medical applications. In doing so, the burden of work on the clinician has been significantly reduced, improving the response time to illness, and allowing the relevant medical treatment to be administered within minutes rather than hours. However, as typical EEG signals are of the order of microvolts (μV ), contamination by signals arising from sources other than the brain is frequent. These extra-cerebral sources, known as artefacts, can significantly distort the EEG signal, making its interpretation difficult, and can dramatically disimprove automatic neurological event detection classification performance. This thesis therefore, contributes to the further improvement of auto- mated neurological event detection systems, by identifying some of the major obstacles in deploying these EEG systems in ambulatory and clinical environments so that the EEG technologies can emerge from the laboratory towards real-world settings, where they can have a real-impact on the lives of patients. In this context, the thesis tackles three major problems in EEG monitoring, namely: (i) the problem of head-movement artefacts in ambulatory EEG, (ii) the high numbers of false detections in state-of-the-art, automated, epileptiform activity detection systems and (iii) false detections in state-of-the-art, automated neonatal seizure detection systems. To accomplish this, the thesis employs a wide range of statistical, signal processing and machine learning techniques drawn from mathematics, engineering and computer science. The first body of work outlined in this thesis proposes a system to automatically detect head-movement artefacts in ambulatory EEG and utilises supervised machine learning classifiers to do so. The resulting head-movement artefact detection system is the first of its kind and offers accurate detection of head-movement artefacts in ambulatory EEG. Subsequently, addtional physiological signals, in the form of gyroscopes, are used to detect head-movements and in doing so, bring additional information to the head- movement artefact detection task. A framework for combining EEG and gyroscope signals is then developed, offering improved head-movement arte- fact detection. The artefact detection methods developed for ambulatory EEG are subsequently adapted for use in an automated epileptiform activity detection system. Information from support vector machines classifiers used to detect epileptiform activity is fused with information from artefact-specific detection classifiers in order to significantly reduce the number of false detections in the epileptiform activity detection system. By this means, epileptiform activity detection which compares favourably with other state-of-the-art systems is achieved. Finally, the problem of false detections in automated neonatal seizure detection is approached in an alternative manner; blind source separation techniques, complimented with information from additional physiological signals are used to remove respiration artefact from the EEG. In utilising these methods, some encouraging advances have been made in detecting and removing respiration artefacts from the neonatal EEG, and in doing so, the performance of the underlying diagnostic technology is improved, bringing its deployment in the real-world, clinical domain one step closer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, attacks on cryptographic algorithms looked for mathematical weaknesses in the underlying structure of a cipher. Side-channel attacks, however, look to extract secret key information based on the leakage from the device on which the cipher is implemented, be it smart-card, microprocessor, dedicated hardware or personal computer. Attacks based on the power consumption, electromagnetic emanations and execution time have all been practically demonstrated on a range of devices to reveal partial secret-key information from which the full key can be reconstructed. The focus of this thesis is power analysis, more specifically a class of attacks known as profiling attacks. These attacks assume a potential attacker has access to, or can control, an identical device to that which is under attack, which allows him to profile the power consumption of operations or data flow during encryption. This assumes a stronger adversary than traditional non-profiling attacks such as differential or correlation power analysis, however the ability to model a device allows templates to be used post-profiling to extract key information from many different target devices using the power consumption of very few encryptions. This allows an adversary to overcome protocols intended to prevent secret key recovery by restricting the number of available traces. In this thesis a detailed investigation of template attacks is conducted, along with how the selection of various attack parameters practically affect the efficiency of the secret key recovery, as well as examining the underlying assumption of profiling attacks in that the power consumption of one device can be used to extract secret keys from another. Trace only attacks, where the corresponding plaintext or ciphertext data is unavailable, are then investigated against both symmetric and asymmetric algorithms with the goal of key recovery from a single trace. This allows an adversary to bypass many of the currently proposed countermeasures, particularly in the asymmetric domain. An investigation into machine-learning methods for side-channel analysis as an alternative to template or stochastic methods is also conducted, with support vector machines, logistic regression and neural networks investigated from a side-channel viewpoint. Both binary and multi-class classification attack scenarios are examined in order to explore the relative strengths of each algorithm. Finally these machine-learning based alternatives are empirically compared with template attacks, with their respective merits examined with regards to attack efficiency.