874 resultados para output fusion
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
A significant issue encountered when fusing data received from multiple sensors is the accuracy of the timestamp associated with each piece of data. This is particularly important in applications such as Simultaneous Localisation and Mapping (SLAM) where vehicle velocity forms an important part of the mapping algorithms; on fastmoving vehicles, even millisecond inconsistencies in data timestamping can produce errors which need to be compensated for. The timestamping problem is compounded in a robot swarm environment due to the use of non-deterministic readily-available hardware (such as 802.11-based wireless) and inaccurate clock synchronisation protocols (such as Network Time Protocol (NTP)). As a result, the synchronisation of the clocks between robots can be out by tens-to-hundreds of milliseconds making correlation of data difficult and preventing the possibility of the units performing synchronised actions such as triggering cameras or intricate swarm manoeuvres. In this thesis, a complete data fusion unit is designed, implemented and tested. The unit, named BabelFuse, is able to accept sensor data from a number of low-speed communication buses (such as RS232, RS485 and CAN Bus) and also timestamp events that occur on General Purpose Input/Output (GPIO) pins referencing a submillisecondaccurate wirelessly-distributed "global" clock signal. In addition to its timestamping capabilities, it can also be used to trigger an attached camera at a predefined start time and frame rate. This functionality enables the creation of a wirelessly-synchronised distributed image acquisition system over a large geographic area; a real world application for this functionality is the creation of a platform to facilitate wirelessly-distributed 3D stereoscopic vision. A ‘best-practice’ design methodology is adopted within the project to ensure the final system operates according to its requirements. Initially, requirements are generated from which a high-level architecture is distilled. This architecture is then converted into a hardware specification and low-level design, which is then manufactured. The manufactured hardware is then verified to ensure it operates as designed and firmware and Linux Operating System (OS) drivers are written to provide the features and connectivity required of the system. Finally, integration testing is performed to ensure the unit functions as per its requirements. The BabelFuse System comprises of a single Grand Master unit which is responsible for maintaining the absolute value of the "global" clock. Slave nodes then determine their local clock o.set from that of the Grand Master via synchronisation events which occur multiple times per-second. The mechanism used for synchronising the clocks between the boards wirelessly makes use of specific hardware and a firmware protocol based on elements of the IEEE-1588 Precision Time Protocol (PTP). With the key requirement of the system being submillisecond-accurate clock synchronisation (as a basis for timestamping and camera triggering), automated testing is carried out to monitor the o.sets between each Slave and the Grand Master over time. A common strobe pulse is also sent to each unit for timestamping; the correlation between the timestamps of the di.erent units is used to validate the clock o.set results. Analysis of the automated test results show that the BabelFuse units are almost threemagnitudes more accurate than their requirement; clocks of the Slave and Grand Master units do not di.er by more than three microseconds over a running time of six hours and the mean clock o.set of Slaves to the Grand Master is less-than one microsecond. The common strobe pulse used to verify the clock o.set data yields a positive result with a maximum variation between units of less-than two microseconds and a mean value of less-than one microsecond. The camera triggering functionality is verified by connecting the trigger pulse output of each board to a four-channel digital oscilloscope and setting each unit to output a 100Hz periodic pulse with a common start time. The resulting waveform shows a maximum variation between the rising-edges of the pulses of approximately 39¥ìs, well below its target of 1ms.
Resumo:
Mapping novel terrain from sparse, complex data often requires the resolution of conflicting information from sensors working at different times, locations, and scales, and from experts with different goals and situations. Information fusion methods help resolve inconsistencies in order to distinguish correct from incorrect answers, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods developed here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an objects class is car, vehicle, or man-made. Underlying relationships among objects are assumed to be unknown to the automated system of the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchial knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples.
Resumo:
Classifying novel terrain or objects front sparse, complex data may require the resolution of conflicting information from sensors working at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods described here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among objects are assumed to be unknown to the automated system or the human user. The ARTMAP information fusion system used distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchical knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships.
Resumo:
Classifying novel terrain or objects from sparse, complex data may require the resolution of conflicting information from sensors woring at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when eveidence variously suggests that and object's class is car, truck, or airplane. The methods described her address a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among classes are assumed to be unknown to the autonomated system or the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierachical knowlege structures. The fusion system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples, but is not limited to image domain.
Resumo:
Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.
An Early-Warning System for Hypo-/Hyperglycemic Events Based on Fusion of Adaptive Prediction Models
Resumo:
Introduction: Early warning of future hypoglycemic and hyperglycemic events can improve the safety of type 1 diabetes mellitus (T1DM) patients. The aim of this study is to design and evaluate a hypoglycemia / hyperglycemia early warning system (EWS) for T1DM patients under sensor-augmented pump (SAP) therapy. Methods: The EWS is based on the combination of data-driven online adaptive prediction models and a warning algorithm. Three modeling approaches have been investigated: (i) autoregressive (ARX) models, (ii) auto-regressive with an output correction module (cARX) models, and (iii) recurrent neural network (RNN) models. The warning algorithm performs postprocessing of the models′ outputs and issues alerts if upcoming hypoglycemic/hyperglycemic events are detected. Fusion of the cARX and RNN models, due to their complementary prediction performances, resulted in the hybrid autoregressive with an output correction module/recurrent neural network (cARN)-based EWS. Results: The EWS was evaluated on 23 T1DM patients under SAP therapy. The ARX-based system achieved hypoglycemic (hyperglycemic) event prediction with median values of accuracy of 100.0% (100.0%), detection time of 10.0 (8.0) min, and daily false alarms of 0.7 (0.5). The respective values for the cARX-based system were 100.0% (100.0%), 17.5 (14.8) min, and 1.5 (1.3) and, for the RNN-based system, were 100.0% (92.0%), 8.4 (7.0) min, and 0.1 (0.2). The hybrid cARN-based EWS presented outperforming results with 100.0% (100.0%) prediction accuracy, detection 16.7 (14.7) min in advance, and 0.8 (0.8) daily false alarms. Conclusion: Combined use of cARX and RNN models for the development of an EWS outperformed the single use of each model, achieving accurate and prompt event prediction with few false alarms, thus providing increased safety and comfort.
Resumo:
Correct predictions of future blood glucose levels in individuals with Type 1 Diabetes (T1D) can be used to provide early warning of upcoming hypo-/hyperglycemic events and thus to improve the patient's safety. To increase prediction accuracy and efficiency, various approaches have been proposed which combine multiple predictors to produce superior results compared to single predictors. Three methods for model fusion are presented and comparatively assessed. Data from 23 T1D subjects under sensor-augmented pump (SAP) therapy were used in two adaptive data-driven models (an autoregressive model with output correction - cARX, and a recurrent neural network - RNN). Data fusion techniques based on i) Dempster-Shafer Evidential Theory (DST), ii) Genetic Algorithms (GA), and iii) Genetic Programming (GP) were used to merge the complimentary performances of the prediction models. The fused output is used in a warning algorithm to issue alarms of upcoming hypo-/hyperglycemic events. The fusion schemes showed improved performance with lower root mean square errors, lower time lags, and higher correlation. In the warning algorithm, median daily false alarms (DFA) of 0.25%, and 100% correct alarms (CA) were obtained for both event types. The detection times (DT) before occurrence of events were 13.0 and 12.1 min respectively for hypo-/hyperglycemic events. Compared to the cARX and RNN models, and a linear fusion of the two, the proposed fusion schemes represents a significant improvement.
Resumo:
A novel algorithm based on bimatrix game theory has been developed to improve the accuracy and reliability of a speaker diarization system. This algorithm fuses the output data of two open-source speaker diarization programs, LIUM and SHoUT, taking advantage of the best properties of each one. The performance of this new system has been tested by means of audio streams from several movies. From preliminary results on fragments of five movies, improvements of 63% in false alarms and missed speech mistakes have been achieved with respect to LIUM and SHoUT systems working alone. Moreover, we also improve in a 20% the number of recognized speakers, getting close to the real number of speakers in the audio stream
Resumo:
In the last decade, multi-sensor data fusion has become a broadly demanded discipline to achieve advanced solutions that can be applied in many real world situations, either civil or military. In Defence,accurate detection of all target objects is fundamental to maintaining situational awareness, to locating threats in the battlefield and to identifying and protecting strategically own forces. Civil applications, such as traffic monitoring, have similar requirements in terms of object detection and reliable identification of incidents in order to ensure safety of road users. Thanks to the appropriate data fusion technique, we can give these systems the power to exploit automatically all relevant information from multiple sources to face for instance mission needs or assess daily supervision operations. This paper focuses on its application to active vehicle monitoring in a particular area of high density traffic, and how it is redirecting the research activities being carried out in the computer vision, signal processing and machine learning fields for improving the effectiveness of detection and tracking in ground surveillance scenarios in general. Specifically, our system proposes fusion of data at a feature level which is extracted from a video camera and a laser scanner. In addition, a stochastic-based tracking which introduces some particle filters into the model to deal with uncertainty due to occlusions and improve the previous detection output is presented in this paper. It has been shown that this computer vision tracker contributes to detect objects even under poor visual information. Finally, in the same way that humans are able to analyze both temporal and spatial relations among items in the scene to associate them a meaning, once the targets objects have been correctly detected and tracked, it is desired that machines can provide a trustworthy description of what is happening in the scene under surveillance. Accomplishing so ambitious task requires a machine learning-based hierarchic architecture able to extract and analyse behaviours at different abstraction levels. A real experimental testbed has been implemented for the evaluation of the proposed modular system. Such scenario is a closed circuit where real traffic situations can be simulated. First results have shown the strength of the proposed system.
Resumo:
The visual system combines spatial signals from the two eyes to achieve single vision. But if binocular disparity is too large, this perceptual fusion gives way to diplopia. We studied and modelled the processes underlying fusion and the transition to diplopia. The likely basis for fusion is linear summation of inputs onto binocular cortical cells. Previous studies of perceived position, contrast matching and contrast discrimination imply the computation of a dynamicallyweighted sum, where the weights vary with relative contrast. For gratings, perceived contrast was almost constant across all disparities, and this can be modelled by allowing the ocular weights to increase with disparity (Zhou, Georgeson & Hess, 2014). However, when a single Gaussian-blurred edge was shown to each eye perceived blur was invariant with disparity (Georgeson & Wallis, ECVP 2012) – not consistent with linear summation (which predicts that perceived blur increases with disparity). This blur constancy is consistent with a multiplicative form of combination (the contrast-weighted geometric mean) but that is hard to reconcile with the evidence favouring linear combination. We describe a 2-stage spatial filtering model with linear binocular combination and suggest that nonlinear output transduction (eg. ‘half-squaring’) at each stage may account for the blur constancy.