771 resultados para Deep Neural Network,Video object detection,Kalman filter,Nano-drones,PULP
Resumo:
Our objective for this thesis work was the deployment of a Neural Network based approach for video object detection on board a nano-drone. Furthermore, we have studied some possible extensions to exploit the temporal nature of videos to improve the detection capabilities of our algorithm. For our project, we have utilized the Mobilenetv2/v3SSDLite due to their limited computational and memory requirements. We have trained our networks on the IMAGENET VID 2015 dataset and to deploy it onto the nano-drone we have used the NNtool and Autotiler tools by GreenWaves. To exploit the temporal nature of video data we have tried different approaches: the introduction of an LSTM based convolutional layer in our architecture, the introduction of a Kalman filter based tracker as a postprocessing step to augment the results of our base architecture. We have obtain a total improvement in our performances of about 2.5 mAP with the Kalman filter based method(BYTE). Our detector run on a microcontroller class processor on board the nano-drone at 1.63 fps.
Resumo:
This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.
Resumo:
This paper proposes a filter based on a general regression neural network and a moving average filter, for preprocessing half-hourly load data for short-term multinodal load forecasting, discussed in another paper. Tests made with half-hourly load data from nine New Zealand electrical substations demonstrate that this filter is able to handle noise, missing data and abnormal data. © 2011 IEEE.
Resumo:
This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.
Resumo:
In this thesis, we propose to infer pixel-level labelling in video by utilising only object category information, exploiting the intrinsic structure of video data. Our motivation is the observation that image-level labels are much more easily to be acquired than pixel-level labels, and it is natural to find a link between the image level recognition and pixel level classification in video data, which would transfer learned recognition models from one domain to the other one. To this end, this thesis proposes two domain adaptation approaches to adapt the deep convolutional neural network (CNN) image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of unlabelled video data. Our proposed approaches explicitly model and compensate for the domain adaptation from the source domain to the target domain which in turn underpins a robust semantic object segmentation method for natural videos. We demonstrate the superior performance of our methods by presenting extensive evaluations on challenging datasets comparing with the state-of-the-art methods.
Resumo:
Correctness of information gathered in production environments is an essential part of quality assurance processes in many industries, this task is often performed by human resources who visually take annotations in various steps of the production flow. Depending on the performed task the correlation between where exactly the information is gathered and what it represents is more than often lost in the process. The lack of labeled data places a great boundary on the application of deep neural networks aimed at object detection tasks, moreover supervised training of deep models requires a great amount of data to be available. Reaching an adequate large collection of labeled images through classic techniques of data annotations is an exhausting and costly task to perform, not always suitable for every scenario. A possible solution is to generate synthetic data that replicates the real one and use it to fine-tune a deep neural network trained on one or more source domains to a different target domain. The purpose of this thesis is to show a real case scenario where the provided data were both in great scarcity and missing the required annotations. Sequentially a possible approach is presented where synthetic data has been generated to address those issues while standing as a training base of deep neural networks for object detection, capable of working on images taken in production-like environments. Lastly, it compares performance on different types of synthetic data and convolutional neural networks used as backbones for the model.
Resumo:
A reliable perception of the real world is a key-feature for an autonomous vehicle and the Advanced Driver Assistance Systems (ADAS). Obstacles detection (OD) is one of the main components for the correct reconstruction of the dynamic world. Historical approaches based on stereo vision and other 3D perception technologies (e.g. LIDAR) have been adapted to the ADAS first and autonomous ground vehicles, after, providing excellent results. The obstacles detection is a very broad field and this domain counts a lot of works in the last years. In academic research, it has been clearly established the essential role of these systems to realize active safety systems for accident prevention, reflecting also the innovative systems introduced by industry. These systems need to accurately assess situational criticalities and simultaneously assess awareness of these criticalities by the driver; it requires that the obstacles detection algorithms must be reliable and accurate, providing: a real-time output, a stable and robust representation of the environment and an estimation independent from lighting and weather conditions. Initial systems relied on only one exteroceptive sensor (e.g. radar or laser for ACC and camera for LDW) in addition to proprioceptive sensors such as wheel speed and yaw rate sensors. But, current systems, such as ACC operating at the entire speed range or autonomous braking for collision avoidance, require the use of multiple sensors since individually they can not meet these requirements. It has led the community to move towards the use of a combination of them in order to exploit the benefits of each one. Pedestrians and vehicles detection are ones of the major thrusts in situational criticalities assessment, still remaining an active area of research. ADASs are the most prominent use case of pedestrians and vehicles detection. Vehicles should be equipped with sensing capabilities able to detect and act on objects in dangerous situations, where the driver would not be able to avoid a collision. A full ADAS or autonomous vehicle, with regard to pedestrians and vehicles, would not only include detection but also tracking, orientation, intent analysis, and collision prediction. The system detects obstacles using a probabilistic occupancy grid built from a multi-resolution disparity map. Obstacles classification is based on an AdaBoost SoftCascade trained on Aggregate Channel Features. A final stage of tracking and fusion guarantees stability and robustness to the result.
Resumo:
Deep Brain Stimulator devices are becoming widely used for therapeutic benefits in movement disorders such as Parkinson's disease. Prolonging the battery life span of such devices could dramatically reduce the risks and accumulative costs associated with surgical replacement. This paper demonstrates how an artificial neural network can be trained using pre-processing frequency analysis of deep brain electrode recordings to detect the onset of tremor in Parkinsonian patients. Implementing this solution into an 'intelligent' neurostimulator device will remove the need for continuous stimulation currently used, and open up the possibility of demand-driven stimulation. Such a methodology could potentially decrease the power consumption of a deep brain pulse generator.
Resumo:
A technique is presented for locating and tracking objects in cluttered environments. Agents are randomly distributed across the image, and subsequently grouped around targets. Each agent uses a weightless neural network and a histogram intersection technique to score its location. The system has been used to locate and track a head in 320x240 resolution video at up to 15fps.
Resumo:
As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.
Resumo:
Application of dataset fusion techniques to an object detection task, involving the use of deep learning as convolutional neural networks, to manage to create a single RCNN architecture able to inference with good performances on two distinct datasets with different domains.
Resumo:
This paper discusses an object-oriented neural network model that was developed for predicting short-term traffic conditions on a section of the Pacific Highway between Brisbane and the Gold Coast in Queensland, Australia. The feasibility of this approach is demonstrated through a time-lag recurrent network (TLRN) which was developed for predicting speed data up to 15 minutes into the future. The results obtained indicate that the TLRN is capable of predicting speed up to 5 minutes into the future with a high degree of accuracy (90-94%). Similar models, which were developed for predicting freeway travel times on the same facility, were successful in predicting travel times up to 15 minutes into the future with a similar degree of accuracy (93-95%). These results represent substantial improvements on conventional model performance and clearly demonstrate the feasibility of using the object-oriented approach for short-term traffic prediction. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
High performance video codec is mandatory for multimedia applications such as video-on-demand and video conferencing. Recent research has proposed numerous video coding techniques to meet the requirement in bandwidth, delay, loss and Quality-of-Service (QoS). In this paper, we present our investigations on inter-subband self-similarity within the wavelet-decomposed video frames using neural networks, and study the performance of applying the spatial network model to all video frames over time. The goal of our proposed method is to restore the highest perceptual quality for video transmitted over a highly congested network. Our contributions in this paper are: (1) A new coding model with neural network based, inter-subband redundancy (ISR) prediction for video coding using wavelet (2) The performance of 1D and 2D ISR prediction, including multiple levels of wavelet decompositions. Our result shows a short-term quality enhancement may be obtained using both 1D and 2D ISR prediction.
Resumo:
Schizophrenia stands for a long-lasting state of mental uncertainty that may bring to an end the relation among behavior, thought, and emotion; that is, it may lead to unreliable perception, not suitable actions and feelings, and a sense of mental fragmentation. Indeed, its diagnosis is done over a large period of time; continuos signs of the disturbance persist for at least 6 (six) months. Once detected, the psychiatrist diagnosis is made through the clinical interview and a series of psychic tests, addressed mainly to avoid the diagnosis of other mental states or diseases. Undeniably, the main problem with identifying schizophrenia is the difficulty to distinguish its symptoms from those associated to different untidiness or roles. Therefore, this work will focus on the development of a diagnostic support system, in terms of its knowledge representation and reasoning procedures, based on a blended of Logic Programming and Artificial Neural Networks approaches to computing, taking advantage of a novel approach to knowledge representation and reasoning, which aims to solve the problems associated in the handling (i.e., to stand for and reason) of defective information.
Resumo:
One of the main problems related to the transport and manipulation of multiphase fluids concerns the existence of characteristic flow patterns and its strong influence on important operation parameters. A good example of this occurs in gas-liquid chemical reactors in which maximum efficiencies can be achieved by maintaining a finely dispersed bubbly flow to maximize the total interfacial area. Thus, the ability to automatically detect flow patterns is of crucial importance, especially for the adequate operation of multiphase systems. This work describes the application of a neural model to process the signals delivered by a direct imaging probe to produce a diagnostic of the corresponding flow pattern. The neural model is constituted of six independent neural modules, each of which trained to detect one of the main horizontal flow patterns, and a last winner-take-all layer responsible for resolving when two or more patterns are simultaneously detected. Experimental signals representing different bubbly, intermittent, annular and stratified flow patterns were used to validate the neural model.