819 resultados para Visual surveillance, Human activity recognition, Video annotation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, but recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primate temporal cortex has been demonstrated to play an important role in visual memory and pattern recognition. It is of particular interest to investigate whether activity-dependent modification of synaptic efficacy, a presumptive mechanism for learning and memory, is present in this cortical region. Here we address this issue by examining the induction of synaptic plasticity in surgically resected human inferior and middle temporal cortex. The results show that synaptic strength in the human temporal cortex could undergo bidirectional modifications, depending on the pattern of conditioning stimulation. High frequency stimulation (100 or 40 Hz) in layer IV induced long-term potentiation (LTP) of both intracellular excitatory postsynaptic potentials and evoked field potentials in layers II/III. The LTP induced by 100 Hz tetanus was blocked by 50-100 microM DL-2-amino-5-phosphonovaleric acid, suggesting that N-methyl-D-aspartate receptors were responsible for its induction. Long-term depression (LTD) was elicited by prolonged low frequency stimulation (1 Hz, 15 min). It was reduced, but not completely blocked, by DL-2-amino-5-phosphonovaleric acid, implying that some other mechanisms in addition to N-methyl-DL-aspartate receptors were involved in LTD induction. LTD was input-specific, i.e., low frequency stimulation of one pathway produced LTD of synaptic transmission in that pathway only. Finally, the LTP and LTD could reverse each other, suggesting that they can act cooperatively to modify the functional state of cortical network. These results suggest that LTP and LTD are possible mechanisms for the visual memory and pattern recognition functions performed in the human temporal cortex.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human object recognition is considered to be largely invariant to translation across the visual field. However, the origin of this invariance to positional changes has remained elusive, since numerous studies found that the ability to discriminate between visual patterns develops in a largely location-specific manner, with only a limited transfer to novel visual field positions. In order to reconcile these contradicting observations, we traced the acquisition of categories of unfamiliar grey-level patterns within an interleaved learning and testing paradigm that involved either the same or different retinal locations. Our results show that position invariance is an emergent property of category learning. Pattern categories acquired over several hours at a fixed location in either the peripheral or central visual field gradually become accessible at new locations without any position-specific feedback. Furthermore, categories of novel patterns presented in the left hemifield are distinctly faster learnt and better generalized to other locations than those learnt in the right hemifield. Our results suggest that during learning initially position-specific representations of categories based on spatial pattern structure become encoded in a relational, position-invariant format. Such representational shifts may provide a generic mechanism to achieve perceptual invariance in object recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual recording devices such as video cameras, CCTVs, or webcams have been broadly used to facilitate work progress or safety monitoring on construction sites. Without human intervention, however, both real-time reasoning about captured scenes and interpretation of recorded images are challenging tasks. This article presents an exploratory method for automated object identification using standard video cameras on construction sites. The proposed method supports real-time detection and classification of mobile heavy equipment and workers. The background subtraction algorithm extracts motion pixels from an image sequence, the pixels are then grouped into regions to represent moving objects, and finally the regions are identified as a certain object using classifiers. For evaluating the method, the formulated computer-aided process was implemented on actual construction sites, and promising results were obtained. This article is expected to contribute to future applications of automated monitoring systems of work zone safety or productivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering identities in a broadcast video is a useful task to aid in video annotation and retrieval. Quality based frame selection is a crucial task in video face clustering, to both improve the clustering performance and reduce the computational cost. We present a frame work that selects the highest quality frames available in a video to cluster the face. This frame selection technique is based on low level and high level features (face symmetry, sharpness, contrast and brightness) to select the highest quality facial images available in a face sequence for clustering. We also consider the temporal distribution of the faces to ensure that selected faces are taken at times distributed throughout the sequence. Normalized feature scores are fused and frames with high quality scores are used in a Local Gabor Binary Pattern Histogram Sequence based face clustering system. We present a news video database to evaluate the clustering system performance. Experiments on the newly created news database show that the proposed method selects the best quality face images in the video sequence, resulting in improved clustering performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At present, the most reliable method to obtain end-user perceived quality is through subjective tests. In this paper, the impact of automatic region-of-interest (ROI) coding on perceived quality of mobile video is investigated. The evidence, which is based on perceptual comparison analysis, shows that the coding strategy improves perceptual quality. This is particularly true in low bit rate situations. The ROI detection method used in this paper is based on two approaches: - (1) automatic ROI by analyzing the visual contents automatically, and; - (2) eye-tracking based ROI by aggregating eye-tracking data across many users, used to both evaluate the accuracy of automatic ROI detection and the subjective quality of automatic ROI encoded video. The perceptual comparison analysis is based on subjective assessments with 54 participants, across different content types, screen resolutions, and target bit rates while comparing the two ROI detection methods. The results from the user study demonstrate that ROI-based video encoding has higher perceived quality compared to normal video encoded at a similar bit rate, particularly in the lower bit rate range.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Yangtze River dolphin or baiji ( Lipotes vexillifer), an obligate freshwater odontocete known only from the middle-lower Yangtze River system and neighbouring Qiantang River in eastern China, has long been recognized as one of the world's rarest and most threatened mammal species. The status of the baiji has not been investigated since the late 1990s, when the surviving population was estimated to be as low as 13 individuals. An intensive six-week multivessel visual and acoustic survey carried out in November-December 2006, covering the entire historical range of the baiji in the main Yangtze channel, failed to find any evidence that the species survives. We are forced to conclude that the baiji is now likely to be extinct, probably due to unsustainable by-catch in local fisheries. This represents the first global extinction of a large vertebrate for over 50 years, only the fourth disappearance of an entire mammal family since AD 1500, and the first cetacean species to be driven to extinction by human activity. Immediate and extreme measures may be necessary to prevent the extinction of other endangered cetaceans, including the sympatric Yangtze finless porpoise ( Neophocaena phocaenoides asiaeorientalis).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Temporal synchronization of multiple video recordings of the same dynamic event is a critical task in many computer vision applications e.g. novel view synthesis and 3D reconstruction. Typically this information is implied through the time-stamp information embedded in the video streams. User-generated videos shot using consumer grade equipment do not contain this information; hence, there is a need to temporally synchronize signals using the visual information itself. Previous work in this area has either assumed good quality data with relatively simple dynamic content or the availability of precise camera geometry. Our first contribution is a synchronization technique which tries to establish correspondence between feature trajectories across views in a novel way, and specifically targets the kind of complex content found in consumer generated sports recordings, without assuming precise knowledge of fundamental matrices or homographies. We evaluate performance using a number of real video recordings and show that our method is able to synchronize to within 1 sec, which is significantly better than previous approaches. Our second contribution is a robust and unsupervised view-invariant activity recognition descriptor that exploits recurrence plot theory on spatial tiles. The descriptor is individually shown to better characterize the activities from different views under occlusions than state-of-the-art approaches. We combine this descriptor with our proposed synchronization method and show that it can further refine the synchronization index. © 2013 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wooden railway sleeper inspections in Sweden are currently performed manually by a human operator; such inspections are based on visual analysis. Machine vision based approach has been done to emulate the visual abilities of human operator to enable automation of the process. Through this process bad sleepers are identified, and a spot is marked on it with specific color (blue in the current case) on the rail so that the maintenance operators are able to identify the spot and replace the sleeper. The motive of this thesis is to help the operators to identify those sleepers which are marked by color (spots), using an “Intelligent Vehicle” which is capable of running on the track. Capturing video while running on the track and segmenting the object of interest (spot) through this vehicle; we can automate this work and minimize the human intuitions. The video acquisition process depends on camera position and source light to obtain fine brightness in acquisition, we have tested 4 different types of combinations (camera position and source light) here to record the video and test the validity of proposed method. A sequence of real time rail frames are extracted from these videos and further processing (depending upon the data acquisition process) is done to identify the spots. After identification of spot each frame is divided in to 9 regions to know the particular region where the spot lies to avoid overlapping with noise, and so on. The proposed method will generate the information regarding in which region the spot lies, based on nine regions in each frame. From the generated results we have made some classification regarding data collection techniques, efficiency, time and speed. In this report, extensive experiments using image sequences from particular camera are reported and the experiments were done using intelligent vehicle as well as test vehicle and the results shows that we have achieved 95% success in identifying the spots when we use video as it is, in other method were we can skip some frames in pre-processing to increase the speed of video but the segmentation results we reduced to 85% and the time was very less compared to previous one. This shows the validity of proposed method in identification of spots lying on wooden railway sleepers where we can compromise between time and efficiency to get the desired result.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Studies of the effect of ethanol on human visual evoked potentials are rare and usually involve chronic alcoholic patients. The effect of acute ethanol ingestion has seldom been investigated. We have studied the effect of acute alcoholic poisoning on pattern-reversal visual evoked potentials (PR-VEP) and flash light visual evoked potentials (F-VEP) in 20 normal volunteers. We observed different effects with ethanol: statistically significant prolonged latencies of F-VEP after ingestion, and no significant differences in the latencies of the PR-VEP components. We hypothesize a selective ethanol effect on the afferent transmission of rods, mainly dependent on GABA and glutamatergic neurotransmission, influencing F-VEP latencies, and no effect on cone afferent transmission, as alcohol doesn't influence PR-VEP latencies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Individual Video Training iVT and Annotating Academic Videos AAV: two complementing technologies 1. Recording communication skills training sessions and reviewing them by oneself, with peers, and with tutors has become standard in medical education. Increasing numbers of students paired with restrictions of financial and human resources create a big obstacle to this important teaching method. 2. Everybody who wants to increase efficiency and effectiveness of communication training can get new ideas from our technical solution. 3. Our goal was to increase the effectiveness of communication skills training by supporting self, peer and tutor assessment over the Internet. Two technologies of SWITCH, the national foundation to support IT solutions for Swiss universities, came handy for our project. The first is the authentication and authorization infrastructure providing all Swiss students with a nationwide single login. The second is SWITCHcast which allows automated recording, upload and publication of videos in the Internet. Students start the recording system by entering their single login. This automatically links the video with their password. Within a few hours, they find their video password protected on the Internet. They now can give access to peers and tutors. Additionally, an annotation interface was developed. This software has free text as well as checklist annotations capabilities. Tutors as well as students can create checklists. Tutor’s checklists are not editable by students. Annotations are linked to tracks. Tracks can be private or public. Public means visible to all who have access to the video. Annotation data can be exported for statistical evaluation. 4. The system was well received by students and tutors. Big numbers of videos were processed simultaneously without any problems. 5. iVT http://www.switch.ch/aaa/projects/detail/UNIBE.7 AAV http://www.switch.ch/aaa/projects/detail/ETHZ.9