789 resultados para Video annotation
Resumo:
Image annotation is a significant step towards semantic based image retrieval. Ontology is a popular approach for semantic representation and has been intensively studied for multimedia analysis. However, relations among concepts are seldom used to extract higher-level semantics. Moreover, the ontology inference is often crisp. This paper aims to enable sophisticated semantic querying of images, and thus contributes to 1) an ontology framework to contain both visual and contextual knowledge, and 2) a probabilistic inference approach to reason the high-level concepts based on different sources of information. The experiment on a natural scene database from LabelMe database shows encouraging results.
Resumo:
To date, automatic recognition of semantic information such as salient objects and mid-level concepts from images is a challenging task. Since real-world objects tend to exist in a context within their environment, the computer vision researchers have increasingly incorporated contextual information for improving object recognition. In this paper, we present a method to build a visual contextual ontology from salient objects descriptions for image annotation. The ontologies include not only partOf/kindOf relations, but also spatial and co-occurrence relations. A two-step image annotation algorithm is also proposed based on ontology relations and probabilistic inference. Different from most of the existing work, we specially exploit how to combine representation of ontology, contextual knowledge and probabilistic inference. The experiments show that image annotation results are improved in the LabelMe dataset.
Resumo:
This chapter outlines examples of classroom activities that aim to make connections between young people’s everyday experiences with video games and the formal high school curriculum. These classroom activities were developed within the emerging field of digital media literacy. Digital media literacy combines elements of ‘traditional’ approaches to media education with elements of technology and information education (Buckingham, 2007; Warschauer, 2006). It is an educational field that has gained significant attention in recent years. For example, digital media literacy has become a significant objective for media policy makers in response to the increased social and cultural roles of new media technologies and controversies associated with young people’s largely unregulated online participation. Media regulators, educational institutions and independent organizations1 in the United States, Canada, the United Kingdom and Australia have developed digital media literacy initiatives that aim to provide advice to parents, teachers and young people.
Resumo:
Surveillance networks are typically monitored by a few people, viewing several monitors displaying the camera feeds. It is then very difficult for a human operator to effectively detect events as they happen. Recently, computer vision research has begun to address ways to automatically process some of this data, to assist human operators. Object tracking, event recognition, crowd analysis and human identification at a distance are being pursued as a means to aid human operators and improve the security of areas such as transport hubs. The task of object tracking is key to the effective use of more advanced technologies. To recognize an event people and objects must be tracked. Tracking also enhances the performance of tasks such as crowd analysis or human identification. Before an object can be tracked, it must be detected. Motion segmentation techniques, widely employed in tracking systems, produce a binary image in which objects can be located. However, these techniques are prone to errors caused by shadows and lighting changes. Detection routines often fail, either due to erroneous motion caused by noise and lighting effects, or due to the detection routines being unable to split occluded regions into their component objects. Particle filters can be used as a self contained tracking system, and make it unnecessary for the task of detection to be carried out separately except for an initial (often manual) detection to initialise the filter. Particle filters use one or more extracted features to evaluate the likelihood of an object existing at a given point each frame. Such systems however do not easily allow for multiple objects to be tracked robustly, and do not explicitly maintain the identity of tracked objects. This dissertation investigates improvements to the performance of object tracking algorithms through improved motion segmentation and the use of a particle filter. A novel hybrid motion segmentation / optical flow algorithm, capable of simultaneously extracting multiple layers of foreground and optical flow in surveillance video frames is proposed. The algorithm is shown to perform well in the presence of adverse lighting conditions, and the optical flow is capable of extracting a moving object. The proposed algorithm is integrated within a tracking system and evaluated using the ETISEO (Evaluation du Traitement et de lInterpretation de Sequences vidEO - Evaluation for video understanding) database, and significant improvement in detection and tracking performance is demonstrated when compared to a baseline system. A Scalable Condensation Filter (SCF), a particle filter designed to work within an existing tracking system, is also developed. The creation and deletion of modes and maintenance of identity is handled by the underlying tracking system; and the tracking system is able to benefit from the improved performance in uncertain conditions arising from occlusion and noise provided by a particle filter. The system is evaluated using the ETISEO database. The dissertation then investigates fusion schemes for multi-spectral tracking systems. Four fusion schemes for combining a thermal and visual colour modality are evaluated using the OTCBVS (Object Tracking and Classification in and Beyond the Visible Spectrum) database. It is shown that a middle fusion scheme yields the best results and demonstrates a significant improvement in performance when compared to a system using either mode individually. Findings from the thesis contribute to improve the performance of semi-automated video processing and therefore improve security in areas under surveillance.
Resumo:
This paper examines a sequence of asynchronous interaction on the photosharing website, Flickr. In responding to a call for a focus on the performative aspects of online annotation (Wolff & Neuwirth, 2001), we outline and apply an interaction order approach to identify temporal and cultural aspects of the setting that provide for commonality and sharing. In particular, we study the interaction as a feature of a synthetic situation (Knorr Cetina, 2009) focusing on the requirements of maintaining a sense of an ongoing discussion online. Our analysis suggests that the rhetorical system of the Flickr environment, its appropriation by participants as a context for bounded activities, and displays of commonality, affiliation, and shared access provide for a common sense of participation in a time envelope. This, in turn, is argued to be central to new processes of consociation (Schutz, 1967; Zhao, 2004) occurring in the life world of Web 2.0 environments.
Resumo:
In this study, the authors propose a novel video stabilisation algorithm for mobile platforms with moving objects in the scene. The quality of videos obtained from mobile platforms, such as unmanned airborne vehicles, suffers from jitter caused by several factors. In order to remove this undesired jitter, the accurate estimation of global motion is essential. However it is difficult to estimate global motions accurately from mobile platforms due to increased estimation errors and noises. Additionally, large moving objects in the video scenes contribute to the estimation errors. Currently, only very few motion estimation algorithms have been developed for video scenes collected from mobile platforms, and this paper shows that these algorithms fail when there are large moving objects in the scene. In this study, a theoretical proof is provided which demonstrates that the use of delta optical flow can improve the robustness of video stabilisation in the presence of large moving objects in the scene. The authors also propose to use sorted arrays of local motions and the selection of feature points to separate outliers from inliers. The proposed algorithm is tested over six video sequences, collected from one fixed platform, four mobile platforms and one synthetic video, of which three contain large moving objects. Experiments show our proposed algorithm performs well to all these video sequences.
Resumo:
The trans-locative potential of the Internet has driven the design of many online applications. Online communities largely cluster around topics of interest, which take precedence over participants’ geographical locations. The site of production is often disregarded when creative content appears online. However, for some, a sense of place is a defining aspect of creativity. Yet environments that focus on the display and sharing of regionally situated content have, so far, been largely overlooked. Recent developments in geo-technologies have precipitated the emergence of a new field of interactive media. Entitled locative media, it emphasizes the geographical context of media. This paper argues that we might combine practices of locative media (experiential mapping and geo-spatial annotation) with aspects of online participatory culture (uploading, file-sharing and search categorization) to produce online applications that support geographically ‘located’ communities. It discusses the design considerations and possibilities of this convergence,making reference to an example, OurPlace 3G to 3D, which has to date been developed as a prototype.1 It goes on to discuss the benefits and potential uses of such convergent applications, including the co-production of spatial- emporal narratives of place.
Resumo:
Scalable video coding of H.264/AVC standard enables adaptive and flexible delivery for multiple devices and various network conditions. Only a few works have addressed the influence of different scalability parameters (frame rate, spatial resolution, and SNR) on the user perceived quality within a limited scope. In this paper, we have conducted an experiment of subjective quality assessment for video sequences encoded with H.264/SVC to gain a better understanding of the correlation between video content and UPQ at all scalable layers and the impact of rate-distortion method and different scalabilities on bitrate and UPQ. Findings from this experiment will contribute to a user-centered design of adaptive delivery of scalable video stream.
Resumo:
Identifying an individual from surveillance video is a difficult, time consuming and labour intensive process. The proposed system aims to streamline this process by filtering out unwanted scenes and enhancing an individual's face through super-resolution. An automatic face recognition system is then used to identify the subject or present the human operator with likely matches from a database. A person tracker is used to speed up the subject detection and super-resolution process by tracking moving subjects and cropping a region of interest around the subject's face to reduce the number and size of the image frames to be super-resolved respectively. In this paper, experiments have been conducted to demonstrate how the optical flow super-resolution method used improves surveillance imagery for visual inspection as well as automatic face recognition on an Eigenface and Elastic Bunch Graph Matching system. The optical flow based method has also been benchmarked against the ``hallucination'' algorithm, interpolation methods and the original low-resolution images. Results show that both super-resolution algorithms improved recognition rates significantly. Although the hallucination method resulted in slightly higher recognition rates, the optical flow method produced less artifacts and more visually correct images suitable for human consumption.
Resumo:
Introduction. Ideally after selective thoracic fusion for Lenke Class IC (i.e. major thoracic / secondary lumbar) curves, the lumbar spine will spontaneously accommodate to the corrected position of the thoracic curve, thereby achieving a balanced spine, avoiding the need for fusion of lumbar spinal segments1. The purpose of this study was to evaluate the behaviour of the lumbar curve in Lenke IC class adolescent idiopathic scoliosis (AIS) following video-assisted thoracoscopic spinal fusion and instrumentation (VATS) of the major thoracic curve. Methods. A retrospective review of 22 consecutive patients with AIS who underwent VATS by a single surgeon was conducted. The results were compared to published literature examining the behaviour of the secondary lumbar curve where other surgical approaches were employed. Results. Twenty-two patients (all female) with AIS underwent VATS. All major thoracic curves were right convex. The average age at surgery was 14 years (range 10 to 22 years). On average 6.7 levels (6 to 8) were instrumented. The mean follow-up was 25.1 months (6 to 36). The pre-operative major thoracic Cobb angle mean was 53.8° (40° to 75°). The pre-operative secondary lumbar Cobb angle mean was 43.9° (34° to 55°). On bending radiographs, the secondary curve corrected to 11.3° (0° to 35°). The rib hump mean measurement was 15.0° (7° to 21°). At latest follow-up the major thoracic Cobb angle measured on average 27.2° (20° to 41°) (p<0.001 – univariate ANOVA) and the mean secondary lumbar curve was 27.3° (15° to 42°) (p<0.001). This represented an uninstrumented secondary curve correction factor of 37.8%. The mean rib hump measured was 6.5° (2° to 15°) (p<0.001). The results above were comparable to published series when open surgery was performed. Discussion. VATS is an effective method of correcting major thoracic curves with secondary lumbar curves. The behaviour of the secondary lumbar curve is consistent with published series when open surgery, both anterior and posterior, is performed.
Resumo:
One of the primary treatment goals of adolescent idiopathic scoliosis (AIS) surgery is to achieve maximum coronal plane correction while maintaining coronal balance. However maintaining or restoring sagittal plane spinal curvature has become increasingly important in maintaining the long-term health of the spine. Patients with AIS are characterised by pre-operative thoracic hypokyphosis, and it is generally agreed that operative treatment of thoracic idiopathic scoliosis should aim to restore thoracic kyphosis to normal values while maintaining lumbar lordosis and good overall sagittal balance. The aim of this study was to evaluate CT sagittal plane parameters, with particular emphasis on thoracolumbar junctional alignment, in patients with AIS who underwent Video Assisted Thoracoscopic Spinal Fusion and Instrumentation (VATS). This study concluded that video-assisted thoracoscopic spinal fusion and instrumentation reliably increases thoracic kyphosis while preserving junctional alignment and lumbar lordosis in thoracic AIS.
Resumo:
In public venues, crowd size is a key indicator of crowd safety and stability. In this paper we propose a crowd counting algorithm that uses tracking and local features to count the number of people in each group as represented by a foreground blob segment, so that the total crowd estimate is the sum of the group sizes. Tracking is employed to improve the robustness of the estimate, by analysing the history of each group, including splitting and merging events. A simplified ground truth annotation strategy results in an approach with minimal setup requirements that is highly accurate.