892 resultados para video data
Resumo:
Action recognition plays an important role in various applications, including smart homes and personal assistive robotics. In this paper, we propose an algorithm for recognizing human actions using motion capture action data. Motion capture data provides accurate three dimensional positions of joints which constitute the human skeleton. We model the movement of the skeletal joints temporally in order to classify the action. The skeleton in each frame of an action sequence is represented as a 129 dimensional vector, of which each component is a 31) angle made by each joint with a fixed point on the skeleton. Finally, the video is represented as a histogram over a codebook obtained from all action sequences. Along with this, the temporal variance of the skeletal joints is used as additional feature. The actions are classified using Meta-Cognitive Radial Basis Function Network (McRBFN) and its Projection Based Learning (PBL) algorithm. We achieve over 97% recognition accuracy on the widely used Berkeley Multimodal Human Action Database (MHAD).
Resumo:
Without knowledge of basic seafloor characteristics, the ability to address any number of critical marine and/or coastal management issues is diminished. For example, management and conservation of essential fish habitat (EFH), a requirement mandated by federally guided fishery management plans (FMPs), requires among other things a description of habitats for federally managed species. Although the list of attributes important to habitat are numerous, the ability to efficiently and effectively describe many, and especially at the scales required, does not exist with the tools currently available. However, several characteristics of seafloor morphology are readily obtainable at multiple scales and can serve as useful descriptors of habitat. Recent advancements in acoustic technology, such as multibeam echosounding (MBES), can provide remote indication of surficial sediment properties such as texture, hardness, or roughness, and further permit highly detailed renderings of seafloor morphology. With acoustic-based surveys providing a relatively efficient method for data acquisition, there exists a need for efficient and reproducible automated segmentation routines to process the data. Using MBES data collected by the Olympic Coast National Marine Sanctuary (OCNMS), and through a contracted seafloor survey, we expanded on the techniques of Cutter et al. (2003) to describe an objective repeatable process that uses parameterized local Fourier histogram (LFH) texture features to automate segmentation of surficial sediments from acoustic imagery using a maximum likelihood decision rule. Sonar signatures and classification performance were evaluated using video imagery obtained from a towed camera sled. Segmented raster images were converted to polygon features and attributed using a hierarchical deep-water marine benthic classification scheme (Greene et al. 1999) for use in a geographical information system (GIS). (PDF contains 41 pages.)
Resumo:
For sign languages used by deaf communities, linguistic corpora have until recently been unavailable, due to the lack of a writing system and a written culture in these communities, and the very recent advent of digital video. Recent improvements in video and computer technology have now made larger sign language datasets possible; however, large sign language datasets that are fully machine-readable are still elusive. This is due to two challenges. 1. Inconsistencies that arise when signs are annotated by means of spoken/written language. 2. The fact that many parts of signed interaction are not necessarily fully composed of lexical signs (equivalent of words), instead consisting of constructions that are less conventionalised. As sign language corpus building progresses, the potential for some standards in annotation is beginning to emerge. But before this project, there were no attempts to standardise these practices across corpora, which is required to be able to compare data crosslinguistically. This project thus had the following aims: 1. To develop annotation standards for glosses (lexical/word level) 2. To test their reliability and validity 3. To improve current software tools that facilitate a reliable workflow Overall the project aimed not only to set a standard for the whole field of sign language studies throughout the world but also to make significant advances toward two of the world’s largest machine-readable datasets for sign languages – specifically the BSL Corpus (British Sign Language, http://bslcorpusproject.org) and the Corpus NGT (Sign Language of the Netherlands, http://www.ru.nl/corpusngt).
Resumo:
The Olympic Coast National Marine Sanctuary (OCNMS) continues to invest significant resources into seafloor mapping activities along Washington’s outer coast (Intelmann and Cochrane 2006; Intelmann et al. 2006; Intelmann 2006). Results from these annual mapping efforts offer a snapshot of current ground conditions, help to guide research and management activities, and provide a baseline for assessing the impacts of various threats to important habitat. During the months of August 2004 and May and July 2005, we used side scan sonar to image several regions of the sea floor in the northern OCNMS, and the data were mosaicked at 1-meter pixel resolution. Video from a towed camera sled, bathymetry data, sedimentary samples and side scan sonar mapping were integrated to describe geological and biological aspects of habitat. Polygon features were created and attributed with a hierarchical deep-water marine benthic classification scheme (Greene et al. 1999). For three small areas that were mapped with both side scan sonar and multibeam echosounder, we made a comparison of output from the classified images indicating little difference in results between the two methods. With these considerations, backscatter derived from multibeam bathymetry is currently a costefficient and safe method for seabed imaging in the shallow (<30 meters) rocky waters of OCNMS. The image quality is sufficient for classification purposes, the associated depths provide further descriptive value and risks to gear are minimized. In shallow waters (<30 meters) which do not have a high incidence of dangerous rock pinnacles, a towed multi-beam side scan sonar could provide a better option for obtaining seafloor imagery due to the high rate of acquisition speed and high image quality, however the high probability of losing or damaging such a costly system when deployed as a towed configuration in the extremely rugose nearshore zones within OCNMS is a financially risky proposition. The development of newer technologies such as intereferometric multibeam systems and bathymetric side scan systems could also provide great potential for mapping these nearshore rocky areas as they allow for high speed data acquisition, produce precisely geo-referenced side scan imagery to bathymetry, and do not experience the angular depth dependency associated with multibeam echosounders allowing larger range scales to be used in shallower water. As such, further investigation of these systems is needed to assess their efficiency and utility in these environments compared to traditional side scan sonar and multibeam bathymetry. (PDF contains 43 pages.)
Resumo:
In September 2002, side scan sonar was used to image a portion of the sea floor in the northern OCNMS and was mosaiced at 1-meter pixel resolution using 100 kHz data collected at 300-meter range scale. Video from a remotely-operated vehicle (ROV), bathymetry data, sedimentary samples, and sonar mapping have been integrated to describe geological and biological aspects of habitat and polygon features have been created and attributed with a hierarchical deep-water marine benthic classification scheme (Greene et al. 1999). The data can be used with geographic information system (GIS) software for display, query, and analysis. Textural analysis of the sonar images provided a relatively automated method for delineating substrate into three broad classes representing soft, mixed sediment, and hard bottom. Microhabitat and presence of certain biologic attributes were also populated into the polygon features, but strictly limited to areas where video groundtruthing occurred. Further groundtruthing work in specific areas would improve confidence in the classified habitat map. (PDF contains 22 pages.)
Resumo:
There is a clear need to develop fisheries independent methods to quantify individual sizes, density, and three dimensional characteristics of reef fish spawning aggregations for use in population assessments and to provide critical baseline data on reproductive life history of exploited populations. We designed, constructed, calibrated, and applied an underwater stereo-video system to estimate individual sizes and three dimensional (3D) positions of Nassau grouper (Epinephelus striatus) at a spawning aggregation site located on a reef promontory on the western edge of Little Cayman Island, Cayman Islands, BWI, on 23 January 2003. The system consists of two free-running camcorders mounted on a meter-long bar and supported by a SCUBA diver. Paired video “stills” were captured, and nose and tail of individual fish observed in the field of view of both cameras were digitized using image analysis software. Conversion of these two dimensional screen coordinates to 3D coordinates was achieved through a matrix inversion algorithm and calibration data. Our estimate of mean total length (58.5 cm, n = 29) was in close agreement with estimated lengths from a hydroacoustic survey and from direct measures of fish size using visual census techniques. We discovered a possible bias in length measures using the video method, most likely arising from some fish orientations that were not perpendicular with respect to the optical axis of the camera system. We observed 40 individuals occupying a volume of 33.3 m3, resulting in a concentration of 1.2 individuals m–3 with a mean (SD) nearest neighbor distance of 70.0 (29.7) cm. We promote the use of roving diver stereo-videography as a method to assess the size distribution, density, and 3D spatial structure of fish spawning aggregations.
Resumo:
With the use of a baited stereo-video camera system, this study semiquantitatively defined the habitat associations of 4 species of Lutjanidae: Opakapaka (Pristipomoides filamentosus), Kalekale (P. sieboldii), Onaga (Etelis coruscans), and Ehu (E. carbunculus). Fish abundance and length data from 6 locations in the main Hawaiian Islands were evaluated for species-specific and size-specific differences between regions and habitat types. Multibeam bathymetry and backscatter were used to classify habitats into 4 types on the basis of substrate (hard or soft) and slope (high or low). Depth was a major influence on bottomfish distributions. Opakapaka occurred at depths shallower than the depths at which other species were observed, and this species showed an ontogenetic shift to deeper water with increasing size. Opakapaka and Ehu had an overall preference for hard substrate with low slope (hard-low), and Onaga was found over both hard-low and hard-high habitats. No significant habitat preferences were recorded for Kalekale. Opakapaka, Kalekale, and Onaga exhibited size-related shifts with habitat type. A move into hard-high environments with increasing size was evident for Opakapaka and Kalekale. Onaga was seen predominantly in hard-low habitats at smaller sizes and in either hard-low or hard-high at larger sizes. These ontogenetic habitat shifts could be driven by reproductive triggers because they roughly coincided with the length at sexual maturity of each species. However, further studies are required to determine causality. No ontogenetic shifts were seen for Ehu, but only a limited number of juveniles were observed. Regional variations in abundance and length were also found and could be related to fishing pressure or large-scale habitat features.
Resumo:
In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature. © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
After earthquakes, licensed inspectors use the established codes to assess the impact of damage on structural elements. It always takes them days to weeks. However, emergency responders (e.g. firefighters) must act within hours of a disaster event to enter damaged structures to save lives, and therefore cannot wait till an official assessment completes. This is a risk that firefighters have to take. Although Search and Rescue Organizations offer training seminars to familiarize firefighters with structural damage assessment, its effectiveness is hard to guarantee when firefighters perform life rescue and damage assessment operations together. Also, the training is not available to every firefighter. The authors therefore proposed a novel framework that can provide firefighters with a quick but crude assessment of damaged buildings through evaluating the visible damage on their critical structural elements (i.e. concrete columns in the study). This paper presents the first step of the framework. It aims to automate the detection of concrete columns from visual data. To achieve this, the typical shape of columns (long vertical lines) is recognized using edge detection and the Hough transform. The bounding rectangle for each pair of long vertical lines is then formed. When the resulting rectangle resembles a column and the material contained in the region of two long vertical lines is recognized as concrete, the region is marked as a concrete column surface. Real video/image data are used to test the method. The preliminary results indicate that concrete columns can be detected when they are not distant and have at least one surface visible.
Resumo:
The automated detection of structural elements (e.g. concrete columns) in visual data is useful in many construction and maintenance applications. The research in this area is under initial investigation. The authors previously presented a concrete column detection method that utilized boundary and color information as detection cues. However, the method is sensitive to parameter selection, which reduces its ability to robustly detect concrete columns in live videos. Compared against the previous method, the new method presented in this paper reduces the reliance of parameter settings mainly in three aspects. First, edges are located using color information. Secondly, the orientation information of edge points is considered in constructing column boundaries. Thirdly, an artificial neural network for concrete material classification is developed to replace concrete sample matching. The method is tested using live videos, and results are compared with the results obtained with the previous method to demonstrate the new method improvements.
A Videogrammetric As-Built Data Collection Method for Digital Fabrication of Sheet Metal Roof Panels
Resumo:
A roofing contractor typically needs to acquire as-built dimensions of a roof structure several times over the course of its build to be able to digitally fabricate sheet metal roof panels. Obtaining these measurements using the exiting roof surveying methods could be costly in terms of equipment, labor, and/or worker exposure to safety hazards. This paper presents a video-based surveying technology as an alternative method which is simple to use, automated, less expensive, and safe. When using this method, the contractor collects video streams with a calibrated stereo camera set. Unique visual characteristics of scenes from a roof structure are then used in the processing step to automatically extract as-built dimensions of roof planes. These dimensions are finally represented in a XML format to be loaded into sheet metal folding and cutting machines. The proposed method has been tested for a roofing project and the preliminary results indicate its capabilities.
Resumo:
Temporal synchronization of multiple video recordings of the same dynamic event is a critical task in many computer vision applications e.g. novel view synthesis and 3D reconstruction. Typically this information is implied, since recordings are made using the same timebase, or time-stamp information is embedded in the video streams. Recordings using consumer grade equipment do not contain this information; hence, there is a need to temporally synchronize signals using the visual information itself. Previous work in this area has either assumed good quality data with relatively simple dynamic content or the availability of precise camera geometry. In this paper, we propose a technique which exploits feature trajectories across views in a novel way, and specifically targets the kind of complex content found in consumer generated sports recordings, without assuming precise knowledge of fundamental matrices or homographies. Our method automatically selects the moving feature points in the two unsynchronized videos whose 2D trajectories can be best related, thereby helping to infer the synchronization index. We evaluate performance using a number of real recordings and show that synchronization can be achieved to within 1 sec, which is better than previous approaches. Copyright 2013 ACM.
Resumo:
Passive monitoring of large sites typically requires coordination between multiple cameras, which in turn requires methods for automatically relating events between distributed cameras. This paper tackles the problem of self-calibration of multiple cameras which are very far apart, using feature correspondences to determine the camera geometry. The key problem is finding such correspondences. Since the camera geometry and photometric characteristics vary considerably between images, one cannot use brightness and/or proximity constraints. Instead we apply planar geometric constraints to moving objects in the scene in order to align the scene"s ground plane across multiple views. We do not assume synchronized cameras, and we show that enforcing geometric constraints enables us to align the tracking data in time. Once we have recovered the homography which aligns the planar structure in the scene, we can compute from the homography matrix the 3D position of the plane and the relative camera positions. This in turn enables us to recover a homography matrix which maps the images to an overhead view. We demonstrate this technique in two settings: a controlled lab setting where we test the effects of errors in internal camera calibration, and an uncontrolled, outdoor setting in which the full procedure is applied to external camera calibration and ground plane recovery. In spite of noise in the internal camera parameters and image data, the system successfully recovers both planar structure and relative camera positions in both settings.
Resumo:
A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.