101 resultados para human-action recognition

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

How to recognize human action from videos captured by modern cameras efficiently and effectively is a challenge in real applications. Traditional methods which need professional analysts are facing a bottleneck because of their shortcomings. To cope with the disadvantage, methods based on computer vision techniques, without or with only a few human interventions, have been proposed to analyse human actions in videos automatically. This paper provides a method combining the three dimensional Scale Invariant Feature Transform (SIFT) detector and the Latent Dirichlet Allocation (LDA) model for human motion analysis. To represent videos effectively and robustly, we extract the 3D SIFT descriptor around each interest point, which is sampled densely from 3D Space-time video volumes. After obtaining the representation of each video frame, the LDA model is adopted to discover the underlying structure-the categorization of human actions in the collection of videos. Public available standard datasets are used to test our method. The concluding part discusses the research challenges and future directions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human action recognition has been attracted lots of interest from computer vision researchers due to its various promising applications. In this paper, we employ Pyramid Histogram of Orientation Gradient (PHOG) to characterize human figures for action recognition. Comparing to silhouette-based features, the PHOG descriptor does not require extraction of human silhouettes or contours. Two state-space models, i.e.; Hidden Markov Model (HMM) and Conditional Random Field (CRF), are adopted to model the dynamic human movement. The proposed PHOG descriptor and the state-space models with respect to different parameters are tested using a standard dataset. We also testify the robustness of the method with respect to various unconstrained conditions and viewpoints. Promising experimental result demonstrates the effectiveness and robustness of our proposed method.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic human action recognition has been a challenging issue in the field of machine vision. Some high-level features such as SIFT, although with promising performance for action recognition, are computationally complex to some extent. To deal with this problem, we construct the features based on the Distance Transform of body contours, which is relatively simple and computationally efficient, to represent human action in the video. After extracting the features from videos, we adopt the Conditional Random Field for modeling the temporal action sequences. The proposed method is tested with an available standard dataset. We also testify the robustness of our method on various realistic conditions, such as body occlusion or intersection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ability to learn and recognize human activities of daily living (ADLs) is important in building pervasive and smart environments. In this paper, we tackle this problem using the hidden semi-Markov model. We discuss the state-of-the-art duration modeling choices and then address a large class of exponential family distributions to model state durations. Inference and learning are efficiently addressed by providing a graphical representation for the model in terms of a dynamic Bayesian network (DBN). We investigate both discrete and continuous distributions from the exponential family (Poisson and Inverse Gaussian respectively) for the problem of learning and recognizing ADLs. A full comparison between the exponential family duration models and other existing models including the traditional multinomial and the new Coxian are also presented. Our work thus completes a thorough investigation into the aspect of duration modeling and its application to human activities recognition in a real-world smart home surveillance scenario.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmentation of individual actions from a stream of human motion is an open problem in computer vision. This paper approaches the problem of segmenting higher-level activities into their component sub-actions using Hidden Markov Models modified to handle missing data in the observation vector. By controlling the use of missing data, action labels can be inferred from the observation vector during inferencing, thus performing segmentation and classification simultaneously. The approach is able to segment both prominent and subtle actions, even when subtle actions are grouped together. The advantage of this method over sliding windows and Viterbi state sequence interrogation is that segmentation is performed as a trainable task, and the temporal relationship between actions is encoded in the model and used as evidence for action labelling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, most human action recognition systems are trained with feature sets that have no missing data. Unfortunately, the use of human pose estimation models to provide more descriptive features also entails an increased sensitivity to occlusions, meaning that incomplete feature information will be unavoidable for realistic scenarios. To address this, our approach is to shift the responsibility for dealing with occluded pose data away from the pose estimator and onto the action classifier. This allows the use of a simple, real-time pose estimation (stick-figure) that does not estimate the positions of limbs it cannot find quickly. The system tracks people via background subtraction and extracts the (possibly incomplete) pose skeleton from their silhouette. Hidden Markov Models modified to handle missing data are then used to successfully classify several human actions using the incomplete pose features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the application of a hybrid model combining the fuzzy min-max (FMM) neural network and the classification and regression tree (CART) to human activity recognition is presented. The hybrid FMM-CART model capitalizes the merits of both FMM and CART in data classification and rule extraction. To evaluate the effectiveness of FMM-CART, two data sets related to human activity recognition problems are conducted. The results obtained are higher than those reported in the literature. More importantly, practical rules in the form of a decision tree are extracted to provide explanation and justification for the predictions from FMM- CART. This outcome positively indicates the potential of FMM- CART in undertaking human activity recognition tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the integration of missing observation data with hidden Markov models to create a framework that is able to segment and classify individual actions from a stream of human motion using an incomplete 3D human pose estimation. Based on this framework, a model is trained to automatically segment and classify an activity sequence into its constituent subactions during inferencing. This is achieved by introducing action labels into the observation vector and setting these labels as missing data during inferencing, thus forcing the system to infer the probability of each action label. Additionally, missing data provides recognition-level support for occlusions and imperfect silhouette segmentation, permitting the use of a fast (real-time) pose estimation that delegates the burden of handling undetected limbs onto the action recognition system. Findings show that the use of missing data to segment activities is an accurate and elegant approach. Furthermore, action recognition can be accurate even when almost half of the pose feature data is missing due to occlusions, since not all of the pose data is important all of the time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the problem of markerless tracking of a human in full 3D with a high-dimensional (29D) body model Most work in this area has been focused on achieving accurate tracking in order to replace marker-based motion capture, but do so at the cost of relying on relatively clean observing conditions. This paper takes a different perspective, proposing a body-tracking model that is explicitly designed to handle real-world conditions such as occlusions by scene objects, failure recovery, long-term tracking, auto-initialisation, generalisation to different people and integration with action recognition. To achieve these goals, an action's motions are modelled with a variant of the hierarchical hidden Markov model The model is quantitatively evaluated with several tests, including comparison to the annealed particle filter, tracking different people and tracking with a reduced resolution and frame rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a novel method for human activity segmentation and interpretation in surveillance applications based on Gabor filter-bank features. A complex human activity is modeled as a sequence of elementary human actions like walking, running, jogging, boxing, hand-waving etc. Since human silhouette can be modeled by a set of rectangles, the elementary human actions can be modeled as a sequence of a set of rectangles with different orientations and scales. The activity segmentation is based on Gabor filter-bank features and normalized spectral clustering. The feature trajectories of an action category are learnt from training example videos using Dynamic Time Warping. The combined segmentation and the recognition processes are very efficient as both the algorithms share the same framework and Gabor features computed for the former can be used for the later. We have also proposed a simple shadow detection technique to extract good silhouette which is necessary for good accuracy of an action recognition technique. © 2008 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recognizing a class of movements as belonging to a "nominal" action category, such as walking, running, or throwing, is a fundamental human ability. Three experiments were undertaken to test the hypothesis that common ("prototypical") features of moving displays could be learned by observation. Participants viewed moving stick-figure displays resembling forearm flexion movements in the saggital plane. Four displays (presentation displays) were first presented in which one or more movement dimensions were combined with 2 respective cues: direction (up, down), speed (fast, slow), and extent (long, short). Eight test displays were then shown, and the observer indicated whether each test display was like or unlike those previously seen. The results showed that without corrective feedback, a single cue (e.g., up or down) could be correctly recognized, on average, with the proportion correct between .66 and .87. When two cues were manipulated (e.g., up and slow), recognition accuracy remained high, ranging between .72 and .89. Three-cue displays were also easily identified. These results provide the first empirical demonstration of action-prototype learning for categories of human action and show how apparently complex kinematic patterns can be categorized in terms of common features or cues. It was also shown that probability of correct recognition of kinematic properties was reduced when the set of 4 presentation displays were more variable with respect to their shared kinematic property, such as speed or amplitude. Finally, while not conclusive, the results (from 2 of the 3 experiments) did suggest that similarity (or "likeness") with respect to a common kinematic property (or properties) is more easily recognized than dissimilarity.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the last decade, the efforts of spoken language processing have achieved significant advances, however, the work with emotional recognition has not progressed so far, and can only achieve 50% to 60% in accuracy. This is because a majority of researchers in this field have focused on the synthesis of emotional speech rather than focusing on automating human emotion recognition. Many research groups have focused on how to improve the performance of the classifier they used for emotion recognition, and few work has been done on data pre-processing, such as the extraction and selection of a set of specifying acoustic features instead of using all the possible ones they had in hand. To work with well-selected acoustic features does not mean to delay the whole job, but this will save much time and resources by removing the irrelative information and reducing the high-dimension data calculation. In this paper, we developed an automatic feature selector based on a RF2TREE algorithm and the traditional C4.5 algorithm. RF2TREE applied here helped us to solve the problems that did not have enough data examples. The ensemble learning technique was applied to enlarge the original data set by building a bagged random forest to generate many virtual examples, and then the new data set was used to train a single decision tree, which selects the most efficient features to represent the speech signals for the emotion recognition. Finally, the output of the selector was a set of specifying acoustic features, produced by RF2TREE and a single decision tree.