155 resultados para action recognition
em Cambridge University Engineering Department Publications Database
Resumo:
Conventional Hidden Markov models generally consist of a Markov chain observed through a linear map corrupted by additive noise. This general class of model has enjoyed a huge and diverse range of applications, for example, speech processing, biomedical signal processing and more recently quantitative finance. However, a lesser known extension of this general class of model is the so-called Factorial Hidden Markov Model (FHMM). FHMMs also have diverse applications, notably in machine learning, artificial intelligence and speech recognition [13, 17]. FHMMs extend the usual class of HMMs, by supposing the partially observed state process is a finite collection of distinct Markov chains, either statistically independent or dependent. There is also considerable current activity in applying collections of partially observed Markov chains to complex action recognition problems, see, for example, [6]. In this article we consider the Maximum Likelihood (ML) parameter estimation problem for FHMMs. Much of the extant literature concerning this problem presents parameter estimation schemes based on full data log-likelihood EM algorithms. This approach can be slow to converge and often imposes heavy demands on computer memory. The latter point is particularly relevant for the class of FHMMs where state space dimensions are relatively large. The contribution in this article is to develop new recursive formulae for a filter-based EM algorithm that can be implemented online. Our new formulae are equivalent ML estimators, however, these formulae are purely recursive and so, significantly reduce numerical complexity and memory requirements. A computer simulation is included to demonstrate the performance of our results. © Taylor & Francis Group, LLC.
Resumo:
This work addresses the challenging problem of unconstrained 3D human pose estimation (HPE) from a novel perspective. Existing approaches struggle to operate in realistic applications, mainly due to their scene-dependent priors, such as background segmentation and multi-camera network, which restrict their use in unconstrained environments. We therfore present a framework which applies action detection and 2D pose estimation techniques to infer 3D poses in an unconstrained video. Action detection offers spatiotemporal priors to 3D human pose estimation by both recognising and localising actions in space-time. Instead of holistic features, e.g. silhouettes, we leverage the flexibility of deformable part model to detect 2D body parts as a feature to estimate 3D poses. A new unconstrained pose dataset has been collected to justify the feasibility of our method, which demonstrated promising results, significantly outperforming the relevant state-of-the-arts. © 2013 IEEE.
Resumo:
The Chinese language is based on characters which are syllabic in nature. Since languages have syllabotactic rules which govern the construction of syllables and their allowed sequences, Chinese character sequence models can be used as a first level approximation of allowed syllable sequences. N-gram character sequence models were trained on 4.3 billion characters. Characters are used as a first level recognition unit with multiple pronunciations per character. For comparison the CU-HTK Mandarin word based system was used to recognize words which were then converted to character sequences. The character only system error rates for one best recognition were slightly worse than word based character recognition. However combining the two systems using log-linear combination gives better results than either system separately. An equally weighted combination gave consistent CER gains of 0.1-0.2% absolute over the word based standard system. Copyright © 2009 ISCA.