987 resultados para Face Detection


Relevância:

60.00% 60.00%

Publicador:

Resumo:

自动摄像跟踪系统,是一种通过传感器检测或者数字图像处理的方法,控制摄像机自动的对运动中的人物或特定物体实施跟踪拍摄的系统,融合了计算机网络通信、计算机视觉、传感器网络等多个领域的技术,包含一系列软件和硬件设备,可以广泛运用在安全保卫视频监控,会议摄像,课堂和讲座摄像,表演摄像等场景,是近年来需求增长较大的领域。 人脸检测是数字图像处理和计算机视觉的一个分支问题,指的是在输入的图像中判断是否有人脸存在,并确定出所有人脸的位置、大小甚至姿势朝向的过程,作为进一步控制和处理的依据。这是计算机视觉方向最热门的技术专题之一,拥有非常广阔的应用空间。 目前还没有成熟的自动跟踪摄像系统应用于安全保卫或一般民用摄像领域,已有的一些类似系统也不能完全达到自动跟踪人物摄像的要求,性能距离全自动化无人干涉的跟踪摄像还有相当差距。 本文旨在运用计算机网络作为图像数据和控制信息媒体,运用数字图像处理领域计算机视觉中人脸检测和人脸跟踪的技术作为自动人物识别的依据,设计人物检测、人物跟踪、摄像机自动跟踪拍摄、人物位置运动估算、简单运动估计等算法策略,研究和开发一套完整的自动摄像跟踪系统软件,支持网络远程访问和控制,在无需人为操作和调整的情况下,实现对人物脸部自动跟踪拍摄的功能。这个系统适用于一般需求的安保视频监控和课堂自动摄像的应用。 本系统相对于已有的性能较好的类似系统——红外视频监控系统有许多明显优势,在达到很好的智能化、实时性、响应速度和准确性的同时,实现了设备的简化,大大降低了成本;同时在现场部署时不需要重新配置,后期维护简便;并且能够实现连续位置跟踪;系统稳定性优良等等。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

人脸检测作为自动人脸识别系统的第一个环节具有非常重要的作用,为了解决目前大部分人脸检测方法存在的分类器训练困难和检测计算量大等问题,提出了一种人脸检测的混合方法。该方法由两级分类器组成,第一级为粗分类器主要过滤大部分非人脸区域,第二级为核心分类器,在由第一级粗分类的基础上利用非线性SVM算法进行人脸检测。在CMU数据库上的实验结果表明,该方法具有较高的人脸检测率,检测速度得到大幅提高。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Object detection can be challenging when the object class exhibits large variations. One commonly-used strategy is to first partition the space of possible object variations and then train separate classifiers for each portion. However, with continuous spaces the partitions tend to be arbitrary since there are no natural boundaries (for example, consider the continuous range of human body poses). In this paper, a new formulation is proposed, where the detectors themselves are associated with continuous parameters, and reside in a parameterized function space. There are two advantages of this strategy. First, a-priori partitioning of the parameter space is not needed; the detectors themselves are in a parameterized space. Second, the underlying parameters for object variations can be learned from training data in an unsupervised manner. In profile face detection experiments, at a fixed false alarm number of 90, our method attains a detection rate of 75% vs. 70% for the method of Viola-Jones. In hand shape detection, at a false positive rate of 0.1%, our method achieves a detection rate of 99.5% vs. 98% for partition based methods. In pedestrian detection, our method reduces the miss detection rate by a factor of three at a false positive rate of 1%, compared with the method of Dalal-Triggs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data registration refers to a series of techniques for matching or bringing similar objects or datasets together into alignment. These techniques enjoy widespread use in a diverse variety of applications, such as video coding, tracking, object and face detection and recognition, surveillance and satellite imaging, medical image analysis and structure from motion. Registration methods are as numerous as their manifold uses, from pixel level and block or feature based methods to Fourier domain methods.

This book is focused on providing algorithms and image and video techniques for registration and quality performance metrics. The authors provide various assessment metrics for measuring registration quality alongside analyses of registration techniques, introducing and explaining both familiar and state-of-the-art registration methodologies used in a variety of targeted applications.

Key features:
- Provides a state-of-the-art review of image and video registration techniques, allowing readers to develop an understanding of how well the techniques perform by using specific quality assessment criteria
- Addresses a range of applications from familiar image and video processing domains to satellite and medical imaging among others, enabling readers to discover novel methodologies with utility in their own research
- Discusses quality evaluation metrics for each application domain with an interdisciplinary approach from different research perspectives

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Face detection and recognition should be complemented by recognition of facial expression, for example for social robots which must react to human emotions. Our framework is based on two multi-scale representations in cortical area V1: keypoints at eyes, nose and mouth are grouped for face detection [1]; lines and edges provide information for face recognition [2].

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data registration refers to a series of techniques for matching or bringing similar objects or datasets together into alignment. These techniques enjoy widespread use in a diverse variety of applications, such as video coding, tracking, object and face detection and recognition, surveillance and satellite imaging, medical image analysis and structure from motion. Registration methods are as numerous as their manifold uses, from pixel level and block or feature based methods to Fourier domain methods. This book is focused on providing algorithms and image and video techniques for registration and quality performance metrics. The authors provide various assessment metrics for measuring registration quality alongside analyses of registration techniques, introducing and explaining both familiar and state–of–the–art registration methodologies used in a variety of targeted applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this text, we present two stereo-based head tracking techniques along with a fast 3D model acquisition system. The first tracking technique is a robust implementation of stereo-based head tracking designed for interactive environments with uncontrolled lighting. We integrate fast face detection and drift reduction algorithms with a gradient-based stereo rigid motion tracking technique. Our system can automatically segment and track a user's head under large rotation and illumination variations. Precision and usability of this approach are compared with previous tracking methods for cursor control and target selection in both desktop and interactive room environments. The second tracking technique is designed to improve the robustness of head pose tracking for fast movements. Our iterative hybrid tracker combines constraints from the ICP (Iterative Closest Point) algorithm and normal flow constraint. This new technique is more precise for small movements and noisy depth than ICP alone, and more robust for large movements than the normal flow constraint alone. We present experiments which test the accuracy of our approach on sequences of real and synthetic stereo images. The 3D model acquisition system we present quickly aligns intensity and depth images, and reconstructs a textured 3D mesh. 3D views are registered with shape alignment based on our iterative hybrid tracker. We reconstruct the 3D model using a new Cubic Ray Projection merging algorithm which takes advantage of a novel data structure: the linked voxel space. We present experiments to test the accuracy of our approach on 3D face modelling using real-time stereo images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes a trainable system capable of tracking faces and facialsfeatures like eyes and nostrils and estimating basic mouth features such as sdegrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A new approach to mammographic mass detection is presented in this paper. Although different algorithms have been proposed for such a task, most of them are application dependent. In contrast, our approach makes use of a kindred topic in computer vision adapted to our particular problem. In this sense, we translate the eigenfaces approach for face detection/classification problems to a mass detection. Two different databases were used to show the robustness of the approach. The first one consisted on a set of 160 regions of interest (RoIs) extracted from the MIAS database, being 40 of them with confirmed masses and the rest normal tissue. The second set of RoIs was extracted from the DDSM database, and contained 196 RoIs containing masses and 392 with normal, but suspicious regions. Initial results demonstrate the feasibility of using such approach with performances comparable to other algorithms, with the advantage of being a more general, simple and cost-effective approach

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Parkinson’s disease (PD) is an increasing neurological disorder in an aging society. The motor and non-motor symptoms of PD advance with the disease progression and occur in varying frequency and duration. In order to affirm the full extent of a patient’s condition, repeated assessments are necessary to adjust medical prescription. In clinical studies, symptoms are assessed using the unified Parkinson’s disease rating scale (UPDRS). On one hand, the subjective rating using UPDRS relies on clinical expertise. On the other hand, it requires the physical presence of patients in clinics which implies high logistical costs. Another limitation of clinical assessment is that the observation in hospital may not accurately represent a patient’s situation at home. For such reasons, the practical frequency of tracking PD symptoms may under-represent the true time scale of PD fluctuations and may result in an overall inaccurate assessment. Current technologies for at-home PD treatment are based on data-driven approaches for which the interpretation and reproduction of results are problematic.  The overall objective of this thesis is to develop and evaluate unobtrusive computer methods for enabling remote monitoring of patients with PD. It investigates first-principle data-driven model based novel signal and image processing techniques for extraction of clinically useful information from audio recordings of speech (in texts read aloud) and video recordings of gait and finger-tapping motor examinations. The aim is to map between PD symptoms severities estimated using novel computer methods and the clinical ratings based on UPDRS part-III (motor examination). A web-based test battery system consisting of self-assessment of symptoms and motor function tests was previously constructed for a touch screen mobile device. A comprehensive speech framework has been developed for this device to analyze text-dependent running speech by: (1) extracting novel signal features that are able to represent PD deficits in each individual component of the speech system, (2) mapping between clinical ratings and feature estimates of speech symptom severity, and (3) classifying between UPDRS part-III severity levels using speech features and statistical machine learning tools. A novel speech processing method called cepstral separation difference showed stronger ability to classify between speech symptom severities as compared to existing features of PD speech. In the case of finger tapping, the recorded videos of rapid finger tapping examination were processed using a novel computer-vision (CV) algorithm that extracts symptom information from video-based tapping signals using motion analysis of the index-finger which incorporates a face detection module for signal calibration. This algorithm was able to discriminate between UPDRS part III severity levels of finger tapping with high classification rates. Further analysis was performed on novel CV based gait features constructed using a standard human model to discriminate between a healthy gait and a Parkinsonian gait. The findings of this study suggest that the symptom severity levels in PD can be discriminated with high accuracies by involving a combination of first-principle (features) and data-driven (classification) approaches. The processing of audio and video recordings on one hand allows remote monitoring of speech, gait and finger-tapping examinations by the clinical staff. On the other hand, the first-principles approach eases the understanding of symptom estimates for clinicians. We have demonstrated that the selected features of speech, gait and finger tapping were able to discriminate between symptom severity levels, as well as, between healthy controls and PD patients with high classification rates. The findings support suitability of these methods to be used as decision support tools in the context of PD assessment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this work is to devise an effective method for static summarization of home video sequences. Based on the premise that the user watching a summary is interested in people related (how many, who, emotional state) or activity related aspects, we formulate a novel approach to video summarization that works to specifically expose relevant video frames that make the content spotting tasks possible. Unlike existing approaches, which work on low-level features which often produce the summary not appealing to the viewer due to the semantic gap between low-level features and high-level concepts, our approach is driven by various utility functions (identity count, identity recognition, emotion recognition, activity recognition, sense of space) that use the results of face detection, face clustering, shot clustering and within cluster frame alignment. The summarization problem is then treated as the problem of extracting the set of key frames that have the maximum combined utility.