851 resultados para computer vision face recognition detection voice recognition sistemi biometrici iOS
Resumo:
In this paper a novel method for an application of digital image processing, Edge Detection is developed. The contemporary Fuzzy logic, a key concept of artificial intelligence helps to implement the fuzzy relative pixel value algorithms and helps to find and highlight all the edges associated with an image by checking the relative pixel values and thus provides an algorithm to abridge the concepts of digital image processing and artificial intelligence. Exhaustive scanning of an image using the windowing technique takes place which is subjected to a set of fuzzy conditions for the comparison of pixel values with adjacent pixels to check the pixel magnitude gradient in the window. After the testing of fuzzy conditions the appropriate values are allocated to the pixels in the window under testing to provide an image highlighted with all the associated edges.
Resumo:
The seminal multiple view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis methodology. Although seminal, these benchmark datasets are limited in scope with few reference scenes. Here, we try to take these works a step further by proposing a new multi-view stereo dataset, which is an order of magnitude larger in number of scenes and with a significant increase in diversity. Specifically, we propose a dataset containing 80 scenes of large variability. Each scene consists of 49 or 64 accurate camera positions and reference structured light scans, all acquired by a 6-axis industrial robot. To apply this dataset we propose an extension of the evaluation protocol from the Middlebury evaluation, reflecting the more complex geometry of some of our scenes. The proposed dataset is used to evaluate the state of the art multiview stereo algorithms of Tola et al., Campbell et al. and Furukawa et al. Hereby we demonstrate the usability of the dataset as well as gain insight into the workings and challenges of multi-view stereopsis. Through these experiments we empirically validate some of the central hypotheses of multi-view stereopsis, as well as determining and reaffirming some of the central challenges.
Resumo:
In this paper, we use the quantum Jensen-Shannon divergence as a means of measuring the information theoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, the quantum Jensen-Shannon divergence can be used to measure the dissimilarity of quantum systems specified in terms of their density matrices. We commence by computing the density matrix associated with a continuous-time quantum walk over each graph being compared. In particular, we adopt the closed form solution of the density matrix introduced in Rossi et al. (2013) [27,28] to reduce the computational complexity and to avoid the cumbersome task of simulating the quantum walk evolution explicitly. Next, we compare the mixed states represented by the density matrices using the quantum Jensen-Shannon divergence. With the quantum states for a pair of graphs described by their density matrices to hand, the quantum graph kernel between the pair of graphs is defined using the quantum Jensen-Shannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets from both bioinformatics and computer vision. The experimental results demonstrate the effectiveness of the proposed quantum graph kernel.
Resumo:
Graph-based representations have been used with considerable success in computer vision in the abstraction and recognition of object shape and scene structure. Despite this, the methodology available for learning structural representations from sets of training examples is relatively limited. In this paper we take a simple yet effective Bayesian approach to attributed graph learning. We present a naïve node-observation model, where we make the important assumption that the observation of each node and each edge is independent of the others, then we propose an EM-like approach to learn a mixture of these models and a Minimum Message Length criterion for components selection. Moreover, in order to avoid the bias that could arise with a single estimation of the node correspondences, we decide to estimate the sampling probability over all the possible matches. Finally we show the utility of the proposed approach on popular computer vision tasks such as 2D and 3D shape recognition. © 2011 Springer-Verlag.
Resumo:
Given the importance of color processing in computer vision and computer graphics, estimating and rendering illumination spectral reflectance of image scenes is important to advance the capability of a large class of applications such as scene reconstruction, rendering, surface segmentation, object recognition, and reflectance estimation. Consequently, this dissertation proposes effective methods for reflection components separation and rendering in single scene images. Based on the dichromatic reflectance model, a novel decomposition technique, named the Mean-Shift Decomposition (MSD) method, is introduced to separate the specular from diffuse reflectance components. This technique provides a direct access to surface shape information through diffuse shading pixel isolation. More importantly, this process does not require any local color segmentation process, which differs from the traditional methods that operate by aggregating color information along each image plane. ^ Exploiting the merits of the MSD method, a scene illumination rendering technique is designed to estimate the relative contributing specular reflectance attributes of a scene image. The image feature subset targeted provides a direct access to the surface illumination information, while a newly introduced efficient rendering method reshapes the dynamic range distribution of the specular reflectance components over each image color channel. This image enhancement technique renders the scene illumination reflection effectively without altering the scene’s surface diffuse attributes contributing to realistic rendering effects. ^ As an ancillary contribution, an effective color constancy algorithm based on the dichromatic reflectance model was also developed. This algorithm selects image highlights in order to extract the prominent surface reflectance that reproduces the exact illumination chromaticity. This evaluation is presented using a novel voting scheme technique based on histogram analysis. ^ In each of the three main contributions, empirical evaluations were performed on synthetic and real-world image scenes taken from three different color image datasets. The experimental results show over 90% accuracy in illumination estimation contributing to near real world illumination rendering effects. ^
Resumo:
The main objective of this work was to enable the recognition of human gestures through the development of a computer program. The program created captures the gestures executed by the user through a camera attached to the computer and sends it to the robot command referring to the gesture. They were interpreted in total ve gestures made by human hand. The software (developed in C ++) widely used the computer vision concepts and open source library OpenCV that directly impact the overall e ciency of the control of mobile robots. The computer vision concepts take into account the use of lters to smooth/blur the image noise reduction, color space to better suit the developer's desktop as well as useful information for manipulating digital images. The OpenCV library was essential in creating the project because it was possible to use various functions/procedures for complete control lters, image borders, image area, the geometric center of borders, exchange of color spaces, convex hull and convexity defect, plus all the necessary means for the characterization of imaged features. During the development of the software was the appearance of several problems, as false positives (noise), underperforming the insertion of various lters with sizes oversized masks, as well as problems arising from the choice of color space for processing human skin tones. However, after the development of seven versions of the control software, it was possible to minimize the occurrence of false positives due to a better use of lters combined with a well-dimensioned mask size (tested at run time) all associated with a programming logic that has been perfected over the construction of the seven versions. After all the development is managed software that met the established requirements. After the completion of the control software, it was observed that the overall e ectiveness of the various programs, highlighting in particular the V programs: 84.75 %, with VI: 93.00 % and VII with: 94.67 % showed that the nal program performed well in interpreting gestures, proving that it was possible the mobile robot control through human gestures without the need for external accessories to give it a better mobility and cost savings for maintain such a system. The great merit of the program was to assist capacity in demystifying the man set/machine therefore uses an easy and intuitive interface for control of mobile robots. Another important feature observed is that to control the mobile robot is not necessary to be close to the same, as to control the equipment is necessary to receive only the address that the Robotino passes to the program via network or Wi-Fi.
Resumo:
The police use both subjective (i.e. police staff) and automated (e.g. face recognition systems) methods for the completion of visual tasks (e.g person identification). Image quality for police tasks has been defined as the image usefulness, or image suitability of the visual material to satisfy a visual task. It is not necessarily affected by any artefact that may affect the visual image quality (i.e. decrease fidelity), as long as these artefacts do not affect the relevant useful information for the task. The capture of useful information will be affected by the unconstrained conditions commonly encountered by CCTV systems such as variations in illumination and high compression levels. The main aim of this thesis is to investigate aspects of image quality and video compression that may affect the completion of police visual tasks/applications with respect to CCTV imagery. This is accomplished by investigating 3 specific police areas/tasks utilising: 1) the human visual system (HVS) for a face recognition task, 2) automated face recognition systems, and 3) automated human detection systems. These systems (HVS and automated) were assessed with defined scene content properties, and video compression, i.e. H.264/MPEG-4 AVC. The performance of imaging systems/processes (e.g. subjective investigations, performance of compression algorithms) are affected by scene content properties. No other investigation has been identified that takes into consideration scene content properties to the same extend. Results have shown that the HVS is more sensitive to compression effects in comparison to the automated systems. In automated face recognition systems, `mixed lightness' scenes were the most affected and `low lightness' scenes were the least affected by compression. In contrast the HVS for the face recognition task, `low lightness' scenes were the most affected and `medium lightness' scenes the least affected. For the automated human detection systems, `close distance' and `run approach' are some of the most commonly affected scenes. Findings have the potential to broaden the methods used for testing imaging systems for security applications.
Resumo:
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and non-verbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and non-verbal behaviours required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and non-verbal behaviour, since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on non-verbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling etc. We first summarise three prototype versions of the SAL scenario, in which the behaviour of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analysing and synthesising the respective behaviours. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behaviour, dialogue management, and synthesis of speaker and listener behaviour of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse, and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.
Resumo:
This theoretical paper attempts to define some of the key components and challenges required to create embodied conversational agents that can be genuinely interesting conversational partners. Wittgenstein's argument concerning talking lions emphasizes the importance of having a shared common ground as a basis for conversational interactions. Virtual bats suggests that-for some people at least-it is important that there be a feeling of authenticity concerning a subjectively experiencing entity that can convey what it is like to be that entity. Electric sheep reminds us of the importance of empathy in human conversational interaction and that we should provide a full communicative repertoire of both verbal and non-verbal components if we are to create genuinely engaging interactions. Also we may be making the task more difficult rather than easy if we leave out non-verbal aspects of communication. Finally, analogical peacocks highlights the importance of between minds alignment and establishes a longer term goal of being interesting, creative, and humorous if an embodied conversational agent is to be truly an engaging conversational partner. Some potential directions and solutions to addressing these issues are suggested.
Resumo:
Objective
Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism.
Method
The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model.
Result
Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video.
Conclusion
This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.
Resumo:
[EN]We investigate mechanisms which can endow the computer with the ability of describing a human face by means of computer vision techniques. This is a necessary requirement in order to develop HCI approaches which make the user feel himself/herself perceived. This paper describes our experiences considering gender, race and the presence of moustache and glasses. This is accomplished comparing, on a set of 6000 facial images, two di erent face representation approaches: Principal Components Analysis (PCA) and Gabor lters. The results achieved using a Support Vector Machine (SVM) based classi er are promising and particularly better for the second representation approach.
Resumo:
AIRES, Kelson R. T. ; ARAÚJO, Hélder J. ; MEDEIROS, Adelardo A. D. . Plane Detection from Monocular Image Sequences. In: VISUALIZATION, IMAGING AND IMAGE PROCESSING, 2008, Palma de Mallorca, Spain. Proceedings..., Palma de Mallorca: VIIP, 2008