851 resultados para computer vision face recognition detection voice recognition sistemi biometrici iOS
Resumo:
Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.
Resumo:
Lors d'une intervention conversationnelle, le langage est supporté par une communication non-verbale qui joue un rôle central dans le comportement social humain en permettant de la rétroaction et en gérant la synchronisation, appuyant ainsi le contenu et la signification du discours. En effet, 55% du message est véhiculé par les expressions faciales, alors que seulement 7% est dû au message linguistique et 38% au paralangage. L'information concernant l'état émotionnel d'une personne est généralement inférée par les attributs faciaux. Cependant, on ne dispose pas vraiment d'instruments de mesure spécifiquement dédiés à ce type de comportements. En vision par ordinateur, on s'intéresse davantage au développement de systèmes d'analyse automatique des expressions faciales prototypiques pour les applications d'interaction homme-machine, d'analyse de vidéos de réunions, de sécurité, et même pour des applications cliniques. Dans la présente recherche, pour appréhender de tels indicateurs observables, nous essayons d'implanter un système capable de construire une source consistante et relativement exhaustive d'informations visuelles, lequel sera capable de distinguer sur un visage les traits et leurs déformations, permettant ainsi de reconnaître la présence ou absence d'une action faciale particulière. Une réflexion sur les techniques recensées nous a amené à explorer deux différentes approches. La première concerne l'aspect apparence dans lequel on se sert de l'orientation des gradients pour dégager une représentation dense des attributs faciaux. Hormis la représentation faciale, la principale difficulté d'un système, qui se veut être général, est la mise en œuvre d'un modèle générique indépendamment de l'identité de la personne, de la géométrie et de la taille des visages. La démarche qu'on propose repose sur l'élaboration d'un référentiel prototypique à partir d'un recalage par SIFT-flow dont on démontre, dans cette thèse, la supériorité par rapport à un alignement conventionnel utilisant la position des yeux. Dans une deuxième approche, on fait appel à un modèle géométrique à travers lequel les primitives faciales sont représentées par un filtrage de Gabor. Motivé par le fait que les expressions faciales sont non seulement ambigües et incohérentes d'une personne à une autre mais aussi dépendantes du contexte lui-même, à travers cette approche, on présente un système personnalisé de reconnaissance d'expressions faciales, dont la performance globale dépend directement de la performance du suivi d'un ensemble de points caractéristiques du visage. Ce suivi est effectué par une forme modifiée d'une technique d'estimation de disparité faisant intervenir la phase de Gabor. Dans cette thèse, on propose une redéfinition de la mesure de confiance et introduisons une procédure itérative et conditionnelle d'estimation du déplacement qui offrent un suivi plus robuste que les méthodes originales.
Resumo:
Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.
Resumo:
In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.
Resumo:
Analysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, but recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions.
Resumo:
This paper presents an automatic method to detect and classify weathered aggregates by assessing changes of colors and textures. The method allows the extraction of aggregate features from images and the automatic classification of them based on surface characteristics. The concept of entropy is used to extract features from digital images. An analysis of the use of this concept is presented and two classification approaches, based on neural networks architectures, are proposed. The classification performance of the proposed approaches is compared to the results obtained by other algorithms (commonly considered for classification purposes). The obtained results confirm that the presented method strongly supports the detection of weathered aggregates.
Resumo:
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETS1) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Many methods based on biometrics such as fingerprint, face, iris, and retina have been proposed for person identification. However, for deceased individuals, such biometric measurements are not available. In such cases, parts of the human skeleton can be used for identification, such as dental records, thorax, vertebrae, shoulder, and frontal sinus. It has been established in prior investigations that the radiographic pattern of frontal sinus is highly variable and unique for every individual. This has stimulated the proposition of measurements of the frontal sinus pattern, obtained from x-ray films, for skeletal identification. This paper presents a frontal sinus recognition method for human identification based on Image Foresting Transform and shape context. Experimental results (ERR = 5,82%) have shown the effectiveness of the proposed method.
Resumo:
The aims of this study were to investigate work conditions, to estimate the prevalence and to describe risk factors associated with Computer Vision Syndrome among two call centers' operators in Sao Paulo (n = 476). The methods include a quantitative cross-sectional observational study and an ergonomic work analysis, using work observation, interviews and questionnaires. The case definition was the presence of one or more specific ocular symptoms answered as always, often or sometimes. The multiple logistic regression model, were created using the stepwise forward likelihood method and remained the variables with levels below 5% (p < 0.05). The operators were mainly female and young (from 15 to 24 years old). The call center was opened 24 hours and the operators weekly hours were 36 hours with break time from 21 to 35 minutes per day. The symptoms reported were eye fatigue (73.9%), "weight" in the eyes (68.2%), "burning" eyes (54.6%), tearing (43.9%) and weakening of vision (43.5%). The prevalence of Computer Vision Syndrome was 54.6%. Associations verified were: being female (OR 2.6, 95% CI 1.6 to 4.1), lack of recognition at work (OR 1.4, 95% CI 1.1 to 1.8), organization of work in call center (OR 1.4, 95% CI 1.1 to 1.7) and high demand at work (OR 1.1, 95% CI 1.0 to 1.3). The organization and psychosocial factors at work should be included in prevention programs of visual syndrome among call centers' operators.
Resumo:
[EN]In this paper, we focus on gender recognition in challenging large scale scenarios. Firstly, we review the literature results achieved for the problem in large datasets, and select the currently hardest dataset: The Images of Groups. Secondly, we study the extraction of features from the face and its local context to improve the recognition accuracy. Diff erent descriptors, resolutions and classfii ers are studied, overcoming previous literature results, reaching an accuracy of 89.8%.
Resumo:
[EN]In this work local binary patterns based focus measures are presented. Local binary patterns (LBP) have been introduced in computer vision tasks like texture classification or face recognition. In applications where recognition is based on LBP, a computational saving can be achieved with the use of LBP in the focus measures. The behavior of the proposed measures is studied to test if they fulfill the properties of the focus measures and then a comparison with some well know focus measures is carried out in different scenarios.
Resumo:
Riconoscere un gesto, tracciarlo ed identificarlo è una operazione complessa ed articolata. Negli ultimi anni, con l’avvento massivo di interfacce interattive sempre più sofisticate, si sono ampliati gli approcci nell’interazione tra uomo e macchina. L’obiettivo comune, è quello di avere una comunicazione “trasparente” tra l’utente e il computer, il quale, deve interpretare gesti umani tramite algoritmi matematici. Il riconoscimento di gesti è un modo per iniziare a comprendere il linguaggio del corpo umano da parte della macchina. Questa disciplina, studia nuovi modi di interazione tra questi due elementi e si compone di due macro obiettivi : (a) tracciare i movimenti di un particolare arto; (b) riconoscere tale tracciato come un gesto identificativo. Ognuno di questi due punti, racchiude in sé moltissimi ambiti di ricerca perché moltissimi sono gli approcci proposti negli anni. Non si tratta di semplice cattura dell’immagine, è necessario creare un supporto, a volte molto articolato, nel quale i dati grezzi provenienti dalla fotocamera, necessitano di filtraggi avanzati e trattamenti algoritmici, in modo tale da trasformare informazioni grezze, in dati utilizzabili ed affidabili. La tecnologia riguardo la gesture recognition è rilevante come l’introduzione delle interfacce tattili sui telefoni intelligenti. L’industria oggi ha iniziato a produrre dispositivi in grado di offrire una nuova esperienza, la più naturale possibile, agli utenti. Dal videogioco, all’esperienza televisiva gestita con dei piccoli gesti, all’ambito biomedicale, si sta introducendo una nuova generazione di dispositivi i cui impieghi sono innumerevoli e, per ogni ambito applicativo, è necessario studiare al meglio le peculiarità, in modo tale da produrre un qualcosa di nuovo ed efficace. Questo lavoro di tesi ha l’obiettivo di apportare un contributo a questa disciplina. Ad oggi, moltissime applicazioni e dispositivi associati, si pongono l’obiettivo di catturare movimenti ampi: il gesto viene eseguito con la maggior parte del corpo e occupa una posizione spaziale rilevante. Questa tesi vuole proporre invece un approccio, nel quale i movimenti da seguire e riconoscere sono fatti “nel piccolo”. Si avrà a che fare con gesti classificati fini, dove i movimenti delle mani sono compiuti davanti al corpo, nella zona del torace, ad esempio. Gli ambiti applicativi sono molti, in questo lavoro si è scelto ed adottato l’ambito artigianale.