873 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI
Resumo:
The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.
Resumo:
Este proyecto consiste en el diseño y desarrollo de un plug-in que permita usar el sistema de procesado de imagen OpenCV desde el sistema operativo OpenDomo OS.
Resumo:
VisualDomo tiene como objetivo la creación de una aplicación multiplataforma para dispositivos móviles que permita controlar de manera visual los dispositivos ODControl de la empresa OpenDomo Services SL. El proyecto está desarrollado utilizando HTML, CSS y JavaScript con el soporte de Cordova.
Resumo:
El projecte desenvolupa una aplicació visual per a dispositius mòbils basada en PhoneGap - Cordova (HTML + CSS + JavaScript) per al control del dispositiu ODControl de l'empresa OpenDomo Services.
Resumo:
The large and growing number of digital images is making manual image search laborious. Only a fraction of the images contain metadata that can be used to search for a particular type of image. Thus, the main research question of this thesis is whether it is possible to learn visual object categories directly from images. Computers process images as long lists of pixels that do not have a clear connection to high-level semantics which could be used in the image search. There are various methods introduced in the literature to extract low-level image features and also approaches to connect these low-level features with high-level semantics. One of these approaches is called Bag-of-Features which is studied in the thesis. In the Bag-of-Features approach, the images are described using a visual codebook. The codebook is built from the descriptions of the image patches using clustering. The images are described by matching descriptions of image patches with the visual codebook and computing the number of matches for each code. In this thesis, unsupervised visual object categorisation using the Bag-of-Features approach is studied. The goal is to find groups of similar images, e.g., images that contain an object from the same category. The standard Bag-of-Features approach is improved by using spatial information and visual saliency. It was found that the performance of the visual object categorisation can be improved by using spatial information of local features to verify the matches. However, this process is computationally heavy, and thus, the number of images must be limited in the spatial matching, for example, by using the Bag-of-Features method as in this study. Different approaches for saliency detection are studied and a new method based on the Hessian-Affine local feature detector is proposed. The new method achieves comparable results with current state-of-the-art. The visual object categorisation performance was improved by using foreground segmentation based on saliency information, especially when the background could be considered as clutter.
Resumo:
Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.
Resumo:
Kandidaatintyö tehtiin osana PulpVision-tutkimusprojektia, jonka tarkoituksena on kehittää kuvapohjaisia laskenta- ja luokittelumetodeja sellun laaduntarkkailuun paperin valmistuksessa. Tämän tutkimusprojektin osana on aiemmin kehitetty metodi, jolla etsittiin kaarevia rakenteita kuvista, ja tätä metodia hyödynnettiin kuitujen etsintään kuvista. Tätä metodia käytettiin lähtökohtana kandidaatintyölle. Työn tarkoituksena oli tutkia, voidaanko erilaisista kuitukuvista laskettujen piirteiden avulla tunnistaa kuvassa olevien kuitujen laji. Näissä kuitukuvissa oli kuituja neljästä eri puulajista ja yhdestä kasvista. Nämä lajit olivat akasia, koivu, mänty, eukalyptus ja vehnä. Jokaisesta lajista valittiin 100 kuitukuvaa ja nämä kuvat jaettiin kahteen ryhmään, joista ensimmäistä käytettiin opetusryhmänä ja toista testausryhmänä. Opetusryhmän avulla jokaiselle kuitulajille laskettiin näitä kuvaavia piirteitä, joiden avulla pyrittiin tunnistamaan testausryhmän kuvissa olevat kuitulajit. Nämä kuvat oli tuottanut CEMIS-Oulu (Center for Measurement and Information Systems), joka on mittaustekniikkaan keskittynyt yksikkö Oulun yliopistossa. Yksittäiselle opetusryhmän kuitukuvalle laskettiin keskiarvot ja keskihajonnat kolmesta eri piirteestä, jotka olivat pituus, leveys ja kaarevuus. Lisäksi laskettiin, kuinka monta kuitua kuvasta löydettiin. Näiden piirteiden eri yhdistelmien avulla testattiin tunnistamisen tarkkuutta käyttämällä k:n lähimmän naapurin menetelmää ja Naiivi Bayes -luokitinta testausryhmän kuville. Testeistä saatiin lupaavia tuloksia muun muassa pituuden ja leveyden keskiarvoja käytettäessä saavutettiin jopa noin 98 %:n tarkkuus molemmilla algoritmeilla. Tunnistuksessa kuitujen keskimäärinen pituus vaikutti olevan kuitukuvia parhaiten kuvaava piirre. Käytettyjen algoritmien välillä ei ollut suurta vaihtelua tarkkuudessa. Testeissä saatujen tulosten perusteella voidaan todeta, että kuitukuvien tunnistaminen on mahdollista. Testien perusteella kuitukuvista tarvitsee laskea vain kaksi piirrettä, joilla kuidut voidaan tunnistaa tarkasti. Käytetyt lajittelualgoritmit olivat hyvin yksinkertaisia, mutta ne toimivat testeissä hyvin.
Resumo:
Forty students from regular, grade five classes were divided into two groups of twenty, a good reader group and a' poor reader group, on the basis. of their reading scores on Canadian Achievement Tests. .The subjects took. part in four experimental conditions iM which they .learned lists of pronounceable and unprono~nceable pseudowords, some with semantic referents, and responded to questions designed tci test visual perceptu~l learning and lexical ·and semantic association learning. It' was hypothesized "that the good reade~ group would be able to make use of graphemic and phonemic redundancy patterns in order to improv~·visuSl perceptual learning and lexical and semantic association lea~ningto a greater extent. than would .the poor reader gr6up. The data supported this hypothesis, and also indicated that, although the poor readers were less adept at using familiar sound and letter patterns, they were more dependent on· such pa~terns as an aid to visual recognition memory and semantic recall than were the good readers. It wa.s postulated that poor readers are in a double- ~ . bind situatio~ of having to choose between using weak graphemic-semantic associations or gr~pheme-phoneme associations which are also weak and which have hindered them in developing automaticity in. reading.
Resumo:
This lexical decision study with eye tracking of Japanese two-kanji-character words investigated the order in which a whole two-character word and its morphographic constituents are activated in the course of lexical access, the relative contributions of the left and the right characters in lexical decision, the depth to which semantic radicals are processed, and how nonlinguistic factors affect lexical processes. Mixed-effects regression analyses of response times and subgaze durations (i.e., first-pass fixation time spent on each of the two characters) revealed joint contributions of morphographic units at all levels of the linguistic structure with the magnitude and the direction of the lexical effects modulated by readers’ locus of attention in a left-to-right preferred processing path. During the early time frame, character effects were larger in magnitude and more robust than radical and whole-word effects, regardless of the font size and the type of nonwords. Extending previous radical-based and character-based models, we propose a task/decision-sensitive character-driven processing model with a level-skipping assumption: Connections from the feature level bypass the lower radical level and link up directly to the higher character level.
Resumo:
Lors d'une intervention conversationnelle, le langage est supporté par une communication non-verbale qui joue un rôle central dans le comportement social humain en permettant de la rétroaction et en gérant la synchronisation, appuyant ainsi le contenu et la signification du discours. En effet, 55% du message est véhiculé par les expressions faciales, alors que seulement 7% est dû au message linguistique et 38% au paralangage. L'information concernant l'état émotionnel d'une personne est généralement inférée par les attributs faciaux. Cependant, on ne dispose pas vraiment d'instruments de mesure spécifiquement dédiés à ce type de comportements. En vision par ordinateur, on s'intéresse davantage au développement de systèmes d'analyse automatique des expressions faciales prototypiques pour les applications d'interaction homme-machine, d'analyse de vidéos de réunions, de sécurité, et même pour des applications cliniques. Dans la présente recherche, pour appréhender de tels indicateurs observables, nous essayons d'implanter un système capable de construire une source consistante et relativement exhaustive d'informations visuelles, lequel sera capable de distinguer sur un visage les traits et leurs déformations, permettant ainsi de reconnaître la présence ou absence d'une action faciale particulière. Une réflexion sur les techniques recensées nous a amené à explorer deux différentes approches. La première concerne l'aspect apparence dans lequel on se sert de l'orientation des gradients pour dégager une représentation dense des attributs faciaux. Hormis la représentation faciale, la principale difficulté d'un système, qui se veut être général, est la mise en œuvre d'un modèle générique indépendamment de l'identité de la personne, de la géométrie et de la taille des visages. La démarche qu'on propose repose sur l'élaboration d'un référentiel prototypique à partir d'un recalage par SIFT-flow dont on démontre, dans cette thèse, la supériorité par rapport à un alignement conventionnel utilisant la position des yeux. Dans une deuxième approche, on fait appel à un modèle géométrique à travers lequel les primitives faciales sont représentées par un filtrage de Gabor. Motivé par le fait que les expressions faciales sont non seulement ambigües et incohérentes d'une personne à une autre mais aussi dépendantes du contexte lui-même, à travers cette approche, on présente un système personnalisé de reconnaissance d'expressions faciales, dont la performance globale dépend directement de la performance du suivi d'un ensemble de points caractéristiques du visage. Ce suivi est effectué par une forme modifiée d'une technique d'estimation de disparité faisant intervenir la phase de Gabor. Dans cette thèse, on propose une redéfinition de la mesure de confiance et introduisons une procédure itérative et conditionnelle d'estimation du déplacement qui offrent un suivi plus robuste que les méthodes originales.
Resumo:
Any automatically measurable, robust and distinctive physical characteristic or personal trait that can be used to identify an individual or verify the claimed identity of an individual, referred to as biometrics, has gained significant interest in the wake of heightened concerns about security and rapid advancements in networking, communication and mobility. Multimodal biometrics is expected to be ultra-secure and reliable, due to the presence of multiple and independent—verification clues. In this study, a multimodal biometric system utilising audio and facial signatures has been implemented and error analysis has been carried out. A total of one thousand face images and 250 sound tracks of 50 users are used for training the proposed system. To account for the attempts of the unregistered signatures data of 25 new users are tested. The short term spectral features were extracted from the sound data and Vector Quantization was done using K-means algorithm. Face images are identified based on Eigen face approach using Principal Component Analysis. The success rate of multimodal system using speech and face is higher when compared to individual unimodal recognition systems
Resumo:
A method is presented for the visual analysis of objects by computer. It is particularly well suited for opaque objects with smoothly curved surfaces. The method extracts information about the object's surface properties, including measures of its specularity, texture, and regularity. It also aids in determining the object's shape. The application of this method to a simple recognition task ??e recognition of fruit ?? discussed. The results on a more complex smoothly curved object, a human face, are also considered.
Resumo:
We present a statistical image-based shape + structure model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
Resumo:
A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects' 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The 'object model', according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.