873 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Histologia, o estudo de tecidos, é uma das áreas fundamentais da Biologia que permitiu enormes avanços científicos. Sendo uma tarefa exigente, meticulosa e demorada, será importante aproveitar a existência de ferramentas e algoritmos computacionais no seu auxílio, tornando o processo mais rápido e possibilitando a descoberta de informação que poderá não estar visível à partida. Esta dissertação tem como principal objectivo averiguar se um animal foi ou não sujeito à ingestão de um xenobiótico. Com esse objectivo em vista, utilizaram-se técnicas de processamento e segmentação de imagem aplicadas a imagens de tecido renal de ratos saudáveis e ratos que ingeriram o xenobiótico. Destas imagens extraíram-se inúmeras características do corpúsculo renal que após serem analisadas através de vários algoritmos de classificação mostraram ser possível saber se o animal ingeriu ou não o xenobiótico, com um reduzido grau de incerteza. ABSTRACT: Histology, the study of tissues, is one of the key areas of Biology that has allowed huge advances in Science. Being a demanding, meticulous and time consuming task, it is important to use the existence of computational tools and algorithms in its aid, making the process faster and enabling the discovery of information that may not be initially visible. The main goal of this thesis is to ascertain if an animal was subjected or not to the ingestion of a xenobiotic. With this in mind, were used image processing and segmentation techniques applied on images of kidney tissue from healthy rats and rats that ingested the xenobiotic. From these images were extracted several features of renal glomeruli that after being analyzed by various classification algorithms had shown to be possible to know, with an acceptable degree of certainty, if the animal ingested or not the xenobiotic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A fabricação de moldes é um pilar da indústria metalomecânica nacional. A qualidade patenteada pelos moldes de fabrico nacional é reconhecida mundialmente. A fabricação de um molde é sempre um novo desafio, dadas as especificidades de cada peça e à complexidade geométrica que normalmente apresentam. Este trabalho teve por principal objetivo proceder ao projeto, fabricação e ajuste de um molde para uma peça plástica complexa, destinada à indústria automóvel, a qual, pela sua complexidade geométrica, elevada relação de área versus espessura e grande número de reforços bastante delgados, exige cuidados especiais no momento de se tomarem decisões quanto às melhores metodologias a adotar na sua fabricação, com vista a garantir a qualidade exigida pelo cliente final. O projeto foi desenvolvido e levado a cabo, permitindo constatar que algumas das opções tomadas na fase do projeto foram as mais adequadas, exigindo pouco trabalho de afinação no final e permitindo a extração de peças com elevada repetibilidade de características e a qualidade requerida pelo cliente.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Letras - FCLAS

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il presente elaborato ha per oggetto la tematica del Sé, in particolar modo il Sé corporeo. Il primo capitolo illustrerà la cornice teorica degli studi sul riconoscimento del Sé corporeo, affrontando come avviene l’elaborazione del proprio corpo e del proprio volto rispetto alle parti corporee delle altre persone. Il secondo capitolo descriverà uno studio su soggetti sani che indaga l’eccitabilità della corteccia motoria nei processi di riconoscimento sé/altro. I risultati mostrano un incremento dell’eccitabilità corticospinale dell’emisfero destro in seguito alla presentazione di stimoli propri (mano e cellulare), a 600 e 900 ms dopo la presentazione dello stimolo, fornendo informazioni sulla specializzazione emisferica substrati neurali e sulla temporalità dei processi che sottendono all’elaborazione del sé. Il terzo capitolo indagherà il contributo del movimento nel riconoscimento del Sé corporeo in soggetti sani ed in pazienti con lesione cerebrale destra. Le evidenze mostrano come i pazienti, che avevano perso la facilitazione nell’elaborare le parti del proprio corpo statiche, presentano tale facilitazione in seguito alla presentazione di parti del proprio corpo in movimento. Il quarto capitolo si occuperà dello sviluppo del sé corporeo in bambini con sviluppo atipico, affetti da autismo, con riferimento al riconoscimento di posture emotive proprie ed altrui. Questo studio mostra come alcuni processi legati al sé possono essere preservati anche in bambini affetti da autismo. Inoltre i dati mostrano che il riconoscimento del sé corporeo è modulato dalle emozioni espresse dalle posture corporee sia in bambini con sviluppo tipico che in bambini affetti da autismo. Il quinto capitolo sarà dedicato al ruolo dei gesti nel riconoscimento del corpo proprio ed altrui. I dati di questo studio evidenziano come il contenuto comunicativo dei gesti possa facilitare l’elaborazione di parti del corpo altrui. Nella discussione generale i risultati dei diversi studi verranno considerati all’interno della loro cornice teorica.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo general de este trabajo es el correcto funcionamiento de un sistema de reconocimiento facial compuesto de varios módulos, implementados en distintos lenguajes. Uno de dichos módulos está escrito en Python y se encargarí de determinar el género del rostro o rostros que aparecen en una imagen o en un fotograma de una secuencia de vídeo. El otro módulo, escrito en C++, llevará a cabo el reconocimiento de cada una de las partes de la cara (ojos, nariz, boca) y la orientación hacia la que está posicionada (derecha, izquierda). La primera parte de esta memoria corresponde a la reimplementación de todas las partes de un analizador facial, que constituyen el primer módulo antes mencionado. Estas partes son un analizador, compuesto a su vez por un reconocedor (Tracker) y un procesador (Processor), y una clase visor para poder visualizar los resultados. Por un lado, el reconocedor o "Tracker.es el encargado de encontrar la cara y sus partes, que serán pasadas al procesador o Processor, que analizará la cara obtenida por el reconocedor y determinará su género. Este módulo estaba dise~nado completamente en C y OpenCV 1.0, y ha sido reescrito en Python y OpenCV 2.4. Y en la segunda parte, se explica cómo realizar la comunicación entre el primer módulo escrito en Python y el segundo escrito en C++. Además, se analizarán diferentes herramientas para poder ejecutar código C++ desde programas Python. Dichas herramientas son PyBindGen, Cython y Boost. Dependiendo de las necesidades del programador se contará cuál de ellas es más conveniente utilizar en cada caso. Por último, en el apartado de resultados se puede observar el funcionamiento del sistema con la integración de los dos módulos, y cómo se muestran por pantalla los puntos de interés, el género y la orientación del rostro utilizando imágenes tomadas con una cámara web.---ABSTRACT---The main objective of this document is the proper functioning of a facial recognition system composed of two modules, implemented in diferent languages. One of these modules is written in Python, and his purpose is determining the gender of the face or faces in an image or a frame of a video sequence. The other module is written in C ++ and it will perform the recognition of each of the parts of the face (eyes, nose , mouth), and the head pose (right, left).The first part of this document corresponds to the reimplementacion of all components of a facial analyzer , which constitute the first module that I mentioned before. These parts are an analyzer , composed by a tracke) and a processor, and a viewer to display the results. The tracker function is to find and its parts, which will be passed to the processor, which will analyze the face obtained by the tracker. The processor will determine the face's gender. This module was completely written in C and OpenCV 1.0, and it has been rewritten in Python and OpenCV 2.4. And in the second part, it explains how to comunicate two modules, one of them written in Python and the other one written in C++. Furthermore, it talks about some tools to execute C++ code from Python scripts. The tools are PyBindGen, Cython and Boost. It will tell which one of those tools is better to use depend on the situation. Finally, in the results section it is possible to see how the system works with the integration of the two modules, and how the points of interest, the gender an the head pose are displayed on the screen using images taken from a webcam.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes a systematic research about free software solutions and techniques for art imagery computer recognition problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Convolutional Neural Networks (CNN) have become the state-of-the-art methods on many large scale visual recognition tasks. For a lot of practical applications, CNN architectures have a restrictive requirement: A huge amount of labeled data are needed for training. The idea of generative pretraining is to obtain initial weights of the network by training the network in a completely unsupervised way and then fine-tune the weights for the task at hand using supervised learning. In this thesis, a general introduction to Deep Neural Networks and algorithms are given and these methods are applied to classification tasks of handwritten digits and natural images for developing unsupervised feature learning. The goal of this thesis is to find out if the effect of pretraining is damped by recent practical advances in optimization and regularization of CNN. The experimental results show that pretraining is still a substantial regularizer, however, not a necessary step in training Convolutional Neural Networks with rectified activations. On handwritten digits, the proposed pretraining model achieved a classification accuracy comparable to the state-of-the-art methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The McGurk effect, in which auditory [ba] dubbed onto [go] lip movements is perceived as da or tha, was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4(1)/(2)-month-olds were tested in a habituation-test paradigm, in which 2 an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [deltaa] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [deltaa], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [deltaa] were no more familiar than [ba]. These results are consistent with infants'perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. (C) 2004 Wiley Periodicals, Inc.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.