873 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The digital television system adopted by Brazil allows the signal reception of broadcast TV on mobile devices and laptops. This paper aims to analyze issues related to the transmission of free and open TV signals for reception in portable and mobile devices enabled by 1SEG system. We will evaluate the behavior of the user, tv schedule, prime time and experiences in Japan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN]Perceptual User Interfaces (PUIs) aim at facilitating human-computer interaction with the aid of human-like capacities (computer vision, speech recognition, etc.). In PUIs, the human face is a central element, since it conveys not only identity but also other important information, particularly with respect to the user’s mood or emotional state. This paper describes both a face detector and a smile detector for PUIs. Both are suitable for real-time interaction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis was aimed at verifying the role of the superior colliculus (SC) in human spatial orienting. To do so, subjects performed two experimental tasks that have been shown to involve SC’s activation in animals, that is a multisensory integration task (Experiment 1 and 2) and a visual target selection task (Experiment 3). To investigate this topic in humans, we took advantage of neurophysiological finding revealing that retinal S-cones do not send projections to the collicular and magnocellular pathway. In the Experiment 1, subjects performed a simple reaction-time task in which they were required to respond as quickly as possible to any sensory stimulus (visual, auditory or bimodal audio-visual). The visual stimulus could be an S-cone stimulus (invisible to the collicular and magnocellular pathway) or a long wavelength stimulus (visible to the SC). Results showed that when using S-cone stimuli, RTs distribution was simply explained by probability summation, indicating that the redundant auditory and visual channels are independent. Conversely, with red long-wavelength stimuli, visible to the SC, the RTs distribution was related to nonlinear neural summation, which constitutes evidence of integration of different sensory information. We also demonstrate that when AV stimuli were presented at fixation, so that the spatial orienting component of the task was reduced, neural summation was possible regardless of stimulus color. Together, these findings provide support for a pivotal role of the SC in mediating multisensory spatial integration in humans, when behavior involves spatial orienting responses. Since previous studies have shown an anatomical asymmetry of fibres projecting to the SC from the hemiretinas, the Experiment 2 was aimed at investigating temporo-nasal asymmetry in multisensory integration. To do so, subjects performed monocularly the same task shown in the Experiment 1. When spatially coincident audio-visual stimuli were visible to the SC (i.e. red stimuli), the RTE depended on a neural coactivation mechanism, suggesting an integration of multisensory information. When using stimuli invisible to the SC (i.e. purple stimuli), the RTE depended only on a simple statistical facilitation effect, in which the two sensory stimuli were processed by independent channels. Finally, we demonstrate that the multisensory integration effect was stronger for stimuli presented to the temporal hemifield than to the nasal hemifield. Taken together, these findings suggested that multisensory stimulation can be differentially effective depending on specific stimulus parameters. The Experiment 3 was aimed at verifying the role of the SC in target selection by using a color-oddity search task, comprising stimuli either visible or invisible to the collicular and magnocellular pathways. Subjects were required to make a saccade toward a target that could be presented alone or with three distractors of another color (either S-cone or long-wavelength). When using S-cone distractors, invisible to the SC, localization errors were similar to those observed in the distractor-free condition. Conversely, with long-wavelength distractors, visible to the SC, saccadic localization error and variability were significantly greater than in either the distractor-free condition or the S-cone distractors condition. Our results clearly indicate that the SC plays a direct role in visual target selection in humans. Overall, our results indicate that the SC plays an important role in mediating spatial orienting responses both when required covert (Experiments 1 and 2) and overt orienting (Experiment 3).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of "intelligence". The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The synchronization of dynamic multileaf collimator (DMLC) response with respiratory motion is critical to ensure the accuracy of DMLC-based four dimensional (4D) radiation delivery. In practice, however, a finite time delay (response time) between the acquisition of tumor position and multileaf collimator response necessitates predictive models of respiratory tumor motion to synchronize radiation delivery. Predicting a complex process such as respiratory motion introduces geometric errors, which have been reported in several publications. However, the dosimetric effect of such errors on 4D radiation delivery has not yet been investigated. Thus, our aim in this work was to quantify the dosimetric effects of geometric error due to prediction under several different conditions. Conformal and intensity modulated radiation therapy (IMRT) plans for a lung patient were generated for anterior-posterior/posterior-anterior (AP/PA) beam arrangements at 6 and 18 MV energies to provide planned dose distributions. Respiratory motion data was obtained from 60 diaphragm-motion fluoroscopy recordings from five patients. A linear adaptive filter was employed to predict the tumor position. The geometric error of prediction was defined as the absolute difference between predicted and actual positions at each diaphragm position. Distributions of geometric error of prediction were obtained for all of the respiratory motion data. Planned dose distributions were then convolved with distributions for the geometric error of prediction to obtain convolved dose distributions. The dosimetric effect of such geometric errors was determined as a function of several variables: response time (0-0.6 s), beam energy (6/18 MV), treatment delivery (3D/4D), treatment type (conformal/IMRT), beam direction (AP/PA), and breathing training type (free breathing/audio instruction/visual feedback). Dose difference and distance-to-agreement analysis was employed to quantify results. Based on our data, the dosimetric impact of prediction (a) increased with response time, (b) was larger for 3D radiation therapy as compared with 4D radiation therapy, (c) was relatively insensitive to change in beam energy and beam direction, (d) was greater for IMRT distributions as compared with conformal distributions, (e) was smaller than the dosimetric impact of latency, and (f) was greatest for respiration motion with audio instructions, followed by visual feedback and free breathing. Geometric errors of prediction that occur during 4D radiation delivery introduce dosimetric errors that are dependent on several factors, such as response time, treatment-delivery type, and beam energy. Even for relatively small response times of 0.6 s into the future, dosimetric errors due to prediction could approach delivery errors when respiratory motion is not accounted for at all. To reduce the dosimetric impact, better predictive models and/or shorter response times are required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In clinical practice, traditional X-ray radiography is widely used, and knowledge of landmarks and contours in anteroposterior (AP) pelvis X-rays is invaluable for computer aided diagnosis, hip surgery planning and image-guided interventions. This paper presents a fully automatic approach for landmark detection and shape segmentation of both pelvis and femur in conventional AP X-ray images. Our approach is based on the framework of landmark detection via Random Forest (RF) regression and shape regularization via hierarchical sparse shape composition. We propose a visual feature FL-HoG (Flexible- Level Histogram of Oriented Gradients) and a feature selection algorithm based on trace radio optimization to improve the robustness and the efficacy of RF-based landmark detection. The landmark detection result is then used in a hierarchical sparse shape composition framework for shape regularization. Finally, the extracted shape contour is fine-tuned by a post-processing step based on low level image features. The experimental results demonstrate that our feature selection algorithm reduces the feature dimension in a factor of 40 and improves both training and test efficiency. Further experiments conducted on 436 clinical AP pelvis X-rays show that our approach achieves an average point-to-curve error around 1.2 mm for femur and 1.9 mm for pelvis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo deviene de un trabajo de corte etnográfico virtual, que pretendió realizar un análisis discursivo de las imágenes exhibidas en la red social Facebook por jóvenes universitarios. Aquellas imágenes donde el cuerpo es objetivado en clave virtual, a través de distintos géneros. Una vez decodificadas estas imágenes se reconocieron narraciones de vida, signos y códigos que ponen en la escena discursos elaborados por los sujetos universitarios, a partir de su estadía en la moderna estructura social de consumo. Con base en ello, se evidencia la necesidad de alfabetizar a los sujetos en el lenguaje audiovisual, pretendiendo formar consumidores críticos

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo deviene de un trabajo de corte etnográfico virtual, que pretendió realizar un análisis discursivo de las imágenes exhibidas en la red social Facebook por jóvenes universitarios. Aquellas imágenes donde el cuerpo es objetivado en clave virtual, a través de distintos géneros. Una vez decodificadas estas imágenes se reconocieron narraciones de vida, signos y códigos que ponen en la escena discursos elaborados por los sujetos universitarios, a partir de su estadía en la moderna estructura social de consumo. Con base en ello, se evidencia la necesidad de alfabetizar a los sujetos en el lenguaje audiovisual, pretendiendo formar consumidores críticos

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo deviene de un trabajo de corte etnográfico virtual, que pretendió realizar un análisis discursivo de las imágenes exhibidas en la red social Facebook por jóvenes universitarios. Aquellas imágenes donde el cuerpo es objetivado en clave virtual, a través de distintos géneros. Una vez decodificadas estas imágenes se reconocieron narraciones de vida, signos y códigos que ponen en la escena discursos elaborados por los sujetos universitarios, a partir de su estadía en la moderna estructura social de consumo. Con base en ello, se evidencia la necesidad de alfabetizar a los sujetos en el lenguaje audiovisual, pretendiendo formar consumidores críticos

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biometrics applied to mobile devices are of great interest for security applications. Daily scenarios can benefit of a combination of both the most secure systems and most simple and extended devices. This document presents a hand biometric system oriented to mobile devices, proposing a non-intrusive, contact-less acquisition process where final users should take a picture of their hand in free-space with a mobile device without removals of rings, bracelets or watches. The main contribution of this paper is threefold: firstly, a feature extraction method is proposed, providing invariant hand measurements to previous changes; second contribution consists of providing a template creation based on hand geometric distances, requiring information from only one individual, without considering data from the rest of individuals within the database; finally, a proposal for template matching is proposed, minimizing the intra-class similarity and maximizing the inter-class likeliness. The proposed method is evaluated using three publicly available contact-less, platform-free databases. In addition, the results obtained with these databases will be compared to the results provided by two competitive pattern recognition techniques, namely Support Vector Machines (SVM) and k-Nearest Neighbour, often employed within the literature. Therefore, this approach provides an appropriate solution to adapt hand biometrics to mobile devices, with an accurate results and a non-intrusive acquisition procedure which increases the overall acceptance from the final user.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a hand biometric system for contact-less, platform-free scenarios, proposing innovative methods in feature extraction, template creation and template matching. The evaluation of the proposed method considers both the use of three contact-less publicly available hand databases, and the comparison of the performance to two competitive pattern recognition techniques existing in literature: namely Support Vector Machines (SVM) and k-Nearest Neighbour (k-NN). Results highlight the fact that the proposed method outcomes existing approaches in literature in terms of computational cost, accuracy in human identification, number of extracted features and number of samples for template creation. The proposed method is a suitable solution for human identification in contact-less scenarios based on hand biometrics, providing a feasible solution to devices with limited hardware requirements like mobile devices

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a robust approach for recognition of thermal face images based on decision level fusion of 34 different region classifiers. The region classifiers concentrate on local variations. They use singular value decomposition (SVD) for feature extraction. Fusion of decisions of the region classifier is done by using majority voting technique. The algorithm is tolerant against false exclusion of thermal information produced by the presence of inconsistent distribution of temperature statistics which generally make the identification process difficult. The algorithm is extensively evaluated on UGC-JU thermal face database, and Terravic facial infrared database and the recognition performance are found to be 95.83% and 100%, respectively. A comparative study has also been made with the existing works in the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the GTH-UPM system for the Albayzin 2014 Search on Speech Evaluation. Teh evaluation task consists of searching a list of terms/queries in audio files. The GTH-UPM system we are presenting is based on a LVCSR (Large Vocabulary Continuous Speech Recognition) system. We have used MAVIR corpus and the Spanish partition of the EPPS (European Parliament Plenary Sessions) database for training both acoustic and language models. The main effort has been focused on lexicon preparation and text selection for the language model construction. The system makes use of different lexicon and language models depending on the task that is performed. For the best configuration of the system on the development set, we have obtained a FOM of 75.27 for the deyword spotting task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature vectors can be anything from simple surface normals to more complex feature descriptors. Feature extraction is important to solve various computer vision problems: e.g. registration, object recognition and scene understanding. Most of these techniques cannot be computed online due to their complexity and the context where they are applied. Therefore, computing these features in real-time for many points in the scene is impossible. In this work, a hardware-based implementation of 3D feature extraction and 3D object recognition is proposed to accelerate these methods and therefore the entire pipeline of RGBD based computer vision systems where such features are typically used. The use of a GPU as a general purpose processor can achieve considerable speed-ups compared with a CPU implementation. In this work, advantageous results are obtained using the GPU to accelerate the computation of a 3D descriptor based on the calculation of 3D semi-local surface patches of partial views. This allows descriptor computation at several points of a scene in real-time. Benefits of the accelerated descriptor have been demonstrated in object recognition tasks. Source code will be made publicly available as contribution to the Open Source Point Cloud Library.