941 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features
Resumo:
This PhD by publication examines selected practice-based audio-visual works made by the author over a ten-year period, placing them in a critical context. Central to the publications, and the focus of the thesis, is an exploration of the role of sound in the creation of dialectic tension between the audio, the visual and the audience. By first analysing a number of texts (films/videos and key writings) the thesis locates the principal issues and debates around the use of audio in artists’ moving image practice. From this it is argued that asynchronism, first advocated in 1929 by Pudovkin as a response to the advent of synchronised sound, can be used to articulate audio-visual relationships. Central to asynchronism’s application in this paper is a recognition of the propensity for sound and image to adhere, and in visual music for there to be a literal equation of audio with the visual, often married with a quest for the synaesthetic. These elements can either be used in an illusionist fashion, or employed as part of an anti-illusionist strategy for realising dialectic. Using this as a theoretical basis, the paper examines how the publications implement asynchronism, including digital mapping to facilitate innovative reciprocal sound and image combinations, and the asynchronous use of ‘found sound’ from a range of online sources to reframe the moving image. The synthesis of publications and practice demonstrates that asynchronism can both underpin the creation of dialectic, and be an integral component in an audio-visual anti-illusionist methodology.
Resumo:
We propose a study of the mathematical properties of voice as an audio signal -- This work includes signals in which the channel conditions are not ideal for emotion recognition -- Multiresolution analysis- discrete wavelet transform – was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states -- ANNs proved to be a system that allows an appropriate classification of such states -- This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features -- Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
A presente dissertação, no âmbito do mestrado de Línguas e Relações Empresariais, tem como objetivo aferir a perceção do público relativamente à identidade visual do Novo Banco, de modo a compreender se esta atuou eficazmente enquanto instrumento de comunicação no pós crise da instituição. Esta eficácia prende-se com a capacidade da identidade visual influenciar positivamente uma marca, promovendo o seu reconhecimento e visibilidade, bem como a diferenciação e o posicionamento positivo na mente do público. O processo de desmantelamento do Banco Espírito Santo, e posterior surgimento do Novo Banco, encontra-se associado a uma forte carga emocional. Por isso, com a intenção de avaliar com maior exactidão o impacto desta carga emocional inerente ao Novo Banco, foram estudadas, também, associações semânticas à marca, reveladoras do significado afetivo do público relativamente à organização. Para o cumprimento objetivo, foram delineadas bases contextuais e teóricas, que levaram à aplicação de um questionário que visou a obtenção de dados relativos, precisamente, à perceção do público geral. Os resultados sugerem que a identidade visual não foi uma resposta suficientemente eficaz dado que determinados componentes que a constituem não oferecem uma interpretação clara do que significam, originando baixos níveis de concordância. Contudo, os resultados mostram, ainda, que grande parte desta ineficácia deriva das consequências da crise no Banco Espírito Santo, ainda muito presentes na mente do público.
Resumo:
One of the Highlights of our academic year is the exhibition of the works of the graduating class in our visual arts degree program. Of course it is very Satisfying to see such an unambiguous evidence of accomplishments of our students.But i also find the exhibition invariably inspiring because of the works themselves,which stimulate us to see again as children, with wide-eyed curiosity and wonder, without the need to explain or reduce, with delight, with horror, with recognition. In enabling us to see again in this way, the students have learned a very demanding craft and educated themselves inthe history, vocabulary and syntax of visual expression.
Resumo:
Dissertação de Mestrado, Engenharia Informática, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2014
Resumo:
In this article we describe a semantic localization dataset for indoor environments named ViDRILO. The dataset provides five sequences of frames acquired with a mobile robot in two similar office buildings under different lighting conditions. Each frame consists of a point cloud representation of the scene and a perspective image. The frames in the dataset are annotated with the semantic category of the scene, but also with the presence or absence of a list of predefined objects appearing in the scene. In addition to the frames and annotations, the dataset is distributed with a set of tools for its use in both place classification and object recognition tasks. The large number of labeled frames in conjunction with the annotation scheme make this dataset different from existing ones. The ViDRILO dataset is released for use as a benchmark for different problems such as multimodal place classification and object recognition, 3D reconstruction or point cloud data compression.
Resumo:
Ornamental fish may be severely affected by a stressful environment. Stressors impair the immune response, reproduction and growth rate; thus, the identification of possible stressors will aid to improve the overall quality of ornamental fish. The aim of this study was to determine whole-body cortisol of adult zebrafish, Danio rerio, following visual or direct contact with a predator species. Zebrafish were distributed in three groups: the first group, which consisted of zebrafish reared completely isolated of the predator, was considered the negative control; the second group, in which the predator, Parachromis managuensis was stocked together with zebrafish, was considered the positive control; the third group consisted of zebrafish stocked in a glass aquarium, with direct visual contact with the predator. The mean whole-body cortisol concentration in zebrafish from the negative control was 6.78 +/- 1.12 ng g(-1), a concentration statistically lower than that found in zebrafish having visual contact with the predator (9.26 +/- 0.88 ng g(-1)) which, in turn, was statistically lower than the mean whole-body cortisol of the positive control group (12.35 +/- 1.59 ng g(-1)). The higher whole-body cortisol concentration found in fish from the positive control can be attributed to the detection, by the zebrafish, of relevant risk situations that may involve a combination of chemical, olfactory and visual cues. One of the functions of elevated cortisol is to mobilize energy from body resources to cope with stress. The elevation of whole-body cortisol in fish subjected to visual contact with the predator involves only the visual cue in the recognition of predation risk. We hypothesized that the zebrafish could recognize predator characteristics in P managuensis, such as length, shape, color and behavior. Nonetheless, the elevation of whole-body cortisol in zebrafish suggested that the visual contact of the predator may elicit a stress response in prey fish. This assertion has a strong practical application concerning the species distribution in ornamental fish markets in which prey species should not be allowed to see predator species. Minimizing visual contact between prey and predator fish may improve the quality, viability and welfare of small fish in ornamental fish markets. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
With the progress of computer technology, computers are expected to be more intelligent in the interaction with humans, presenting information according to the user's psychological and physiological characteristics. However, computer users with visual problems may encounter difficulties on the perception of icons, menus, and other graphical information displayed on the screen, limiting the efficiency of their interaction with computers. In this dissertation, a personalized and dynamic image precompensation method was developed to improve the visual performance of the computer users with ocular aberrations. The precompensation was applied on the graphical targets before presenting them on the screen, aiming to counteract the visual blurring caused by the ocular aberration of the user's eye. A complete and systematic modeling approach to describe the retinal image formation of the computer user was presented, taking advantage of modeling tools, such as Zernike polynomials, wavefront aberration, Point Spread Function and Modulation Transfer Function. The ocular aberration of the computer user was originally measured by a wavefront aberrometer, as a reference for the precompensation model. The dynamic precompensation was generated based on the resized aberration, with the real-time pupil diameter monitored. The potential visual benefit of the dynamic precompensation method was explored through software simulation, with the aberration data from a real human subject. An "artificial eye'' experiment was conducted by simulating the human eye with a high-definition camera, providing objective evaluation to the image quality after precompensation. In addition, an empirical evaluation with 20 human participants was also designed and implemented, involving image recognition tests performed under a more realistic viewing environment of computer use. The statistical analysis results of the empirical experiment confirmed the effectiveness of the dynamic precompensation method, by showing significant improvement on the recognition accuracy. The merit and necessity of the dynamic precompensation were also substantiated by comparing it with the static precompensation. The visual benefit of the dynamic precompensation was further confirmed by the subjective assessments collected from the evaluation participants.
Resumo:
OBJECTIVE: Cochlear implantation (CI) is a standard treatment for severe-profound sensorineural hearing loss (SNHL). However, consensus has yet to be reached on its effectiveness for hearing loss caused by auditory neuropathy spectrum disorder (ANSD). This review aims to summarize and synthesize current evidence of the effectiveness of CI in improving speech recognition in children with ANSD. DESIGN: Systematic review. STUDY SAMPLE: A total of 27 studies from an initial selection of 237. RESULTS: All selected studies were observational in design, including case studies, cohort studies, and comparisons between children with ANSD and SNHL. Most children with ANSD achieved open-set speech recognition with their CI. Speech recognition ability was found to be equivalent in CI users (who previously performed poorly with hearing aids) and hearing-aid users. Outcomes following CI generally appeared similar in children with ANSD and SNHL. Assessment of study quality, however, suggested substantial methodological concerns, particularly in relation to issues of bias and confounding, limiting the robustness of any conclusions around effectiveness. CONCLUSIONS: Currently available evidence is compatible with favourable outcomes from CI in children with ANSD. However, this evidence is weak. Stronger evidence is needed to support cost-effective clinical policy and practice in this area.