986 resultados para Audio-scripto-visual


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This PhD by publication examines selected practice-based audio-visual works made by the author over a ten-year period, placing them in a critical context. Central to the publications, and the focus of the thesis, is an exploration of the role of sound in the creation of dialectic tension between the audio, the visual and the audience. By first analysing a number of texts (films/videos and key writings) the thesis locates the principal issues and debates around the use of audio in artists’ moving image practice. From this it is argued that asynchronism, first advocated in 1929 by Pudovkin as a response to the advent of synchronised sound, can be used to articulate audio-visual relationships. Central to asynchronism’s application in this paper is a recognition of the propensity for sound and image to adhere, and in visual music for there to be a literal equation of audio with the visual, often married with a quest for the synaesthetic. These elements can either be used in an illusionist fashion, or employed as part of an anti-illusionist strategy for realising dialectic. Using this as a theoretical basis, the paper examines how the publications implement asynchronism, including digital mapping to facilitate innovative reciprocal sound and image combinations, and the asynchronous use of ‘found sound’ from a range of online sources to reframe the moving image. The synthesis of publications and practice demonstrates that asynchronism can both underpin the creation of dialectic, and be an integral component in an audio-visual anti-illusionist methodology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relatório de Actividade Profissional apresentado para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Ciências da Comunicação – vertente de Novos Média e Práticas Web

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background Standard operating procedures state that police officers should not drive while interacting with their mobile data terminal (MDT) which provides in-vehicle information essential to police work. Such interactions do however occur in practice and represent a potential source of driver distraction. The MDT comprises visual output with manual input via touch screen and keyboard. This study investigated the potential for alternative input and output methods to mitigate driver distraction with specific focus on eye movements. Method Nineteen experienced drivers of police vehicles (one female) from the NSW Police Force completed four simulated urban drives. Three drives included a concurrent secondary task: imitation licence plate search using an emulated MDT. Three different interface methods were examined: Visual-Manual, Visual-Voice, and Audio-Voice (“Visual” and “Audio” = output modality; “Manual” and “Voice” = input modality). During each drive, eye movements were recorded using FaceLAB™ (Seeing Machines Ltd, Canberra, ACT). Gaze direction and glances on the MDT were assessed. Results The Visual-Voice and Visual-Manual interfaces resulted in a significantly greater number of glances towards the MDT than Audio-Voice or Baseline. The Visual-Manual and Visual-Voice interfaces resulted in significantly more glances to the display than Audio-Voice or Baseline. For longer duration glances (>2s and 1-2s) the Visual-Manual interface resulted in significantly more fixations than Baseline or Audio-Voice. The short duration glances (<1s) were significantly greater for both Visual-Voice and Visual-Manual compared with Baseline and Audio-Voice. There were no significant differences between Baseline and Audio-Voice. Conclusion An Audio-Voice interface has the greatest potential to decrease visual distraction to police drivers. However, it is acknowledged that an audio output may have limitations for information presentation compared with visual output. The Visual-Voice interface offers an environment where the capacity to present information is sustained, whilst distraction to the driver is reduced (compared to Visual-Manual) by enabling adaptation of fixation behaviour.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

O presente artigo encontra-se inserido dentro de um estudo que busca compreender as principais alternativas para a inclusão de alunos com deficiência visual no contexto do ensino de física. Focalizando aulas de óptica, analisa as viabilidades comunicacionais entre licenciandos e discentes com deficiência visual. Para tal, enfatiza as estruturas empírica e semântico-sensorial das linguagens utilizadas, indicando fatores geradores de acessibilidade às informações veiculadas. Recomenda, ainda, alternativas que visam dar condições à participação efetiva do discente com deficiência visual no processo comunicativo, das quais se destacam: a identificação da estrutura semântico-sensorial dos significados veiculados, o conhecimento da história visual do aluno, a utilização de linguagens de estrutura empírica tátil-auditiva interdependente em contextos interativos, bem como, a exploração das potencialidades comunicacionais das linguagens constituídas de estruturas empíricas fundamental auditiva, e auditiva e visual independentes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The unique alpine-living kea parrot Nestor notabilis has been the focus of numerous cognitive studies, but its communication system has so far been largely neglected. We examined 2,884 calls recorded in New Zealand’s Southern Alps. Based on audio and visual spectrographic differences, these calls were categorised into seven distinct call types: the non-oscillating ‘screech’ contact call and ‘mew’; and the oscillating ‘trill’, ‘chatter’, ‘warble’ and ‘whistle’; and a hybrid ‘screech-trill’. Most of these calls contained aspects that were individually unique, in addition to potentially encoding for an individual’s sex and age. Additionally, for each recording, the sender’s previous and next calls were noted, as well as any response given by conspecifics. We found that the previous and next calls made by the sender were most often of the same type, and that the next most likely preceding and/or following call type was the screech call, a contact call which sounds like the ‘kee-ah’ from which the bird’s name derives. As a social bird capable of covering large distances over visually obstructive terrain, long distance contact calls may be of considerable importance for social cohesion. Contact calls allow kea to locate conspecifics and congregate in temporary groups for social activities. The most likely response to any given call was a screech, usually followed by the same type of call as the initial call made by the sender, although responses differed depending on the age of the caller. The exception was the warble, the kea’s play call, to which the most likely response was another warble. Being the most common call type, as well as the default response to another call, it appears that the ‘contagious’ screech contact call plays a central role in kea vocal communication and social cohesion

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Improving safety at railway level crossings is an important issue for the Australian transport system. Governments, the rail industry and road organisations have tried a variety of countermeasures for many years to improve railway level crossing safety. New types of Intelligent Transport System (ITS) interventions are now emerging due to the availability and the affordability of technology. These interventions target both actively and passively protected railway level crossings and attempt to address drivers’ errors at railway crossings, which are mainly a failure to detect the crossing or the train and misjudgement of the train approach speed and distance. This study aims to assess the effectiveness of three emerging ITS that the rail industry considers implementing in Australia: a visual in-vehicle ITS, an audio in-vehicle ITS, as well as an on-road flashing beacons intervention. The evaluation was conducted on an advanced driving simulator with 20 participants per trialled technology, each participant driving once without any technology and once with one of the ITS interventions. Every participant drove through a range of active and passive crossings with and without trains approaching. Their speed approach of the crossing, head movements and stopping compliance were measured. Results showed that driver behaviour was changed with the three ITS interventions at passive crossings, while limited effects were found at active crossings, even with reduced visibility. The on-road intervention trialled was unsuccessful in improving driver behaviour; the audio and visual ITS improved driver behaviour when a train was approaching. A trend toward worsening driver behaviour with the visual ITS was observed when no trains were approaching. This trend was not observed for the audio ITS intervention, which appears to be the ITS intervention with the highest potential for improving safety at passive crossings.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose Peer-review programmes in radiation oncology are used to facilitate the process and evaluation of clinical decision-making. However, web-based peer-review methods are still uncommon. This study analysed an inter-centre, web-based peer-review case conference as a method of facilitating the decision-making process in radiation oncology. Methodology A benchmark form was designed based on the American Society for Radiation Oncology targets for radiation oncology peer review. This was used for evaluating the contents of the peer-review case presentations on 40 cases, selected from three participating radiation oncology centres. A scoring system was used for comparison of data, and a survey was conducted to analyse the experiences of radiation oncology professionals who attended the web-based peer-review meetings in order to identify priorities for improvement. Results The mean scores for the evaluations were 82·7, 84·5, 86·3 and 87·3% for cervical, prostate, breast and head and neck presentations, respectively. The survey showed that radiation oncology professionals were confident about the role of web-based peer-reviews in facilitating sharing of good practice, stimulating professionalism and promoting professional growth. The participants were satisfied with the quality of the audio and visual aspects of the web-based meeting. Conclusion The results of this study suggest that simple inter-centre web-based peer-review case conferences are a feasible technique for peer review in radiation oncology. Limitations such as data security and confidentiality can be overcome by the use of appropriate structure and technology. To drive the issues of quality and safety a step further, small radiotherapy departments may need to consider web-based peer-review case conference as part of their routine quality assurance practices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Growing evidence suggests that significant motor problems are associated with a diagnosis of Autism Spectrum Disorders (ASD), particularly in catching tasks. Catching is a complex, dynamic skill that involves the ability to synchronise one's own movement to that of a moving target. To successfully complete the task, the participant must pick up and use perceptual information about the moving target to arrive at the catching place at the right time. This study looks at catching ability in children diagnosed with ASD (mean age 10.16 ± 0.9 years) and age-matched non-verbal (9.72 ± 0.79 years) and receptive language (9.51 ± 0.46) control groups. Participants were asked to "catch" a ball as it rolled down a fixed ramp. Two ramp heights provided two levels of task difficulty, whilst the sensory information (audio and visual) specifying ball arrival time was varied. Results showed children with ASD performed significantly worse than both the receptive language (p =.02) and non-verbal (p =.02) control groups in terms of total number of balls caught. A detailed analysis of the movement kinematics showed that difficulties with picking up and using the sensory information to guide the action may be the source of the problem. © 2013 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Relatório da Prática de Ensino Supervisionada, Mestrado em Ensino da Matemática 3.º Ciclo e Secundário, Universidade de Lisboa, 2010

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As Virtual Reality pushes the boundaries of the human computer interface new ways of interaction are emerging. One such technology is the integration of haptic interfaces (force-feedback devices) into virtual environments. This modality offers an improved sense of immersion to that achieved when relying only on audio and visual modalities. The paper introduces some of the technical obstacles such as latency and network traffic that need to be overcome for maintaining a high degree of immersion during haptic tasks. The paper describes the advantages of integrating haptic feedback into systems, and presents some of the technical issues inherent in a networked haptic virtual environment. A generic control interface has been developed to seamlessly mesh with existing networked VR development libraries.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Study investigating the telecommunication information needs of people with communication disabilities in Australia. Participants were informed about the project through flyers, letters, e-mail, disability agency contacts, and Web sites. A survey with multiple choice and closed and open-ended items was developed. People with communication disabilities used a hard-copy survey format to facilitate completion. Sixty-five participants age eighteen and over from Victoria, Tasmania, South Australia, and Queensland completed the survey. Preliminary results of the study indicated that the participants requested six text-based adaptations: (1) make information clear and easy to read and understand, (2) use larger print, (3) highlight the key points, (4) use dot points, (5) use visual information, such as photos and communication symbols, and (6) provide a range of oral/audio and visual formats for information. The accessibility characteristics requested by the participants called for the development of text and Web-based formats, and for the development of inclusive design guidelines. The authors concluded that further investigation was required to determine the best possible method of making the information accessible.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To enable content-based retrieval, highlights extraction from broadcasted sport video has been an active research topic in the last decade. There is a well-known theory that high-level semantic, such as goal in soccer can be detected based on the occurrences of specific audio and visual features that can be extracted automatically. However, there is yet a definitive solution for the scope (i.e. start and end) of the detection for self consumable highlights. Thus, in this paper we will primarily demonstrate the benefits of using play-break for this purpose. Moreover, we also propose a browsing scheme that is based on integrated play-break and highlights (extended from [1]). To validate our approach, we will present the results from some experiments and a user study.