997 resultados para audio processing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A sound design that animates elves working in an office processing Christmas orders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes a novel dual-channel time-spread echo method for audio watermarking, aiming to improve robustness and perceptual quality. At the embedding stage, the host audio signal is divided into two subsignals, which are considered to be signals obtained from two virtual audio channels. The watermarks are implanted into the two subsignals simultaneously. Then the subsignals embedded with watermarks are combined to form the watermarked signal. At the decoding stage, the watermarked signal is split up into two watermarked subsignals. The similarity of the cepstra corresponding to the watermarked subsignals is exploited to extract the embedded watermarks. Moreover, if a properly designed colored pseudonoise sequence is used, the large peaks of its auto-correlation function can be utilized to further enhance the performance of watermark extraction. Compared with the existing time-spread echo-based schemes, the proposed method is more robust to attacks and has higher imperceptibility. The effectiveness of our method is demonstrated by simulation results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use the concept of film pace, expressed through the audio, to analyse the broad level narrative structure of film. The narrative structure is divided into visual narration, action sections, and audio narration, plot development sections. We hypothesise, that changes in the narrative structure signal a change in audio content, which is reflected by a change in audio pace. We test this hypothesis using a number of audio feature functions, that reflect the audio pace, to detect changes in narrative structure for 8 films of varying genres. The properties of the energy were then used to determine the. audio pace feature corresponding to the narrative, structure for each film analysed. The method was successful in determining the narrative structure for 1 of the films, achieving an overall precision of 76.4% and recall of 80.3%, We map the properties of the speech and energy of film audio to the higher level semantic concept of audio pace. The audio pace was in turn applied to a higher level semantic analysis of the structure of film.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We examine localised sound energy patterns, or events, that we associate with high level affect experienced with films. The study of sound energy events in conjunction with their intended affect enable the analysis of film at a higher conceptual level, such as genre. The various affect/emotional responses we investigate in this paper are brought about by well established patterns of sound energy dynamics employed in audio tracks of horror films. This allows the examination of the thematic content of the films in relation to horror elements. We analyse the frequency of sound energy and affect events at a film level as well as at a scene level, and propose measures indicative of the film genre and scene content. Using 4 horror, and 2 non-horror movies as experimental data we establish a correlation between the sound energy event types and horrific thematic content within film, thus enabling an automated mechanism for genre typing and scene content labeling in film.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel adaptive safe-band for quantization based audio watermarking methods, aiming to improve robustness. Considerable number of audio watermarking methods have been developed using quantization based techniques. These techniques are generally vulnerable to signal processing attacks. For these conventional quantization based techniques, robustness can be marginally improved by choosing larger step sizes at the cost of significant perceptual quality degradation. We first introduce fixed size safe-band between two quantization steps to improve robustness. This safe-band will act as a buffer to withstand certain types of attacks. Then we further improve the robustness by adaptively changing the size of the safe-band based on the audio signal feature used for watermarking. Compared with conventional quantization based method and the fixed size safe-band based method, the proposed adaptive safe-band based quantization method is more robust to attacks. The effectiveness of the proposed technique is demonstrated by simulation results. © 2014 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper adresses the problem on processing biological data such as cardiac beats, audio and ultrasonic range, calculating wavelet coefficients in real time, with processor clock running at frequency of present ASIC's and FPGA. The Paralell Filter Architecture for DWT has been improved, calculating wavelet coefficients in real time with hardware reduced to 60%. The new architecture, which also processes IDWT, is implemented with the Radix-2 or the Booth-Wallace Constant multipliers. Including series memory register banks, one integrated circuit Signal Analyzer, ultrasonic range, is presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lesions to the primary geniculo-striate visual pathway cause blindness in the contralesional visual field. Nevertheless, previous studies have suggested that patients with visual field defects may still be able to implicitly process the affective valence of unseen emotional stimuli (affective blindsight) through alternative visual pathways bypassing the striate cortex. These alternative pathways may also allow exploitation of multisensory (audio-visual) integration mechanisms, such that auditory stimulation can enhance visual detection of stimuli which would otherwise be undetected when presented alone (crossmodal blindsight). The present dissertation investigated implicit emotional processing and multisensory integration when conscious visual processing is prevented by real or virtual lesions to the geniculo-striate pathway, in order to further clarify both the nature of these residual processes and the functional aspects of the underlying neural pathways. The present experimental evidence demonstrates that alternative subcortical visual pathways allow implicit processing of the emotional content of facial expressions in the absence of cortical processing. However, this residual ability is limited to fearful expressions. This finding suggests the existence of a subcortical system specialised in detecting danger signals based on coarse visual cues, therefore allowing the early recruitment of flight-or-fight behavioural responses even before conscious and detailed recognition of potential threats can take place. Moreover, the present dissertation extends the knowledge about crossmodal blindsight phenomena by showing that, unlike with visual detection, sound cannot crossmodally enhance visual orientation discrimination in the absence of functional striate cortex. This finding demonstrates, on the one hand, that the striate cortex plays a causative role in crossmodally enhancing visual orientation sensitivity and, on the other hand, that subcortical visual pathways bypassing the striate cortex, despite affording audio-visual integration processes leading to the improvement of simple visual abilities such as detection, cannot mediate multisensory enhancement of more complex visual functions, such as orientation discrimination.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Digital systems can generate left and right audio channels that create the effect of virtual sound source placement (spatialization) by processing an audio signal through pairs of Head-Related Transfer Functions (HRTFs) or, equivalently, Head-Related Impulse Responses (HRIRs). The spatialization effect is better when individually-measured HRTFs or HRIRs are used than when generic ones (e.g., from a mannequin) are used. However, the measurement process is not available to the majority of users. There is ongoing interest to find mechanisms to customize HRTFs or HRIRs to a specific user, in order to achieve an improved spatialization effect for that subject. Unfortunately, the current models used for HRTFs and HRIRs contain over a hundred parameters and none of those parameters can be easily related to the characteristics of the subject. This dissertation proposes an alternative model for the representation of HRTFs, which contains at most 30 parameters, all of which have a defined functional significance. It also presents methods to obtain the value of parameters in the model to make it approximately equivalent to an individually-measured HRTF. This conversion is achieved by the systematic deconstruction of HRIR sequences through an augmented version of the Hankel Total Least Squares (HTLS) decomposition approach. An average 95% match (fit) was observed between the original HRIRs and those re-constructed from the Damped and Delayed Sinusoids (DDSs) found by the decomposition process, for ipsilateral source locations. The dissertation also introduces and evaluates an HRIR customization procedure, based on a multilinear model implemented through a 3-mode tensor, for mapping of anatomical data from the subjects to the HRIR sequences at different sound source locations. This model uses the Higher-Order Singular Value Decomposition (HOSVD) method to represent the HRIRs and is capable of generating customized HRIRs from easily attainable anatomical measurements of a new intended user of the system. Listening tests were performed to compare the spatialization performance of customized, generic and individually-measured HRIRs when they are used for synthesized spatial audio. Statistical analysis of the results confirms that the type of HRIRs used for spatialization is a significant factor in the spatialization success, with the customized HRIRs yielding better results than generic HRIRs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study examines the correlation between how certified music educators understand audio technology and how they incorporate it in their instructional methods. Participants were classroom music teachers selected from fifty middle schools in Miami- Dade Public Schools. The study adopted a non-experimental research design in which a survey was the primary tool of investigation. The findings reveal that a majority of middle school music teachers in Miami-Dade are not familiar with advanced audiorecording software or any other digital device dedicated to the recording and processing of audio signals. Moreover, they report a lack of opportunities to develop this knowledge. Younger music teachers, however, are more open to developing up-to-date instructional methodologies. Most of the participants agreed that music instruction should be a platform for preparing students for a future in the entertainment industry. A basic knowledge of music business should be delivered to students enrolled in middle-school music courses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Therapistsʼ process notes - written descriptions of a session produced shortly afterwards from memory - hold a significant role in child and adolescent psychoanalytic psychotherapy. They are central in training, in supervision, and in developing oneʼs understanding through selfsupervision and forms of psychotherapy research. This thesis examines such process notes through a comparison with audio recordings of the same sessions. In so doing, it aims to generate theory that might illuminate the causes of significantly patterned discrepancies between the notes and recordings, in order to understand more about the processes at work in psychoanalytic psychotherapy and to explore the nature of process notes, their values and limitations. The literature searches conducted revealed limited relevant studies. All identified studies that compare process notes with recordings of sessions seek to quantify the differences between the two forms of recording. Unlike these, this thesis explores the meaning of the differences between process notes and recordings through qualitative data analysis. Using psychoanalytically informed grounded theory, in total nine sets of process notes and recordings from three different psychoanalytic psychotherapists are analysed. The analysis identifies eight core categories of findings. Initial theories are developed from these categories, most significantly concerning the role and influence of a ʻcore transference dynamicʼ between therapist and patient. Further theory is developed on the nature and function of process notes as a means for the therapistʼs conscious and unconscious processing of the session, as well as on the nature of the influence of the relationships – both internal and external – within which they are written. In the light of the findings, a proposal is made for a new approach for learning about the patient and clinical work, ʻthe comparison methodʼ (supervision involving a comparison of process notes and recordings), and, in particular, for its inclusion within the training of psychoanalytic psychotherapists. Further recommendations for research are also made.