973 resultados para audio PIM


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent debates about media literacy and the internet have begun to acknowledge the importance of active user-engagement and interaction. It is not enough simply to access material online, but also to comment upon it and re-use. Yet how do these new user expectations fit within digital initiatives which increase access to audio-visual-content but which prioritise access and preservation of archives and online research rather than active user-engagement? This article will address these issues of media literacy in relation to audio-visual content. It will consider how these issues are currently being addressed, focusing particularly on the high-profile European initiative EUscreen. EUscreen brings together 20 European television archives into a single searchable database of over 40,000 digital items. Yet creative re-use restrictions and copyright issues prevent users from re-working the material they find on the site. Instead of re-use, EUscreen instead offers access and detailed contextualisation of its collection of material. But if the emphasis for resources within an online environment rests no longer upon access but on user-engagement, what does EUscreen and similar sites offer to different users?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human listeners seem to be remarkably able to recognise acoustic sound sources based on timbre cues. Here we describe a psychophysical paradigm to estimate the time it takes to recognise a set of complex sounds differing only in timbre cues: both in terms of the minimum duration of the sounds and the inferred neural processing time. Listeners had to respond to the human voice while ignoring a set of distractors. All sounds were recorded from natural sources over the same pitch range and equalised to the same duration and power. In a first experiment, stimuli were gated in time with a raised-cosine window of variable duration and random onset time. A voice/non-voice (yes/no) task was used. Performance, as measured by d', remained above chance for the shortest sounds tested (2 ms); d's above 1 were observed for durations longer than or equal to 8 ms. Then, we constructed sequences of short sounds presented in rapid succession. Listeners were asked to report the presence of a single voice token that could occur at a random position within the sequence. This method is analogous to the "rapid sequential visual presentation" paradigm (RSVP), which has been used to evaluate neural processing time for images. For 500-ms sequences made of 32-ms and 16-ms sounds, d' remained above chance for presentation rates of up to 30 sounds per second. There was no effect of the pitch relation between successive sounds: identical for all sounds in the sequence or random for each sound. This implies that the task was not determined by streaming or forward masking, as both phenomena would predict better performance for the random pitch condition. Overall, the recognition of familiar sound categories such as the voice seems to be surprisingly fast, both in terms of the acoustic duration required and of the underlying neural time constants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: As resident work hours policies evolve, residents’ off-duty time remains poorly understood. Despite assumptions about how residents should be using their postcall, off-duty time, there is little research on how residents actually use this time and the reasoning underpinning their activities. This study sought to understand residents’ nonclinical postcall activities when they leave the hospital, their decision-making processes, and their perspectives on the relationship between these activities and their well-being or recovery.

Method: The study took place at a Liaison Committee on Medical Education–accredited Canadian medical school from 2012 to 2014. The authors recruited a purposive and convenience sample of postgraduate year 1–5 residents from six surgical and nonsurgical specialties at three hospitals affiliated with the medical school. Using a constructivist grounded theory approach, semistructured interviews were conducted, audio-taped, transcribed, anonymized, and combined with field notes. The authors analyzed interview transcripts using constant comparative analysis and performed post hoc member checking.

Results: Twenty-four residents participated. Residents characterized their predominant approach to postcall decision making as one of making trade-offs between multiple, competing, seemingly incompatible, but equally valuable, activities. Participants exhibited two different trade-off orientations: being oriented toward maintaining a normal life or toward mitigating fatigue.

Conclusions: The authors’ findings on residents’ trade-off orientations suggest a dual recovery model with postcall trade-offs motivated by the recovery of sleep or of self. This model challenges the dominant viewpoint in the current duty hours literature and suggests that the duty hours discussion must be broadened to include other recovery processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Direct experience of social work in another country is making an increasingly important contribution to internationalising the social work academic curriculum together with the cultural competency of students. However at present this opportunity is still restricted to a limited number of students. The aim of this paper is to describe and reflect on the production of an audio-visual presentation as representing the experience of three students who participated in an exchange with a social work programme in Pune, India. It describes and assesses the rationale, production and use of video to capture student learning from the Belfast/Pune exchange. We also describe the use of the video in a classroom setting with a year group of 53 students from a younger cohort. This exercise aimed to stimulate students’ curiosity about international dimensions of social work and add to their awareness of poverty, social justice, cultural competence and community social work as global issues. Written classroom feedback informs our discussion of the technical as well as the pedagogical benefits and challenges of this approach. We conclude that some benefit of audio-visual presentation in helping students connect with diverse cultural contexts, but that a complementary discussion challenging stereotyped viewpoints and unconscious professional imperialism is also crucial.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experience obtained in the support of mobile learning using podcast audio is reported. The paper outlines design, storage and distribution via a web site. An initial evaluation of the uptake of the approach in a final year computing module was undertaken. Audio objects were tailored to meet different pedagogical needs resulting in a repository of persistent glossary terms and disposable audio lectures distributed by podcasting. An aim of our approach is to document the interest from the students, and evaluate the potential of mobile learning for supplementing revision

Relevância:

20.00% 20.00%

Publicador:

Resumo:

On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine tool chatter is an unfavorable phenomenon during metal cutting, which results in heavy vibration of cutting tool. With increase in depth of cut, the cutting regime changes from chatter-free cutting to one with chatter. In this paper, we propose the use of permutation entropy (PE), a conceptually simple and computationally fast measurement to detect the onset of chatter from the time series using sound signal recorded with a unidirectional microphone. PE can efficiently distinguish the regular and complex nature of any signal and extract information about the dynamics of the process by indicating sudden change in its value. Under situations where the data sets are huge and there is no time for preprocessing and fine-tuning, PE can effectively detect dynamical changes of the system. This makes PE an ideal choice for online detection of chatter, which is not possible with other conventional nonlinear methods. In the present study, the variation of PE under two cutting conditions is analyzed. Abrupt variation in the value of PE with increase in depth of cut indicates the onset of chatter vibrations. The results are verified using frequency spectra of the signals and the nonlinear measure, normalized coarse-grained information rate (NCIR).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Any automatically measurable, robust and distinctive physical characteristic or personal trait that can be used to identify an individual or verify the claimed identity of an individual, referred to as biometrics, has gained significant interest in the wake of heightened concerns about security and rapid advancements in networking, communication and mobility. Multimodal biometrics is expected to be ultra-secure and reliable, due to the presence of multiple and independent—verification clues. In this study, a multimodal biometric system utilising audio and facial signatures has been implemented and error analysis has been carried out. A total of one thousand face images and 250 sound tracks of 50 users are used for training the proposed system. To account for the attempts of the unregistered signatures data of 25 new users are tested. The short term spectral features were extracted from the sound data and Vector Quantization was done using K-means algorithm. Face images are identified based on Eigen face approach using Principal Component Analysis. The success rate of multimodal system using speech and face is higher when compared to individual unimodal recognition systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aquest llibre és el producte d'anys de cooperació entre equips de recerca de cinc països diferents, tot ells Key Institutions de la xarxa Childwatch International, en el marc d'un projecte plurinacional sobre adolescents i mitjans

Relevância:

20.00% 20.00%

Publicador: