2 resultados para sound source segregation


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sounds offer a rich source of information about events taking place in our physical and social environment. However, outside the domains of speech and music, little is known about whether humans can recognize and act upon the intentions of another agent’s actions detected through auditory information alone. In this study we assessed whether intention can be inferred from the sound an action makes, and in turn, whether this information can be used to prospectively guide movement. In two experiments experienced and novice basketball players had to virtually intercept an attacker by listening to audio recordings of that player’s movements. In the first experiment participants had to move a slider, while in the second one their body, to block the perceived passage of the attacker as they would in a real basketball game. Combinations of deceptive and non-deceptive movements were used to see if novice and/or experienced listeners could perceive the attacker’s intentions through sound alone. We showed that basketball players were able to more accurately predict final running direction compared to non-players, particularly in the second experiment when the interceptive action was more basketball specific. We suggest that athletes present better action anticipation by being able to pick up and use the relevant kinematic features of deceptive movement from event-related sounds alone. This result suggests that action intention can be perceived through the sound a movement makes and that the ability to determine another person’s action intention from the information conveyed through sound is honed through practice.