13 resultados para Audio-Visual Automatic Speech Recognition

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crowdsourcing linguistic phenomena with smartphone applications is relatively new. Apps have been used to train acoustic models for automatic speech recognition (de Vries et al. 2014) and to archive endangered languages (Iwaidja Inyaman Team 2012). Leemann and Kolly (2013) developed a free app for iOS—Dialäkt Äpp (DÄ) (>78k downloads)—to document language change in Swiss German. Here, we present results of sound change based on DÄ data. DÄ predicts the users’ dialects: for 16 variables, users select their dialectal variant. DÄ then tells users which dialect they speak. Underlying this prediction are maps from the Linguistic Atlas of German-speaking Switzerland (SDS, 1962-2003), which documents the linguistic situation around 1950. If predicted wrongly, users indicate their actual dialect. With this information, the 16 variables can be assessed for language change. Results revealed robustness of phonetic variables; lexical and morphological variables were more prone to change. Phonetic variables like to lift (variants: /lupfə, lʏpfə, lipfə/) revealed SDS agreement scores of nearly 85%, i.e., little sound change. Not all phonetic variables are equally robust: ladle (variants: /xælə, xællə, xæuə, xæɫə, xæɫɫə/) exhibited significant sound change. We will illustrate the results using maps that show details of the sound changes at hand.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. METHODS Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. RESULTS Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. CONCLUSION Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Methods: Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Results: Both aphasic patients and healthy controls mainly fixated the speaker’s face. We found a significant co-speech gesture x ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction x ROI x group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker’s face compared to healthy controls. Conclusion: Co-speech gestures guide the observer’s attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker’s face. Keywords: Gestures, visual exploration, dialogue, aphasia, apraxia, eye movements

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. METHODS Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). CONCLUSION Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the Bag of Features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5,000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10,000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE To evaluate the speech intelligibility in noise with a new cochlear implant (CI) processor that uses a pinna effect imitating directional microphone system. STUDY DESIGN Prospective experimental study. SETTING Tertiary referral center. PATIENTS Ten experienced, unilateral CI recipients with bilateral severe-to-profound hearing loss. INTERVENTION All participants performed speech in noise tests with the Opus 2 processor (omnidirectional microphone mode only) and the newer Sonnet processor (omnidirectional and directional microphone mode). MAIN OUTCOME MEASURE The speech reception threshold (SRT) in noise was measured in four spatial settings. The test sentences were always presented from the front. The noise was arriving either from the front (S0N0), the ipsilateral side of the CI (S0NIL), the contralateral side of the CI (S0NCL), or the back (S0N180). RESULTS The directional mode improved the SRTs by 3.6 dB (p < 0.01), 2.2 dB (p < 0.01), and 1.3 dB (p < 0.05) in the S0N180, S0NIL, and S0NCL situations, when compared with the Sonnet in the omnidirectional mode. There was no statistically significant difference in the S0N0 situation. No differences between the Opus 2 and the Sonnet in the omnidirectional mode were observed. CONCLUSION Speech intelligibility with the Sonnet system was statistically different to speech recognition with the Opus 2 system suggesting that CI users might profit from the pinna effect imitating directionality mode in noisy environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The level of improvement in the audiological results of Baha(®) users mainly depends on the patient's preoperative hearing thresholds and the type of Baha sound processor used. This investigation shows correlations between the preoperative hearing threshold and postoperative aided thresholds and audiological results in speech understanding in quiet of 84 Baha users with unilateral conductive hearing loss, bilateral conductive hearing loss and bilateral mixed hearing loss. Secondly, speech understanding in noise of 26 Baha users with different Baha sound processors (Compact, Divino, and BP100) is investigated. Linear regression between aided sound field thresholds and bone conduction (BC) thresholds of the better ear shows highest correlation coefficients and the steepest slope. Differences between better BC thresholds and aided sound field thresholds are smallest for mid-frequencies (1 and 2 kHz) and become larger at 0.5 and 4 kHz. For Baha users, the gain in speech recognition in quiet can be expected to lie in the order of magnitude of the gain in their hearing threshold. Compared to its predecessor sound processors Baha(®) Compact and Baha(®) Divino, Baha(®) BP100 improves speech understanding in noise significantly by +0.9 to +4.6 dB signal-to-noise ratio, depending on the setting and the use of directional microphone. For Baha users with unilateral and bilateral conductive hearing loss and bilateral mixed hearing loss, audiological results in aided sound field thresholds can be estimated with the better BC hearing threshold. The benefit in speech understanding in quiet can be expected to be similar to the gain in their sound field hearing threshold. The most recent technology of Baha sound processor improves speech understanding in noise by an order of magnitude that is well perceived by users and which can be very useful in everyday life.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND Efficiently performed basic life support (BLS) after cardiac arrest is proven to be effective. However, cardiopulmonary resuscitation (CPR) is strenuous and rescuers' performance declines rapidly over time. Audio-visual feedback devices reporting CPR quality may prevent this decline. We aimed to investigate the effect of various CPR feedback devices on CPR quality. METHODS In this open, prospective, randomised, controlled trial we compared three CPR feedback devices (PocketCPR, CPRmeter, iPhone app PocketCPR) with standard BLS without feedback in a simulated scenario. 240 trained medical students performed single rescuer BLS on a manikin for 8min. Effective compression (compressions with correct depth, pressure point and sufficient decompression) as well as compression rate, flow time fraction and ventilation parameters were compared between the four groups. RESULTS Study participants using the PocketCPR performed 17±19% effective compressions compared to 32±28% with CPRmeter, 25±27% with the iPhone app PocketCPR, and 35±30% applying standard BLS (PocketCPR vs. CPRmeter p=0.007, PocketCPR vs. standard BLS p=0.001, others: ns). PocketCPR and CPRmeter prevented a decline in effective compression over time, but overall performance in the PocketCPR group was considerably inferior to standard BLS. Compression depth and rate were within the range recommended in the guidelines in all groups. CONCLUSION While we found differences between the investigated CPR feedback devices, overall BLS quality was suboptimal in all groups. Surprisingly, effective compression was not improved by any CPR feedback device compared to standard BLS. All feedback devices caused substantial delay in starting CPR, which may worsen outcome.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Under the name Nollywood a unique video film industry has developed in Nigeria in the last few decades, which now forms one of the world’s biggest entertainment industries. With its focus on stories reflecting „the values, desires and fears” (Haynes 2007: 133) of African viewers and its particular way of production, Nollywood brings „lived practices and its representation together in ways that make the films deeply accessible and entirely familiar to their audience“ (Marston et al. 2007: 57). In doing so, Nollywood shows its spectators new postcolonial forms of performative self‐expression and becomes a point of reference for a wide range of people. However, Nollywood not only excites a large number of viewers inside and outside Nigeria, it also inspires some of them to become active themselves and make their own films. This effect of Nigerian filmmaking can be found in many parts of sub‐Saharan Africa as well as in African diasporas all over the world – including Switzerland (Mooser 2011: 63‐66). As a source of inspiration, Nollywood and its unconventional ways of filmmaking offer African migrants a benchmark that meets their wish to express themselves as minority group in a foreign country. As Appadurai (1996: 53), Ginsburg (2003: 78) and Marks (2000: 21) assume, filmmakers with a migratory background have a specific need to express themselves through media. As minority group members in their country of residence they not only wish to reflect upon their situation within the diaspora and illustrate their everyday struggles as foreigners, but to also express their own views and ideas in order to challenge dominant public opinion (Ginsburg 2003: 78). They attempt to “talk back to the structures of power” (2003: 78) they live in. In this process, their audio-visual works become a means of response and “an answering echo to a previous presentation or representation” (Mitchell 1994: 421). The American art historian Mitchell, therefore, suggests interpreting representation as “the relay mechanism in exchange of power, value, and publicity” (1994: 420). This desire of interacting with the local public has also been expressed during a film project of African, mainly Nigerian, first-generation migrants in Switzerland I am currently partnering in. Several cast and crew members have expressed feelings of being under-represented, even misrepresented, in the dominant Swiss media discourse. In order to create a form of exchange and give themselves a voice, they consequently produce a Nollywood inspired film and wish to present it to the society they live in. My partnership in this on‐going film production (which forms the foundation of my PhD field study) allows me to observe and experience this process. By employing qualitative media anthropological methods and in particular Performance Ethnography, I seek to find out more about the ways African migrants represent themselves as a community through audio‐visual media and the effect the transnational use of Nollywood has on their form of self‐representations as well as the ways they express themselves.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND Resuscitation guidelines encourage the use of cardiopulmonary resuscitation (CPR) feedback devices implying better outcomes after sudden cardiac arrest. Whether effective continuous feedback could also be given verbally by a second rescuer ("human feedback") has not been investigated yet. We, therefore, compared the effect of human feedback to a CPR feedback device. METHODS In an open, prospective, randomised, controlled trial, we compared CPR performance of three groups of medical students in a two-rescuer scenario. Group "sCPR" was taught standard BLS without continuous feedback, serving as control. Group "mfCPR" was taught BLS with mechanical audio-visual feedback (HeartStart MRx with Q-CPR-Technology™). Group "hfCPR" was taught standard BLS with human feedback. Afterwards, 326 medical students performed two-rescuer BLS on a manikin for 8 min. CPR quality parameters, such as "effective compression ratio" (ECR: compressions with correct hand position, depth and complete decompression multiplied by flow-time fraction), and other compression, ventilation and time-related parameters were assessed for all groups. RESULTS ECR was comparable between the hfCPR and the mfCPR group (0.33 vs. 0.35, p = 0.435). The hfCPR group needed less time until starting chest compressions (2 vs. 8 s, p < 0.001) and showed fewer incorrect decompressions (26 vs. 33 %, p = 0.044). On the other hand, absolute hands-off time was higher in the hfCPR group (67 vs. 60 s, p = 0.021). CONCLUSIONS The quality of CPR with human feedback or by using a mechanical audio-visual feedback device was similar. Further studies should investigate whether extended human feedback training could further increase CPR quality at comparable costs for training.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND Screening of aphasia in acute stroke is crucial for directing patients to early language therapy. The Language Screening Test (LAST), originally developed in French, is a validated language screening test that allows detection of a language deficit within a few minutes. The aim of the present study was to develop and validate two parallel German versions of the LAST. METHODS The LAST includes subtests for naming, repetition, automatic speech, and comprehension. For the translation into German, task constructs and psycholinguistic criteria for item selection were identical to the French LAST. A cohort of 101 stroke patients were tested, all of whom were native German speakers. Validation of the LAST was based on (1) analysis of equivalence of the German versions, which was established by administering both versions successively in a subset of patients, (2) internal validity by means of internal consistency analysis, and (3) external validity by comparison with the short version of the Token Test in another subset of patients. RESULTS The two German versions were equivalent as demonstrated by a high intraclass correlation coefficient of 0.91. Furthermore, an acceptable internal structure of the LAST was found (Cronbach's α = 0.74). A highly significant correlation (r = 0.74, p < 0.0001) between the LAST and the short version of the Token Test indicated good external validity of the scale. CONCLUSION The German version of the LAST, available in two parallel versions, is a new and valid language screening test in stroke.