212 resultados para emotional speech


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This research examined for the first time the relationship between emotional manipulation, emotional intelligence, and primary and secondary psychopathy. As predicted, in Study 1 (N = 73), emotional manipulation was related to both primary and secondary psychopathy. Only secondary psychopathy was related to perceived poor emotional skills. Secondary psychopathy was also related to emotional concealment. Emotional intelligence was negatively related to perceived poor emotional skills, emotional concealment, and primary and secondary psychopathy. In Study 2 (N = 275), two additional variables were included: alexithymia and ethical position. It was found that for males, primary psychopathy and emotional intelligence predicted emotional manipulation, while for females emotional intelligence acted as a suppressor, and ethical idealism and secondary psychopathy were additional predictors. For males, emotional intelligence and alexithymia were related to perceived poor emotional skills, while for females emotional intelligence, but not alexithymia, predicted perceived poor emotional skills, with ethical idealism acting as a suppressor. For both males and females, alexithymia predicted emotional concealment. These findings suggest that the mechanisms behind the emotional manipulation–psychopathy relationship differ as a function of gender. Examining the different aspects of emotional manipulation as separate but related constructs may enhance understanding of the construct of emotional manipulation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, a 'cultural turn' in the study of class has resulted in a rich body of work detailing the ways in which class advantage and disadvantage are emotionally inscribed and embodied in educational settings. To date, however, much of this literature has focused on the urban sphere. In order to address this gap in the literature, this paper focuses on the affective evaluations made by teachers employed in rural and remote Australian schools of students' families, bodies, expectations and practices. The central argument is that moral ascriptions of class by the teachers are powerfully shaped by dominant socio-cultural constructions of rurality that equate 'the rural' with agriculture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The experience of emotional expression in the context of social relations is not well understood for people diagnosed with schizophrenia. Early phenomenological research on the experience of people diagnosed with schizophrenia traditionally focussed on self experience in isolation from others, with later research explicating isolated aspects of self experience in relation to others. The current research aimed to focus on the progressive experience of emotional expression of people diagnosed with schizophrenia in relation to others over 12 months, in order to gain a broad spectrum of experience. This study involved unstructured interviews with 7 participants, an average of 4 times each, over a period of 12 months. Due to the unstructured nature of the interviews, a great breadth of experience was explicated. From the interviews there emerged 6 themes grouped together as a transition into, and 5 themes grouped together as a recovery from, symptoms associated with a diagnosis of schizophrenia. Special significance was given to the theme of relational confusion as an experience that provides an understanding of the relationship between social stressors and personal characteristics with responses that are associated with a diagnosis of schizophrenia. It was suggested that participants experienced themselves, including their distancing and isolating responses, as a part of a social system. The breadth of experiences that emerged afforded a framework of experiences within which prior phenomenological research findings on static moments of experience have been located. A more meaningful understanding of the transitioning into and recovery from the experiences associated with a diagnosis of schizophrenia will afford advances in mental health practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A self-report measure of the emotional and behavioural reactions to intrusive thoughts was developed. The paper presents data that confirm the stability, reliability and validity of the new 7-item measure. Emotional and behavioural reactions to intrusions emerged as separate factors on the Emotional and Behavioural Reactions to Intrusions Questionnaire (EBRIQ), a finding confirmed by an independent stress study. Test retest reliability over 30-70 days was good. Expected relationships with other constructs were significant. Stronger negative responses to intrusions were associated with lower mindfulness scores and higher ratings of experiential avoidance, thought suppression and intensity and frequency of craving. The EBRIQ will help explore differences in reactions to intrusive thoughts in clinical and non clinical populations, and across different emotional and behavioural states. It will also be useful in assessing the effects of therapeutic approaches such as mindfulness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Indigenous men’s support groups are designed to empower men to take greater control and responsibility for their health and wellbeing. They provide health education sessions, counselling, men’s health clinics, diversionary programs for men facing criminal charges, cultural activities, drug- and alcohol-free social events, and advocacy for resources. Despite there being ~100 such groups across Australia, there is a dearth of literature on their strategies and outcomes. This paper is based on participatory action research involving two north Queensland groups which were the subject of a series of five ‘phased’ evaluative reports between 2002 and 2007. By applying ‘meta-ethnography’ to the five studies, we identified four themes which provide new interpretations of the data. Self-reported benefits included improved social and emotional wellbeing, modest lifestyle modifications and willingness to change current notions of ‘gendered’ roles within the home, such as sharing housework. Our qualitative research to date suggests that through promoting empowerment, wellbeing and social cohesion for men and their families, men’s support groups may be saving costs through reduced expenditure on health care, welfare, and criminal justice costs, and higher earnings. Future research needs to demonstrate this empirically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

What really changed for Australian Aboriginal and Torres Strait Islander people between Paul Keating’s Redfern Park Speech (Keating 1992) and Kevin Rudd’s Apology to the stolen generations (Rudd 2008)? What will change between the Apology and the next speech of an Australian Prime Minister? The two speeches were intricately linked, and they were both personal and political. But do they really signify change at the political level? This paper reflects my attempt to turn the gaze away from Aboriginal and Torres Strait Islander people, and back to where the speeches originated: the Australian Labor Party (ALP). I question whether the changes foreshadowed in the two speeches – including changes by the Australian public and within Australian society – are evident in the internal mechanisms of the ALP. I also seek to understand why non-Indigenous women seem to have given in to the existing ways of the ALP instead of challenging the status quo which keeps Aboriginal and Torres Strait Islander peoples marginalised. I believe that, without a thorough examination and a change in the ALP’s practices, the domination and subjugation of Indigenous peoples will continue – within the Party, through the Australian political process and, therefore, through governments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.