88 resultados para Speech in Noise
Resumo:
Properly designed decision support environments encourage proactive and objective decision making. The work presented in this paper inquires into developing a decision support environment and a tool to facilitate objective decision making in dealing with road traffic noise. The decision support methodology incorporates traffic amelioration strategies both within and outside the road reserve. The project is funded by the CRC for Construction Innovation and conducted jointly by the RMIT University and the Queensland Department of Main Roads (MR) in collaboration with the Queensland Department of Public Works, Arup Pty Ltd., and the Queensland University of Technology. In this paper, the proposed decision support framework is presented in the way of a flowchart which enabled the development of the decision support tool (DST). The underpinning concept is to establish and retain an information warehouse for each critical road segment (noise corridor) for a given planning horizon. It is understood that, in current practice, some components of the approach described are already in place but not fully integrated and supported. It provides an integrated user-friendly interface between traffic noise modeling software, noise management criteria and cost databases.
Resumo:
The progress of a nationally representative sample of 3632 children was followed from early childhood through to primary school, using data from the Longitudinal Study of Australian Children (LSAC). The aim was to examine the predictive effects of different aspects of communicative ability, and of early vs. sustained identification of speech and language impairment, on children's achievement and adjustment at school. Four indicators identified speech and language impairment: parent-rated expressive language concern; parent-rated receptive language concern; use of speech-language pathology services; below average scores on the adapted Peabody Picture Vocabulary Test-III. School outcomes were assessed by teachers' ratings of language/literacy ability, numeracy/mathematical thinking and approaches to learning. Comparison of group differences, using ANOVA, provided clear evidence that children who were identified as having speech and language impairment in their early childhood years did not perform as well at school, two years later, as their non-impaired peers on all three outcomes: Language and Literacy, Mathematical Thinking, and Approaches to Learning. The effects of early speech and language status on literacy, numeracy, and approaches to learning outcomes were similar in magnitude to the effect of family socio-economic factors, after controlling for child characteristics. Additionally, early identification of speech and language impairment (at age 4-5) was found to be a better predictor of school outcomes than sustained identification (at aged 4-5 and 6-7 years). Parent-reports of speech and language impairment in early childhood are useful in foreshadowing later difficulties with school and providing early intervention and targeted support from speech-language pathologists and specialist teachers.
Resumo:
We propose an efficient and low-complexity scheme for estimating and compensating clipping noise in OFDMA systems. Conventional clipping noise estimation schemes, which need all demodulated data symbols, may become infeasible in OFDMA systems where a specific user may only know his own modulation scheme. The proposed scheme first uses equalized output to identify a limited number of candidate clips, and then exploits the information on known subcarriers to reconstruct clipped signal. Simulation results show that the proposed scheme can significantly improve the system performance.
Resumo:
The Autistic Behavioural Indicators Instrument (ABII) is an 18-item instrument developed to identify children with Autistic Disorder (AD) based on the presence of unique autistic behavioural indicators. The ABII was administered to 20 children with AD, 20 children with speech and language impairment (SLI) and 20 typically developing (TD) children aged 2-6 years. Results indicated that the ABII discriminated children diagnosed with AD from those diagnosed with SLI and those who were TD, based on the presence of specific social attention, sensory, and behavioural symptoms. A combination of symptomology across these domains correctly classified 100% of children with and without AD. The paper concludes that the ABII shows considerable promise as an instrument for the early identification of AD.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Resumo:
This paper presents a method of voice activity detection (VAD) suitable for high noise scenarios, based on the fusion of two complementary systems. The first system uses a proposed non-Gaussianity score (NGS) feature based on normal probability testing. The second system employs a histogram distance score (HDS) feature that detects changes in the signal through conducting a template-based similarity measure between adjacent frames. The decision outputs by the two systems are then merged using an open-by-reconstruction fusion stage. Accuracy of the proposed method was compared to several baseline VAD methods on a database created using real recordings of a variety of high-noise environments.
Resumo:
This paper presents a method of voice activity detection (VAD) for high noise scenarios, using a noise robust voiced speech detection feature. The developed method is based on the fusion of two systems. The first system utilises the maximum peak of the normalised time-domain autocorrelation function (MaxPeak). The second zone system uses a novel combination of cross-correlation and zero-crossing rate of the normalised autocorrelation to approximate a measure of signal pitch and periodicity (CrossCorr) that is hypothesised to be noise robust. The score outputs by the two systems are then merged using weighted sum fusion to create the proposed autocorrelation zero-crossing rate (AZR) VAD. Accuracy of AZR was compared to state of the art and standardised VAD methods and was shown to outperform the best performing system with an average relative improvement of 24.8% in half-total error rate (HTER) on the QUT-NOISE-TIMIT database created using real recordings from high-noise environments.
Resumo:
Since the launch of the ‘Clean Delhi, Green Delhi’ campaign in 2003, slums have become a significant social and political issue in India’s capital city. Through this campaign, the state, in collaboration with Delhi’s middle class through the ‘Bhagidari system’ (literally translated as ‘participatory system’), aims to transform Delhi into a ‘world-class city’ that offers a sanitised, aesthetically appealing urban experience to its citizens and Western visitors. In 2007, Delhi won the bid to host the 2010 Commonwealth Games; since then, this agenda has acquired an urgent, almost violent, impetus to transform Delhi into an environmentally friendly, aesthetically appealing and ‘truly international city’. Slums and slum-dwellers, with their ‘filth, dirt, and noise’, have no place in this imagined city. The violence inflicted upon slum-dwellers, including the denial of their judicial rights, is justified on these accounts. In addition, the juridical discourse since 2000 has ‘re-problematised slums as ‘nuisance’. The rising antagonism of the middle-classes against the poor, supported by the state’s ambition to have a ‘world-class city’, has allowed a new rhetoric to situate the slums in the city. These representations articulate slums as homogenised spaces of experience and identity. The ‘illegal’ status of slum-dwellers, as encroachers upon public space, is stretched to involve ‘social, cultural, and moral’ decadence and depravity. This thesis is an ethnographic exploration of everyday life in a prominent slum settlement in Delhi. It sensually examines the social, cultural and political materiality of slums, and the relationship of slums with the middle class. In doing so, it highlights the politics of sensorial ordering of slums as ‘filthy, dirty, and noisy’ by the middle classes to calcify their position as ‘others’ in order to further segregate, exclude and discriminate the slums. The ethnographic experience in the slums, however, highlights a complex sensorial ordering and politics of its own. Not only are the interactions between diverse communities in slums highly restricted and sensually ordained, but the middle class is identified as a sensual ‘other’, and its sensual practices prohibited. This is significant in two ways. First, it highlights the multiplicity of social, cultural experience and engagement in the slums, thereby challenging its homogenised representation. Second, the ethnographic exploration allowed me to frame a distinct sense of self amongst the slums, which is denied in mainstream discourses, and allowed me to identify the slums’ own ’others’, middle class being one of them. This thesis highlights sound – its production, performances and articulations – as an act with social, cultural, and political implications and manifestations. ‘Noise’ can be understood as a political construct to identify ‘others’ – and both slum-dwellers and the middle classes identify different sonic practices as noise to situate the ‘other’ sonically. It is within this context that this thesis frames the position of Listener and Hearer, which corresponds to their social-political positions. These positions can be, and are, resisted and circumvented through sonic practices. For instance, amplification tactics in the Karimnagar slums, which are understood as ‘uncultured, callous activities to just create more noise’ by the slums’ middle-class neighbours, also serve definite purposes in shaping and navigating the space through the slums’ soundscapes, asserting a presence that is otherwise denied. Such tactics allow the residents to define their sonic territories and scope of sonic performances; they are significant in terms of exerting one’s position, territory and identity, and they are very important in subverting hierarchies. The residents of the Karimnagar slums have to negotiate many social, cultural, moral and political prejudices in their everyday lives. Their identity is constantly under scrutiny and threat. However, the sonic cultures and practices in the Karimnagar slums allow their residents to exert a definite sonic presence – which the middle class has to hear. The articulation of noise and silence is an act manifesting, referencing and resisting social, cultural, and political power and hierarchies.
Resumo:
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise