190 resultados para speech disorder


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: The aim of the present study was to investigate whether parent report of family resilience predicted children’s disaster-induced post-traumatic stress disorder (PTSD) and general emotional symptoms, independent of a broad range of variables including event-related factors, previous child mental illness and social connectedness. ---------- Methods: A total of 568 children (mean age = 10.2 years, SD = 1.3) who attended public primary schools, were screened 3 months after Cyclone Larry devastated the Innisfail region of North Queensland. Measures included parent report on the Family Resilience Measure and Strengths and Difficulties Questionnaire (SDQ)–emotional subscale and child report on the PTSD Reaction Index, measures of event exposure and social connectedness. ---------- Results: Sixty-four students (11.3%) were in the severe–very severe PTSD category and 53 families (28.6%) scored in the poor family resilience range. A lower family resilience score was associated with child emotional problems on the SDQ and longer duration of previous child mental health difficulties, but not disaster-induced child PTSD or child threat perception on either bivariate analysis, or as a main or moderator variable on multivariate analysis (main effect: adjusted odds ratio (ORadj) = 0.57, 95% confidence interval (CI) = 0.13–2.44). Similarly, previous mental illness was not a significant predictor of child PTSD in the multivariate model (ORadj = 0.75, 95%CI = 0.16–3.61). ---------- Conclusion: In this post-disaster sample children with existing mental health problems and those of low-resilience families were not at elevated risk of PTSD. The possibility that the aetiological model of disaster-induced child PTSD may differ from usual child and adolescent conceptualizations is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Keizer, Lindenberg and Steg (2008) conduct six interesting field experiments and report that their results provide evidence of the broken windows theory. Such an analysis is highly relevant as the (broken windows) theory is both controversial and lacking empirical support. Keizer et al.’s key aim was to conceptualize a disorderly setting in such a way that it is linked to a process of spreading norm violation. The strength of the study is the exploration of cross-norm inhibition effects in a controlled field experimental environment. Their results show that if norm violating behavior becomes more common, it negatively affects compliance in other areas. Nevertheless, this comment paper discusses several shortcomings or limitations and provides new empirical evidence that deals with these problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Voice recognition is one of the key enablers to reduce driver distraction as in-vehicle systems become more and more complex. With the integration of voice recognition in vehicles, safety and usability are improved as the driver’s eyes and hands are not required to operate system controls. Whilst speaker independent voice recognition is well developed, performance in high noise environments (e.g. vehicles) is still limited. La Trobe University and Queensland University of Technology have developed a low-cost hardware-based speech enhancement system for automotive environments based on spectral subtraction and delay–sum beamforming techniques. The enhancement algorithms have been optimised using authentic Australian English collected under typical driving conditions. Performance tests conducted using speech data collected under variety of vehicle noise conditions demonstrate a word recognition rate improvement in the order of 10% or more under the noisiest conditions. Currently developed to a proof of concept stage there is potential for even greater performance improvement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The structures of two polymorphs of the anhydrous cocrystal adduct of bis(quinolinium-2-carboxylate) DL-malic acid, one triclinic the other monoclinic and disordered, have been determined at 200 K. Crystals of the triclinic polymorph 1 have space group P-1, with Z = 1 in a cell with dimensions a = 4.4854(4), b = 9.8914(7), c = 12.4670(8)Å, α = 79.671(5), β = 83.094(6), γ = 88.745(6)deg. Crystals of the monoclinic polymorph 2 have space group P21/c, with Z = 2 in a cell with dimensions a = 13.3640(4), b = 4.4237(12), c = 18.4182(5)Å, β = 100.782(3)deg. Both structures comprise centrosymmetric cyclic hydrogen-bonded quinolinic acid zwitterion dimers [graph set R2/2(10)] and 50% disordered malic acid molecules which lie across crystallographic inversion centres. However, the oxygen atoms of the malic acid carboxylic groups in 2 are 50% rotationally disordered whereas in 1 these are ordered. There are similar primary malic acid carboxyl O-H...quinaldic acid hydrogen-bonding chain interactions in each polymorph, extended into two-dimensional structures but in l this involves centrosymmetric cyclic head-to-head malic acid hydroxyl-carboxyl O-H...O interactions [graph set R2/2(10)] whereas in 2 the links are through single hydroxy-carboxyl hydrogen bonds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Attachment theory has been conceptualised as an affect regulation theory, proposing that attachment is associated with the expression and recognition of emotions as well as interpersonal functioning. Previous research has reported affect regulation difficulties in substance use disorders and addiction has been considered an attachment disorder. However, scarce empirical research exists on the relationship of attachment in relation to affect regulation and interpersonal functioning in those with substance use problems. Thus, the objective of the present study was to investigate potential associations between attachment, negative mood regulation (NMR) expectancies, fear of intimacy and self-differentiation in substance abusers. The revised adult attachment scale (RAAS), the NMR expectancies scale, the fear of intimacy scale and the differentiation of self inventory were administered to a sample of 100 substance use disorder inpatients. Attachment accounted for significant variance in NMR expectancies and was also a strong predictor of fear of intimacy. The predictive utility of attachment also extended to self-differentiation, suggesting that attachment was strongly related to overall self-differentiation score, Emotional reactivity, Emotional cut-off and I position. These findings support attachment theory suggesting that attachment is associated with and predicts affect regulation abilities and difficulties in interpersonal functioning in a sample of substance use disorder inpatients. The inclusion and assessment of attachment appears to be important in the development of treatment programmes for substance abusing individuals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since the mid-1990s, government policies in the USA, Canada, England, and Australia have promoted the need to produce an ICT skilled workforce in order to ensure national competitiveness in globalised economic conditions. In this article, we examine the ways in which these policy intentions in 1 state in Australia were translated into a techno-determinist and technocentric plan which focused primarily on getting wired up and connected. We summarise the findings from 2 projects: an investigation of a state-wide principals' professional development programme and an action research study investigating literacy, educational disadvantage, and information technologies. We found significant differences in the distribution of the physical and human capabilities between schools which made the task of engaging with ICT harder for some than others. Nevertheless, we suggest that some school leaders did develop innovative practice. We suggest that policy deficits made it difficult for school leaders to grapple with the dimensions of and debates about the kinds of educational changes that schools and school systems should be making. © 2006 Taylor & Francis.