984 resultados para Speech act
Resumo:
We investigate the use of a two stage transform vector quantizer (TSTVQ) for coding of line spectral frequency (LSF) parameters in wideband speech coding. The first stage quantizer of TSTVQ, provides better matching of source distribution and the second stage quantizer provides additional coding gain through using an individual cluster specific decorrelating transform and variance normalization. Further coding gain is shown to be achieved by exploiting the slow time-varying nature of speech spectra and thus using inter-frame cluster continuity (ICC) property in the first stage of TSTVQ method. The proposed method saves 3-4 bits and reduces the computational complexity by 58-66%, compared to the traditional split vector quantizer (SVQ), but at the expense of 1.5-2.5 times of memory.
Resumo:
This report presents the results of a study exploring the law and practice of mandatory reporting of child abuse and neglect in the Australian Capital Territory. Government administrative data over a decade (2003-2012) were accessed and analysed to map trends in reporting of different types of child abuse and neglect (physical abuse, sexual abuse, emotional abuse, and neglect) by different reporter groups (both mandated reporters e.g., police, teachers, doctors, nurses depending on the jurisdiction, and non-mandated reporters e.g., family members, neighbours, depending on the jurisdiction), and the outcomes of these reports (whether investigated, and whether substantiated or not). The study was funded by the Australian Government and administered through the Government of Victoria.
Resumo:
We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of ``virtual pattern'' developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy.
Resumo:
Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.
Resumo:
We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single Hidden Markov Model. It is shown that such a solution is possible by aligning the K patterns using the proposed Multi Pattern Dynamic Time Warping algorithm followed by the Constrained Multi Pattern Viterbi Algorithm The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB Signal to Noise Ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.
Resumo:
Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.
Resumo:
Evaluation of entrepreneurship in the speech of academic students and newly qualified young academics a summary of a qualitative attitude study. In Finland very few university students plan to become entrepreneurs. The aim of this research was to examine entrepreneurial attitudes expressed in speech. The material was gathered from interviews with university students and newly qualified young academic adults. The interviewees commented on twelve different sentences with claims formulated using research literature and views that have appeared in public discussions. The interviewees were divided into three different groups based on their self-expressed entrepreneurial intentions. The method of qualitative attitude research (Vesala & Rantanen 1999, 2007) was used in the interviews. The research material was studied using two interpretative theories: (1) The planned behaviour theory (Ajzen 1985, 1991a, b), which makes it possible to focus on the separate elements (attitude towards an act, subjective norms and perceived feasibility) necessary for intentions to develop; and (2) The theory of the two images of entrepreneurship (Vesala 1996), where individualism and relationism can be seen as resources for evaluating entrepreneurship. The subject of the research was how university students and newly qualified young adults viewed entrepreneurship as a general phenomen and in relation to the academic world. A second focus was on the attitudes expressed toward entrepreneurial university education and the possibility of combining entrepreneurship and academic knowledge. Of interest were also questions such as whether academic studies, knowledge and the university itself are resources or barriers to entrepreneurial intentions and entrepreneurship whether university students received any support for their entrepreneurial ambitions from the university and their fellow academic students. The problems tackled by this research were thus the following: How was entrepreneurship seen, both as a general phenomen and in an academic context, when it was evaluated positively, negatively or neutrally by the interviewees? In what way was entrepreneurship constructed in the interviewees attitudes? How were entrepreneurship and the academic world related in the interviewees attitudes? What kind of role did the university as an academic context play in the interviewees attitudes for example were university education and academic knowledge seen as resources or barriers to their entrepreneurial intentions. Traditional attitude studies claim that attitudes are a stable property of an individual. In contrast, rhetorical social psychological and qualitative attitude studies emphasize the contextual and linguistic aspects of attitude, and they offered an alternative viewpoint for this research. The study was based on two general assumptions: attitudes have objects and are evaluative. Here attitude was defined as an evaluative interpresentation made towards an object; adopting an attitude is a contextual process in the sense that attitudes are always concerned with the action context of the persons presenting them. Entrepreneurship, both as a general phenomen and in an academic context, was specified as the object to which an attitude was taken. From a theoretical point of view, qualitative methods suited the general structure of this research well. In a particular, qualitative approach which emphasized contextual elements proved to be both empirically valid and useful for avoiding the problematic assumptions associated with traditional attitude study. The subject of the analysis was the argumentative speech produced by the interviewees. The results of the study show the subjects responses to three main ways of viewing entrepreneurships. The first was an individualistic, ideal image of entrepreneurship. This was mostly evaluated positively and gained wide approval especially among interviewees who included entrepreneurship among their employment choices. Entrepreneurship was seen as the decision to earn one s living independently. In this individualistic image of entrepreneurship, the social context was hardly ever mentioned. Elements which were seen to threaten this ideal image were evaluated negatively. When entrepreneurship was evaluated negatively using the individualistic image of entrepreneurship, it was mentioned that it forced one into a never ending cycle of work and uninterested duties. The relationistic image of entrepreneurship was used as a speech resource when the social context was constructed as an economic resource or a threat to the ideal image of entrepreneurship. In the second view, entrepreneurship was characteristically seen as being based on economics, which was seen as a threat to the ideal individualistic image of entrepreneurship. The risk of economic failure was seen as a limiting factor to entrepreneurial ambitions as it forced entrepreneurs to work around the clock. The third view concerned the relationship between entrepreneurship and the academic world. Entrepreneurship as an employment choice for university educated persons was evaluated as relevant, and thus positively, when university education was constructed as a resource for entrepreneurship - and irrelevant and thus negatively when it was construed as an obstacle, too wide, or when successful entrepreneurship was seen as being mostly based on an individual s personal characteristics. The interviewees with no entrepreneurial intentions expressed the view that academic education didn t provide the proper skills and knowledge for entrepreneurship. The interviewees also expressed interest in university entrepreneurship education, although none had experience on this. The interviewees emphasized the fact that the University didn t encourage them to consider entrepreneurship as a relevant employment choice. The assumption made by this study was that becoming an entrepreneur is a conscious decision, the environment may influence an individual s decisions on how to make a living as it tends to socialise people to act in accordance with cultural traditions. Keywords: Entrepreneurship, Attitudes towards entrepreneurship, Intentional behaviour, Entrepreneurial intention, University entrepreneurship education, Qualitative attitude research (Vesala & Rantanen 1999, 2007), Rhetorical social psychology (Billig 1986), The theory of entrepreneuship s two images: individualism and relationism (Vesala 1996 ), The planned behaviour theory (Ajzen 1985, 1991a, b)
Resumo:
In 2015, Victoria passed laws removing the time limit in which a survivor of child sexual abuse can commence a civil claim for personal injury. The law applies also to physical abuse, and to psychological injury arising from those forms of abuse. In 2016, New South Wales made almost identical legal reforms. These reforms were partly motivated by the recommendations of inquiries into institutional child abuse. Of particular relevance is that the Australian Royal Commission Into Institutional Responses to Child Sexual Abuse recommended in 2015 that all States and Territories remove their time limits for civil claims. This presentation explores the problems with standard time limits when applied to child sexual abuse cases (whether occurring within or beyond institutions), the scientific, ethical and legal justifications for lifting the time limits, and solutions for future law reform.
New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis
Resumo:
This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.
Resumo:
We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.
Resumo:
Non-uniform sampling of a signal is formulated as an optimization problem which minimizes the reconstruction signal error. Dynamic programming (DP) has been used to solve this problem efficiently for a finite duration signal. Further, the optimum samples are quantized to realize a speech coder. The quantizer and the DP based optimum search for non-uniform samples (DP-NUS) can be combined in a closed-loop manner, which provides distinct advantage over the open-loop formulation. The DP-NUS formulation provides a useful control over the trade-off between bitrate and performance (reconstruction error). It is shown that 5-10 dB SNR improvement is possible using DP-NUS compared to extrema sampling approach. In addition, the close-loop DP-NUS gives a 4-5 dB improvement in reconstruction error.
Resumo:
This paper describes a method of automated segmentation of speech assuming the signal is continuously time varying rather than the traditional short time stationary model. It has been shown that this representation gives comparable if not marginally better results than the other techniques for automated segmentation. A formulation of the 'Bach' (music semitonal) frequency scale filter-bank is proposed. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks considering this model. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. 'Bach' filters are seen to marginally outperform the other filters.
Resumo:
This correspondence describes a method for automated segmentation of speech. The method proposed in this paper uses a specially designed filter-bank called Bach filter-bank which makes use of 'music' related perception criteria. The speech signal is treated as continuously time varying signal as against a short time stationary model. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. The Bach filters are seen to marginally outperform the other filters.