30 resultados para Perceptual speech analysis

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis describes work undertaken in order to fulfil a need experienced in the Department of Educational Enquiry at the University of Aston in Birmingham for speech analysis facilities suitable for use in teaching and research work within the Department. The hardware and software developed during the research project provides displays of speech fundamental frequency and intensity in real time. The system is suitable for the provision of visual feedback of these parameters of a subject's speech in a learning situation, and overcomes the inadequacies of equipment currently used for this task in that it provides a clear indication of fundamental frequency contours as the subject is speaking. The thesis considers the use of such equipment in several related fields, and the approaches that have been reported to one of the major problems of speech analysis, namely pitch-period estimation. A number of different systems are described, and their suitability for the present purposes is discussed. Finally, a novel method of pitch-period estimation is developed, and a speech analysis system incorporating this method is described. Comparison is made between the results produced by this system and those produced by a conventional speech spectrograph.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The standard reference clinical score quantifying average Parkinson's disease (PD) symptom severity is the Unified Parkinson's Disease Rating Scale (UPDRS). At present, UPDRS is determined by the subjective clinical evaluation of the patient's ability to adequately cope with a range of tasks. In this study, we extend recent findings that UPDRS can be objectively assessed to clinically useful accuracy using simple, self-administered speech tests, without requiring the patient's physical presence in the clinic. We apply a wide range of known speech signal processing algorithms to a large database (approx. 6000 recordings from 42 PD patients, recruited to a six-month, multi-centre trial) and propose a number of novel, nonlinear signal processing algorithms which reveal pathological characteristics in PD more accurately than existing approaches. Robust feature selection algorithms select the optimal subset of these algorithms, which is fed into non-parametric regression and classification algorithms, mapping the signal processing algorithm outputs to UPDRS. We demonstrate rapid, accurate replication of the UPDRS assessment with clinically useful accuracy (about 2 UPDRS points difference from the clinicians' estimates, p < 0.001). This study supports the viability of frequent, remote, cost-effective, objective, accurate UPDRS telemonitoring based on self-administered speech tests. This technology could facilitate large-scale clinical trials into novel PD treatments.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Speech comprises dynamic and heterogeneous acoustic elements, yet it is heard as a single perceptual stream even when accompanied by other sounds. The relative contributions of grouping “primitives” and of speech-specific grouping factors to the perceptual coherence of speech are unclear, and the acoustical correlates of the latter remain unspecified. The parametric manipulations possible with simplified speech signals, such as sine-wave analogues, make them attractive stimuli to explore these issues. Given that the factors governing perceptual organization are generally revealed only where competition operates, the second-formant competitor (F2C) paradigm was used, in which the listener must resist competition to optimize recognition [Remez et al., Psychol. Rev. 101, 129-156 (1994)]. Three-formant (F1+F2+F3) sine-wave analogues were derived from natural sentences and presented dichotically (one ear = F1+F2C+F3; opposite ear = F2). Different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, regardless of their amplitude characteristics. In contrast, F2Cs with constant frequency contours were completely ineffective. Competitor efficacy was not due to energetic masking of F3 by F2C. These findings indicate that modulation of the frequency, but not the amplitude, contour is critical for across-formant grouping.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Six experiments investigated the influence of several grouping cues within the framework of the Verbal Transformation Effect (VTE, Experiments 1 to 4) and Phonemic Transformation Effect (PTE, Experiments 5 and 6), where listening to a repeated word (VTE) or sequence of vowels (PTE) produces verbal transformations (VTs). In Experiment 1, the influence of F0 frequency and lateralization cues (ITDs) was investigated in terms of the pattern of VTs. As the lateralization difference increased between two repeating sequences, the number of forms was significantly reduced with the fewest forms reported in the dichotic condition. Experiment 2 explored whether or not propensity to report more VTs on high pitch was due to the task demands of monitoring two sequences at once. The number of VTs reported was higher when listeners were asked to attend to one sequence only, suggesting smaller attentional constraints on the task requirements. In Experiment 3, consonant-vowel transitions were edited out from two sets of six stimuli words with ‘strong’ and ‘weak’ formant transitions, respectively. Listeners reported more forms in the spliced-out than in the unedited case for the strong-transition words, but not for those with weak transitions. A similar trend was observed for the F0 contour manipulation used in Experiment 4 where listeners reported more VTs and forms for words following a discontinuous F0 contour. In Experiments 5 and 6, the role of F0 frequency and ITD cues was investigated further using a related phenomenon – the PTE. Although these manipulations had relatively little effect on the number of VTs and forms reported, they did influence the particular forms heard. In summary, the current experiments confirmed that it is possible to successfully investigate auditory grouping cues within the VTE framework and that, in agreement with recent studies, the results can be attributed to the perceptual re-grouping of speech sounds.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper examines the connected speech process described by Wells (1982b) as the T to R rule in the West Midlands speech variety associated with the Black Country. The T to R rule is well known as a linguistic marker of local varieties of the middle and far north of England. Less well understood is its position in the phonological systems of Midlands varieties. Varieties of the Midlands of England are underresearched in comparison with varieties of the north, and what is known about the application of the T to R rule in this transitional dialect area is correspondingly nebulous. This paper focuses on the Black Country area, and examines the possible outputs in the contexts which give rise to /t/ becoming [?] in local varieties of the north. I examine the written and spoken evidence which suggests that the T to R rule does indeed operate in the Black Country variety. My analysis focuses on possible phonetic outcomes of the T to R rule across time. In my conclusion, I discuss briefly the possibility that, lying on a bundle of isoglosses separating north from south, the variety of the Black Country reflects this in that a T to [?] rule, rather than a T to R rule, is the dominant output of this connected speech process in the Black Country.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The judicial interest in ‘scientific’ evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis deals with the problematic of the business systems systemic purpose definition. The definition of the systemic purpose, which is regarded as the utmost expression of the system's purposefulness, is to be achieved by ensuring the participation of all the stakeholders, if possible, who affect or they are affected by the business system's operations. The nature of participation, defined as a process of the stakeholders' perceptual exchanges, is deemed to be problematic in itself due to the influence exerted upon it by organisational power, coercion and false consciousness. The main focus of the thesis then is to make aware and provide the stakeholders with an explicit philosophical pedestal and a set of principles upon which a meta- epistemological framework for the enquiry of the business system's purposeful behaviour is developed. In addition, the thesis focuses on the development of a methodology that can be used by the stakeholders to achieve self-knowledge through the critical and systemic examination of their normative presuppositions, about the business system, at both sociological as well as the psychological levels concurrently and the subsequent development of an organisational intrinsically motivated information system. According to the critical systems philosophy and principles, developed in this thesis, normative presuppositions define the stakeholders' perceptions about the purposeful behaviour of the business system they perceived as having a material, an informational and/or an emacipatory stake (human interest) in. The methodology will provide Information Systems that demonstrably improve coordination of organisational activities by enabling the development and maintenance of a single/multifaceted view of purpose throughout organisations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To make vision possible, the visual nervous system must represent the most informative features in the light pattern captured by the eye. Here we use Gaussian scale-space theory to derive a multiscale model for edge analysis and we test it in perceptual experiments. At all scales there are two stages of spatial filtering. An odd-symmetric, Gaussian first derivative filter provides the input to a Gaussian second derivative filter. Crucially, the output at each stage is half-wave rectified before feeding forward to the next. This creates nonlinear channels selectively responsive to one edge polarity while suppressing spurious or "phantom" edges. The two stages have properties analogous to simple and complex cells in the visual cortex. Edges are found as peaks in a scale-space response map that is the output of the second stage. The position and scale of the peak response identify the location and blur of the edge. The model predicts remarkably accurately our results on human perception of edge location and blur for a wide range of luminance profiles, including the surprising finding that blurred edges look sharper when their length is made shorter. The model enhances our understanding of early vision by integrating computational, physiological, and psychophysical approaches. © ARVO.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1?+?F2?+?F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1?+?F2C?+?F3C; F2?+?F3), where F2C?+?F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0?=?constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study reports a qualitative phenomenological investigation of anger and anger-related aggression in the context of the lives of individual women. Semistructured interviews with five women are analyzed using interpretative phenomenological analysis. This inductive approach aims to capture the richness and complexity of the lived experience of emotional life. In particular, it draws attention to the context-dependent and relational dimension of angry feelings and aggressive behavior. Three analytic themes are presented here: the subjective experience of anger, which includes the perceptual confusion and bodily change felt by the women when angry, crying, and the presence of multiple emotions; the forms and contexts of aggression, paying particular attention to the range of aggressive strategies used; and anger as moral judgment, in particular perceptions of injustice and unfairness. The authors conclude by examining the analytic observations in light of phenomenological thinking.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cellular mobile radio systems will be of increasing importance in the future. This thesis describes research work concerned with the teletraffic capacity and the canputer control requirements of such systems. The work involves theoretical analysis and experimental investigations using digital computer simulation. New formulas are derived for the congestion in single-cell systems in which there are both land-to-mobile and mobile-to-mobile calls and in which mobile-to-mobile calls go via the base station. Two approaches are used, the first yields modified forms of the familiar Erlang and Engset formulas, while the second gives more complicated but more accurate formulas. The results of computer simulations to establish the accuracy of the formulas are described. New teletraffic formulas are also derived for the congestion in multi -cell systems. Fixed, dynamic and hybrid channel assignments are considered. The formulas agree with previously published simulation results. Simulation programs are described for the evaluation of the speech traffic of mobiles and for the investigation of a possible computer network for the control of the speech traffic. The programs were developed according to the structured progranming approach leading to programs of modular construction. Two simulation methods are used for the speech traffic: the roulette method and the time-true method. The first is economical but has some restriction, while the second is expensive but gives comprehensive answers. The proposed control network operates at three hierarchical levels performing various control functions which include: the setting-up and clearing-down of calls, the hand-over of calls between cells and the address-changing of mobiles travelling between cities. The results demonstrate the feasibility of the control netwvork and indicate that small mini -computers inter-connected via voice grade data channels would be capable of providing satisfactory control