35 resultados para Speech synthesis Data processing
em Aston University Research Archive
Resumo:
This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions. To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration. Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.
Resumo:
This paper presents a novel prosody model in the context of computer text-to-speech synthesis applications for tone languages. We have demonstrated its applicability using the Standard Yorùbá (SY) language. Our approach is motivated by the theory that abstract and realised forms of various prosody dimensions should be modelled within a modular and unified framework [Coleman, J.S., 1994. Polysyllabic words in the YorkTalk synthesis system. In: Keating, P.A. (Ed.), Phonological Structure and Forms: Papers in Laboratory Phonology III, Cambridge University Press, Cambridge, pp. 293–324]. We have implemented this framework using the Relational Tree (R-Tree) technique. R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. The underlying assumption of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combine acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. To implement the intonation dimension, fuzzy logic based rules were developed using speech data from native speakers of Yorùbá. The Fuzzy Decision Tree (FDT) and the Classification and Regression Tree (CART) techniques were tested in modelling the duration dimension. For practical reasons, we have selected the FDT for implementing the duration dimension of our prosody model. To establish the effectiveness of our prosody model, we have also developed a Stem-ML prosody model for SY. We have performed both quantitative and qualitative evaluations on our implemented prosody models. The results suggest that, although the R-Tree model does not predict the numerical speech prosody data as accurately as the Stem-ML model, it produces synthetic speech prosody with better intelligibility and naturalness. The R-Tree model is particularly suitable for speech prosody modelling for languages with limited language resources and expertise, e.g. African languages. Furthermore, the R-Tree model is easy to implement, interpret and analyse.
Resumo:
In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.
Resumo:
This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.
Resumo:
In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.
Resumo:
It has been proposed that language impairments in children with Autism Spectrum Disorders (ASD) stem from atypical neural processing of speech and/or nonspeech sounds. However, the strength of this proposal is compromised by the unreliable outcomes of previous studies of speech and nonspeech processing in ASD. The aim of this study was to determine whether there was an association between poor spoken language and atypical event-related field (ERF) responses to speech and nonspeech sounds in children with ASD (n = 14) and controls (n = 18). Data from this developmental population (ages 6-14) were analysed using a novel combination of methods to maximize the reliability of our findings while taking into consideration the heterogeneity of the ASD population. The results showed that poor spoken language scores were associated with atypical left hemisphere brain responses (200 to 400 ms) to both speech and nonspeech in the ASD group. These data support the idea that some children with ASD may have an immature auditory cortex that affects their ability to process both speech and nonspeech sounds. Their poor speech processing may impair their ability to process the speech of other people, and hence reduce their ability to learn the phonology, syntax, and semantics of their native language.
Resumo:
Photonic technologies for data processing in the optical domain are expected to play a major role in future high-speed communications. Nonlinear effects in optical fibres have many attractive features and great, but not yet fully explored potential for optical signal processing. Here we provide an overview of our recent advances in developing novel techniques and approaches to all-optical processing based on fibre nonlinearities.
Resumo:
We present the first experimental implementation of a recently designed quasi-lossless fiber span with strongly reduced signal power excursion. The resulting fiber waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing.
Resumo:
We present the first experimental implementation of a recently designed quasi-lossless fibre span with strongly reduced signal power excursion. The resulting fibre waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing.
Resumo:
A series of novel block copolymers, processable from single organic solvents and subsequently rendered amphiphilic by thermolysis, have been synthesized using Grignard metathesis (GRIM) and reversible addition-fragmentation chain transfer (RAFT) polymerizations and azide-alkyne click chemistry. This chemistry is simple and allows the fabrication of well-defined block copolymers with controllable block lengths. The block copolymers, designed for use as interfacial adhesive layers in organic photovoltaics to enhance contact between the photoactive and hole transport layers, comprise printable poly(3-hexylthiophene)-block-poly(neopentyl p-styrenesulfonate), P3HT-b-PNSS. Subsequently, they are converted to P3HT-b-poly(p-styrenesulfonate), P3HT-b-PSS, following deposition and thermal treatment at 150 °C. Grazing incidence small- and wide-angle X-ray scattering (GISAXS/GIWAXS) revealed that thin films of the amphiphilic block copolymers comprise lamellar nanodomains of P3HT crystallites that can be pushed further apart by increasing the PSS block lengths. The approach of using a thermally modifiable block allows deposition of this copolymer from a single organic solvent and subsequent conversion to an amphiphilic layer by nonchemical means, particularly attractive to large scale roll-to-roll industrial printing processes.
Resumo:
We present the first experimental implementation of a recently designed quasi-lossless fiber span with strongly reduced signal power excursion. The resulting fiber waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing. © 2005 IEEE.
Resumo:
We present the first experimental implementation of a recently designed quasi-lossless fibre span with strongly reduced signal power excursion. The resulting fibre waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing.
Resumo:
Poly(L-lactide-co-ε-caprolactone) 75:25% mol, P(LL-co-CL), was synthesized via bulk ring-opening polymerisation (ROP) using a novel tin(II)alkoxide initiator, [Sn(Oct)]2DEG, at 130oC for 48 hrs. The effectiveness of this initiator was compared withthe well-known conventional tin(II) octoateinitiator, Sn(Oct)2. The P(LL-co-CL) copolymersobtained were characterized using a combination of analytical technique including: nuclear magnetic resonance spectroscopy (NMR), differential scanning calorimetry (DSC), thermogravimetry (TG) and gel permeation chromatography (GPC). The P(LL-co-CL) was melt-spun into monofilament fibres of uniform diameter and smooth surface appearance. Modification of the matrix morphology was then built into the as-spun fibresvia a series of controlled off-line annealing and hot-drawing steps. © (2014) Trans Tech Publications, Switzerland.
Resumo:
Despite being nominated as a key potential interaction technique for supporting today's mobile technology user, the widespread commercialisation of speech-based input is currently being impeded by unacceptable recognition error rates. Developing effective speech-based solutions for use in mobile contexts, given the varying extent of background noise, is challenging. The research presented in this paper is part of an ongoing investigation into how best to incorporate speechbased input within mobile data collection applications. Specifically, this paper reports on a comparison of three different commercially available microphones in terms of their efficacy to facilitate mobile, speech-based data entry. We describe, in detail, our novel evaluation design as well as the results we obtained.