Biblioteca Digital

35 resultados para Speech synthesis Data processing

em Aston University Research Archive

A computational model of prosody for Yorùbá text-to-speech synthesis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions. To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration. Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.

A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel prosody model in the context of computer text-to-speech synthesis applications for tone languages. We have demonstrated its applicability using the Standard Yorùbá (SY) language. Our approach is motivated by the theory that abstract and realised forms of various prosody dimensions should be modelled within a modular and unified framework [Coleman, J.S., 1994. Polysyllabic words in the YorkTalk synthesis system. In: Keating, P.A. (Ed.), Phonological Structure and Forms: Papers in Laboratory Phonology III, Cambridge University Press, Cambridge, pp. 293–324]. We have implemented this framework using the Relational Tree (R-Tree) technique. R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. The underlying assumption of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combine acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. To implement the intonation dimension, fuzzy logic based rules were developed using speech data from native speakers of Yorùbá. The Fuzzy Decision Tree (FDT) and the Classification and Regression Tree (CART) techniques were tested in modelling the duration dimension. For practical reasons, we have selected the FDT for implementing the duration dimension of our prosody model. To establish the effectiveness of our prosody model, we have also developed a Stem-ML prosody model for SY. We have performed both quantitative and qualitative evaluations on our implemented prosody models. The results suggest that, although the R-Tree model does not predict the numerical speech prosody data as accurately as the Stem-ML model, it produces synthetic speech prosody with better intelligibility and naturalness. The R-Tree model is particularly suitable for speech prosody modelling for languages with limited language resources and expertise, e.g. African languages. Furthermore, the R-Tree model is easy to implement, interpret and analyse.

A fuzzy decision tree-based duration model for Standard Yorùbá text-to-speech synthesis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.

Intonation contour realisation for Standard Yorùbá text-to-speech synthesis:a fuzzy computational approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.

A computational model of intonation for yorùbá text-to-speech synthesis:design and analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

The relationship between spoken language and speech and nonspeech processing in children with autism:a magnetic event-related field study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It has been proposed that language impairments in children with Autism Spectrum Disorders (ASD) stem from atypical neural processing of speech and/or nonspeech sounds. However, the strength of this proposal is compromised by the unreliable outcomes of previous studies of speech and nonspeech processing in ASD. The aim of this study was to determine whether there was an association between poor spoken language and atypical event-related field (ERF) responses to speech and nonspeech sounds in children with ASD (n = 14) and controls (n = 18). Data from this developmental population (ages 6-14) were analysed using a novel combination of methods to maximize the reliability of our findings while taking into consideration the heterogeneity of the ASD population. The results showed that poor spoken language scores were associated with atypical left hemisphere brain responses (200 to 400 ms) to both speech and nonspeech in the ASD group. These data support the idea that some children with ASD may have an immature auditory cortex that affects their ability to process both speech and nonspeech sounds. Their poor speech processing may impair their ability to process the speech of other people, and hence reduce their ability to learn the phonology, syntax, and semantics of their native language.

Recent developments in all-optical nonlinear data processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Photonic technologies for data processing in the optical domain are expected to play a major role in future high-speed communications. Nonlinear effects in optical fibres have many attractive features and great, but not yet fully explored potential for optical signal processing. Here we provide an overview of our recent advances in developing novel techniques and approaches to all-optical processing based on fibre nonlinearities.

Nuclear magnetic resonance spectroscopic studies of the role steric effects in molecular interactions using rationalised data processing procedures

Relevância:

100.00% 100.00%

Publicador:

Quasi-lossless optical links for broad-band transmission and data processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the first experimental implementation of a recently designed quasi-lossless fiber span with strongly reduced signal power excursion. The resulting fiber waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing.

Quasi-lossless spans for broadband transmission and data processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the first experimental implementation of a recently designed quasi-lossless fibre span with strongly reduced signal power excursion. The resulting fibre waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing.

Synthesis, thermal processing, and thin film morphology of poly(3-hexylthiophene)-poly(styrenesulfonate) block copolymers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A series of novel block copolymers, processable from single organic solvents and subsequently rendered amphiphilic by thermolysis, have been synthesized using Grignard metathesis (GRIM) and reversible addition-fragmentation chain transfer (RAFT) polymerizations and azide-alkyne click chemistry. This chemistry is simple and allows the fabrication of well-defined block copolymers with controllable block lengths. The block copolymers, designed for use as interfacial adhesive layers in organic photovoltaics to enhance contact between the photoactive and hole transport layers, comprise printable poly(3-hexylthiophene)-block-poly(neopentyl p-styrenesulfonate), P3HT-b-PNSS. Subsequently, they are converted to P3HT-b-poly(p-styrenesulfonate), P3HT-b-PSS, following deposition and thermal treatment at 150 °C. Grazing incidence small- and wide-angle X-ray scattering (GISAXS/GIWAXS) revealed that thin films of the amphiphilic block copolymers comprise lamellar nanodomains of P3HT crystallites that can be pushed further apart by increasing the PSS block lengths. The approach of using a thermally modifiable block allows deposition of this copolymer from a single organic solvent and subsequent conversion to an amphiphilic layer by nonchemical means, particularly attractive to large scale roll-to-roll industrial printing processes.

Quasi-lossless optical links for broad-band transmission and data processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the first experimental implementation of a recently designed quasi-lossless fiber span with strongly reduced signal power excursion. The resulting fiber waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing. © 2005 IEEE.

Quasi-lossless spans for broadband transmission and data processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the first experimental implementation of a recently designed quasi-lossless fibre span with strongly reduced signal power excursion. The resulting fibre waveguide medium can be advantageously used both in lightwave communications and in all-optical nonlinear data processing.

Controlled synthesis and processing of a poly(L-lactide-co-ε-caprolactone) copolymer for biomedical use as an absorbable monofilament surgical suture

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poly(L-lactide-co-ε-caprolactone) 75:25% mol, P(LL-co-CL), was synthesized via bulk ring-opening polymerisation (ROP) using a novel tin(II)alkoxide initiator, [Sn(Oct)]2DEG, at 130oC for 48 hrs. The effectiveness of this initiator was compared withthe well-known conventional tin(II) octoateinitiator, Sn(Oct)2. The P(LL-co-CL) copolymersobtained were characterized using a combination of analytical technique including: nuclear magnetic resonance spectroscopy (NMR), differential scanning calorimetry (DSC), thermogravimetry (TG) and gel permeation chromatography (GPC). The P(LL-co-CL) was melt-spun into monofilament fibres of uniform diameter and smooth surface appearance. Modification of the matrix morphology was then built into the as-spun fibresvia a series of controlled off-line annealing and hot-drawing steps. © (2014) Trans Tech Publications, Switzerland.

Investigating microphone efficacy for facilitation of mobile speech-based data entry

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite being nominated as a key potential interaction technique for supporting today's mobile technology user, the widespread commercialisation of speech-based input is currently being impeded by unacceptable recognition error rates. Developing effective speech-based solutions for use in mobile contexts, given the varying extent of background noise, is challenging. The research presented in this paper is part of an ongoing investigation into how best to incorporate speechbased input within mobile data collection applications. Specifically, this paper reports on a comparison of three different commercially available microphones in terms of their efficacy to facilitate mobile, speech-based data entry. We describe, in detail, our novel evaluation design as well as the results we obtained.

«
1
2
3
»