23 resultados para Speech Synthesis
em Instituto Politécnico do Porto, Portugal
Resumo:
The recent developments on Hidden Markov Models (HMM) based speech synthesis showed that this is a promising technology fully capable of competing with other established techniques. However some issues still lack a solution. Several authors report an over-smoothing phenomenon on both time and frequencies which decreases naturalness and sometimes intelligibility. In this work we present a new vowel intelligibility enhancement algorithm that uses a discrete Kalman filter (DKF) for tracking frame based parameters. The inter-frame correlations are modelled by an autoregressive structure which provides an underlying time frame dependency and can improve time-frequency resolution. The system’s performance has been evaluated using objective and subjective tests and the proposed methodology has led to improved results.
Resumo:
In this work an adaptive filtering scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for Hidden Markov Model (HMM) based speech synthesis quality enhancement. The objective is to improve signal smoothness across HMMs and their related states and to reduce artifacts due to acoustic model's limitations. Both speech and artifacts are modelled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. Themodel parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The quality enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. The system's performance has been evaluated using mean opinion score tests and the proposed technique has led to improved results.
Resumo:
In the last few years, the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the final application is a speech based interface. In this paper we present an objective definition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classification and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classification and a 15.7% error rate for voice pleasantness intensity estimation.
Resumo:
In the last few years the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how good is a voice when the application is a speech based interface. In this paper we present a new automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference. Our study is based on a multi-language database composed by female voices. In the objective performance evaluation the system achieved a 7.3% error rate.
Resumo:
Ionic Liquids (ILs) are ionic compounds that possess melting temperature below 100ºC and they have been a topic of great interest since the mid-1990s due to their unique properties. The range of IL uses has been broadened, due to a significant increase in the variety of physical, chemical and biological ILs properties. They are now used as Active Pharmaceutical Ingredients (APIs) and recent interests are focused on their application as innovative solutions in new medical treatment and delivery options.1 In this work, our principal objective was the synthesis and investigation of physicochemical and medical properties of ionic liquids (ILs) and organic salts from ampicillin. This approach is of huge interest in pharmaceutical industry as cation and anion composition of ILs and organic salts can greatly alter their desired properties, namely the melting temperature and even synergistic effects can be obtained.2,3 For the synthesis of these compounds we used a recently developed method proposed by Ohno et al.4 for the preparation of quaternary ammonium and phosphonium hydroxides, that were neutralized by ampicillin. After purification we obtained pure ILs and salts in good yields. These ILs shows good antimicrobial and antifungal activities. As it is well known that some ionic liquids containing phosphonium and ammonium cation also shows anti-cancer activity1,5 we also decided to study these compounds against some cancer cell lines.
Resumo:
Imidazolidin-4-ones are commonly employed as skeletal modifications in bioactive oligopeptides, either as proline surrogates or for protection of the N-terminal amino acid against aminopeptidase-catalysed hydrolysis . We have been working on the synthesis of imidazolidin-4-ones of the antimalarial primaquine , through acylation of primaquine with an α-amino acid and subsequent reaction of the resulting α-aminoamide with a ketone or aldehyde. Thus, when using racemic primaquine, an optically pure chiral α-amino acid and an aldehyde as starting materials, four imidazolidin-4-one diastereomers are to be expected (Scheme 1). However, we have recently observed that imidazolidin-4-one synthesis was stereoselective when 2-carboxybenzaldehyde (2CBA)* was used, as only two diastereomers were produced2. Computational studies have shown that the imine formed prior to ring closure had, for structures derived from 2CBA, a quasi-cyclic rigid structure2. This rigid conformation is stabilized by an intramolecular hydrogen bond involving the C=O oxygen atom of the 2-carboxyl substituent in 2CBA and the N-H group of the α-amino amide moiety2. These findings led us to postulate that the 2-carbonyl substituent in the benzaldehyde moiety was the key for the stereoselective synthesis of the imidazolidin-4-ones2.
Resumo:
The tongue is the most important and dynamic articulator for speech formation, because of its anatomic aspects (particularly, the large volume of this muscular organ comparatively to the surrounding organs of the vocal tract) and also due to the wide range of movements and flexibility that are involved. In speech communication research, a variety of techniques have been used for measuring the three-dimensional vocal tract shapes. More recently, magnetic resonance imaging (MRI) becomes common; mainly, because this technique allows the collection of a set of static and dynamic images that can represent the entire vocal tract along any orientation. Over the years, different anatomical organs of the vocal tract have been modelled; namely, 2D and 3D tongue models, using parametric or statistical modelling procedures. Our aims are to present and describe some 3D reconstructed models from MRI data, for one subject uttering sustained articulations of some typical Portuguese sounds. Thus, we present a 3D database of the tongue obtained by stack combinations with the subject articulating Portuguese vowels. This 3D knowledge of the speech organs could be very important; especially, for clinical purposes (for example, for the assessment of articulatory impairments followed by tongue surgery in speech rehabilitation), and also for a better understanding of acoustic theory in speech formation.
Resumo:
The first and second authors would like to thank the support of the PhD grants with references SFRH/BD/28817/2006 and SFRH/PROTEC/49517/2009, respectively, from Fundação para a Ciência e Tecnol ogia (FCT). This work was partially done in the scope of the project “Methodologies to Analyze Organs from Complex Medical Images – Applications to Fema le Pelvic Cavity”, wi th reference PTDC/EEA- CRO/103320/2008, financially supported by FCT.
Resumo:
The mechanisms of speech production are complex and have been raising attention from researchers of both medical and computer vision fields. In the speech production mechanism, the articulator’s study is a complex issue, since they have a high level of freedom along this process, namely the tongue, which instigates a problem in its control and observation. In this work it is automatically characterized the tongues shape during the articulation of the oral vowels of Portuguese European by using statistical modeling on MR-images. A point distribution model is built from a set of images collected during artificially sustained articulations of Portuguese European sounds, which can extract the main characteristics of the motion of the tongue. The model built in this work allows under standing more clearly the dynamic speech events involved during sustained articulations. The tongue shape model built can also be useful for speech rehabilitation purposes, specifically to recognize the compensatory movements of the articulators during speech production.
Resumo:
This communication presents a novel kind of silicon nanomaterial: freestanding Si nanowire arrays (Si NWAs), which are synthesized facilely by one-step template-free electro-deoxidation of SiO2 in molten CaCl2. The self-assembling growth process of this material is also investigated preliminarily.
Resumo:
Background: In Portugal, the routine clinical practice of speech and language therapists (SLTs) in treating children with all types of speech sound disorder (SSD) continues to be articulation therapy (AT). There is limited use of phonological therapy (PT) or phonological awareness training in Portugal. Additionally, at an international level there is a focus on collecting information on and differentiating between the effectiveness of PT and AT for children with different types of phonologically based SSD, as well as on the role of phonological awareness in remediating SSD. It is important to collect more evidence for the most effective and efficient type of intervention approach for different SSDs and for these data to be collected from diverse linguistic and cultural perspectives. Aims: To evaluate the effectiveness of a PT and AT approach for treatment of 14 Portuguese children, aged 4.0–6.7 years, with a phonologically based SSD. Methods & Procedures: The children were randomly assigned to one of the two treatment approaches (seven children in each group). All children were treated by the same SLT, blind to the aims of the study, over three blocks of a total of 25 weekly sessions of intervention. Outcome measures of phonological ability (percentage of consonants correct (PCC), percentage occurrence of different phonological processes and phonetic inventory) were taken before and after intervention. A qualitative assessment of intervention effectiveness from the perspective of the parents of participants was included. Outcomes & Results: Both treatments were effective in improving the participants’ speech, with the children receiving PT showing a more significant improvement in PCC score than those receiving the AT. Children in the PT group also showed greater generalization to untreated words than those receiving AT. Parents reported both intervention approaches to be as effective in improving their children’s speech. Conclusions & Implications: The PT (combination of expressive phonological tasks, phonological awareness, listening and discrimination activities) proved to be an effective integrated method of improving phonological SSD in children. These findings provide some evidence for Portuguese SLTs to employ PT with children with phonologically based SSD
Resumo:
The relation of automatic auditory discrimination, measured with MMN, with the type of stimuli has not been well established in the literature, despite its importance as an electrophysiological measure of central sound representation. In this study, MMN response was elicited by pure-tone and speech binaurally passive auditory oddball paradigm in a group of 8 normal young adult subjects at the same intensity level (75 dB SPL). The frequency difference in pure-tone oddball was 100 Hz (standard = 1 000 Hz; deviant = 1 100 Hz; same duration = 100 ms), in speech oddball (standard /ba/; deviant /pa/; same duration = 175 ms) the Portuguese phonemes are both plosive bi-labial in order to maintain a narrow frequency band. Differences were found across electrode location between speech and pure-tone stimuli. Larger MMN amplitude, duration and higher latency to speech were verified compared to pure-tone in Cz and Fz as well as significance differences in latency and amplitude between mastoids. Results suggest that speech may be processed differently than non-speech; also it may occur in a later stage due to overlapping processes since more neural resources are required to speech processing.
Resumo:
The synthesis and application of fractional-order controllers is now an active research field. This article investigates the use of fractional-order PID controllers in the velocity control of an experimental modular servo system. The systern consists of a digital servomechanism and open-architecture software environment for real-time control experiments using MATLAB/Simulink. Different tuning methods will be employed, such as heuristics based on the well-known Ziegler Nichols rules, techniques based on Bode’s ideal transfer function and optimization tuning methods. Experimental responses obtained from the application of the several fractional-order controllers are presented and analyzed. The effectiveness and superior performance of the proposed algorithms are also compared with classical integer-order PID controllers.
Resumo:
The paper presents a RFDSCA automated synthesis procedure. This algorithm determines several RFDSCA circuits from the top-level system specifications all with the same maximum performance. The genetic synthesis tool optimizes a fitness function proportional to the RFDSCA quality factor and uses the epsiv-concept and maximin sorting scheme to achieve a set of solutions well distributed along a non-dominated front. To confirm the results of the algorithm, three RFDSCAs were simulated in SpectreRF and one of them was implemented and tested. The design used a 0.25 mum BiCMOS process. All the results (synthesized, simulated and measured) are very close, which indicate that the genetic synthesis method is a very useful tool to design optimum performance RFDSCAs.
Resumo:
Mestrado em Engenharia Informática, Área de Especialização em Tecnologias do Conhecimento e da Decisão