9 resultados para Text to speech

em Instituto Politécnico do Porto, Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a linguistically rule-based grapheme-to-phone (G2P) transcription algorithm is described for European Portuguese. A complete set of phonological and phonetic transcription rules regarding the European Portuguese standard variety is presented. This algorithm was implemented and tested by using online newspaper articles. The obtained experimental results gave rise to 98.80% of accuracy rate. Future developments in order to increase this value are foreseen. Our purpose with this work is to develop a module/ tool that can improve synthetic speech naturalness in European Portuguese. Other applications of this system can be expected like language teaching/learning. These results, together with our perspectives of future improvements, have proved the dramatic importance of linguistic knowledge on the development of Text-to-Speech systems (TTS).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last few years, the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the final application is a speech based interface. In this paper we present an objective definition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classification and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classification and a 15.7% error rate for voice pleasantness intensity estimation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-of-speech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homograph ambiguity in European Portuguese (EP) (106 homograph pairs so far). This system is ready to be integrated in a Letter-to-Sound (LTS) converter. The algorithms were trained and tested with different corpora. The obtained experimental results gave rise to 97.8% of accuracy rate. This methodology is also valid for Brazilian Portuguese (BP), since 95 homographs pairs are exactly the same as in EP. A comparison with a probabilistic approach was also done and results were discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The relation of automatic auditory discrimination, measured with MMN, with the type of stimuli has not been well established in the literature, despite its importance as an electrophysiological measure of central sound representation. In this study, MMN response was elicited by pure-tone and speech binaurally passive auditory oddball paradigm in a group of 8 normal young adult subjects at the same intensity level (75 dB SPL). The frequency difference in pure-tone oddball was 100 Hz (standard = 1 000 Hz; deviant = 1 100 Hz; same duration = 100 ms), in speech oddball (standard /ba/; deviant /pa/; same duration = 175 ms) the Portuguese phonemes are both plosive bi-labial in order to maintain a narrow frequency band. Differences were found across electrode location between speech and pure-tone stimuli. Larger MMN amplitude, duration and higher latency to speech were verified compared to pure-tone in Cz and Fz as well as significance differences in latency and amplitude between mastoids. Results suggest that speech may be processed differently than non-speech; also it may occur in a later stage due to overlapping processes since more neural resources are required to speech processing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This essay aims to confront the literary text Wuthering Heights by Emily Brontë with five of its screen adaptations and Portuguese subtitles. Owing to the scope of the study, it will necessarily afford merely a bird‘s eye view of the issues and serve as a starting point for further research. Accordingly, the following questions are used as guidelines: What transformations occur in the process of adapting the original text to the screen? Do subtitles update the film dialogues to the target audience‘s cultural and linguistic context? Are subtitles influenced more by oral speech than by written literary discourse? Shouldn‘t subtitles in fact reflect the poetic function prevalent in screen adaptations of literary texts? Rather than attempt to answer these questions, we focus on the objects as phenomena. Our interdisciplinary undertaking clearly involves a semio-pragmatic stance, at this stage trying to avoid theoretical backdrops that may affect our apprehension of the objects as to their qualities, singularities, and conventional traits, based on Lucia Santaella‘s interpretation of Charles S. Peirce‘s phaneroscopy. From an empirical standpoint, we gather features and describe peculiarities, under the presumption that there are substrata in subtitling that point or should point to the literary source text, albeit through the mediation of a film script and a particular cinematic style. Therefore, we consider how the subtitling process may be influenced by the literary intertext, the idiosyncrasies of a particular film adaptation, as well as the socio-cultural context of the subtitler and target audience. First, we isolate one of the novel‘s most poignant scenes – ‗I am Heathcliff‘ – taking into account its symbolic play and significance in relation to character and plot construction. Secondly, we study American, English, French, and Mexican adaptations of the excerpt into film in terms of intersemiotic transformations. Then we analyze differences between the film dialogues and their Portuguese subtitles.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As the wireless cellular market reaches competitive levels never seen before, network operators need to focus on maintaining Quality of Service (QoS) a main priority if they wish to attract new subscribers while keeping existing customers satisfied. Speech Quality as perceived by the end user is one major example of a characteristic in constant need of maintenance and improvement. It is in this topic that this Master Thesis project fits in. Making use of an intrusive method of speech quality evaluation, as a means to further study and characterize the performance of speech codecs in second-generation (2G) and third-generation (3G) technologies. Trying to find further correlation between codecs with similar bit rates, along with the exploration of certain transmission parameters which may aid in the assessment of speech quality. Due to some limitations concerning the audio analyzer equipment that was to be employed, a different system for recording the test samples was sought out. Although the new designed system is not standard, after extensive testing and optimization of the system's parameters, final results were found reliable and satisfactory. Tests include a set of high and low bit rate codecs for both 2G and 3G, where values were compared and analysed, leading to the outcome that 3G speech codecs perform better, under the approximately same conditions, when compared with 2G. Reinforcing the idea that 3G is, with no doubt, the best choice if the costumer looks for the best possible listening speech quality. Regarding the transmission parameters chosen for the experiment, the Receiver Quality (RxQual) and Received Energy per Chip to the Power Density Ratio (Ec/N0), these were subject to speech quality correlation tests. Final results of RxQual were compared to those of prior studies from different researchers and, are considered to be of important relevance. Leading to the confirmation of RxQual as a reliable indicator of speech quality. As for Ec/N0, it is not possible to state it as a speech quality indicator however, it shows clear thresholds for which the MOS values decrease significantly. The studied transmission parameters show that they can be used not only for network management purposes but, at the same time, give an expected idea to the communications engineer (or technician) of the end-to-end speech quality consequences. With the conclusion of the work new ideas for future studies come to mind. Considering that the fourth-generation (4G) cellular technologies are now beginning to take an important place in the global market, as the first all-IP network structure, it seems of great relevance that 4G speech quality should be subject of evaluation. Comparing it to 3G, not only in narrowband but also adding wideband scenarios with the most recent standard objective method of speech quality assessment, POLQA. Also, new data found on Ec/N0 tests, justifies further research studies with the intention of validating the assumptions made in this work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Temporal lobe epilepsy (TLE) is a neurological disorder that directly affects cortical areas responsible for auditory processing. The resulting abnormalities can be assessed using event-related potentials (ERP), which have high temporal resolution. However, little is known about TLE in terms of dysfunction of early sensory memory encoding or possible correlations between EEGs, linguistic deficits, and seizures. Mismatch negativity (MMN) is an ERP component – elicited by introducing a deviant stimulus while the subject is attending to a repetitive behavioural task – which reflects pre-attentive sensory memory function and reflects neuronal auditory discrimination and perceptional accuracy. Hypothesis: We propose an MMN protocol for future clinical application and research based on the hypothesis that children with TLE may have abnormal MMN for speech and non-speech stimuli. The MMN can be elicited with a passive auditory oddball paradigm, and the abnormalities might be associated with the location and frequency of epileptic seizures. Significance: The suggested protocol might contribute to a better understanding of the neuropsychophysiological basis of MMN. We suggest that in TLE central sound representation may be decreased for speech and non-speech stimuli. Discussion: MMN arises from a difference to speech and non-speech stimuli across electrode sites. TLE in childhood might be a good model for studying topographic and functional auditory processing and its neurodevelopment, pointing to MMN as a possible clinical tool for prognosis, evaluation, follow-up, and rehabilitation for TLE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of this paper is to present the main Portuguese results from a multi-national study on reading format preferences and behaviors from undergraduate students from Polytechnic Institute of Porto (Portugal). For this purpose we apply an adaptation of the Academic Reading Questionnaire previously created by Mizrachi (2014). This survey instrument has 14 Likert-style statements regarding the format influence in the students reading behavior, including aspects such as ability to remember, feelings about access convenience, active engagement with the text by highlighting and annotating, and ability to review and concentrate on the text. The importance of the language and dimension of the text to determine the preference format is also inquired. Students are also asked about the electronic device they use to read digital documents. Finally, some demographic and academic data were gathered. The analysis of the results will be contextualized on a review of the literature concerning youngsters reading format preferences. The format (digital or print) in which a text is displayed and read can impact comprehension, which is an important information literacy skill. This is a quite relevant issue for class readings in academic context because it impacts learning. On the other hand, students preferences on reading formats will influence the use of library services. However, literature is not unanimous on this subject. Woody, Daniel and Baker (2010) concluded that the experience of reading is not the same in electronic or print context and that students prefer print books than e-books. This thesis is reinforced by Ji, Michaels and Waterman (2014) which report that among 101 undergraduates the large majority self-reported to read and learn more when they use printed format despite the fact that they prefer electronically supplied readings instead of those supplied in printed form. On the other side, Rockinson-Szapkiw, et al (2013) conducted a study were they demonstrate that e-textbook is as effective for learning as the traditional textbook and that students who choose e-textbook had significantly higher perceived learning than students who chose to use print textbooks.