18 resultados para Inventory-style speech enhancement
em Instituto Politécnico do Porto, Portugal
Resumo:
In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.
Resumo:
The recent developments on Hidden Markov Models (HMM) based speech synthesis showed that this is a promising technology fully capable of competing with other established techniques. However some issues still lack a solution. Several authors report an over-smoothing phenomenon on both time and frequencies which decreases naturalness and sometimes intelligibility. In this work we present a new vowel intelligibility enhancement algorithm that uses a discrete Kalman filter (DKF) for tracking frame based parameters. The inter-frame correlations are modelled by an autoregressive structure which provides an underlying time frame dependency and can improve time-frequency resolution. The system’s performance has been evaluated using objective and subjective tests and the proposed methodology has led to improved results.
Resumo:
Background: In Portugal, the routine clinical practice of speech and language therapists (SLTs) in treating children with all types of speech sound disorder (SSD) continues to be articulation therapy (AT). There is limited use of phonological therapy (PT) or phonological awareness training in Portugal. Additionally, at an international level there is a focus on collecting information on and differentiating between the effectiveness of PT and AT for children with different types of phonologically based SSD, as well as on the role of phonological awareness in remediating SSD. It is important to collect more evidence for the most effective and efficient type of intervention approach for different SSDs and for these data to be collected from diverse linguistic and cultural perspectives. Aims: To evaluate the effectiveness of a PT and AT approach for treatment of 14 Portuguese children, aged 4.0–6.7 years, with a phonologically based SSD. Methods & Procedures: The children were randomly assigned to one of the two treatment approaches (seven children in each group). All children were treated by the same SLT, blind to the aims of the study, over three blocks of a total of 25 weekly sessions of intervention. Outcome measures of phonological ability (percentage of consonants correct (PCC), percentage occurrence of different phonological processes and phonetic inventory) were taken before and after intervention. A qualitative assessment of intervention effectiveness from the perspective of the parents of participants was included. Outcomes & Results: Both treatments were effective in improving the participants’ speech, with the children receiving PT showing a more significant improvement in PCC score than those receiving the AT. Children in the PT group also showed greater generalization to untreated words than those receiving AT. Parents reported both intervention approaches to be as effective in improving their children’s speech. Conclusions & Implications: The PT (combination of expressive phonological tasks, phonological awareness, listening and discrimination activities) proved to be an effective integrated method of improving phonological SSD in children. These findings provide some evidence for Portuguese SLTs to employ PT with children with phonologically based SSD
Resumo:
In this work an adaptive filtering scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for Hidden Markov Model (HMM) based speech synthesis quality enhancement. The objective is to improve signal smoothness across HMMs and their related states and to reduce artifacts due to acoustic model's limitations. Both speech and artifacts are modelled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. Themodel parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The quality enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. The system's performance has been evaluated using mean opinion score tests and the proposed technique has led to improved results.
Resumo:
Sendo as queixas da existência de acufenos um motivo frequente de procura da consulta de ORL (Otorrinolaringologia), e sendo frequente encontrar na literatura referências à depressão, à ansiedade, entre outras dimensões psicopatológicas, associadas a essas queixas, pretendemos com o nosso trabalho verificar a existência de eventuais correlações entre essas dimensões e a existência de acufenos. Para tal, utilizamos a escala de avaliação psicológica BSI (Brief Syntoms Inventory), que avalia nove dimensões psicológicas, tendo igualmente efectuado uma avaliação audiométrica nos indivíduos com queixas de acufenos. Estes indivíduos frequentavam a consulta de ORL de três hospitais, e apresentavam como queixa principal os acufenos. Os resultados obtidos nesses indivíduos, foram comparados com os resultados de um grupo de controlo. Dos resultados obtidos, é de destacar o facto de os elementos do sexo feminino com queixas de acufenos, apresentarem valores significativamente mais elevados para as dimensões de somatização e ansiedade fóbica. Relativamente ao nível de audição não foram encontradas diferenças significativas entre os diferentes níveis considerados e as mesmas dimensões. Quando comparamos os resultadosobtidos nas referidas dimensões entre o grupo de pacientes com acufenos e o grupo de controle, é de destacar o facto de existirem diferenças significativas para seis das nove dimensões avaliadas pela escala usada, o que vem confirmar os resultados encontrados na literatura, evidenciando o interesse do recurso a escalas de avaliação psicológica para referenciar o paciente, e abrir portas a estudos mais aprofundados nesta área.
Resumo:
The tongue is the most important and dynamic articulator for speech formation, because of its anatomic aspects (particularly, the large volume of this muscular organ comparatively to the surrounding organs of the vocal tract) and also due to the wide range of movements and flexibility that are involved. In speech communication research, a variety of techniques have been used for measuring the three-dimensional vocal tract shapes. More recently, magnetic resonance imaging (MRI) becomes common; mainly, because this technique allows the collection of a set of static and dynamic images that can represent the entire vocal tract along any orientation. Over the years, different anatomical organs of the vocal tract have been modelled; namely, 2D and 3D tongue models, using parametric or statistical modelling procedures. Our aims are to present and describe some 3D reconstructed models from MRI data, for one subject uttering sustained articulations of some typical Portuguese sounds. Thus, we present a 3D database of the tongue obtained by stack combinations with the subject articulating Portuguese vowels. This 3D knowledge of the speech organs could be very important; especially, for clinical purposes (for example, for the assessment of articulatory impairments followed by tongue surgery in speech rehabilitation), and also for a better understanding of acoustic theory in speech formation.
Resumo:
The first and second authors would like to thank the support of the PhD grants with references SFRH/BD/28817/2006 and SFRH/PROTEC/49517/2009, respectively, from Fundação para a Ciência e Tecnol ogia (FCT). This work was partially done in the scope of the project “Methodologies to Analyze Organs from Complex Medical Images – Applications to Fema le Pelvic Cavity”, wi th reference PTDC/EEA- CRO/103320/2008, financially supported by FCT.
Resumo:
The mechanisms of speech production are complex and have been raising attention from researchers of both medical and computer vision fields. In the speech production mechanism, the articulator’s study is a complex issue, since they have a high level of freedom along this process, namely the tongue, which instigates a problem in its control and observation. In this work it is automatically characterized the tongues shape during the articulation of the oral vowels of Portuguese European by using statistical modeling on MR-images. A point distribution model is built from a set of images collected during artificially sustained articulations of Portuguese European sounds, which can extract the main characteristics of the motion of the tongue. The model built in this work allows under standing more clearly the dynamic speech events involved during sustained articulations. The tongue shape model built can also be useful for speech rehabilitation purposes, specifically to recognize the compensatory movements of the articulators during speech production.
Resumo:
The relation of automatic auditory discrimination, measured with MMN, with the type of stimuli has not been well established in the literature, despite its importance as an electrophysiological measure of central sound representation. In this study, MMN response was elicited by pure-tone and speech binaurally passive auditory oddball paradigm in a group of 8 normal young adult subjects at the same intensity level (75 dB SPL). The frequency difference in pure-tone oddball was 100 Hz (standard = 1 000 Hz; deviant = 1 100 Hz; same duration = 100 ms), in speech oddball (standard /ba/; deviant /pa/; same duration = 175 ms) the Portuguese phonemes are both plosive bi-labial in order to maintain a narrow frequency band. Differences were found across electrode location between speech and pure-tone stimuli. Larger MMN amplitude, duration and higher latency to speech were verified compared to pure-tone in Cz and Fz as well as significance differences in latency and amplitude between mastoids. Results suggest that speech may be processed differently than non-speech; also it may occur in a later stage due to overlapping processes since more neural resources are required to speech processing.
Resumo:
In a time of fierce competition between regions, an image serve as a basis to develop a strong sense of community, which fosters trust and cooperation that can be mobilized for regional growth. A positive image and reputation could be used in the promotional activities of the region benefiting all the stakeholders as a whole. Mega cultural events are frequently used to attract tourists and investments to a region, but also to enhance the city’s image. This study adopts a marketing/communication perspective of city’s image, and intends to explain how the image of the city is perceived by their residents. Specifically, we intend to compare the perceptions of residents that effectively participated in the Guimarães European Capital of Culture (ECOC) 2012 (engaged residents), and the residents that only assisted to the event (attendees). Several significant findings are reported and their implications for event managers and public policy administrators presented, along with the limitations of the study
Resumo:
This essay aims to confront the literary text Wuthering Heights by Emily Brontë with five of its screen adaptations and Portuguese subtitles. Owing to the scope of the study, it will necessarily afford merely a bird‘s eye view of the issues and serve as a starting point for further research. Accordingly, the following questions are used as guidelines: What transformations occur in the process of adapting the original text to the screen? Do subtitles update the film dialogues to the target audience‘s cultural and linguistic context? Are subtitles influenced more by oral speech than by written literary discourse? Shouldn‘t subtitles in fact reflect the poetic function prevalent in screen adaptations of literary texts? Rather than attempt to answer these questions, we focus on the objects as phenomena. Our interdisciplinary undertaking clearly involves a semio-pragmatic stance, at this stage trying to avoid theoretical backdrops that may affect our apprehension of the objects as to their qualities, singularities, and conventional traits, based on Lucia Santaella‘s interpretation of Charles S. Peirce‘s phaneroscopy. From an empirical standpoint, we gather features and describe peculiarities, under the presumption that there are substrata in subtitling that point or should point to the literary source text, albeit through the mediation of a film script and a particular cinematic style. Therefore, we consider how the subtitling process may be influenced by the literary intertext, the idiosyncrasies of a particular film adaptation, as well as the socio-cultural context of the subtitler and target audience. First, we isolate one of the novel‘s most poignant scenes – ‗I am Heathcliff‘ – taking into account its symbolic play and significance in relation to character and plot construction. Secondly, we study American, English, French, and Mexican adaptations of the excerpt into film in terms of intersemiotic transformations. Then we analyze differences between the film dialogues and their Portuguese subtitles.
Resumo:
As the wireless cellular market reaches competitive levels never seen before, network operators need to focus on maintaining Quality of Service (QoS) a main priority if they wish to attract new subscribers while keeping existing customers satisfied. Speech Quality as perceived by the end user is one major example of a characteristic in constant need of maintenance and improvement. It is in this topic that this Master Thesis project fits in. Making use of an intrusive method of speech quality evaluation, as a means to further study and characterize the performance of speech codecs in second-generation (2G) and third-generation (3G) technologies. Trying to find further correlation between codecs with similar bit rates, along with the exploration of certain transmission parameters which may aid in the assessment of speech quality. Due to some limitations concerning the audio analyzer equipment that was to be employed, a different system for recording the test samples was sought out. Although the new designed system is not standard, after extensive testing and optimization of the system's parameters, final results were found reliable and satisfactory. Tests include a set of high and low bit rate codecs for both 2G and 3G, where values were compared and analysed, leading to the outcome that 3G speech codecs perform better, under the approximately same conditions, when compared with 2G. Reinforcing the idea that 3G is, with no doubt, the best choice if the costumer looks for the best possible listening speech quality. Regarding the transmission parameters chosen for the experiment, the Receiver Quality (RxQual) and Received Energy per Chip to the Power Density Ratio (Ec/N0), these were subject to speech quality correlation tests. Final results of RxQual were compared to those of prior studies from different researchers and, are considered to be of important relevance. Leading to the confirmation of RxQual as a reliable indicator of speech quality. As for Ec/N0, it is not possible to state it as a speech quality indicator however, it shows clear thresholds for which the MOS values decrease significantly. The studied transmission parameters show that they can be used not only for network management purposes but, at the same time, give an expected idea to the communications engineer (or technician) of the end-to-end speech quality consequences. With the conclusion of the work new ideas for future studies come to mind. Considering that the fourth-generation (4G) cellular technologies are now beginning to take an important place in the global market, as the first all-IP network structure, it seems of great relevance that 4G speech quality should be subject of evaluation. Comparing it to 3G, not only in narrowband but also adding wideband scenarios with the most recent standard objective method of speech quality assessment, POLQA. Also, new data found on Ec/N0 tests, justifies further research studies with the intention of validating the assumptions made in this work.
Resumo:
The aim of this study was to develop and validate a Portuguese version of the Short Form of the Posttraumatic Growth Inventory (PTGI-SF). Using an online convenience sample of Portuguese divorced adults (N = 482), we confirmed the oblique five-factor structure of the PTGI-SF by confirmatory factor analysis. The results demonstrated the measurement invariance across divorce initiator status groups. Total score and factors of PTGI-SF showed good internal consistency, with the exception of the New Possibilities factor, which revealed an acceptable reliability. The Portuguese PTGI-SF showed a satisfactory convergent validity. In terms of discriminant validity, posttraumatic growth assessed by the Portuguese PTGI-SF was a distinct factor from posttraumatic psychological adjustment. These preliminary findings suggest the cultural adaptation and also psychometric properties of the present Portuguese PTGI-SF to measure posttraumatic growth after personal crisis.
Resumo:
Based on a literature review, this article frames different stages of the foster care process, identifying a set of standardized measures in the American and Portuguese contexts which, if implemented, could contribute towards higher levels of foster success. The article continues with the presentation of a comparative study, based on the application of the Casey Foster Applicant Inventory-Applicant Version (CFAI-A) questionnaire, in the aforementioned contexts. Taking a comparative analyses of CFAI-A's psychometric characteristics in four different samples as a starting point, one discovered that despite the fact that the questionnaire was adapted to Portuguese reality, it kept the quality values presented on the American samples. It specifically shows significant values regarding reliability and validity. This questionnaire, which aims to assess the potential of foster families, also supports the technical staff's decision making process regarding the monitoring and support of foster families, while it also promotes a better decision in the placement process towards the child's integration and development.