195 resultados para intelligibility
Resumo:
In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.
Resumo:
This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. © 2014 The Author(s).
Resumo:
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints. © Springer Science+Business Media New York 2013.
Resumo:
This thesis contributes to social studies of finance and accounting (Vollmer, Mennicken, & Preda, 2009) and the practice theory literatures (Feldman & Orlikowski, 2011) by experimenting (Baxter & Chua, 2008) with concepts developed by Theodore Schatzki and demonstrating their relevance and usefulness in theorizing and explaining accounting and other organizational phenomena. Influenced by Schatzki, I have undertaken a sociological investigation of the practices, arrangements, and nexuses forming (part of) the social ‘site’ of private equity (PE). I have examined and explained the organization of practices within the PE industry. More specifically, I have sought to throw light on the practice organizations animating various PE practices. I have problematized a particular aspect of Schatzki’s practice organization framework: ‘general understanding’, which has so far been poorly understood and taken for granted in the accounting literature. I have tried to further explore the concept to clarify important definitional issues surrounding its empirical application. In investigating the forms of accounting and control practices in PE firms and how they link with other practices forming part of the ‘site’, I have sought to explain how the ‘situated functionality’ of accounting is ‘prefigured’ by its ‘dispersed’ nature. In doing so, this thesis addresses the recent calls for research on accounting and control practices within financial services firms. This thesis contributes to the social studies of finance and accounting literature also by opening the blackbox of investment [e]valuation practices prevalent in the PE industry. I theorize the due diligence of PE funds as a complex of linked calculative practices and bring to fore the important aspects of ‘practical intelligibility’ of the investment professionals undertaking investment evaluation. I also identify and differentiate the ‘causal’ and ‘prefigurational’ relations between investment evaluation practices and the material entities ‘constituting’ those practices. Moreover, I demonstrate the role of practice memory in those practices. Finally, the thesis also contributes to the practice theory literature by identifying and attempting to clarify and/or improve the poorly defined and/or underdeveloped concepts of Schatzki’s ‘site’ ontology framework.
Resumo:
In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.
Resumo:
Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This idea was explored using a method that ensures interference cannot occur through energetic masking. Three-formant (F1+F2+F3) analogues of natural sentences were synthesized using a monotonous periodic source. Target formants were presented monaurally, with the target ear assigned randomly on each trial. A competitor for F2 (F2C) was presented contralaterally; listeners must reject F2C to optimize recognition. In experiment 1, F2Cs with various frequency and amplitude contours were used. F2Cs with time-varying frequency contours were effective competitors; constant-frequency F2Cs had far less impact. To a lesser extent, amplitude contour also influenced competitor impact; this effect was additive. In experiment 2, F2Cs were created by inverting the F2 frequency contour about its geometric mean and varying its depth of variation over a range from constant to twice the original (0%-200%). The impact on intelligibility was least for constant F2Cs and increased up to ∼100% depth, but little thereafter. The effect of an extraneous formant depends primarily on its frequency contour; interference increases as the depth of variation is increased until the range exceeds that typical for F2 in natural speech.
Resumo:
Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This idea was explored using a method that ensures interference occurs only through informational masking. Three-formant analogues of sentences were synthesized using a monotonous periodic source (F0 = 140 Hz). Target formants were presented monaurally; the target ear was assigned randomly on each trial. A competitor for F2 (F2C) was presented contralaterally; listeners must reject F2C to optimize recognition. In experiment 1, F2Cs with various frequency and amplitude contours were used. F2Cs with time-varying frequency contours were effective competitors; constant-frequency F2Cs had far less impact. Amplitude contour also influenced competitor impact; this effect was additive. In experiment 2, F2Cs were created by inverting the F2 frequency contour about its geometric mean and varying its depth of variation over a range from constant to twice the original (0–200%). The impact on intelligibility was least for constant F2Cs and increased up to ~100% depth, but little thereafter. The effect of an extraneous formant depends primarily on its frequency contour; interference increases as the depth of variation is increased until the range exceeds that typical for F2 in natural speech.
Resumo:
This is a qualitative and reflexive research with focus on digital literacy. Among the digital media that could support the teaching of argumentation in the Science & Technology and Information Technology undergraduate courses of the Federal University of Rio Grande do Norte, we chose a serious game as object of research. Given the object of study in the discipline of reading and writing II – argumentation and genre from the order of argumentative writing -, common to the undergraduate courses mentioned, we invest on the development of a serious game, named ArgumentACTION, because we believe that it may, in fact, become a promising didactic instrument. Therefore we intend to understand whether and how this game can help students develop their reading and writing skills more independently, specifically towards an argumentative order genre: the opinion piece. With this research, we intend to contribute to the teaching of the Portuguese language on three bases: extending theoretical scope, in order to generate greater intelligibility on the teaching-learning process of argumenting; proposing a new methodological possibility, with the incorporation of a serious games to teaching; perfecting the game with which we are working, in order to build – and make available – a more refined digital tool to subsidize the teaching and learning of reading and writing of opinion pieces. To do so, we use the following as theoretical-methodological: Studies of Literacy (KLEIMAN, 2012b; TINOCO, 2008; OLIVEIRA, 2010; GEE, 2009; 2010; ROJO, 2012), The Applied Linguistics (KLEIMAN, 1998; BUSH-LEE, 2009), The Philosophy of Language (BAKHTIN, VOLOSHINOV, 2012) and Critical Pedagogy (DEWEY, 2010). A group of students from the upper mentioned undergraduate courses collaborated with this research by playing and analyzing the game. They were also interviewed about their experience in this matter. From the data generated, we established the categories of analysis: decollection, interest, multimodality/multisemiosis and interactivity, agent of literacy, learning principles. The conclusions we obtained show that the investment in applications, especially games, can bring real benefits to the teaching/learning of the Portuguese language; moreover they reveal that the work on argumenting has much to gain with the incorporation of serious games; however the possible advantages depend on a focused teaching practice and constant improvements and updates of this type of interactive tool, as well as the pedagogical practice from those who use and develop the games.
Resumo:
The knowledge is only possible due to we exist bodily. However, during the educational experience, the epistemic potency of the body is neglected, declining the registers of the intelligibility. The current thesis approaches that problem obliquely: from a body and image philosophy which has revealed other ways of doing those registers in the modernity – understood not as period itself, but as a qualification for the negotiations between the real and the intelligible. The referred ways are explored through Merleau- Ponty’s and Michel Foucault’s works, which offer a spectrum about that new negotiation of the real. In order to approach the studied problem, the visibility and the human body motricity in the cinema are taken as analysis object. The mentioned objects have been analyzed through a corpus of movies of which plots are centered at the formal education and they require from the characters and the spectators engagement into a visual performance. Aiming to approach the object, it is questioned how the Education phenomenon is represented by the cinema; how the body is exposed and how spectators can see it. Analyzing the corpus and articulating Merleau- Ponty’s and Michel Foucault’s theories, it has been possible to state the following thesis: the cinema as an education of the gaze. The general objective of this study is to reveal the educational potency of the filmic experience, which provides a new path of intelligibility for Education. In that sense, the body as a visual operator widens the capacity of understanding the real. The current work is divided in three chapters. The first one brings the methodological approach: it is pointed how the theoretical articulation is properly arranged; it explains the method of using the images as indirect language as part of the reality description; the filmic corpus is presented, as well the criteria for the films choices and for the construction of instrument adopted during the object analysis are described. In the second chapter, it is problematized the incapacity of the western society of formulating the real discursively by debating Merleau-Ponty’s and Foucault’s theoretical contributions about the visual performance displayed on the images while the films are watched and analyzed. In the third chapter, the implications of the education of the gaze provided by the cinema are developed, mainly concerning about the place attributed to the visibility during the formulation of the real. Finally, paths are designed for the construction of another approach for the visibility in Education. Assuming the gaze as an experience of knowledge, this study aims to present other ways of being, seeing, thinking and feeling the world. Therefore, it is a proposal to reset the epistemic and subjectification patterns at the educational context.
Resumo:
The activities in labor judicial sphere are permeated by the use of diverse text genres, which are indispensable instruments to the accomplishment of actions that involve the jurisdictional realm. Among the genres that circulate in this domain, we selected the genre minutes of hearing as the object of study of this research, because it is a document supporting the actions, procedures and deliberations agreed, in hearings, by members involved in work-related litigation. In this study we aim to describe the elements that constitute the referred genre in what concerns its pragmatic, organizational and linguistic dimensions. Therefore, we use the postulations of Sociodiscursive Interactionism as theoretical framework, through the writings of Bronckart (2006; 2007; 2012), supported by Marcuschi studies (2008; 2010; 2011), Koch and Favero (1987), Elias Koch (2011; 2012) and Zanotto (2012). In methodological terms, it is characterized as a qualitative approach research (BOGDAN; BIKLEN, 1994; CHIZZOTTI, 2000; MOREIRA; CALEFFE, 2006) with aspects of an ethnographic work (ANDRÉ, 1995; CANÇADO, 1994). The discussion proposed inserts into the field of Applied Linguistics for it focalizes ―social issues and creates intelligibility about the social practices in which language plays the main role‖ (MOITA LOPES, 2006, p.14). The analyses indicate – regarding the pragmatic dimension – that the genre in focus constitutes artefact that enables the register of actions, deliberations, testimonials, procedures and occurrences established during the hearings and has as its interlocutors judges, litigants and their legal representatives. Concerning the organizational dimension, despite the genre in scope present a proposal of standardized writing, their examples contemplate variations and flexibility especially with regard to the development and outcome of the text. As for linguistic aspects, it is noticeable the presence of lexical choices inherent in the language used by labor law discourse community. Lastly, stand out the relevance of the research lies in the fact that it approaches, from the perspective of the Applied Linguistics, the forensic writing and, consequently, offers contributions to the understanding of such genre.
Resumo:
A versão impressa está dividida em volume 1 e 2.
Resumo:
OBJECTIVES: In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. DESIGN: Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. RESULTS: In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. CONCLUSIONS: The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids.
Resumo:
This study explored the effects on speech intelligibility of across-formant differences in fundamental frequency (ΔF0) and F0 contour. Sentence-length speech analogues were presented dichotically (left=F1+F3; right=F2), either alone or—because competition usually reveals grouping cues most clearly—accompanied in the left ear by a competitor for F2 (F2C) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour. In experiment 1, all left-ear formants shared the same constant F0 and ΔF0F2 was 0 or ±4 semitones. In experiment 2, all left-ear formants shared the natural F0 contour and that for F2 was natural, constant, exaggerated, or inverted. Adding F2C lowered keyword scores, presumably because of informational masking. The results for experiment 1 were complicated by effects associated with the direction of ΔF0F2; this problem was avoided in experiment 2 because all four F0 contours had the same geometric mean frequency. When the target formants were presented alone, scores were relatively high and did not depend on the F0F2 contour. F2C impact was greater when F2 had a different F0 contour from the other formants. This effect was a direct consequence of the associated ΔF0; the F0F2 contour per se did not influence competitor impact.
Resumo:
Speech perception routinely takes place in noisy or degraded listening environments, leading to ambiguity in the identity of the speech token. Here, I present one review paper and two experimental papers that highlight cognitive and visual speech contributions to the listening process, particularly in challenging listening environments. First, I survey the literature linking audiometric age-related hearing loss and cognitive decline and review the four proposed causal mechanisms underlying this link. I argue that future research in this area requires greater consideration of the functional overlap between hearing and cognition. I also present an alternative framework for understanding causal relationships between age-related declines in hearing and cognition, with emphasis on the interconnected nature of hearing and cognition and likely contributions from multiple causal mechanisms. I also provide a number of testable hypotheses to examine how impairments in one domain may affect the other. In my first experimental study, I examine the direct contribution of working memory (through a cognitive training manipulation) on speech in noise comprehension in older adults. My results challenge the efficacy of cognitive training more generally, and also provide support for the contribution of sentence context in reducing working memory load. My findings also challenge the ubiquitous use of the Reading Span test as a pure test of working memory. In a second experimental (fMRI) study, I examine the role of attention in audiovisual speech integration, particularly when the acoustic signal is degraded. I demonstrate that attentional processes support audiovisual speech integration in the middle and superior temporal gyri, as well as the fusiform gyrus. My results also suggest that the superior temporal sulcus is sensitive to intelligibility enhancement, regardless of how this benefit is obtained (i.e., whether it is obtained through visual speech information or speech clarity). In addition, I also demonstrate that both the cingulo-opercular network and motor speech areas are recruited in difficult listening conditions. Taken together, these findings augment our understanding of cognitive contributions to the listening process and demonstrate that memory, working memory, and executive control networks may flexibly be recruited in order to meet listening demands in challenging environments.