990 resultados para Speech data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Developmental speech disorder is accounted for by theories derived from psychology, psycholinguistics, linguistics and medicine, with researchers developing assessment protocols that reflect their theoretical perspective. How theory and data analyses lead to different therapy approaches, however, is sometimes unclear. Here, we present a case management plan for a 7 year old boy with unintelligible speech. Assessment data were analysed to address seven case management questions regarding need for intervention, service delivery, differential diagnosis, intervention goals, generalization of therapeutic gains, discharge criteria and evaluation of efficacy. Jarrod was diagnosed as having inconsistent speech disorder that required intervention. He pronounced 88% of words differently when asked to name each word in the 25 word inconsistency test of the Diagnostic Evaluation of Articulation and Phonology three times, each trial separated by another activity. Other standardized assessments supported the diagnosis of inconsistent speech disorder that, according to previous research, is associated with a deficit in phonological assembly. Core vocabulary intervention was chosen as the most appropriate therapy technique. Its nature and a possible protocol for implementation is described.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigated the feasibility of assessing childhood speech disorders via an Internet-based telehealth system (eREHAB). The equipment provided videoconferencing through a 128 kbit/s Internet link, and enabled the transfer of pre-recorded video and audio data from the participant to the online clinician. Six children (mean age = 5.3 years) with a speech disorder were studied. Assessments of single-word articulation, intelligibility in conversation, and oro-motor structure and function were conducted for each participant, with simultaneous scoring by a face to face and an online clinician. There were high levels of agreement between the two scoring environments for single-word articulation (92%), speech intelligibility (100%) and oro-motor tasks (91%). High levels of inter- and intra-rater agreement were achieved for the online ratings for most measures. The results suggest that an Internet-based assessment protocol has potential for assessing paediatric speech disorders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses methodological issues in the development of a multitiered, phonetic annotation system, intended to capture pronunciation variation in the speech of second language learners and to serve in construction of a data base for training ASR models to recognize major pronunciation variants in the assessment of accented English.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have conducted a preliminary validation of an Internet-based telehealth application for assessing motor speech disorders in adults with acquired neurological impairment. The videoconferencing module used NetMeeting software to provide realtime videoconferencing through a 128 kbit/s Internet link, as well as the transfer of store-and-forward video and audio data from the participant to the clinician. Ten participants with dysarthria following acquired brain injury were included in the study. An assessment of the overall severity of the speech disturbance was made for each participant face to face (FTF) and in the online environment, in addition, a 23-item version of the Frenchay Dysarthria Assessment (FDA) (which measures motor speech function) and the Assessment of Intelligibility of Dysarthric Speech (ASSIDS) (which gives the percentage word and sentence intelligibility, words per minute and a rating of communication efficiency) were administered in both environments. There was a 90% level of agreement between the two assessment environments for the rating of overall severity of dysarthria. A 70-100% level of agreement was achieved for 17 (74%) of the 23 FDA variables. On the ASSIDS there was a significant difference between the FTF and online assessments only for percentage word intelligibility. These findings suggest that Internet-based assessment has potential as a reliable method for assessing motor speech disorders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of digital communication systems is increasing very rapidly. This is due to lower system implementation cost compared to analogue transmission and at the same time, the ease with which several types of data sources (data, digitised speech and video, etc.) can be mixed. The emergence of packet broadcast techniques as an efficient type of multiplexing, especially with the use of contention random multiple access protocols, has led to a wide-spread application of these distributed access protocols in local area networks (LANs) and a further extension of them to radio and mobile radio communication applications. In this research, a proposal for a modified version of the distributed access contention protocol which uses the packet broadcast switching technique has been achieved. The carrier sense multiple access with collision avoidance (CSMA/CA) is found to be the most appropriate protocol which has the ability to satisfy equally the operational requirements for local area networks as well as for radio and mobile radio applications. The suggested version of the protocol is designed in a way in which all desirable features of its precedents is maintained. However, all the shortcomings are eliminated and additional features have been added to strengthen its ability to work with radio and mobile radio channels. Operational performance evaluation of the protocol has been carried out for the two types of non-persistent and slotted non-persistent, through mathematical and simulation modelling of the protocol. The results obtained from the two modelling procedures validate the accuracy of both methods, which compares favourably with its precedent protocol CSMA/CD (with collision detection). A further extension of the protocol operation has been suggested to operate with multichannel systems. Two multichannel systems based on the CSMA/CA protocol for medium access are therefore proposed. These are; the dynamic multichannel system, which is based on two types of channel selection, the random choice (RC) and the idle choice (IC), and the sequential multichannel system. The latter has been proposed in order to supress the effect of the hidden terminal, which always represents a major problem with the usage of the contention random multiple access protocols with radio and mobile radio channels. Verification of their operation performance evaluation has been carried out using mathematical modelling for the dynamic system. However, simulation modelling has been chosen for the sequential system. Both systems are found to improve system operation and fault tolerance when compared to single channel operation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The need for low bit-rate speech coding is the result of growing demand on the available radio bandwidth for mobile communications both for military purposes and for the public sector. To meet this growing demand it is required that the available bandwidth be utilized in the most economic way to accommodate more services. Two low bit-rate speech coders have been built and tested in this project. The two coders combine predictive coding with delta modulation, a property which enables them to achieve simultaneously the low bit-rate and good speech quality requirements. To enhance their efficiency, the predictor coefficients and the quantizer step size are updated periodically in each coder. This enables the coders to keep up with changes in the characteristics of the speech signal with time and with changes in the dynamic range of the speech waveform. However, the two coders differ in the method of updating their predictor coefficients. One updates the coefficients once every one hundred sampling periods and extracts the coefficients from input speech samples. This is known in this project as the Forward Adaptive Coder. Since the coefficients are extracted from input speech samples, these must be transmitted to the receiver to reconstruct the transmitted speech sample, thus adding to the transmission bit rate. The other updates its coefficients every sampling period, based on information of output data. This coder is known as the Backward Adaptive Coder. Results of subjective tests showed both coders to be reasonably robust to quantization noise. Both were graded quite good, with the Forward Adaptive performing slightly better, but with a slightly higher transmission bit rate for the same speech quality, than its Backward counterpart. The coders yielded acceptable speech quality of 9.6kbps for the Forward Adaptive and 8kbps for the Backward Adaptive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions. To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration. Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Common approaches to IP-traffic modelling have featured the use of stochastic models, based on the Markov property, which can be classified into black box and white box models based on the approach used for modelling traffic. White box models, are simple to understand, transparent and have a physical meaning attributed to each of the associated parameters. To exploit this key advantage, this thesis explores the use of simple classic continuous-time Markov models based on a white box approach, to model, not only the network traffic statistics but also the source behaviour with respect to the network and application. The thesis is divided into two parts: The first part focuses on the use of simple Markov and Semi-Markov traffic models, starting from the simplest two-state model moving upwards to n-state models with Poisson and non-Poisson statistics. The thesis then introduces the convenient to use, mathematically derived, Gaussian Markov models which are used to model the measured network IP traffic statistics. As one of the most significant contributions, the thesis establishes the significance of the second-order density statistics as it reveals that, in contrast to first-order density, they carry much more unique information on traffic sources and behaviour. The thesis then exploits the use of Gaussian Markov models to model these unique features and finally shows how the use of simple classic Markov models coupled with use of second-order density statistics provides an excellent tool for capturing maximum traffic detail, which in itself is the essence of good traffic modelling. The second part of the thesis, studies the ON-OFF characteristics of VoIP traffic with reference to accurate measurements of the ON and OFF periods, made from a large multi-lingual database of over 100 hours worth of VoIP call recordings. The impact of the language, prosodic structure and speech rate of the speaker on the statistics of the ON-OFF periods is analysed and relevant conclusions are presented. Finally, an ON-OFF VoIP source model with log-normal transitions is contributed as an ideal candidate to model VoIP traffic and the results of this model are compared with those of previously published work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mobile technologies have yet to be widely adopted by the Architectural, Engineering, and Construction (AEC) industry despite being one of the major growth areas in computing in recent years. This lack of uptake in the AEC industry is likely due, in large part, to the combination of small screen size and inappropriate interaction demands of current mobile technologies. This paper discusses the scope for multimodal interaction design with a specific focus on speech-based interaction to enhance the suitability of mobile technology use within the AEC industry by broadening the field data input capabilities of such technologies. To investigate the appropriateness of using multimodal technology for field data collection in the AEC industry, we have developed a prototype Multimodal Field Data Entry (MFDE) application. This application, which allows concrete testing technicians to record quality control data in the field, has been designed to support two different modalities of data input speech-based data entry and stylus-based data entry. To compare the effectiveness or usability of, and user preference for, the different input options, we have designed a comprehensive lab-based evaluation of the application. To appropriately reflect the anticipated context of use within the study design, careful consideration had to be given to the key elements of a construction site that would potentially influence a test technician's ability to use the input techniques. These considerations and the resultant evaluation design are discussed in detail in this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, we have witnessed the mushrooming of pro- democracy and protest movements not only in the Arab world, but also within Europe and the Americas. Such movements have ranged from popular upheavals, like in Tunisia and Egypt, to the organization of large- scale demonstrations against unpopular policies, as in Spain, Greece and Poland. What connects these different events are not only their democratic aspirations, but also their innovative forms of communication and organization through online means, which are sometimes considered to be outside of the State’s control. At the same time, however, it has become more and more apparent that countries are attempting to increase their understanding of, and control over, their citizens’ actions in the digital sphere. This involves striving to develop surveillance instruments, control mechanisms and processes engineered to dominate the digital public sphere, which necessitates the assistance and support of private actors such as Internet intermediaries. Examples include the growing use of Internet surveillance technology with which online data traffic is analysed, and the extensive monitoring of social networks. Despite increased media attention, academic debate on the ambivalence of these technologies, mechanisms and techniques remains relatively limited, as is discussion of the involvement of corporate actors. The purpose of this edited volume is to reflect on how Internet-related technologies, mechanisms and techniques may be used as a means to enable expression, but also to restrict speech, manipulate public debate and govern global populaces.