15 resultados para Speech enhancement systems
em Aston University Research Archive
Resumo:
Magnification can be provided to assist those with visual impairment to make the best use of remaining vision. Electronic transverse magnification of an object was first conceived for use in low vision in the late 1950s, but has developed slowly and is not extensively prescribed because of its relatively high cost and lack of portability. Electronic devices providing transverse magnification have been termed closed-circuit televisions (CCTVs) because of the direct cable link between the camera imaging system and monitor viewing system, but this description generally refers to surveillance devices and does not indicate the provision of features such as magnification and contrast enhancement. Therefore, the term Electronic Vision Enhancement Systems (EVES) is proposed to better distinguish and describe such devices. This paper reviews current knowledge on EVES for the visually impaired in terms of: classification; hardware and software (development of technology, magnification and field-of-view, contrast and image enhancement); user aspects (users and usage, reading speed and duration, and training); and potential future development of EVES. © 2003 The College of Optometrists.
Resumo:
PURPOSE: To examine whether objective performance of near tasks is improved with various electronic vision enhancement systems (EVES) compared with the subject's own optical magnifier. DESIGN: Experimental study, randomized, within-patient design. METHODS: This was a prospective study, conducted in a hospital ophthalmology low-vision clinic. The patient population comprised 70 sequential visually impaired subjects. The magnifying devices examined were: patient's optimum optical magnifier; magnification and field-of-view matched mouse EVES with monitor or head-mounted display (HMD) viewing; and stand EVES with monitor viewing. The tasks performed were: reading speed and acuity; time taken to track from one column of print to the next; follow a route map, and locate a specific feature; and identification of specific information from a medicine label. RESULTS: Mouse EVES with HMD viewing caused lower reading speeds than stand EVES with monitor viewing (F = 38.7, P < .001). Reading with the optical magnifier was slower than with the mouse or stand EVES with monitor viewing at smaller print sizes (P < .05). The column location task was faster with the optical magnifier than with any of the EVES (F = 10.3, P < .001). The map tracking and medicine label identification task was slower with the mouse EVES with HMD viewing than with the other magnifiers (P < .01). Previous EVES experience had no effect on task performance (P > .05), but subjects with previous optical magnifier experience were significantly slower at performing the medicine label identification task with all of the EVES (P < .05). CONCLUSIONS: Although EVES provide objective benefits to the visually impaired in reading speed and acuity, together with some specific near tasks, some can be performed just as fast using optical magnification. © 2003 by Elsevier Inc. All rights reserved.
Resumo:
The present thesis focuses on the overall structure of the language of two types of Speech Exchange Systems (SES) : Interview (INT) and Conversation (CON). The linguistic structure of INT and CON are quantitatively investigated on three different but interrelated levels of analysis : Lexis, Syntax and Information Structure. The corpus of data 1n vest1gated for the project consists of eight sessions of pairs of conversants in carefully planned interviews followed by unplanned, surreptitiously recorded conversational encounters of the same pairs of speakers. The data comprise a total of approximately 15.200 words of INT talk and of about 19.200 words in CON. Taking account of the debatable assumption that the language of SES might be complex on certain linguistic levels (e.g. syntax) (Halliday 1979) and might be simple on others (e.g. lexis) in comparison to written discourse, the thesis sets out to investigate this complexity using a statistical approach to the computation of the structures recurrent in the language of INT and CON. The findings indicate clearly the presence of linguistic complexity in both types. They also show the language of INT to be slightly more syntactically and lexically complex than that of CON. Lexical density seems to be relatively high in both types of spoken discourse. The language of INT seems to be more complex than that of CON on the level of information structure too. This is manifested in the greater use of Inferable and other linguistically complex entities of discourse. Halliday's suggestion that the language of SES is syntactically complex is confirmed but not the one that the more casual the conversation is the more syntactically complex it becomes. The results of the analysis point to the general conclusion that the linguistic complexity of types of SES is not only in the high recurrence of syntactic structures, but also in the combination of these features with each other and with other linguistic and extralinguistic features. The linguistic analysis of the language of SES can be useful in understanding and pinpointing the intricacies of spoken discourse in general and will help discourse analysts and applied linguists in exploiting it both for theoretical and pedagogical purposes.
Resumo:
This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech. It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.
Resumo:
This paper examines the connected speech process described by Wells (1982b) as the T to R rule in the West Midlands speech variety associated with the Black Country. The T to R rule is well known as a linguistic marker of local varieties of the middle and far north of England. Less well understood is its position in the phonological systems of Midlands varieties. Varieties of the Midlands of England are underresearched in comparison with varieties of the north, and what is known about the application of the T to R rule in this transitional dialect area is correspondingly nebulous. This paper focuses on the Black Country area, and examines the possible outputs in the contexts which give rise to /t/ becoming [?] in local varieties of the north. I examine the written and spoken evidence which suggests that the T to R rule does indeed operate in the Black Country variety. My analysis focuses on possible phonetic outcomes of the T to R rule across time. In my conclusion, I discuss briefly the possibility that, lying on a bundle of isoglosses separating north from south, the variety of the Black Country reflects this in that a T to [?] rule, rather than a T to R rule, is the dominant output of this connected speech process in the Black Country.
Resumo:
Cellular mobile radio systems will be of increasing importance in the future. This thesis describes research work concerned with the teletraffic capacity and the canputer control requirements of such systems. The work involves theoretical analysis and experimental investigations using digital computer simulation. New formulas are derived for the congestion in single-cell systems in which there are both land-to-mobile and mobile-to-mobile calls and in which mobile-to-mobile calls go via the base station. Two approaches are used, the first yields modified forms of the familiar Erlang and Engset formulas, while the second gives more complicated but more accurate formulas. The results of computer simulations to establish the accuracy of the formulas are described. New teletraffic formulas are also derived for the congestion in multi -cell systems. Fixed, dynamic and hybrid channel assignments are considered. The formulas agree with previously published simulation results. Simulation programs are described for the evaluation of the speech traffic of mobiles and for the investigation of a possible computer network for the control of the speech traffic. The programs were developed according to the structured progranming approach leading to programs of modular construction. Two simulation methods are used for the speech traffic: the roulette method and the time-true method. The first is economical but has some restriction, while the second is expensive but gives comprehensive answers. The proposed control network operates at three hierarchical levels performing various control functions which include: the setting-up and clearing-down of calls, the hand-over of calls between cells and the address-changing of mobiles travelling between cities. The results demonstrate the feasibility of the control netwvork and indicate that small mini -computers inter-connected via voice grade data channels would be capable of providing satisfactory control
Resumo:
The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.
Resumo:
This thesis describes work undertaken in order to fulfil a need experienced in the Department of Educational Enquiry at the University of Aston in Birmingham for speech analysis facilities suitable for use in teaching and research work within the Department. The hardware and software developed during the research project provides displays of speech fundamental frequency and intensity in real time. The system is suitable for the provision of visual feedback of these parameters of a subject's speech in a learning situation, and overcomes the inadequacies of equipment currently used for this task in that it provides a clear indication of fundamental frequency contours as the subject is speaking. The thesis considers the use of such equipment in several related fields, and the approaches that have been reported to one of the major problems of speech analysis, namely pitch-period estimation. A number of different systems are described, and their suitability for the present purposes is discussed. Finally, a novel method of pitch-period estimation is developed, and a speech analysis system incorporating this method is described. Comparison is made between the results produced by this system and those produced by a conventional speech spectrograph.
Resumo:
Multiple-antenna systems offer significant performance enhancement and will be applied to the next generation broadband wireless communications. This thesis presents the investigations of multiple-antenna systems – multiple-input multiple-output (MIMO) and cooperative communication (CC) – and their performances in more realistic propagation environments than those reported previously. For MIMO systems, the investigations are conducted via theoretical modelling and simulations in a double-scattering environment. The results show that the variations of system performances depend on how scatterer density varies in flat fading channels, and that in frequency-selective fading channels system performances are affected by the length of the coding block as well as scatterer density. In realistic propagation environments, the fading correlation also has an impact on CC systems where the antennas can be further apart than those in MIMO systems. A general stochastic model is applied to studying the effects of fading correlation on the performances of CC systems. This model reflects the asymmetry fact of the wireless channels in a CC system. The results demonstrate the varied effects of fading correlation under different protocols and channel conditions. Performances of CC systems are further studied at the packet level, using both simulations and an experimental testbed. The results obtained have verified various performance trade-offs of the cooperative relaying network (CRN) investigated in different propagation environments. The results suggest that a proper selection of the relaying algorithms and other techniques can meet the requirements of quality of service for different applications.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
Optically multiplexed multi-carrier systems with channel spacing reduced to the symbol rate per carrier are highly susceptible to inter-channel crosstalk, which places stringent requirements for the specifications of system components and hinders the use of high-level formats. In this paper, we investigate the performance benefits of using offset 4-, 16-, and 64-quadrature amplitude modulation (QAM) in coherent wavelength division multiplexing (CoWDM). We compare this system with recently reported Nyquist WDM and no-guard-interval optical coherent orthogonal frequency division multiplexing, and show that the presented system greatly relaxes the requirements for device specifications and enhances the spectral efficiency by enabling the use of high-level QAM. The achieved performance can approach the theoretical limits using practical components.
Resumo:
We investigate electronic mitigation of linear and non-linear fibre impairments and compare various digital signal processing techniques, including electronic dispersion compensation (EDC), single-channel back-propagation (SC-BP) and back-propagation with multiple channel processing (MC-BP) in a nine-channel 112 Gb/s PM-mQAM (m=4,16) WDM system, for reaches up to 6,320 km. We show that, for a sufficiently high local dispersion, SC-BP is sufficient to provide a significant performance enhancement when compared to EDC, and is adequate to achieve BER below FEC threshold. For these conditions we report that a sampling rate of two samples per symbol is sufficient for practical SC-BP, without significant penalties.
Resumo:
Tissue Transglutaminase (TG2) and FXIIIa, members of the transglutaminase (TG) family, catalyses a transamidating reaction and form covalent bond between or within proteins. In bone development, both enzymes expressions correlate with the initial of the mineralisation process by osteoblasts and chondrocytes. Exogenous TG2 also promotes maturation of chondrocytes and mineralisation in pre-osteoblasts. To understand the role of endogenous TG2 in osteoblast mineralisation, the TG2 expression was examined during the human osteoblast (HOB) mineralisation. The expression of the endogenous TG2 increased during the mineralisation, yet, its expression was not essential for mineral deposition due to the compensation effect by other members in the TG family. The extracellular transamidating activity of HOBs was found increased during mineralisation and a shift from FXIIIa dominant- to TG2-dominant crosslinking activity was suggested after differentiation. However, the transamidating activity of both TG2 and FXIIIa were not critical for cell mineralisation. On the other hand, Exogenous TG2 was found to enhance wild type HOB and TG2 knockdown HOB mineral deposition. The transamidating activity of TG2 was not required but most likely a close conformation was essential for this enhancement. Results also demonstrated that exogenous TG2 may activate the ß-catenin pathway through LRP5 receptor thus contribute in cell mineralisation. This enhancement could be abolished by addition of ß-catenin inhibitors. Finally, using of TG2 crosslinked collagen gel for bone and cornea repair was evaluated. Crosslinked collagen gel showed promising results in improving HOB mineralisation, human corneal fibroblast (hCF) proliferation and migration. These effects might be resulted from the trapped TG2 within the collagen matrix and the alteration of matrix topography by TG2.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
The authors present the impact of asymmetric filtering of strong (e.g. 35 GHz) optical filters on the performance of 42.7 Gb/s 67% (carrier suppressed return to zero)-differential phase shift keying systems. The performance is examined (in an amplified spontaneous emission (ASE) noise-limited regime and in the presence of chromatic dispersion) when offsetting the filter at the receiver by substantial amounts via balanced, constructive and destructive single-ended detections. It is found that with a slight offset (vestigial side band) or an offset of almost half of the modulation frequency (single-side band), there is a significant improvement in the calculated 'Q'. © The Institution of Engineering and Technology 2013.