63 resultados para Speech processing systems.
Resumo:
One of the most pervading concepts underlying computational models of information processing in the brain is linear input integration of rate coded uni-variate information by neurons. After a suitable learning process this results in neuronal structures that statically represent knowledge as a vector of real valued synaptic weights. Although this general framework has contributed to the many successes of connectionism, in this paper we argue that for all but the most basic of cognitive processes, a more complex, multi-variate dynamic neural coding mechanism is required - knowledge should not be spacially bound to a particular neuron or group of neurons. We conclude the paper with discussion of a simple experiment that illustrates dynamic knowledge representation in a spiking neuron connectionist system.
Resumo:
The advantages of standard bus systems have been appreciated for many years. The ability to connect only those modules required to perform a given task has both technical and commercial advantages over a system with a fixed architecture which cannot be easily expanded or updated. Although such bus standards have proliferated in the microprocessor field, a general purpose low-cost standard for digital video processing has yet to gain acceptance. The paper describes the likely requirements of such a system, and discusses three currently available commercial systems. A new bus specification known as Vidibus, developed to fulfil these requirements, is presented. Results from applications already implemented using this real-time bus system are also given.
Resumo:
Emergency vehicles use high-amplitude sirens to warn pedestrians and other road users of their presence. Unfortunately, the siren noise enters the vehicle and corrupts the intelligibility of two-way radio voice com-munications from the emergency vehicle to a control room. Often the siren has to be turned off to enable the control room to hear what is being said which subsequently endangers people's lives. A digital signal processing (DSP) based system for the cancellation of siren noise embedded within speech is presented. The system has been tested with the least mean square (LMS), normalised least mean square (NLMS) and affine projection algorithm (APA) using recordings from three common types of sirens (two-tone, wail and yelp) from actual test vehicles. It was found that the APA with a projection order of 2 gives comparably improved cancellation over the LMS and NLMS with only a moderate increase in algorithm complexity and code size. Therefore, this siren noise cancellation system using the APA offers an improvement in cancellation achieved by previous systems. The removal of the siren noise improves the response time for the emergency vehicle and thus the system can contribute to saving lives. The system also allows voice communication to take place even when the siren is on and as such the vehicle offers less risk of danger when moving at high speeds in heavy traffic.
Resumo:
This paper presents a review of the design and development of the Yorick series of active stereo camera platforms and their integration into real-time closed loop active vision systems, whose applications span surveillance, navigation of autonomously guided vehicles (AGVs), and inspection tasks for teleoperation, including immersive visual telepresence. The mechatronic approach adopted for the design of the first system, including head/eye platform, local controller, vision engine, gaze controller and system integration, proved to be very successful. The design team comprised researchers with experience in parallel computing, robot control, mechanical design and machine vision. The success of the project has generated sufficient interest to sanction a number of revisions of the original head design, including the design of a lightweight compact head for use on a robot arm, and the further development of a robot head to look specifically at increasing visual resolution for visual telepresence. The controller and vision processing engines have also been upgraded, to include the control of robot heads on mobile platforms and control of vergence through tracking of an operator's eye movement. This paper details the hardware development of the different active vision/telepresence systems.
Resumo:
This paper investigates the robustness of a hybrid analog/digital feedback active noise cancellation (ANC) headset system. The digital ANC systems with the filtered-x least-mean-square (FXLMS) algorithm require accurate estimation of the secondary path for the stability and convergence of the algorithm. This demands a great challenge for the ANC headset design because the secondary path may fluctuate dramatically such as when the user adjusts the position of the ear-cup. In this paper, we analytically show that adding an analog feedback loop into the digital ANC systems can effectively reduce the plant fluctuation, thus achieving a more robust system. The method for designing the analog controller is highlighted. A practical hybrid analog/digital feedback ANC headset has been built and used to conduct experiments, and the experimental results show that the hybrid headset system is more robust under large plant fluctuation, and has achieved satisfactory noise cancellation for both narrowband and broadband noises.
Resumo:
Adaptive least mean square (LMS) filters with or without training sequences, which are known as training-based and blind detectors respectively, have been formulated to counter interference in CDMA systems. The convergence characteristics of these two LMS detectors are analyzed and compared in this paper. We show that the blind detector is superior to the training-based detector with respect to convergence rate. On the other hand, the training-based detector performs better in the steady state, giving a lower excess mean-square error (MSE) for a given adaptation step size. A novel decision-directed LMS detector which achieves the low excess MSE of the training-based detector and the superior convergence performance of the blind detector is proposed.
Resumo:
Three experiments measured constancy in speech perception, using natural-speech messages or noise-band vocoder versions of them. The eight vocoder-bands had equally log-spaced center-frequencies and the shapes of corresponding “auditory” filters. Consequently, the bands had the temporal envelopes that arise in these auditory filters when the speech is played. The “sir” or “stir” test-words were distinguished by degrees of amplitude modulation, and played in the context; “next you’ll get _ to click on.” Listeners identified test-words appropriately, even in the vocoder conditions where the speech had a “noise-like” quality. Constancy was assessed by comparing the identification of test-words with low or high levels of room reflections across conditions where the context had either a low or a high level of reflections. Constancy was obtained with both the natural and the vocoded speech, indicating that the effect arises through temporal-envelope processing. Two further experiments assessed perceptual weighting of the different bands, both in the test word and in the context. The resulting weighting functions both increase monotonically with frequency, following the spectral characteristics of the test-word’s [s]. It is suggested that these two weighting functions are similar because they both come about through the perceptual grouping of the test-word’s bands.
Resumo:
A characterization of observability for linear time-varying descriptor systemsE(t)x(t)+F(t)x(t)=B(t)u(t), y(t)=C(t)x(t) was recently developed. NeitherE norC were required to have constant rank. This paper defines a dual system, and a type of controllability so that observability of the original system is equivalent to controllability of the dual system. Criteria for observability and controllability are given in terms of arrays of derivatives of the original coefficients. In addition, the duality results of this paper lead to an improvement on a previous fundamental structure result for solvable systems of the formE(t)x(t)+F(t)x(t)=f(tt).
Resumo:
In this paper, a new model-based proportional–integral–derivative (PID) tuning and controller approach is introduced for Hammerstein systems that are identified on the basis of the observational input/output data. The nonlinear static function in the Hammerstein system is modelled using a B-spline neural network. The control signal is composed of a PID controller, together with a correction term. Both the parameters in the PID controller and the correction term are optimized on the basis of minimizing the multistep ahead prediction errors. In order to update the control signal, the multistep ahead predictions of the Hammerstein system based on B-spline neural networks and the associated Jacobian matrix are calculated using the de Boor algorithms, including both the functional and derivative recursions. Numerical examples are utilized to demonstrate the efficacy of the proposed approaches.
Resumo:
Background: Word deafness is a rare condition where pathologically degraded speech perception results in impaired repetition and comprehension but otherwise intact linguistic skills. Although impaired linguistic systems in aphasias resulting from damage to the neural language system (here termed central impairments), have been consistently shown to be amenable to external influences such as linguistic or contextual information (e.g. cueing effects in naming), it is not known whether similar influences can be shown for aphasia arising from damage to a perceptual system (here termed peripheral impairments). Aims: This study aimed to investigate the extent to which pathologically degraded speech perception could be facilitated or disrupted by providing visual as well as auditory information. Methods and Procedures: In three word repetition tasks, the participant with word deafness (AB) repeated words under different conditions: words were repeated in the context of a pictorial or written target, a distractor (semantic, unrelated, rhyme or phonological neighbour) or a blank page (nothing). Accuracy and error types were analysed. Results: AB was impaired at repetition in the blank condition, confirming her degraded speech perception. Repetition was significantly facilitated when accompanied by a picture or written example of the word and significantly impaired by the presence of a written rhyme. Errors in the blank condition were primarily formal whereas errors in the rhyme condition were primarily miscues (saying the distractor word rather than the target). Conclusions: Cross-modal input can both facilitate and further disrupt repetition in word deafness. The cognitive mechanisms behind these findings are discussed. Both top-down influence from the lexical layer on perceptual processes as well as intra-lexical competition within the lexical layer may play a role.
Resumo:
Background and aims: In addition to the well-known linguistic processing impairments in aphasia, oro-motor skills and articulatory implementation of speech segments are reported to be compromised to some degree in most types of aphasia. This study aimed to identify differences in the characteristics and coordination of lip movements in the production of a bilabial closure gesture between speech-like and nonspeech tasks in individuals with aphasia and healthy control subjects. Method and procedure: Upper and lower lip movement data were collected for a speech-like and a nonspeech task using an AG 100 EMMA system from five individuals with aphasia and five age and gender matched control subjects. Each task was produced at two rate conditions (normal and fast), and in a familiar and a less-familiar manner. Single articulator kinematic parameters (peak velocity, amplitude, duration, and cyclic spatio-temporal index) and multi-articulator coordination indices (average relative phase and variability of relative phase) were measured to characterize lip movements. Outcome and results: The results showed that when the two lips had similar task goals (bilabial closure) in speech-like versus nonspeech task, kinematic and coordination characteristics were not found to be different. However, when changes in rate were imposed on the bilabial gesture, only speech-like task showed functional adaptations, indicated by a greater decrease in amplitude and duration at fast rates. In terms of group differences, individuals with aphasia showed smaller amplitudes and longer movement durations for upper lip, higher spatio-temporal variability for both lips, and higher variability in lip coordination than the control speakers. Rate was an important factor in distinguishing the two groups, and individuals with aphasia were limited in implementing the rate changes. Conclusion and implications: The findings support the notion of subtle but robust differences in motor control characteristics between individuals with aphasia and the control participants, even in the context of producing bilabial closing gestures for a relatively simple speech-like task. The findings also highlight the functional differences between speech-like and nonspeech tasks, despite a common movement coordination goal for bilabial closure.
Resumo:
The "marketing" sector in Muth's single-stage model is disaggregated in to two sequential stages: "processing" and "distribution. "Comparative statics are used to derive necessary and sufficient conditions for farmers to gain from downstream research. The farm benefits are shown to depend crucially on the stage in "marketing" to which research is directed.