924 resultados para acoustic speech recognition system


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The article describes researches of a method of person recognition by face image based on Gabor wavelets. Scales of Gabor functions are determined at which the maximal percent of recognition for search of a person in a database and minimal percent of mistakes due to false alarm errors when solving an access control task is achieved. The carried out researches have shown a possibility of improvement of recognition system work parameters in the specified two modes when the volume of used data is reduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, most conventional surveillance networks are based on analog system, which has a lot of constraints like manpower and high-bandwidth requirements. It becomes the barrier for today's surveillance network development. This dissertation describes a digital surveillance network architecture based on the H.264 coding/decoding (CODEC) System-on-a-Chip (SoC) platform. The proposed digital surveillance network architecture includes three major layers: software layer, hardware layer, and the network layer. The following outlines the contributions to the proposed digital surveillance network architecture. (1) We implement an object recognition system and an object categorization system on the software layer by applying several Digital Image Processing (DIP) algorithms. (2) For better compression ratio and higher video quality transfer, we implement two new modules on the hardware layer of the H.264 CODEC core, i.e., the background elimination module and the Directional Discrete Cosine Transform (DDCT) module. (3) Furthermore, we introduce a Digital Signal Processor (DSP) sub-system on the main bus of H.264 SoC platforms as the major hardware support system for our software architecture. Thus we combine the software and hardware platforms to be an intelligent surveillance node. Lab results show that the proposed surveillance node can dramatically save the network resources like bandwidth and storage capacity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The move from Standard Definition (SD) to High Definition (HD) represents a six times increases in data, which needs to be processed. With expanding resolutions and evolving compression, there is a need for high performance with flexible architectures to allow for quick upgrade ability. The technology advances in image display resolutions, advanced compression techniques, and video intelligence. Software implementation of these systems can attain accuracy with tradeoffs among processing performance (to achieve specified frame rates, working on large image data sets), power and cost constraints. There is a need for new architectures to be in pace with the fast innovations in video and imaging. It contains dedicated hardware implementation of the pixel and frame rate processes on Field Programmable Gate Array (FPGA) to achieve the real-time performance. ^ The following outlines the contributions of the dissertation. (1) We develop a target detection system by applying a novel running average mean threshold (RAMT) approach to globalize the threshold required for background subtraction. This approach adapts the threshold automatically to different environments (indoor and outdoor) and different targets (humans and vehicles). For low power consumption and better performance, we design the complete system on FPGA. (2) We introduce a safe distance factor and develop an algorithm for occlusion occurrence detection during target tracking. A novel mean-threshold is calculated by motion-position analysis. (3) A new strategy for gesture recognition is developed using Combinational Neural Networks (CNN) based on a tree structure. Analysis of the method is done on American Sign Language (ASL) gestures. We introduce novel point of interests approach to reduce the feature vector size and gradient threshold approach for accurate classification. (4) We design a gesture recognition system using a hardware/ software co-simulation neural network for high speed and low memory storage requirements provided by the FPGA. We develop an innovative maximum distant algorithm which uses only 0.39% of the image as the feature vector to train and test the system design. Database set gestures involved in different applications may vary. Therefore, it is highly essential to keep the feature vector as low as possible while maintaining the same accuracy and performance^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN]In face recognition, where high-dimensional representation spaces are generally used, it is very important to take advantage of all the available information. In particular, many labelled facial images will be accumulated while the recognition system is functioning, and due to practical reasons some of them are often discarded. In this paper, we propose an algorithm for using this information. The algorithm has the fundamental characteristic of being incremental. On the other hand, the algorithm makes use of a combination of classification results for the images in the input sequence. Experiments with sequences obtained with a real person detection and tracking system allow us to analyze the performance of the algorithm, as well as its potential improvements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The suitable operation of mobile robots when providing Ambient Assisted Living (AAL) services calls for robust object recognition capabilities. Probabilistic Graphical Models (PGMs) have become the de-facto choice in recognition systems aiming to e ciently exploit contextual relations among objects, also dealing with the uncertainty inherent to the robot workspace. However, these models can perform in an inco herent way when operating in a long-term fashion out of the laboratory, e.g. while recognizing objects in peculiar con gurations or belonging to new types. In this work we propose a recognition system that resorts to PGMs and common-sense knowledge, represented in the form of an ontology, to detect those inconsistencies and learn from them. The utilization of the ontology carries additional advantages, e.g. the possibility to verbalize the robot's knowledge. A primary demonstration of the system capabilities has been carried out with very promising results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recording and processing of voice data raises increasing privacy concerns for users and service providers. One way to address these issues is to move processing on the edge device closer to the recording so that potentially identifiable information is not transmitted over the internet. However, this is often not possible due to hardware limitations. An interesting alternative is the development of voice anonymization techniques that remove individual speakers characteristics while preserving linguistic and acoustic information in the data. In this work, a state-of-the-art approach to sequence-to-sequence speech conversion, ini- tially based on x-vectors and bottleneck features for automatic speech recognition, is explored to disentangle the two acoustic information using different pre-trained speech and speakers representation. Furthermore, different strategies for selecting target speech representations are analyzed. Results on public datasets in terms of equal error rate and word error rate show that good privacy is achieved with limited impact on converted speech quality relative to the original method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The differences in spectral shape resolution abilities among cochlear implant ~CI! listeners, and between CI and normal-hearing ~NH! listeners, when listening with the same number of channels ~12!, was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 4IFC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners ~average thresholds of approximately 3000 and 400 Hz, respectively!, and wide variability among CI listeners ~range of approximately 800 to 8000 Hz!. There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4–6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech understanding disorders in the elderly may be due to peripheral or central auditory dysfunctions. Asymmetry of results in dichotic testing increases with age, and may reflect on a lack of inter-hemisphere transmission and cognitive decline. Aim: To investigate auditory processing of aged people with no hearing complaints. Study design: clinical prospective. Materials and Methods: Twenty-two voluntary individuals, aged between 55 and 75 years, were evaluated. They reported no hearing complaints and had maximal auditory thresholds of 40 dB HL until 4 KHz, 80% of minimal speech recognition scores and peripheral symmetry between the ears. We used two kinds of tests: speech in noise and dichotic alternated dissyllables (SSW). Results were compared between males and females, right and left ears and between age groups. Results: There were no significant differences between genders, in both tests. Their Left ears showed worse results, in the competitive condition of SSW. Individuals aged 65 or older had poorer performances than those aged 55 to 64. Conclusion: Central auditory tests showed worse performance with aging. The employment of a dichotic test in the auditory evaluation setting in the elderly may help in the early identification of degenerative processes, which are common among these patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The HCI community is actively seeking novel methodologies to gain insight into the user’s experience during interaction with both the application and the content. We propose an emotional recognition engine capable of automatically recognizing a set of human emotional states using psychophysiological measures of the autonomous nervous system, including galvanic skin response, respiration, and heart rate. A novel pattern recognition system, based on discriminant analysis and support vector machine classifiers is trained using movies’ scenes selected to induce emotions ranging from the positive to the negative valence dimension, including happiness, anger, disgust, sadness, and fear. In this paper we introduce an emotion recognition system and evaluate its accuracy by presenting the results of an experiment conducted with three physiologic sensors.