945 resultados para Acoustic Arrays, Array Signal Processing, Calibration, Speech Enhancement
Resumo:
In Part I [""Fast Transforms for Acoustic Imaging-Part I: Theory,"" IEEE TRANSACTIONS ON IMAGE PROCESSING], we introduced the Kronecker array transform (KAT), a fast transform for imaging with separable arrays. Given a source distribution, the KAT produces the spectral matrix which would be measured by a separable sensor array. In Part II, we establish connections between the KAT, beamforming and 2-D convolutions, and show how these results can be used to accelerate classical and state of the art array imaging algorithms. We also propose using the KAT to accelerate general purpose regularized least-squares solvers. Using this approach, we avoid ill-conditioned deconvolution steps and obtain more accurate reconstructions than previously possible, while maintaining low computational costs. We also show how the KAT performs when imaging near-field source distributions, and illustrate the trade-off between accuracy and computational complexity. Finally, we show that separable designs can deliver accuracy competitive with multi-arm logarithmic spiral geometries, while having the computational advantages of the KAT.
Resumo:
This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Resumo:
A systolic array to implement lattice-reduction-aided lineardetection is proposed for a MIMO receiver. The lattice reductionalgorithm and the ensuing linear detections are operated in the same array, which can be hardware-efficient. All-swap lattice reduction algorithm (ASLR) is considered for the systolic design.ASLR is a variant of the LLL algorithm, which processes all lattice basis vectors within one iteration. Lattice-reduction-aided linear detection based on ASLR and LLL algorithms have very similarbit-error-rate performance, while ASLR is more time efficient inthe systolic array, especially for systems with a large number ofantennas.
Resumo:
In a leading service economy like India, services lie at the very center of economic activity. Competitive organizations now look not only at the skills and knowledge, but also at the behavior required by an employee to be successful on the job. Emotionally competent employees can effectively deal with occupational stress and maintain psychological well-being. This study explores the scope of the first two formants and jitter to assess seven common emotional states present in the natural speech in English. The k-means method was used to classify emotional speech as neutral, happy, surprised, angry, disgusted and sad. The accuracy of classification obtained using raw jitter was more than 65 percent for happy and sad but less accurate for the others. The overall classification accuracy was 72% in the case of preprocessed jitter. The experimental study was done on 1664 English utterances of 6 females. This is a simple, interesting and more proactive method for employees from varied backgrounds to become aware of their own communication styles as well as that of their colleagues' and customers and is therefore socially beneficial. It is a cheap method also as it requires only a computer. Since knowledge of sophisticated software or signal processing is not necessary, it is easy to analyze
Resumo:
The paper is concerned with the uniformization of a system of affine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). We present and implement an automatic system to achieve uniformization of systems of affine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.
Resumo:
The aim of this thesis is to investigate computerized voice assessment methods to classify between the normal and Dysarthric speech signals. In this proposed system, computerized assessment methods equipped with signal processing and artificial intelligence techniques have been introduced. The sentences used for the measurement of inter-stress intervals (ISI) were read by each subject. These sentences were computed for comparisons between normal and impaired voice. Band pass filter has been used for the preprocessing of speech samples. Speech segmentation is performed using signal energy and spectral centroid to separate voiced and unvoiced areas in speech signal. Acoustic features are extracted from the LPC model and speech segments from each audio signal to find the anomalies. The speech features which have been assessed for classification are Energy Entropy, Zero crossing rate (ZCR), Spectral-Centroid, Mean Fundamental-Frequency (Meanf0), Jitter (RAP), Jitter (PPQ), and Shimmer (APQ). Naïve Bayes (NB) has been used for speech classification. For speech test-1 and test-2, 72% and 80% accuracies of classification between healthy and impaired speech samples have been achieved respectively using the NB. For speech test-3, 64% correct classification is achieved using the NB. The results direct the possibility of speech impairment classification in PD patients based on the clinical rating scale.
Resumo:
This work aims to investigate the efficiency of digital signal processing tools of acoustic emission signals in order to detect thermal damages in grinding process. To accomplish such a goal, an experimental work was carried out for 15 runs in a surface grinding machine operating with an aluminum oxide grinding wheel and ABNT 1045. The acoustic emission signals were acquired from a fixed sensor placed on the workpiece holder. A high sampling rate data acquisition system at 2.5 MHz was used to collect the raw acoustic emission instead of root mean square value usually employed. Many statistics have shown effective to detect burn, such as the root mean square (RMS), correlation of the AE, constant false alarm (CFAR), ratio of power (ROP) and mean-value deviance (MVD). However, the CFAR, ROP, Kurtosis and correlation of the AE have been presented more sensitive than the RMS.
Resumo:
Digital radiography in the inspection of welded pipes to be installed under deep water offshore gas and oil pipelines, like a presalt in Brazil, in the paper has been investigated. The aim is to use digital radiography for nondestructive testing of welds as it is already in use in the medical, aerospace, security, automotive, and petrochemical sectors. Among the current options, the DDA (Digital Detector Array) is considered as one of the best solutions to replace industrial films, as well as to increase the sensitivity to reduce the inspection cycle time. This paper shows the results of this new technique, comparing it to radiography with industrial films systems. In this paper, 20 test specimens of longitudinal welded pipe joints, specially prepared with artificial defects like cracks, lack of fusion, lack of penetration, and porosities and slag inclusions with varying dimensions and in 06 different base metal wall thicknesses, were tested and a comparison of the techniques was made. These experiments verified the purposed rules for parameter definitions and selections to control the required digital radiographic image quality as described in the draft international standard ISO/DIS 10893-7. This draft is first standard establishing the parameters for digital radiography on weld seam of welded steel pipes for pressure purposes to be used on gas and oil pipelines.
Resumo:
Purpose: To analyze the components of the acoustic signal of swallowing using a specific software. Methods: Fourteen healthy subjects ranging in age from 20 to 50 years (mean age 31±10 years), were evaluated. Data collection consisted on the simultaneous capture of the swallowing audio with a microphone and of the swallowing videofluoroscopic image. The bursts of the swallowing acoustic signal were identified and their duration and the interval between them were later analyzed using a specific software, which allowed the simultaneous analyses between the acoustic wave and the videofluoroscopic image. Results: Three burst components were identified in most of the swallows evaluated. The first burst presented mean time of 87.3 milliseconds (ms) for water and 78.2 for the substance. The second burst presented mean time of 112.9 ms for water and 85.5 for the pasty substance. The mean interval between first and second burst was 82.1 ms for water and 95.3 ms for the pasty consistency, and between second and third burst was 339.8 ms for water and 322.0 ms for the pasty consistency. Conclusion: The software allowed the visualization of three bursts during the swallowing of healthy individuals, and showed that the swallowing signal in normal subjects is highly variable.
Resumo:
This study investigated whether there are differences in the Speech-Evoked Auditory Brainstem Response among children with Typical Development (TD), (Central) Auditory Processing Disorder (C) APD, and Language Impairment (LI). The speech-evoked Auditory Brainstem Response was tested in 57 children (ages 6-12). The children were placed into three groups: TD (n = 18), (C)APD (n = 18) and LI (n = 21). Speech-evoked ABR were elicited using the five-formant syllable/da/. Three dimensions were defined for analysis, including timing, harmonics, and pitch. A comparative analysis of the responses between the typical development children and children with (C)APD and LI revealed abnormal encoding of the speech acoustic features that are characteristics of speech perception in children with (C)APD and LI, although the two groups differed in their abnormalities. While the children with (C)APD might had a greater difficulty distinguishing stimuli based on timing cues, the children with LI had the additional difficulty of distinguishing speech harmonics, which are important to the identification of speech sounds. These data suggested that an inefficient representation of crucial components of speech sounds may contribute to the difficulties with language processing found in children with LI. Furthermore, these findings may indicate that the neural processes mediated by the auditory brainstem differ among children with auditory processing and speech-language disorders. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This work is part of an on-going collaborative project between the medical and signal processing communities to promote new research efforts on automatic OSA (Obstructive Apnea Syndrome) diagnosis. In this paper, we explore the differences noted in phonetic classes (interphoneme) across groups (control/apnoea) and analyze their utility for OSA detection
Resumo:
This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.
Resumo:
This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.