807 resultados para Freedom of speech.
Resumo:
This paper describes a method of automated segmentation of speech assuming the signal is continuously time varying rather than the traditional short time stationary model. It has been shown that this representation gives comparable if not marginally better results than the other techniques for automated segmentation. A formulation of the 'Bach' (music semitonal) frequency scale filter-bank is proposed. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks considering this model. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. 'Bach' filters are seen to marginally outperform the other filters.
Resumo:
This correspondence describes a method for automated segmentation of speech. The method proposed in this paper uses a specially designed filter-bank called Bach filter-bank which makes use of 'music' related perception criteria. The speech signal is treated as continuously time varying signal as against a short time stationary model. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. The Bach filters are seen to marginally outperform the other filters.
Resumo:
We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.
Resumo:
We consider the one-way relay aided MIMO X fading Channel where there are two transmitters and two receivers along with a relay with M antennas at every node. Every transmitter wants to transmit messages to every other receiver. The relay broadcasts to the receivers along a noisy link which is independent of the transmitters channel. In literature, this is referred to as a relay with orthogonal components. We derive an upper bound on the degrees of freedom of such a network. Next we show that the upper bound is tight by proposing an achievability scheme based on signal space alignment for the same for M = 2 antennas at every node.
Resumo:
This work derives inner and outer bounds on the generalized degrees of freedom (GDOF) of the K-user symmetric MIMO Gaussian interference channel. For the inner bound, an achievable GDOF is derived by employing a combination of treating interference as noise, zero-forcing at the receivers, interference alignment (IA), and extending the Han-Kobayashi (HK) scheme to K users, depending on the number of antennas and the INR/SNR level. An outer bound on the GDOF is derived, using a combination of the notion of cooperation and providing side information to the receivers. Several interesting conclusions are drawn from the bounds. For example, in terms of the achievable GDOF in the weak interference regime, when the number of transmit antennas (M) is equal to the number of receive antennas (N), treating interference as noise performs the same as the HK scheme and is GDOF optimal. For K >; N/M+1, a combination of the HK and IA schemes performs the best among the schemes considered. However, for N/M <; K ≤ N/M+1, the HK scheme is found to be GDOF optimal.
Resumo:
We consider the speech production mechanism and the asso- ciated linear source-filter model. For voiced speech sounds in particular, the source/glottal excitation is modeled as a stream of impulses and the filter as a cascade of second-order resonators. We show that the process of sampling speech signals can be modeled as filtering a stream of Dirac impulses (a model for the excitation) with a kernel function (the vocal tract response),and then sampling uniformly. We show that the problem of esti- mating the excitation is equivalent to the problem of recovering a stream of Dirac impulses from samples of a filtered version. We present associated algorithms based on the annihilating filter and also make a comparison with the classical linear prediction technique, which is well known in speech analysis. Results on synthesized as well as natural speech data are presented.
Resumo:
A joint analysis-synthesis framework is developed for the compressive sensing (CS) recovery of speech signals. The signal is assumed to be sparse in the residual domain with the linear prediction filter used as the sparse transformation. Importantly this transform is not known apriori, since estimating the predictor filter requires the knowledge of the signal. Two prediction filters, one comb filter for pitch and another all pole formant filter are needed to induce maximum sparsity. An iterative method is proposed for the estimation of both the prediction filters and the signal itself. Formant prediction filter is used as the synthesis transform, while the pitch filter is used to model the periodicity in the residual excitation signal, in the analysis mode. Significant improvement in the LLR measure is seen over the previously reported formant filter estimation.
Resumo:
This paper presents a study of the nature of the degrees-of-freedom of spatial manipulators based on the concept of partition of degrees-of-freedom. In particular, the partitioning of degrees-of-freedom is studied in five lower-mobility spatial parallel manipulators possessing different combinations of degrees-of-freedom. An extension of the existing theory is introduced so as to analyse the nature of the gained degree(s)-of-freedom at a gain-type singularity. The gain of one- and two-degrees-of-freedom is analysed in several well-studied, as well as newly developed manipulators. The formulations also present a basis for the analysis of the velocity kinematics of manipulators of any architecture. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America
Resumo:
We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.