23 resultados para Voice


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The instants at which significant excitation of vocal tract take place during voicing are referred to as epochs. Epochs and strengths of excitation pulses at epochs are useful in characterizing voice source. Epoch filtering technique proposed by the authors determine epochs from speech waveform. In this paper we propose zero-phase inverse filtering to obtain strengths of excitation pulses at epochs. Zero-phase inverse filter compensates the gross spectral envelope of short-time spectrum of speech without affecting phase characteristics. Linear prediction analysis is used to realize the zero-phase inverse filter. Source characteristics that can be derived from speech using this technique are illustrated with examples.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we analyze the throughput and energy efficiency performance of user datagram protocol (UDP) using linear, binary exponential, and geometric backoff algorithms at the link layer (LL) on point-to-point wireless fading links. Using a first-order Markov chain representation of the packet success/failure process on fading channels, we derive analytical expressions for throughput and energy efficiency of UDP/LL with and without LL backoff. The analytical results are verified through simulations. We also evaluate the mean delay and delay variation of voice packets and energy efficiency performance over a wireless link that uses UDP for transport of voice packets and the proposed backoff algorithms at the LL. We show that the proposed LL backoff algorithms achieve energy efficiency improvement of the order of 2-3 dB compared to LL with no backoff, without compromising much on the throughput and delay performance at the UDP layer. Such energy savings through protocol means will improve the battery life in wireless mobile terminals.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper optical code-division multiple-access (O-CDMA) packet network is considered, which offers inherent security in the access networks. The application of O-CDMA to multimedia transmission (voice, data, and video) is investigated. The simultaneous transmission of various services is achieved by assigning to each user unique multiple code signatures. Thus, by applying a parallel mapping technique, we achieve multi-rate services. A random access protocol is proposed, here, where all distinct codes are used, for packet transmission. The codes, Optical Orthogonal Code (OOC), or 1D codes and Wavelength/Time Single-Pulse-per-Row (W/T SPR), or 2D codes, are analyzed. These 1D and 2D codes with varied weight are used to differentiate the Quality of Service (QoS). The theoretical bit error probability corresponding to the quality of each service is established using 1D and 2D codes in the receiver noiseless case and compared. The results show that, using 2D codes QoS in multimedia transmission is better than using 1D codes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting `transients' in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Femtocells are a new concept which improves the coverage and capacity of a cellular system. We consider the problem of channel allocation and power control to different users within a Femtocell. Knowing the channels available, the channel states and the rate requirements of different users the Femtocell base station (FBS), allocates the channels to different users to satisfy their requirements. Also, the Femtocell should use minimal power so as to cause least interference to its neighboring Femtocells and outside users. We develop efficient, low complexity algorithms which can be used online by the Femtocell. The users may want to transmit data or voice. We compare our algorithms with the optimal solutions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The goal in the whisper activity detection (WAD) is to find the whispered speech segments in a given noisy recording of whispered speech. Since whispering lacks the periodic glottal excitation, it resembles an unvoiced speech. This noise-like nature of the whispered speech makes WAD a more challenging task compared to a typical voice activity detection (VAD) problem. In this paper, we propose a feature based on the long term variation of the logarithm of the short-time sub-band signal energy for WAD. We also propose an automatic sub-band selection algorithm to maximally discriminate noisy whisper from noise. Experiments with eight noise types in four different signal-to-noise ratio (SNR) conditions show that, for most of the noises, the performance of the proposed WAD scheme is significantly better than that of the existing VAD schemes and whisper detection schemes when used for WAD.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech polarity detection is a crucial first step in many speech processing techniques. In this paper, an algorithm is proposed that improvises the existing technique using the skewness of the voice source (VS) signal. Here, the integrated linear prediction residual (ILPR) is used as the VS estimate, which is obtained using linear prediction on long-term frames of the low-pass filtered speech signal. This excludes the unvoiced regions from analysis and also reduces the computation. Further, a modified skewness measure is proposed for decision, which also considers the magnitude of the skewness of the ILPR along with its sign. With the detection error rate (DER) as the performance metric, the algorithm is tested on 8 large databases and its performance (DER=0.20%) is found to be comparable to that of the best technique (DER=0.06%) on both clean and noisy speech. Further, the proposed method is found to be ten times faster than the best technique.