992 resultados para Inventory-style speech enhancement


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a new method for the enhancement of speech. The method is designed for scenarios in which targeted speaker enrollment as well as system training within the typical noise environment are feasible. The proposed procedure is fundamentally different from most conventional and state-of-the-art denoising approaches. Instead of filtering a distorted signal we are resynthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. A successful implementation of the proposed method is presented. Experiments were performed in a scenario with roughly one hour of clean speech training data. Our results show that the proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments. Potential applications for the proposed method include jet cockpit communication systems and offline methods for the restoration of audio recordings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech signals degraded by additive noise can affects different applications in telecommunication. The noise may degrades the intelligibility of the speech signals and its waveforms as well. In some applications such as speech coding, both intelligibility and waveform quality are important but only intelligibility has been focused lastly. So, modern speech quality measurement techniques such as PESQ (Perceptual Evaluation of Speech Quality) have been used and classical distortion measurement techniques such as Cepstral Distance are becoming unused. In this paper it is shown that some classical distortion measures are still important in applications where speech corrupted by additive noise has to be evaluated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a speech enhancement system (SES) based on a TMS320C31 digital signal processor (DSP) for real-time application. The SES algorithm is based on a modified spectral subtraction method and a new speech activity detector (SAD) is used. The system presents a medium computational load and a sampling rate up to 18 kHz can be used. The goal is load and a sampling rate up to 18 kHz can be used. The goal is to use it to reduce noise in an analog telephone line.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The recent developments on Hidden Markov Models (HMM) based speech synthesis showed that this is a promising technology fully capable of competing with other established techniques. However some issues still lack a solution. Several authors report an over-smoothing phenomenon on both time and frequencies which decreases naturalness and sometimes intelligibility. In this work we present a new vowel intelligibility enhancement algorithm that uses a discrete Kalman filter (DKF) for tracking frame based parameters. The inter-frame correlations are modelled by an autoregressive structure which provides an underlying time frame dependency and can improve time-frequency resolution. The system’s performance has been evaluated using objective and subjective tests and the proposed methodology has led to improved results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The purpose of this study was to evaluate the factor structure and the reliability of the French versions of the Identity Style Inventory (ISI-3) and the Utrecht-Management of Identity Commitments Scale (U-MICS) in a sample of college students (N = 457, 18 to 25 years old). Confirmatory factor analyses confirmed the hypothesized three-factor solution of the ISI-3 identity styles (i.e. informational, normative, and diffuse-avoidant styles), the one-factor solution of the ISI-3 identity commitment, and the three-factor structure of the U-MICS (i.e. commitment, in-depth exploration, and reconsideration of commitment). Additionally, theoretically consistent and meaningful associations among the ISI-3, U-MICS, and Ego Identity Process Questionnaire (EIPQ) confirmed convergent validity. Overall, the results of the present study indicate that the French versions of the ISI-3 and UMICS are useful instruments for assessing identity styles and processes, and provide additional support to the cross-cultural validity of these tools.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Kirton's Adaption-Innovation Inventory (KAI) is a widely-used measure of "cognitive style." Surprisingly, there is very little research investigating the discriminant and incremental validity of the KAI. In two studies (n = 213), we examined whether (a) we could predict KAI scores with the "big five" personality dimensions and (b) the KAI scores predicted leadership behavior when controlling for personality and ability. Correcting for measurement error, we found that KAI scores were predicted mostly by personality and gender (multiple R = 0.82). KAI scores did not predict variance in leadership while controlling for established predictors. Our findings add to recent literature that questions the uniqueness and utility of cognitive style or similar "style" constructs; researchers using such measures must control for the big five factors and correct for measurement error to avoid confounded interpretations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In a leading service economy like India, services lie at the very center of economic activity. Competitive organizations now look not only at the skills and knowledge, but also at the behavior required by an employee to be successful on the job. Emotionally competent employees can effectively deal with occupational stress and maintain psychological well-being. This study explores the scope of the first two formants and jitter to assess seven common emotional states present in the natural speech in English. The k-means method was used to classify emotional speech as neutral, happy, surprised, angry, disgusted and sad. The accuracy of classification obtained using raw jitter was more than 65 percent for happy and sad but less accurate for the others. The overall classification accuracy was 72% in the case of preprocessed jitter. The experimental study was done on 1664 English utterances of 6 females. This is a simple, interesting and more proactive method for employees from varied backgrounds to become aware of their own communication styles as well as that of their colleagues' and customers and is therefore socially beneficial. It is a cheap method also as it requires only a computer. Since knowledge of sophisticated software or signal processing is not necessary, it is easy to analyze

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Facebook is a medium of social interaction producing its own style. I study how users from Malaga create this style through phonic features of the local variety and how they reflect on the use of these features. I then analyse the use of non-standard features by users from Malaga and compare them to an oral corpus. Results demonstrate that social factors work differently in real and virtual speech. Facebook communication is seen as a style serving to create social meaning and to express linguistic identity.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Over the last few years Facebook has become a widespread and continuously expanding medium of communication. Being a new medium of social interaction, Facebook produces its own communication style. My focus of analysis is how Facebook users from the city of Malaga create this style by means of phonic features typical of the Andalusian variety and how the users reflect on the use of these phonic features. This project is based on a theoretical framework which combines variationist sociolinguistics with CMC to study the emergence of a style peculiar of the online social networks. In a corpus of Facebook users from three zones of Malaga, I have analysed the use of non-standard phonic features and then compared them with the same features in a reference corpus collected on three beaches of Malaga. From this comparison it can be deduced that the analysed social and linguistic factors work differently in real and virtual speech. Due to these different uses we can consider the peculiar electronic communication of Facebook as a style constrained by the electronic medium. It is a style which serves the users to create social meaning and to express their linguistic identities.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended.