23 resultados para Speech synthesis Data processing

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When designing human-machine interfaces it is important to consider not only the bare bones functionality but also the ease of use and accessibility it provides. When talking about voice-based inter- faces, it has been proven that imbuing expressiveness into the synthetic voices increases signi?cantly its perceived naturalness, which in the end is very helpful when building user friendly interfaces. This paper proposes an adaptation based expressiveness transplantation system capable of copying the emotions of a source speaker into any desired target speaker with just a few minutes of read speech and without requiring the record- ing of additional expressive data. This system was evaluated through a perceptual test for 3 speakers showing up to an average of 52% emotion recognition rates relative to the natural voice recognition rates, while at the same time keeping good scores in similarity and naturality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a series of attempts to research and document relevant sloshing type phenomena, a series of experiments have been conducted. The aim of this paper is to describe the setup and data processing of such experiments. A sloshing tank is subjected to angular motion. As a result pressure registers are obtained at several locations, together with the motion data, torque and a collection of image and video information. The experimental rig and the data acquisition systems are described. Useful information for experimental sloshing research practitioners is provided. This information is related to the liquids used in the experiments, the dying techniques, tank building processes, synchronization of acquisition systems, etc. A new procedure for reconstructing experimental data, that takes into account experimental uncertainties, is presented. This procedure is based on a least squares spline approximation of the data. Based on a deterministic approach to the first sloshing wave impact event in a sloshing experiment, an uncertainty analysis procedure of the associated first pressure peak value is described.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the advancement of both, information technology in general, and databases in particular; data storage devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend to store large volumes of data holding great potential information. Decision Support Systems, DSS try to use the stored data to obtain valuable information for organizations. In this paper, we use both data models and use cases to represent the functionality of data processing in DSS following Software Engineering processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing modeling. We have used, as a starting point, a data model adapted to the semantics involved in multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us with all the possible ways to automatically cross check multidimensional model data. Using the aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We highlight the reusability and automation benefits that this can be achieved, and we think this study can serve as a guide in the development of DSS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PAMELA (Phased Array Monitoring for Enhanced Life Assessment) SHMTM System is an integrated embedded ultrasonic guided waves based system consisting of several electronic devices and one system manager controller. The data collected by all PAMELA devices in the system must be transmitted to the controller, who will be responsible for carrying out the advanced signal processing to obtain SHM maps. PAMELA devices consist of hardware based on a Virtex 5 FPGA with a PowerPC 440 running an embedded Linux distribution. Therefore, PAMELA devices, in addition to the capability of performing tests and transmitting the collected data to the controller, have the capability of perform local data processing or pre-processing (reduction, normalization, pattern recognition, feature extraction, etc.). Local data processing decreases the data traffic over the network and allows CPU load of the external computer to be reduced. Even it is possible that PAMELA devices are running autonomously performing scheduled tests, and only communicates with the controller in case of detection of structural damages or when programmed. Each PAMELA device integrates a software management application (SMA) that allows to the developer downloading his own algorithm code and adding the new data processing algorithm to the device. The development of the SMA is done in a virtual machine with an Ubuntu Linux distribution including all necessary software tools to perform the entire cycle of development. Eclipse IDE (Integrated Development Environment) is used to develop the SMA project and to write the code of each data processing algorithm. This paper presents the developed software architecture and describes the necessary steps to add new data processing algorithms to SMA in order to increase the processing capabilities of PAMELA devices.An example of basic damage index estimation using delay and sum algorithm is provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, devices that monitor the health of structures consume a lot of power and need a lot of time to acquire, process, and send the information about the structure to the main processing unit. To decrease this time, fast electronic devices are starting to be used to accelerate this processing. In this paper some hardware algorithms implemented in an electronic logic programming device are described. The goal of this implementation is accelerate the process and diminish the information that has to be send. By reaching this goal, the time the processor needs for treating all the information is reduced and so the power consumption is reduced too.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students’ satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrase-based translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last year's RT09 evaluation data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports on an innovative approach that aims to reduce information management costs in data-intensive and cognitively-complex biomedical environments. Recognizing the importance of prominent high-performance computing paradigms and large data processing technologies as well as collaboration support systems to remedy data-intensive issues, it adopts a hybrid approach by building on the synergy of these technologies. The proposed approach provides innovative Web-based workbenches that integrate and orchestrate a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities.