14 resultados para Voice messages
em Universidad Politécnica de Madrid
Resumo:
Voice biometry is classically based on the parameterization and patterning of speech features mainly. The present approach is based on the characterization of phonation features instead (glottal features). The intention is to reduce intra-speaker variability due to the `text'. Through the study of larynx biomechanics it may be seen that the glottal correlates constitute a family of 2-nd order gaussian wavelets. The methodology relies in the extraction of glottal correlates (the glottal source) which are parameterized using wavelet techniques. Classification and pattern matching was carried out using Gaussian Mixture Models. Data of speakers from a balanced database and NIST SRE HASR2 were used in verification experiments. Preliminary results are given and discussed.
Resumo:
The dramatic impact of neurological degenerative pathologies in life quality is a growing concern. It is well known that many neurological diseases leave a fingerprint in voice and speech production. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are costly or difficult to extend to primary attention medical services. Through the present paper it will be shown how some neurological diseases can be traced at the level of phonation. The detection procedure would be based on a simple voice test. The availability of advanced tools and methodologies to monitor the organic pathology of voice would facilitate the implantation of these tests. The paper hypothesizes that some of the underlying mechanisms affecting the production of voice produce measurable correlates in vocal fold biomechanics. A general description of the methodological foundations for the voice analysis system which can estimate correlates to the neurological disease is shown. Some study cases will be presented to illustrate the possibilities of the methodology to monitor neurological diseases by voice
Resumo:
The employment of nonlinear analysis techniques for automatic voice pathology detection systems has gained popularity due to the ability of such techniques for dealing with the underlying nonlinear phenomena. On this respect, characterization using nonlinear analysis typically employs the classical Correlation Dimension and the largest Lyapunov Exponent, as well as some regularity quantifiers computing the system predictability. Mostly, regularity features highly depend on a correct choosing of some parameters. One of those, the delay time �, is usually fixed to be 1. Nonetheless, it has been stated that a unity � can not avoid linear correlation of the time series and hence, may not correctly capture system nonlinearities. Therefore, present work studies the influence of the � parameter on the estimation of regularity features. Three � estimations are considered: the baseline value 1; a � based on the Average Automutual Information criterion; and � chosen from the embedding window. Testing results obtained for pathological voice suggest that an improved accuracy might be obtained by using a � value different from 1, as it accounts for the underlying nonlinearities of the voice signal.
Resumo:
Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Resumo:
This paper describes a knowledge model for a configuration problem in the do-main of traffic control. The goal of this model is to help traffic engineers in the dynamic selection of a set of messages to be presented to drivers on variable message signals. This selection is done in a real-time context using data recorded by traffic detectors on motorways. The system follows an advanced knowledge-based solution that implements two abstract problem solving methods according to a model-based approach recently proposed in the knowledge engineering field. Finally, the paper presents a discussion about the advantages and drawbacks found for this problem as a consequence of the applied knowledge modeling ap-proach.
Resumo:
La aparición de los smartphones, trajo consigo el desarrollo de aplicaciones móviles de mensajería instantánea. Estas aplicaciones aprovechan la infraestructura de las redes de datos para enviar los mensajes de unos dispositivos a otros, lo que supone la posibilidad de enviar mensajes ilimitados a bajo coste. Hoy en día lo inusual es ver a alguna persona que haga uso de los antiguos mensajes de texto o sms (Short Message Service), que además llevan el coste de comunicación definido por las distintas operadoras. Tanto ha sido su auge que se ha convertido en uno de los principales medios de comunicación tanto en el ámbito personal como empresarial. Desafortunadamente, cada vez son más los conductores que hacen uso de las aplicaciones de mensajería para enviar y recibir mensajes mientras conducen, a pesar de que su uso está totalmente prohibido y penado por la ley. Por este motivo, en este proyecto se propone la modificación de la aplicación de mensajería Telegram, que permite controlar el env´ıo y recepción de mensajes únicamente utilizando la voz, evitando así cualquier tipo de distracci´on ocasionada por la interacción táctil con el dispositivo. Esta idea propuesta en el proyecto puede ayudar a reducir el número de accidentes ocasionados por este tipo de distracciones al volante, así como las posibles multas e incidentes que pueda ocasionar el uso del móvil durante la conducción. ---ABSTRACT---The emergence of smartphones, fostered the development of mobile instant messaging applications. These applications take advantage of the infrastructure of data networks to send messages between devices with almost no additional cost attached to it. Today you will hardly be able to find a person who makes use of the old text messages or sms (Short Message Service), and therefore bears the cost of communication defined by the respective operators. This boom has been such that it has become one of the main communication methods or channels in both the personal and work environments. Unfortunately, more and more drivers use messaging applications to send and receive messages while they are driving, even though its use is strictly prohibited and punished by law. Therefore our objective is to modify the existing messaging application Telegram allowing interaction with the mobile device by only using the user’s voice to send and receive messages, avoiding any distractions that any tactile interaction with the device could cause. The aim is to significantly try to reduce accidents caused while driving, as well as to avoid any related potential fines and incidents that may result from use of mobile phone while driving.
Resumo:
Teaching the adequate use of the singing voice conveys a lot of knowledge in musical performance as well as in objective estimation techniques involving the use of air, muscles, room and body acoustics, and the tuning of a fine instrument as the human voice. Although subjective evaluation and training is a very delicate task to be carried out only by expert singers, biomedical engineering may help contributing with well-funded methodologies developed for the study of voice pathology. The present work is a preliminary study of exploratory character describing the performance of a student singer in a regular classroom under the point of view of vocal fold biomechanics. Estimates of biomechanical parameters obtained from singing voice are given and their potential use is discussed.
Resumo:
A case study of vocal fold paralysis treatment is described with the help of the voice quality analysis application BioMet®Phon. The case corresponds to a description of a 40 - year old female patient who was diagnosed of vocal fold paralysis following a cardio - pulmonar intervention which required intubation for 8 days and posterior tracheotomy for 15 days. The patient presented breathy and asthenic phon ation, and dysphagia. Six main examinations were conducted during a full year period that the treatment lasted consisting in periodic reviews including video - endostroboscopy, voice analysis and breathing function monitoring. The phoniatrician treatment inc luded 20 sessions of vocal rehabilitation, followed by an intracordal infiltration with Radiesse 8 months after the rehabilitation treatment started followed by 6 sessions of rehabilitation more. The videondoscopy and the voicing quality analysis refer a s ubstantial improvement in the vocal function with recovery in all the measures estimated (jitter, shimmer, mucosal wave contents, glottal closure, harmonic contents and biomechanical function analysis). The paper refers the procedure followed and the results obtained by comparing the longitudinal progression of the treatment, illustrating the utility of voice quality analysis tools in speech therapy.
Resumo:
Teaching the adequate use of the singing voice conveys a lot of knowledge in musical performance as well as in objective estimation techniques involving the use of air, muscles, room and body acoustics, and the tuning of a fine instrument as the human voice. Although subjective evaluation and training is a very delicate task to be carried out only by expert singers, biomedical engineering may help contributing with well - funded methodologies developed for the study of voice pathology. The present study is a preliminary study of exploratory character describing the performance of a student singer in a regular classroom under the point of view of vocal fold biomechanics. Estimate s of biomechanical parameters obtained from singing voice are given and their use i n the classroom is discussed.
Resumo:
Voice therapies of muscle tension dysphonia in Germany need to be increased in effectiveness by applying intensive, manualized procedures and standardized assessment protocols. The same holds true for therapies of disturbed singer's voices. According to a Cochrane review on the effectiveness of therapies of functional dysphonia neither direct nor indirect voice therapies alone but combinations of both elements are effective (Ruotsalainen et al., 2007).
Resumo:
The broadcast service spreads a message m among all processes of the system, such that each process eventually delivers m. A basic broadcast service does not impose any delivery guarantee in a system with failures. Fault-tolerant broadcast is a fundamental problem in distributed systems that adds certainty in the delivery of messages when crashes can happen in the system. Traditionally, the fault-tolerant broadcast service has been studied in classical distributed systems when each process has a unique identity (eponymous system). In this paper we study the fault-tolerant broadcast service in anonymous systems, that is, in systems where all processes are indistinguishable.
Resumo:
El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos.
Resumo:
Acoustic parameters are frequently used to assess the presence of pathologies in human voice. Many of them have demonstrated to be useful but in some cases its results could be optimized by selecting appropriate working margins. In this study two indices, CIL and RALA, obtained from Modulation Spectra are described and tuned using different frame lengths and frequency ranges to maximize AUC in normal to pathological voice detection. After the tuning process, AUC reaches 0.96 and 0.95 values for CIL and RALA respectively representing an improvement of 16 % and 12 % at each case respect to the typical tuning based only on frame length selection.
Resumo:
Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.