14 resultados para Performing voice
em Universidad Politécnica de Madrid
Resumo:
Voice biometry is classically based on the parameterization and patterning of speech features mainly. The present approach is based on the characterization of phonation features instead (glottal features). The intention is to reduce intra-speaker variability due to the `text'. Through the study of larynx biomechanics it may be seen that the glottal correlates constitute a family of 2-nd order gaussian wavelets. The methodology relies in the extraction of glottal correlates (the glottal source) which are parameterized using wavelet techniques. Classification and pattern matching was carried out using Gaussian Mixture Models. Data of speakers from a balanced database and NIST SRE HASR2 were used in verification experiments. Preliminary results are given and discussed.
Resumo:
The dramatic impact of neurological degenerative pathologies in life quality is a growing concern. It is well known that many neurological diseases leave a fingerprint in voice and speech production. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are costly or difficult to extend to primary attention medical services. Through the present paper it will be shown how some neurological diseases can be traced at the level of phonation. The detection procedure would be based on a simple voice test. The availability of advanced tools and methodologies to monitor the organic pathology of voice would facilitate the implantation of these tests. The paper hypothesizes that some of the underlying mechanisms affecting the production of voice produce measurable correlates in vocal fold biomechanics. A general description of the methodological foundations for the voice analysis system which can estimate correlates to the neurological disease is shown. Some study cases will be presented to illustrate the possibilities of the methodology to monitor neurological diseases by voice
Resumo:
The employment of nonlinear analysis techniques for automatic voice pathology detection systems has gained popularity due to the ability of such techniques for dealing with the underlying nonlinear phenomena. On this respect, characterization using nonlinear analysis typically employs the classical Correlation Dimension and the largest Lyapunov Exponent, as well as some regularity quantifiers computing the system predictability. Mostly, regularity features highly depend on a correct choosing of some parameters. One of those, the delay time �, is usually fixed to be 1. Nonetheless, it has been stated that a unity � can not avoid linear correlation of the time series and hence, may not correctly capture system nonlinearities. Therefore, present work studies the influence of the � parameter on the estimation of regularity features. Three � estimations are considered: the baseline value 1; a � based on the Average Automutual Information criterion; and � chosen from the embedding window. Testing results obtained for pathological voice suggest that an improved accuracy might be obtained by using a � value different from 1, as it accounts for the underlying nonlinearities of the voice signal.
Resumo:
Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Resumo:
En el presente proyecto se ha realizado un estudio sobre las condiciones acústicas de la iglesia Santa María del Castillo, ubicada en la localidad de Campo Real, al sureste de Madrid. Se trata de una iglesia construida entre los siglos XIV y XVII en diferentes fases, rica en características arquitectónicas correspondientes a varios estilos, tales como el gótico, el renacentista y el barroco. Reconocida en 1981 por sus valores arquitectónicos como Monumento Histórico–Artístico. A partir de unas completas mediciones del interior de la iglesia, se ha realizado un modelo tridimensional del mismo como base para la simulación mediante el software de simulación acústica EASE versión 4.3. Para conseguir que este modelo se asemeje a la realidad, se han realizado medidas del ruido de fondo en el interior de la iglesia en diferentes condiciones ambientales. Además se han creado mediante el software los coeficientes de absorción correspondientes a cada material presente en el interior de la iglesia y se han tenido en cuenta las características de los altavoces utilizados en la megafonía del recinto. El modelo en 3D obtenido caracteriza completamente las condiciones acústicas de la iglesia Santa María del Castillo, y nos sirve para valorar cómo es el sonido en el interior de la misma. Para ello obtenemos valores de diferentes parámetros acústico realizando simulaciones. Parámetros como el tiempo de reverberación y el nivel de presión sonora nos dan una idea general de cómo es el campo sonoro en el interior del recinto. Otros parámetros como el ALCons y el STI nos dan información sobre la inteligibilidad de la palabra en el recinto en el que se está realizando el estudio. Finalmente basándonos en los resultados obtenidos de la simulación se sacan conclusiones sobre las características acústicas de este recinto. La iglesia estudiada no es un recinto apropiado para la palabra y/o la música, además el predominio del campo reverberante sobre el campo directo es claro, esto es debido a las dimensiones del recinto y la poca absorción de los diferentes materiales empleados en su construcción, que son bastante reflexivos al sonido. ABSTRACT The present project undertakes the acoustic study of the church Santa María del Castillo. The church is the main temple of Campo Real, in the south-east of Madrid. It was built over different phases between the 14th and the 17th centuries and therefore, the presence of several architectural styles makes the church of Campo Real an interesting aim for this study. The building was recognised as Historic-Artistic Monument for its architectural value in 1981. Complete measurements from inside of the church were taken to build a computational 3D model which has been used to perform acoustic simulations of the church with the software EASE (Version 4.3). Noise measurements have been taken inside the church at different ambient conditions and they have been used to improve the reliability of the computational model. Furthermore, the model has been provided with software generated absorption coefficients and the characteristics of the actual loudspeakers have been taken into account. The 3D model created characterises all the acoustic conditions of the church Santa María del Castillo and allows the study of the sound properties inside the temple. Parameters such as reverberation time and sound pressure level were calculated performing simulations so the sound field inside the building can be described. Other parameters such as the Articulation Loss of Consonants (ALCons) and the Speech Transmission Index (STI) were studied to derive information about intelligibility inside the church. Finally, based on the results obtained by the simulation, I expose my conclusions about the acoustic characteristics of the building. The main conclusion derived from the present study is that the temple is not an appropriate place for voice or music listening due to the dimensions and the characteristics of the materials used in the construction since they are highly reflective to sound. The reverberant field predominates over the whole audience area in comparison with the direct field.
Resumo:
Análisis de la atenuación del oleaje por un carguero funcionando como dique flotante y aplicación a dos casos de protección portuaria y costera. The effectiveness of a bulk carrier working as a detached floating breakwater to protect a stretch of coast and form salients or tombolos is assessed in this paper. Experiments were conducted in the Madrid CEDEX facilities in a 30 m long, 3 m wide, 1/150 scale flume. The bulk carrier ship is 205 m long, 29 m wide and 18 m in height with a draught of 13 m, and has been subjected to irregular waves with significant heights from 2 m to 4 m and peak periods from 6 s to 12 s at a depth of 15 m, all prototype dimensions. Three probes were placed between the wave paddle and the ship to record incident and reflected waves and four probes were placed between the ship and the coastline to measure the transmitted waves. Transmission, reflection and dissipation coefficients (Ct, Cr, Cd) were calculated to determine wave attenuation. Results show good shelter in the lee of the ship with values of Ct under 0.5 for peak periods from 6 s to 11 s. In addition, forces on the mooring chains were measured showing maximum values of about 2000 tons at a 10 speak period. Finally, two analytical models were used to determine the shoreline’s response to the ship’s protection and to assess the possible forming of salients or tombolos. According to the results, salients - but not tombolos - are formed in all tests.
Resumo:
Teaching the adequate use of the singing voice conveys a lot of knowledge in musical performance as well as in objective estimation techniques involving the use of air, muscles, room and body acoustics, and the tuning of a fine instrument as the human voice. Although subjective evaluation and training is a very delicate task to be carried out only by expert singers, biomedical engineering may help contributing with well-funded methodologies developed for the study of voice pathology. The present work is a preliminary study of exploratory character describing the performance of a student singer in a regular classroom under the point of view of vocal fold biomechanics. Estimates of biomechanical parameters obtained from singing voice are given and their potential use is discussed.
Resumo:
A case study of vocal fold paralysis treatment is described with the help of the voice quality analysis application BioMet®Phon. The case corresponds to a description of a 40 - year old female patient who was diagnosed of vocal fold paralysis following a cardio - pulmonar intervention which required intubation for 8 days and posterior tracheotomy for 15 days. The patient presented breathy and asthenic phon ation, and dysphagia. Six main examinations were conducted during a full year period that the treatment lasted consisting in periodic reviews including video - endostroboscopy, voice analysis and breathing function monitoring. The phoniatrician treatment inc luded 20 sessions of vocal rehabilitation, followed by an intracordal infiltration with Radiesse 8 months after the rehabilitation treatment started followed by 6 sessions of rehabilitation more. The videondoscopy and the voicing quality analysis refer a s ubstantial improvement in the vocal function with recovery in all the measures estimated (jitter, shimmer, mucosal wave contents, glottal closure, harmonic contents and biomechanical function analysis). The paper refers the procedure followed and the results obtained by comparing the longitudinal progression of the treatment, illustrating the utility of voice quality analysis tools in speech therapy.
Resumo:
Teaching the adequate use of the singing voice conveys a lot of knowledge in musical performance as well as in objective estimation techniques involving the use of air, muscles, room and body acoustics, and the tuning of a fine instrument as the human voice. Although subjective evaluation and training is a very delicate task to be carried out only by expert singers, biomedical engineering may help contributing with well - funded methodologies developed for the study of voice pathology. The present study is a preliminary study of exploratory character describing the performance of a student singer in a regular classroom under the point of view of vocal fold biomechanics. Estimate s of biomechanical parameters obtained from singing voice are given and their use i n the classroom is discussed.
Resumo:
Voice therapies of muscle tension dysphonia in Germany need to be increased in effectiveness by applying intensive, manualized procedures and standardized assessment protocols. The same holds true for therapies of disturbed singer's voices. According to a Cochrane review on the effectiveness of therapies of functional dysphonia neither direct nor indirect voice therapies alone but combinations of both elements are effective (Ruotsalainen et al., 2007).
Resumo:
El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos.
Resumo:
Acoustic parameters are frequently used to assess the presence of pathologies in human voice. Many of them have demonstrated to be useful but in some cases its results could be optimized by selecting appropriate working margins. In this study two indices, CIL and RALA, obtained from Modulation Spectra are described and tuned using different frame lengths and frequency ranges to maximize AUC in normal to pathological voice detection. After the tuning process, AUC reaches 0.96 and 0.95 values for CIL and RALA respectively representing an improvement of 16 % and 12 % at each case respect to the typical tuning based only on frame length selection.
Resumo:
Este proyecto muestra una solución de red para una empresa que presta servicios de Contact Center desde distintas sedes distribuidas geográficamente, utilizando la tecnología de telefonía sobre IP. El objetivo de este proyecto es el de convertirse en una guía de diseño para el despliegue de soluciones de red utilizando los actuales equipos de comunicaciones desarrollados por el fabricante Cisco Systems, Inc., los equipos de seguridad desarrollados por el fabricante Fortinet y los sistemas de telefonía desarrollados por Avaya Inc. y Oracle Corporation, debido a su gran penetración en el mercado y a las aportaciones que cada uno ha realizado en el sector de Contact Center. Para poder proveer interconexión entre las sedes de un Contact Center se procede a la contratación de un acceso a la red MPLS perteneciente a un operador de telecomunicaciones, quien provee conectividad entre las sedes utilizando la tecnología VPN MPLS con dos accesos diversificados entre sí desde cada una de las sedes del Contact Center. El resultado de esta contratación es el aprovechamiento de las ventajas que un operador de telecomunicaciones puede ofrecer a sus clientes, en relación a calidad de servicio, disponibilidad y expansión geográfica. De la misma manera, se definen una serie de criterios o niveles de servicio que aseguran a un Contact Center una comunicación de calidad entre sus sedes, entendiéndose por comunicación de calidad aquella que sea capaz de transmitirse con unos valores mínimos de pérdida de paquetes así como retraso en la transmisión, y una velocidad acorde a la demanda de los servicios de voz y datos. Como parte de la solución, se diseña una conexión redundante a Internet que proporciona acceso a todas las sedes del Contact Center. La solución de conectividad local en cada una de las sedes de un Contact Center se ha diseñado de manera general acorde al volumen de puestos de usuarios y escalabilidad que pueda tener cada una de las sedes. De esta manera se muestran varias opciones asociadas al equipamiento actual que ofrece el fabricante Cisco Systems, Inc.. Como parte de la solución se han definido los criterios de calidad para la elección de los Centros de Datos (Data Center). Un Contact Center tiene conexiones hacia o desde las empresas cliente a las que da servicio y provee de acceso a la red a sus tele-trabajadores. Este requerimiento junto con el acceso y servicios publicados en Internet necesita una infraestructura de seguridad. Este hecho da lugar al diseño de una solución que unifica todas las conexiones bajo una única infraestructura, dividiendo de manera lógica o virtual cada uno de los servicios. De la misma manera, se ha definido la utilización de protocolos como 802.1X para evitar accesos no autorizados a la red del Contact Center. La solución de voz elegida es heterogénea y capaz de soportar los protocolos de señalización más conocidos (SIP y H.323). De esta manera se busca tener la máxima flexibilidad para establecer enlaces de voz sobre IP (Trunk IP) con proveedores y clientes. Esto se logra gracias a la utilización de SBCs y a una infraestructura interna de voz basada en el fabricante Avaya Inc. Los sistemas de VoIP en un Contact Center son los elementos clave para poder realizar la prestación del servicio; por esta razón se elige una solución redundada bajo un entorno virtual. Esta solución permite desplegar el sistema de VoIP desde cualquiera de los Data Center del Contact Center. La solución llevada a cabo en este proyecto está principalmente basada en mi experiencia laboral adquirida durante los últimos siete años en el departamento de comunicaciones de una empresa de Contact Center. He tenido en cuenta los principales requerimientos que exigen hoy en día la mayor parte de empresas que desean contratar un servicio de Contact Center. Este proyecto está dividido en cuatro capítulos. El primer capítulo es una introducción donde se explican los principales escenarios de negocio y áreas técnicas necesarias para la prestación de servicios de Contact Center. El segundo capítulo describe de manera resumida, las principales tecnologías y protocolos que serán utilizados para llevar a cabo el diseño de la solución técnica de creación de una red de comunicaciones para una empresa de Contact Center. En el tercer capítulo se expone la solución técnica necesaria para permitir que una empresa de Contact Center preste sus servicios desde distintas ubicaciones distribuidas geográficamente, utilizando dos Data Centers donde se centralizan las aplicaciones de voz y datos. Finalmente, en el cuarto capítulo se presentan las conclusiones obtenidas tras la elaboración de la presente memoria, así como una propuesta de trabajos futuros, que permitirían junto con el proyecto actual, realizar una solución técnica completa incluyendo otras áreas tecnológicas necesarias en una empresa de Contact Center. Todas las ilustraciones y tablas de este proyecto son de elaboración propia a partir de mi experiencia profesional y de la información obtenida en diversos formatos de la bibliografía consultada, excepto en los casos en los que la fuente es mencionada. ABSTRACT This project shows a network solution for a company that provides Contact Center services from different locations geographically distributed, using the Telephone over Internet Protocol (ToIP) technology. The goal of this project is to become a design guide for performing network solutions using current communications equipment developed by the manufacturer Cisco Systems, Inc., firewalls developed by the manufacturer Fortinet and telephone systems developed by Avaya Inc. and Oracle Corporation, due to their great market reputation and their contributions that each one has made in the field of Contact Center. In order to provide interconnection between its different sites, the Contact Center needs to hire the services of a telecommunications’ operator, who will use the VPN MPLS technology, with two diversified access from each Contact Center’s site. The result of this hiring is the advantage of the benefits that a telecommunications operator can offer to its customers, regarding quality of service, availability and geographical expansion. Likewise, Service Level Agreement (SLA) has to be defined to ensure the Contact Center quality communication between their sites. A quality communication is understood as a communication that is capable of being transmitted with minimum values of packet loss and transmission delays, and a speed according to the demand for its voice and data services. As part of the solution, a redundant Internet connection has to be designed to provide access to every Contact Center’s site. The local connectivity solution in each of the Contact Center’s sites has to be designed according to its volume of users and scalability that each one may have. Thereby, the manufacturer Cisco Systems, Inc. offers several options associated with the current equipment. As part of the solution, quality criteria are being defined for the choice of the Data Centers. A Contact Center has connections to/from the client companies that provide network access to teleworkers. This requires along the access and services published on the Internet, needs a security infrastructure. Therefore is been created a solution design that unifies all connections under a single infrastructure, dividing each services in a virtual way. Likewise, is been defined the use of protocols, such as 802.1X, to prevent unauthorized access to the Contact Center’s network. The voice solution chosen is heterogeneous and capable of supporting best-known signaling protocols (SIP and H.323) in order to have maximum flexibility to establish links of Voice over IP (IP Trunk) with suppliers and clients. This can be achieved through the use of SBC and an internal voice infrastructure based on Avaya Inc. The VoIP systems in a Contact Center are the key elements to be able to provide the service; for this reason a redundant solution under virtual environment is been chosen. This solution allows any of the Data Centers to deploy the VoIP system. The solution carried out in this project is mainly based on my own experience acquired during the past seven years in the communications department of a Contact Center company. I have taken into account the main requirements that most companies request nowadays when they hire a Contact Center service. This project is divided into four chapters. The first chapter is an introduction that explains the main business scenarios and technical areas required to provide Contact Center services. The second chapter describes briefly the key technologies and protocols that will be used to carry out the design of the technical solution for the creation of a communications network in a Contact Center company. The third chapter shows a technical solution required that allows a Contact Center company to provide services from across geographically distributed locations, using two Data Centers where data and voice applications are centralized. Lastly, the fourth chapter includes the conclusions gained after making this project, as well as a future projects proposal, which would allow along the current project, to perform a whole technical solution including other necessary technologic areas in a Contact Center company All illustrations and tables of this project have been made by myself from my professional experience and the information obtained in various formats of the bibliography, except in the cases where the source is indicated.
Resumo:
Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.