22 resultados para Vocal quartets.

em Universidad Politécnica de Madrid


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic systems based on speech signal analysis for the early dete ction of obstructive sleep apnea (OSA) have achieved fairly high performance rates in recent years. However, a satisfactory explanation of these results has not been available. This presentation aims at explaining via an examination of the long-term spectra of OSA patients and normal control speakers these systems’ ability to discover OSA speakers on the base of all-purpose cepstral coefficients. An in terpretation of the long- term spectra in terms of the underlying tract settings suggests that the speech of OSA patients is characterized by a pharyngeal narrowing that may be captured by acoustic cues of the spectral contour of windowed speech frames. A novel interpretation of long-term spectra in terms of the first principal component of the temporal sequence of short-term amplitude-spectra is also discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Neurological Diseases (ND) are affecting larger segments of aging population every year. Treatment is dependent on expensive accurate and frequent monitoring. It is well known that ND leave correlates in speech and phonation. The present work shows a method to detect alterations in vocal fold tension during phonation. These may appear either as hypertension or as cyclical tremor. Estimations of tremor may be produced by auto-regressive modeling of the vocal fold tension series in sustained phonation. The correlates obtained are a set of cyclicality coefficients, the frequency and the root mean square amplitude of the tremor. Statistical distributions of these correlates obtained from a set of male and female subjects are presented. Results from five study cases of female voice are also given.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A case study of vocal fold paralysis treatment is described with the help of the voice quality analysis application BioMet®Phon. The case corresponds to a description of a 40 - year old female patient who was diagnosed of vocal fold paralysis following a cardio - pulmonar intervention which required intubation for 8 days and posterior tracheotomy for 15 days. The patient presented breathy and asthenic phon ation, and dysphagia. Six main examinations were conducted during a full year period that the treatment lasted consisting in periodic reviews including video - endostroboscopy, voice analysis and breathing function monitoring. The phoniatrician treatment inc luded 20 sessions of vocal rehabilitation, followed by an intracordal infiltration with Radiesse 8 months after the rehabilitation treatment started followed by 6 sessions of rehabilitation more. The videondoscopy and the voicing quality analysis refer a s ubstantial improvement in the vocal function with recovery in all the measures estimated (jitter, shimmer, mucosal wave contents, glottal closure, harmonic contents and biomechanical function analysis). The paper refers the procedure followed and the results obtained by comparing the longitudinal progression of the treatment, illustrating the utility of voice quality analysis tools in speech therapy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Teaching the adequate use of the singing voice conveys a lot of knowledge in musical performance as well as in objective estimation techniques involving the use of air, muscles, room and body acoustics, and the tuning of a fine instrument as the human voice. Although subjective evaluation and training is a very delicate task to be carried out only by expert singers, biomedical engineering may help contributing with well - funded methodologies developed for the study of voice pathology. The present study is a preliminary study of exploratory character describing the performance of a student singer in a regular classroom under the point of view of vocal fold biomechanics. Estimate s of biomechanical parameters obtained from singing voice are given and their use i n the classroom is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

La realización de este proyecto está basado en el estudio realizado por Jean Schoentgen en el cual el autor caracterizó el micro temblor vocal por medio del índice y la frecuencia de modulación. En este proyecto se utilizará la herramienta Matlab para el cálculo de estos parámetros y al finalizar se analizarán los datos obtenidos. El proyecto se ha dividido en tres grandes partes. En la primera de ellas se ha explicado brevemente los conceptos básicos de la voz y conceptos importantes tales como el temblor fisiológico, el patológico y el Jitter vocal entre otros, también se han detallado conceptos matemáticos utilizados en el desarrollo del código. Esto se realizó con el fin que el lector tenga claros algunos conceptos importantes antes del desarrollo del código y así pueda entender con más facilidad el estudio realizado en este proyecto, en esta parte no se ha realizado una explicación muy extensa de cada concepto, entendiendo que el lector posee unos conocimientos básicos de ingeniería, por otra parte existen innumerables libros que explican de una manera más precisa cada uno de estos conceptos. En la segunda parte se llevó a cabo el desarrollo del código. Como se mencionó anteriormente se ha utilizado la herramienta Matlab que es muy utilizada en la mayoría de las asignaturas de la carrera obteniendo así un buen dominio de esta, además posee unos toolbox muy útiles que facilitan los cálculos matemáticos. En esta parte se ilustra paso a paso cada etapa de elaboración del código y algunas graficas de la señal de voz a medida que pasa por cada etapa del código. En la última parte se obtienen los datos de todos los cálculos de los registros de voz y se analiza cada uno de ellos a la vez que se comparan con los del estudio de Jean Schoentgen y se analizan las posibles diferencias. ABSTRACT. The Project is based on the search made by Jean Schoentgen, whose research the micro tremor vocal can be established by frequency modulation and modulation index. This project has been carried out in Matlab to calculate the aforementioned parameters and finally, the results were contrasted with the results from Jean Shoetngen’s research. This project consists of three parts: The first of all, to be able to understand this project to future readers .It was explained different basic concepts about the voice such as physiologic tremor, pathological tremor and Jitter. Furthermore, mathematical concepts were explained in detail, due to these were used in the software development. Then, it was focused on software development such as the elaboration of code and different voice signals that were processed. This part was made with Matlab, which is mathematical software with high-level language for numerical computation, visualization, collaborate across disciplines including signal and image processing and application development. At finally, the acquired calculations were contrasted with the results from Jean Schoentgen’s research.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detección de Actividad de Voz en entornos acústicos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefonía basadas en reconocimiento automático de voz, aplicaciones en sistemas de transcripción automática, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial interés en el estudio de las voces de fondo, principal fuente de error de la mayoría de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de características contiene dos componentes: la energía normalizada y la variación de la energía. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliación del vector de características de partida dotándole así de información espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topologías y, finalmente, 3) estudio e inclusión de nuevas características, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detección, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Burnt Norton es el primero de los FOUR QUARTETS, una de las obras clave de T.S. Elliot (1). En sus 6 primeras líneas utiliza hasta 7 veces la palabra tiempo con reiteración sorprendente. “El tiempo presente y el tiempo pasado / quizás ambos están contenidos / el presente en el tiempo futuro / y el tiempo futuro en el tiempo pasado. / Si todo el tiempo es eternamente presente / todo tiempo es recuperable.” Pues éste, el tiempo que los poetas tan bien expresan, es el tiempo que quiere atrapar la creación arquitectónica. Este tiempo es tema central de la Arquitectura. Se trata en este texto de analizar el porqué, a veces, algunos espacios arquitectónicos son capaces de producirnos una conmocion interior tal, una suspensión del tiempo que, aunque pudiera parecer un concepto abstracto, o un tema más propio de la Poesía o de la Filosofia, se produce con una fuerza especial, real, palpable, sólo cuando de la Arquitectura se trata. Cuando estamos ante o en esos espacios en los que el tiempo parece detenerse, se suspende, se puede tocar con las manos

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The dramatic impact of neurological degenerative pathologies in life quality is a growing concern. It is well known that many neurological diseases leave a fingerprint in voice and speech production. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are costly or difficult to extend to primary attention medical services. Through the present paper it will be shown how some neurological diseases can be traced at the level of phonation. The detection procedure would be based on a simple voice test. The availability of advanced tools and methodologies to monitor the organic pathology of voice would facilitate the implantation of these tests. The paper hypothesizes that some of the underlying mechanisms affecting the production of voice produce measurable correlates in vocal fold biomechanics. A general description of the methodological foundations for the voice analysis system which can estimate correlates to the neurological disease is shown. Some study cases will be presented to illustrate the possibilities of the methodology to monitor neurological diseases by voice

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The purpose of this document is to serve as the printed material for the seminar "An Introductory Course on Constraint Logic Programming". The intended audience of this seminar are industrial programmers with a degree in Computer Science but little previous experience with constraint programming. The seminar itself has been field tested, prior to the writing of this document, with a group of the application programmers of Esprit project P23182, "VOCAL", aimed at developing an application in scheduling of field maintenance tasks in the context of an electric utility company. The contents of this paper follow essentially the flow of the seminar slides. However, there are some differences. These differences stem from our perception from the experience of teaching the seminar, that the technical aspects are the ones which need more attention and clearer explanations in the written version. Thus, this document includes more examples than those in the slides, more exercises (and the solutions to them), as well as four additional programming projects, with which we hope the reader will obtain a clearer view of the process of development and tuning of programs using CLP. On the other hand, several parts of the seminar have been taken out: those related with the account of fields and applications in which C(L)P is useful, and the enumerations of C(L)P tools available. We feel that the slides are clear enough, and that for more information on available tools, the interested reader will find more up-to-date information by browsing the Web or asking the vendors directly. More details in this direction will actually boil down to summarizing a user manual, which is not the aim of this document.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BioMet®Tools is a set of software applications developed for the biometrical characterization of voice in different fields as voice quality evaluation in laryngology, speech therapy and rehabilitation, education of the singing voice, forensic voice analysis in court, emotional detection in voice, secure access to facilities and services, etc. Initially it was conceived as plain research code to estimate the glottal source from voice and obtain the biomechanical parameters of the vocal folds from the spectral density of the estimate. This code grew to what is now the Glottex®Engine package (G®E). Further demands from users in medical and forensic fields instantiated the development of different Graphic User Interfaces (GUI’s) to encapsulate user interaction with the G®E. This required the personalized design of different GUI’s handling the same G®E. In this way development costs and time could be saved. The development model is described in detail leading to commercial production and distribution. Study cases from its application to the field of laryngology and speech therapy are given and discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La medicina ha evolucionado de forma que las imágenes digitales tienen un papel de gran relevancia para llevar a cabo el diagnóstico de enfermedades. Son muchos y de diversa naturaleza los problemas que pueden presentar el aparato fonador. Un paso previo para la caracterización de imágenes digitales de la laringe es la segmentación de las cuerdas vocales. Hasta el momento se han desarrollado algoritmos que permiten la segmentación de la glotis. El presente proyecto pretende avanzar un paso más en el estudio, procurando asimismo la segmentación de las cuerdas vocales. Para ello, es necesario aprovechar la información de color que ofrecen las imágenes, pues es lo que va a determinar la diferencia entre una región y otra de la imagen. En este proyecto se ha desarrollado un novedoso método de segmentación de imágenes en color estroboscópicas de la laringe basado en el crecimiento de regiones a partir de píxeles-semilla. Debido a los problemas que presentan las imágenes obtenidas por la técnica de la estroboscopia, para conseguir óptimos resultados de la segmentación es necesario someter a las imágenes a un preprocesado, que consiste en la eliminación de altos brillos y aplicación de un filtro de difusión anisotrópica. Tras el preprocesado, comienza el crecimiento de la región a partir de unas semillas que se obtienen previamente. La condición de inclusión de un píxel en la región se basa en un parámetro de tolerancia que se determina de forma adaptativa. Este parámetro comienza teniendo un valor muy bajo y va aumentando de forma recursiva hasta alcanzar una condición de parada. Esta condición se basa en el análisis de la distribución estadística de los píxeles dentro de la región que va creciendo. La última fase del proyecto consiste en la realización de las pruebas necesarias para verificar el funcionamiento del sistema diseñado, obteniéndose buenos resultados en la segmentación de la glotis y resultados esperanzadores para seguir mejorando el sistema para la segmentación de las cuerdas vocales. ABSTRACT Medicine has evolved so that digital images have a very important role to perform disease diagnosis. There are wide variety of problems that can present the vocal apparatus. A preliminary step for characterization of digital images of the larynx is the segmentation of the vocal folds. To date, some algorithms that allow the segmentation of the glottis have been developed. This project aims to go one step further in the study, also seeking the segmentation of the vocal folds. To do this, we must use the color information offered by images, since this is what will determine the difference between different regions in a picture. In this project a novel method of larynx color images segmentation based on region growing from a pixel seed is developed. Due to the problems of the images obtained by the technique of stroboscopy, to achieve optimal results of the segmentation is necessary a preprocessing of the images, which involves the removal of high brightness and applying an anisotropic diffusion filter. After this preprocessing, the growth of the region from previously obtained seeds starts. The condition for inclusion of a pixel in the region is based on a tolerance parameter, which is adaptively determined. It initially has a low value and this is recursively increased until a stop condition is reached. This condition is based on the analysis of the statistical distribution of the pixels within the grown region. The last phase of the project involves the necessary tests to verify the proper working of the designed system, obtaining very good results in the segmentation of the glottis and encouraging results to keep improving the system for the segmentation of the vocal folds.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La presente Tesis analiza las posibilidades que ofrecen en la actualidad las tecnologías del habla para la detección de patologías clínicas asociadas a la vía aérea superior. El estudio del habla que tradicionalmente cubre tanto la producción como el proceso de transformación del mensaje y las señales involucradas, desde el emisor hasta alcanzar al receptor, ofrece una vía de estudio alternativa para estas patologías. El hecho de que la señal emitida no solo contiene este mensaje, sino también información acerca del locutor, ha motivado el desarrollo de sistemas orientados a la identificación y verificación de la identidad de los locutores. Estos trabajos han recibido recientemente un nuevo impulso, orientándose tanto hacia la caracterización de rasgos que son comunes a varios locutores, como a las diferencias existentes entre grabaciones de un mismo locutor. Los primeros resultan especialmente relevantes para esta Tesis dado que estos rasgos podrían evidenciar la presencia de características relacionadas con una cierta condición común a varios locutores, independiente de su identidad. Tal es el caso que se enfrenta en esta Tesis, donde los rasgos identificados se relacionarían con una de la patología particular y directamente vinculada con el sistema de físico de conformación del habla. El caso del Síndrome de Apneas Hipopneas durante el Sueno (SAHS) resulta paradigmático. Se trata de una patología con una elevada prevalencia mundo, que aumenta con la edad. Los pacientes de esta patología experimentan episodios de cese involuntario de la respiración durante el sueño, que se prolongan durante varios segundos y que se reproducen a lo largo de la noche impidiendo el correcto descanso. En el caso de la apnea obstructiva, estos episodios se deben a la imposibilidad de mantener un camino abierto a través de la vía aérea, de forma que el flujo de aire se ve interrumpido. En la actualidad, el diagnostico de estos pacientes se realiza a través de un estudio polisomnográfico, que se centra en el análisis de los episodios de apnea durante el sueño, requiriendo que el paciente permanezca en el hospital durante una noche. La complejidad y el elevado coste de estos procedimientos, unidos a las crecientes listas de espera, han evidenciado la necesidad de contar con técnicas rápidas de detección, que si bien podrían no obtener tasas tan elevadas, permitirían reorganizar las listas de espera en función del grado de severidad de la patología en cada paciente. Entre otros, los sistemas de diagnostico por imagen, así como la caracterización antropométrica de los pacientes, han evidenciado la existencia de patrones anatómicos que tendrían influencia directa sobre el habla. Los trabajos dedicados al estudio del SAHS en lo relativo a como esta afecta al habla han sido escasos y algunos de ellos incluso contradictorios. Sin embargo, desde finales de la década de 1980 se conoce la existencia de patrones específicos relativos a la articulación, la fonación y la resonancia. Sin embargo, su descripción resultaba difícilmente aprovechable a través de un sistema de reconocimiento automático, pero apuntaba la existencia de un nexo entre voz y SAHS. En los últimos anos las técnicas de procesado automático han permitido el desarrollo de sistemas automáticos que ya son capaces de identificar diferencias significativas en el habla de los pacientes del SAHS, y que los distinguen de los locutores sanos. Por contra, poco se conoce acerca de la conexión entre estos nuevos resultados, los sé que habían obtenido en el pasado y la patogénesis del SAHS. Esta Tesis continua la labor desarrollada en este ámbito considerando específicamente: el estudio de la forma en que el SAHS afecta el habla de los pacientes, la mejora en las tasas de clasificación automática y la combinación de la información obtenida con los predictores utilizados por los especialistas clínicos en sus evaluaciones preliminares. Las dos primeras tareas plantean problemas simbióticos, pero diferentes. Mientras el estudio de la conexión entre el SAHS y el habla requiere de modelos acotados que puedan ser interpretados con facilidad, los sistemas de reconocimiento se sirven de un elevado número de dimensiones para la caracterización y posterior identificación de patrones. Así, la primera tarea debe permitirnos avanzar en la segunda, al igual que la incorporación de los predictores utilizados por los especialistas clínicos. La Tesis aborda el estudio tanto del habla continua como del habla sostenida, con el fin de aprovechar las sinergias y diferencias existentes entre ambas. En el análisis del habla continua se tomo como punto de partida un esquema que ya fue evaluado con anterioridad, y sobre el cual se ha tratado la evaluación y optimización de la representación del habla, así como la caracterización de los patrones específicos asociados al SAHS. Ello ha evidenciado la conexión entre el SAHS y los elementos fundamentales de la señal de voz: los formantes. Los resultados obtenidos demuestran que el éxito de estos sistemas se debe, fundamentalmente, a la capacidad de estas representaciones para describir dichas componentes, obviando las dimensiones ruidosas o con poca capacidad discriminativa. El esquema resultante ofrece una tasa de error por debajo del 18%, sirviéndose de clasificadores notablemente menos complejos que los descritos en el estado del arte y de una única grabación de voz de corta duración. En relación a la conexión entre el SAHS y los patrones observados, fue necesario considerar las diferencias inter- e intra-grupo, centrándonos en la articulación característica del locutor, sustituyendo los complejos modelos de clasificación por el estudio de los promedios espectrales. El resultado apunta con claridad hacia ciertas regiones del eje de frecuencias, sugiriendo la existencia de un estrechamiento sistemático en la sección del tracto en la región de la orofaringe, ya prevista en la patogénesis de este síndrome. En cuanto al habla sostenida, se han reproducido los estudios realizados sobre el habla continua en grabaciones de la vocal /a/ sostenida. Los resultados son cualitativamente análogos a los anteriores, si bien en este caso las tasas de clasificación resultan ser más bajas. Con el objetivo de identificar el sentido de este resultado se reprodujo el estudio de los promedios espectrales y de la variabilidad inter e intra-grupo. Ambos estudios mostraron importantes diferencias con los anteriores que podrían explicar estos resultados. Sin embargo, el habla sostenida ofrece otras oportunidades al establecer un entorno controlado para el estudio de la fonación, que también había sido identificada como una fuente de información para la detección del SAHS. De su estudio se pudo observar que, en el conjunto de datos disponibles, no existen variaciones que pudieran asociarse fácilmente con la fonación. Únicamente aquellas dimensiones que describen la distribución de energía a lo largo del eje de frecuencia evidenciaron diferencias significativas, apuntando, una vez más, en la dirección de las resonancias espectrales. Analizados los resultados anteriores, la Tesis afronta la fusión de ambas fuentes de información en un único sistema de clasificación. Con ello es posible mejorar las tasas de clasificación, bajo la hipótesis de que la información presente en el habla continua y el habla sostenida es fundamentalmente distinta. Esta tarea se realizo a través de un sencillo esquema de fusión que obtuvo un 88.6% de aciertos en clasificación (tasa de error del 11.4%), lo que representa una mejora significativa respecto al estado del arte. Finalmente, la combinación de este clasificador con los predictores utilizados por los especialistas clínicos ofreció una tasa del 91.3% (tasa de error de 8.7%), que se encuentra dentro del margen ofrecido por esquemas más costosos e intrusivos, y que a diferencia del propuesto, no pueden ser utilizados en la evaluación previa de los pacientes. Con todo, la Tesis ofrece una visión clara sobre la relación entre el SAHS y el habla, evidenciando el grado de madurez alcanzado por la tecnología del habla en la caracterización y detección del SAHS, poniendo de manifiesto que su uso para la evaluación de los pacientes ya sería posible, y dejando la puerta abierta a futuras investigaciones que continúen el trabajo aquí iniciado. ABSTRACT This Thesis explores the potential of speech technologies for the detection of clinical disorders connected to the upper airway. The study of speech traditionally covers both the production process and post processing of the signals involved, from the speaker up to the listener, offering an alternative path to study these pathologies. The fact that utterances embed not just the encoded message but also information about the speaker, has motivated the development of automatic systems oriented to the identification and verificaton the speaker’s identity. These have recently been boosted and reoriented either towards the characterization of traits that are common to several speakers, or to the differences between records of the same speaker collected under different conditions. The first are particularly relevant to this Thesis as these patterns could reveal the presence of features that are related to a common condition shared among different speakers, regardless of their identity. Such is the case faced in this Thesis, where the traits identified would relate to a particular pathology, directly connected to the speech production system. The Obstructive Sleep Apnea syndrome (OSA) is a paradigmatic case for analysis. It is a disorder with high prevalence among adults and affecting a larger number of them as they grow older. Patients suffering from this disorder experience episodes of involuntary cessation of breath during sleep that may last a few seconds and reproduce throughout the night, preventing proper rest. In the case of obstructive apnea, these episodes are related to the collapse of the pharynx, which interrupts the air flow. Currently, OSA diagnosis is done through a polysomnographic study, which focuses on the analysis of apnea episodes during sleep, requiring the patient to stay at the hospital for the whole night. The complexity and high cost of the procedures involved, combined with the waiting lists, have evidenced the need for screening techniques, which perhaps would not achieve outstanding performance rates but would allow clinicians to reorganize these lists ranking patients according to the severity of their condition. Among others, imaging diagnosis and anthropometric characterization of patients have evidenced the existence of anatomical patterns related to OSA that have direct influence on speech. Contributions devoted to the study of how this disorder affects scpeech are scarce and somehow contradictory. However, since the late 1980s the existence of specific patterns related to articulation, phonation and resonance is known. By that time these descriptions were virtually useless when coming to the development of an automatic system, but pointed out the existence of a link between speech and OSA. In recent years automatic processing techniques have evolved and are now able to identify significant differences in the speech of OSAS patients when compared to records from healthy subjects. Nevertheless, little is known about the connection between these new results with those published in the past and the pathogenesis of the OSA syndrome. This Thesis is aimed to progress beyond the previous research done in this area by addressing: the study of how OSA affects patients’ speech, the enhancement of automatic OSA classification based on speech analysis, and its integration with the information embedded in the predictors generally used by clinicians in preliminary patients’ examination. The first two tasks, though may appear symbiotic at first, are quite different. While studying the connection between speech and OSA requires simple narrow models that can be easily interpreted, classification requires larger models including a large number dimensions for the characterization and posterior identification of the observed patterns. Anyhow, it is clear that any progress made in the first task should allow us to improve our performance on the second one, and that the incorporation of the predictors used by clinicians shall contribute in this same direction. The Thesis considers both continuous and sustained speech analysis, to exploit the synergies and differences between them. On continuous speech analysis, a conventional speech processing scheme, designed and evaluated before this Thesis, was taken as a baseline. Over this initial system several alternative representations of the speech information were proposed, optimized and tested to select those more suitable for the characterization of OSA-specific patterns. Evidences were found on the existence of a connection between OSA and the fundamental constituents of the speech: the formants. Experimental results proved that the success of the proposed solution is well explained by the ability of speech representations to describe these specific OSA-related components, ignoring the noisy ones as well those presenting low discrimination capabilities. The resulting scheme obtained a 18% error rate, on a classification scheme significantly less complex than those described in the literature and operating on a single speech record. Regarding the connection between OSA and the observed patterns, it was necessary to consider inter-and intra-group differences for this analysis, and to focus on the articulation, replacing the complex classification models by the long-term average spectra. Results clearly point to certain regions on the frequency axis, suggesting the existence of a systematic narrowing in the vocal tract section at the oropharynx. This was already described in the pathogenesis of this syndrome. Regarding sustained speech, similar experiments as those conducted on continuous speech were reproduced on sustained phonations of vowel / a /. Results were qualitatively similar to the previous ones, though in this case perfomance rates were found to be noticeably lower. Trying to derive further knowledge from this result, experiments on the long-term average spectra and intraand inter-group variability ratios were also reproduced on sustained speech records. Results on both experiments showed significant differences from the previous ones obtained from continuous speech which could explain the differences observed on peformance. However, sustained speech also provided the opportunity to study phonation within the controlled framework it provides. This was also identified in the literature as a source of information for the detection of OSA. In this study it was found that, for the available dataset, no sistematic differences related to phonation could be found between the two groups of speakers. Only those dimensions which relate energy distribution along the frequency axis provided significant differences, pointing once again towards the direction of resonant components. Once classification schemes on both continuous and sustained speech were developed, the Thesis addressed their combination into a single classification system. Under the assumption that the information in continuous and sustained speech is fundamentally different, it should be possible to successfully merge the two of them. This was tested through a simple fusion scheme which obtained a 88.6% correct classification (11.4% error rate), which represents a significant improvement over the state of the art. Finally, the combination of this classifier with the variables used by clinicians obtained a 91.3% accuracy (8.7% error rate). This is within the range of alternative, but costly and intrusive schemes, which unlike the one proposed can not be used in the preliminary assessment of patients’ condition. In the end, this Thesis has shed new light on the underlying connection between OSA and speech, and evidenced the degree of maturity reached by speech technology on OSA characterization and detection, leaving the door open for future research which shall continue in the multiple directions that have been pointed out and left as future work.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Teaching the adequate use of the singing voice conveys a lot of knowledge in musical performance as well as in objective estimation techniques involving the use of air, muscles, room and body acoustics, and the tuning of a fine instrument as the human voice. Although subjective evaluation and training is a very delicate task to be carried out only by expert singers, biomedical engineering may help contributing with well-funded methodologies developed for the study of voice pathology. The present work is a preliminary study of exploratory character describing the performance of a student singer in a regular classroom under the point of view of vocal fold biomechanics. Estimates of biomechanical parameters obtained from singing voice are given and their potential use is discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BioMet®Phon is a software application developed for the characterization of voice in voice quality evaluation. Initially it was conceived as plain research code to estimate the glottal source from voice and obtain the biomechanical parameters of the vocal folds from the spectral density of the estimate. This code grew to what is now the Glottex®Engine package (G®E). Further demands from users in laryngology and speech therapy fields instantiated the development of a specific Graphic User Interface (GUI’s) to encapsulate user interaction with the G®E. This gave place to BioMet®Phon, an application which extracts the glottal source from voice and offers a complete parameterization of this signal, including distortion, cepstral, spectral, biomechanical, time domain, contact and tremor parameters. The semantic capabilities of biomechanical parameters are discussed. Study cases from its application to the field of laryngology and speech therapy are given and discussed. Validation results in voice pathology detection are also presented. Applications to laryngology, speech therapy, and monitoring neurological deterioration in the elder are proposed.