13 resultados para Freedom of Speech

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a proposal for an advanced system of debate in an environment of digital democracy which overcomes the limitations of existing systems. We have been especially careful in applying security procedures in telematic systems, for they are to offer citizens the guarantees that society demands. New functional tools have been included to ensure user authentication and to permit anonymous participation where the system is unable to disclose or even to know the identity of system users. The platform prevents participation by non-entitled persons who do not belong to the authorized group from giving their opinion. Furthermore, this proposal allows for verifying the proper function of the system, free of tampering or fraud intended to alter the conclusions or outcomes of participation. All these tools guarantee important aspects of both a social and technical nature, most importantly: freedom of expression, equality and auditability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La presente Tesis analiza las posibilidades que ofrecen en la actualidad las tecnologas del habla para la deteccin de patologas clnicas asociadas a la va area superior. El estudio del habla que tradicionalmente cubre tanto la produccin como el proceso de transformacin del mensaje y las seales involucradas, desde el emisor hasta alcanzar al receptor, ofrece una va de estudio alternativa para estas patologas. El hecho de que la seal emitida no solo contiene este mensaje, sino tambin informacin acerca del locutor, ha motivado el desarrollo de sistemas orientados a la identificacin y verificacin de la identidad de los locutores. Estos trabajos han recibido recientemente un nuevo impulso, orientndose tanto hacia la caracterizacin de rasgos que son comunes a varios locutores, como a las diferencias existentes entre grabaciones de un mismo locutor. Los primeros resultan especialmente relevantes para esta Tesis dado que estos rasgos podran evidenciar la presencia de caractersticas relacionadas con una cierta condicin comn a varios locutores, independiente de su identidad. Tal es el caso que se enfrenta en esta Tesis, donde los rasgos identificados se relacionaran con una de la patologa particular y directamente vinculada con el sistema de fsico de conformacin del habla. El caso del Sndrome de Apneas Hipopneas durante el Sueno (SAHS) resulta paradigmtico. Se trata de una patologa con una elevada prevalencia mundo, que aumenta con la edad. Los pacientes de esta patologa experimentan episodios de cese involuntario de la respiracin durante el sueo, que se prolongan durante varios segundos y que se reproducen a lo largo de la noche impidiendo el correcto descanso. En el caso de la apnea obstructiva, estos episodios se deben a la imposibilidad de mantener un camino abierto a travs de la va area, de forma que el flujo de aire se ve interrumpido. En la actualidad, el diagnostico de estos pacientes se realiza a travs de un estudio polisomnogrfico, que se centra en el anlisis de los episodios de apnea durante el sueo, requiriendo que el paciente permanezca en el hospital durante una noche. La complejidad y el elevado coste de estos procedimientos, unidos a las crecientes listas de espera, han evidenciado la necesidad de contar con tcnicas rpidas de deteccin, que si bien podran no obtener tasas tan elevadas, permitiran reorganizar las listas de espera en funcin del grado de severidad de la patologa en cada paciente. Entre otros, los sistemas de diagnostico por imagen, as como la caracterizacin antropomtrica de los pacientes, han evidenciado la existencia de patrones anatmicos que tendran influencia directa sobre el habla. Los trabajos dedicados al estudio del SAHS en lo relativo a como esta afecta al habla han sido escasos y algunos de ellos incluso contradictorios. Sin embargo, desde finales de la dcada de 1980 se conoce la existencia de patrones especficos relativos a la articulacin, la fonacin y la resonancia. Sin embargo, su descripcin resultaba difcilmente aprovechable a travs de un sistema de reconocimiento automtico, pero apuntaba la existencia de un nexo entre voz y SAHS. En los ltimos anos las tcnicas de procesado automtico han permitido el desarrollo de sistemas automticos que ya son capaces de identificar diferencias significativas en el habla de los pacientes del SAHS, y que los distinguen de los locutores sanos. Por contra, poco se conoce acerca de la conexin entre estos nuevos resultados, los s que haban obtenido en el pasado y la patognesis del SAHS. Esta Tesis continua la labor desarrollada en este mbito considerando especficamente: el estudio de la forma en que el SAHS afecta el habla de los pacientes, la mejora en las tasas de clasificacin automtica y la combinacin de la informacin obtenida con los predictores utilizados por los especialistas clnicos en sus evaluaciones preliminares. Las dos primeras tareas plantean problemas simbiticos, pero diferentes. Mientras el estudio de la conexin entre el SAHS y el habla requiere de modelos acotados que puedan ser interpretados con facilidad, los sistemas de reconocimiento se sirven de un elevado nmero de dimensiones para la caracterizacin y posterior identificacin de patrones. As, la primera tarea debe permitirnos avanzar en la segunda, al igual que la incorporacin de los predictores utilizados por los especialistas clnicos. La Tesis aborda el estudio tanto del habla continua como del habla sostenida, con el fin de aprovechar las sinergias y diferencias existentes entre ambas. En el anlisis del habla continua se tomo como punto de partida un esquema que ya fue evaluado con anterioridad, y sobre el cual se ha tratado la evaluacin y optimizacin de la representacin del habla, as como la caracterizacin de los patrones especficos asociados al SAHS. Ello ha evidenciado la conexin entre el SAHS y los elementos fundamentales de la seal de voz: los formantes. Los resultados obtenidos demuestran que el xito de estos sistemas se debe, fundamentalmente, a la capacidad de estas representaciones para describir dichas componentes, obviando las dimensiones ruidosas o con poca capacidad discriminativa. El esquema resultante ofrece una tasa de error por debajo del 18%, sirvindose de clasificadores notablemente menos complejos que los descritos en el estado del arte y de una nica grabacin de voz de corta duracin. En relacin a la conexin entre el SAHS y los patrones observados, fue necesario considerar las diferencias inter- e intra-grupo, centrndonos en la articulacin caracterstica del locutor, sustituyendo los complejos modelos de clasificacin por el estudio de los promedios espectrales. El resultado apunta con claridad hacia ciertas regiones del eje de frecuencias, sugiriendo la existencia de un estrechamiento sistemtico en la seccin del tracto en la regin de la orofaringe, ya prevista en la patognesis de este sndrome. En cuanto al habla sostenida, se han reproducido los estudios realizados sobre el habla continua en grabaciones de la vocal /a/ sostenida. Los resultados son cualitativamente anlogos a los anteriores, si bien en este caso las tasas de clasificacin resultan ser ms bajas. Con el objetivo de identificar el sentido de este resultado se reprodujo el estudio de los promedios espectrales y de la variabilidad inter e intra-grupo. Ambos estudios mostraron importantes diferencias con los anteriores que podran explicar estos resultados. Sin embargo, el habla sostenida ofrece otras oportunidades al establecer un entorno controlado para el estudio de la fonacin, que tambin haba sido identificada como una fuente de informacin para la deteccin del SAHS. De su estudio se pudo observar que, en el conjunto de datos disponibles, no existen variaciones que pudieran asociarse fcilmente con la fonacin. nicamente aquellas dimensiones que describen la distribucin de energa a lo largo del eje de frecuencia evidenciaron diferencias significativas, apuntando, una vez ms, en la direccin de las resonancias espectrales. Analizados los resultados anteriores, la Tesis afronta la fusin de ambas fuentes de informacin en un nico sistema de clasificacin. Con ello es posible mejorar las tasas de clasificacin, bajo la hiptesis de que la informacin presente en el habla continua y el habla sostenida es fundamentalmente distinta. Esta tarea se realizo a travs de un sencillo esquema de fusin que obtuvo un 88.6% de aciertos en clasificacin (tasa de error del 11.4%), lo que representa una mejora significativa respecto al estado del arte. Finalmente, la combinacin de este clasificador con los predictores utilizados por los especialistas clnicos ofreci una tasa del 91.3% (tasa de error de 8.7%), que se encuentra dentro del margen ofrecido por esquemas ms costosos e intrusivos, y que a diferencia del propuesto, no pueden ser utilizados en la evaluacin previa de los pacientes. Con todo, la Tesis ofrece una visin clara sobre la relacin entre el SAHS y el habla, evidenciando el grado de madurez alcanzado por la tecnologa del habla en la caracterizacin y deteccin del SAHS, poniendo de manifiesto que su uso para la evaluacin de los pacientes ya sera posible, y dejando la puerta abierta a futuras investigaciones que continen el trabajo aqu iniciado. ABSTRACT This Thesis explores the potential of speech technologies for the detection of clinical disorders connected to the upper airway. The study of speech traditionally covers both the production process and post processing of the signals involved, from the speaker up to the listener, offering an alternative path to study these pathologies. The fact that utterances embed not just the encoded message but also information about the speaker, has motivated the development of automatic systems oriented to the identification and verificaton the speakers identity. These have recently been boosted and reoriented either towards the characterization of traits that are common to several speakers, or to the differences between records of the same speaker collected under different conditions. The first are particularly relevant to this Thesis as these patterns could reveal the presence of features that are related to a common condition shared among different speakers, regardless of their identity. Such is the case faced in this Thesis, where the traits identified would relate to a particular pathology, directly connected to the speech production system. The Obstructive Sleep Apnea syndrome (OSA) is a paradigmatic case for analysis. It is a disorder with high prevalence among adults and affecting a larger number of them as they grow older. Patients suffering from this disorder experience episodes of involuntary cessation of breath during sleep that may last a few seconds and reproduce throughout the night, preventing proper rest. In the case of obstructive apnea, these episodes are related to the collapse of the pharynx, which interrupts the air flow. Currently, OSA diagnosis is done through a polysomnographic study, which focuses on the analysis of apnea episodes during sleep, requiring the patient to stay at the hospital for the whole night. The complexity and high cost of the procedures involved, combined with the waiting lists, have evidenced the need for screening techniques, which perhaps would not achieve outstanding performance rates but would allow clinicians to reorganize these lists ranking patients according to the severity of their condition. Among others, imaging diagnosis and anthropometric characterization of patients have evidenced the existence of anatomical patterns related to OSA that have direct influence on speech. Contributions devoted to the study of how this disorder affects scpeech are scarce and somehow contradictory. However, since the late 1980s the existence of specific patterns related to articulation, phonation and resonance is known. By that time these descriptions were virtually useless when coming to the development of an automatic system, but pointed out the existence of a link between speech and OSA. In recent years automatic processing techniques have evolved and are now able to identify significant differences in the speech of OSAS patients when compared to records from healthy subjects. Nevertheless, little is known about the connection between these new results with those published in the past and the pathogenesis of the OSA syndrome. This Thesis is aimed to progress beyond the previous research done in this area by addressing: the study of how OSA affects patients speech, the enhancement of automatic OSA classification based on speech analysis, and its integration with the information embedded in the predictors generally used by clinicians in preliminary patients examination. The first two tasks, though may appear symbiotic at first, are quite different. While studying the connection between speech and OSA requires simple narrow models that can be easily interpreted, classification requires larger models including a large number dimensions for the characterization and posterior identification of the observed patterns. Anyhow, it is clear that any progress made in the first task should allow us to improve our performance on the second one, and that the incorporation of the predictors used by clinicians shall contribute in this same direction. The Thesis considers both continuous and sustained speech analysis, to exploit the synergies and differences between them. On continuous speech analysis, a conventional speech processing scheme, designed and evaluated before this Thesis, was taken as a baseline. Over this initial system several alternative representations of the speech information were proposed, optimized and tested to select those more suitable for the characterization of OSA-specific patterns. Evidences were found on the existence of a connection between OSA and the fundamental constituents of the speech: the formants. Experimental results proved that the success of the proposed solution is well explained by the ability of speech representations to describe these specific OSA-related components, ignoring the noisy ones as well those presenting low discrimination capabilities. The resulting scheme obtained a 18% error rate, on a classification scheme significantly less complex than those described in the literature and operating on a single speech record. Regarding the connection between OSA and the observed patterns, it was necessary to consider inter-and intra-group differences for this analysis, and to focus on the articulation, replacing the complex classification models by the long-term average spectra. Results clearly point to certain regions on the frequency axis, suggesting the existence of a systematic narrowing in the vocal tract section at the oropharynx. This was already described in the pathogenesis of this syndrome. Regarding sustained speech, similar experiments as those conducted on continuous speech were reproduced on sustained phonations of vowel / a /. Results were qualitatively similar to the previous ones, though in this case perfomance rates were found to be noticeably lower. Trying to derive further knowledge from this result, experiments on the long-term average spectra and intraand inter-group variability ratios were also reproduced on sustained speech records. Results on both experiments showed significant differences from the previous ones obtained from continuous speech which could explain the differences observed on peformance. However, sustained speech also provided the opportunity to study phonation within the controlled framework it provides. This was also identified in the literature as a source of information for the detection of OSA. In this study it was found that, for the available dataset, no sistematic differences related to phonation could be found between the two groups of speakers. Only those dimensions which relate energy distribution along the frequency axis provided significant differences, pointing once again towards the direction of resonant components. Once classification schemes on both continuous and sustained speech were developed, the Thesis addressed their combination into a single classification system. Under the assumption that the information in continuous and sustained speech is fundamentally different, it should be possible to successfully merge the two of them. This was tested through a simple fusion scheme which obtained a 88.6% correct classification (11.4% error rate), which represents a significant improvement over the state of the art. Finally, the combination of this classifier with the variables used by clinicians obtained a 91.3% accuracy (8.7% error rate). This is within the range of alternative, but costly and intrusive schemes, which unlike the one proposed can not be used in the preliminary assessment of patients condition. In the end, this Thesis has shed new light on the underlying connection between OSA and speech, and evidenced the degree of maturity reached by speech technology on OSA characterization and detection, leaving the door open for future research which shall continue in the multiple directions that have been pointed out and left as future work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Amyotrophic Lateral Sclerosis is a severe disease, which dramatically reduces the speech communication skills of patients as disease progresses. The present study is devoted to define accurate and objective estimates to characterize the loss of communication skills, to help clinicians and therapists in monitoring disease progression and in deciding on rehabilitation interventions. The methodology proposed is based on the perceptual (neuromorphic)definition of speech dinamics, concentrated in vowel sound in character and duration. We present the results from a longitudinal study carried out in an ALS patient during one year. Discussion addresses future actions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temtica en Tecnologas del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language spoken and recognize and understand sentences in both Spanish and English. We also present field results for several in-tower controller positions. To the best of our knowledge, this is the first time that field ATC speech (not simulated) is captured, processed, and analyzed. The use of stochastic grammars allows variations in the standard phraseology that appear in field data. The robust understanding algorithm developed has 95% concept accuracy from ATC text input. It also allows changes in the presentation order of the concepts and the correction of errors created by the speech recognition engine improving it by 17% and 25%, respectively, absolute in the percentage of fully correctly understood sentences for English and Spanish in relation to the percentages of fully correctly recognized sentences. The analysis of errors due to the spontaneity of the speech and its comparison to read speech is also carried out. A 96% word accuracy for read speech is reduced to 86% word accuracy for field ATC data for Spanish for the "clearances" task confirming that field data is needed to estimate the performance of a system. A literature review and a critical discussion on the possibilities of speech recognition and understanding technology applied to ATC speech are also given.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

More children with different versions of speech disorders appear in Russia last decades. This situation reflects general tendency of national health deterioration. Our practical experience shows that close grownups can?t communicate to children with limited health capacity. As a result there arise social disabilities in child development. Speech communication is one of the forms of global communicative interaction process between close grownups and young child in the course of which there is a redistribution of knowledge and ignorance (Nikas Luman,2005). Within a framework of sociocultiral theory of mental growth we consider the appearance of speech communication under any cases of physical illness is possible only under conditions of correctly- organized communication between grownups and young children. (L.S. Vigotski ,2000). The special value in this aspect acquires the study of communication between grownups and young children. For five years we have been conducting the surveys on the problem of communicative contacts between parents and non-verbal children. Analysis of received data gave us the opportunity to systematize peculiar communicative interaction of adults and children who have some lapses in acquiring speech form communication. We have revealed four versions of situational- business communication between close grownups and young children with disabilities in acquiring speech. We have assumed that four versions of situational- business communication negatively affect speech form communication formation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The introduction of open-plan offices in the 1960s with the intent of making the workplace more flexible, efficient, and team-oriented resulted in a higher noise floor level, which not only made concentrated work more difficult, but also caused physiological problems, such as increased stress, in addition to a loss of speech privacy. Irrelevant background human speech, in particular, has proven to be a major factor in disrupting concentration and lowering performance. Therefore, reducing the intelligibility of speech and has been a goal of increasing importance in recent years. One method employed to do so is the use of masking noises, which consists in emitting a continuous noise signal over a loudspeaker system that conceals the perturbing speech. Studies have shown that while effective, the maskers employed to date normally filtered pink noise are generally poorly accepted by users. The collaborative "Private Workspace" project, within the scope of which this thesis was carried out, attempts to develop a coupled, adaptive noise masking system along with a physical structure to be used for open-plan offices so as to combat these issues. There is evidence to suggest that nature sounds might be more accepted as masker, in part because they can have a visual object that acts as the source for the sound. Direct audio recordings are not recommended for various reasons, and thus the nature sounds must be synthesized. This work done consists of the synthesis of a sound texture to be used as a masker as well as its evaluation. The sound texture is composed of two parts: a wind-like noise synthesized with subtractive synthesis, and a leaf-like noise synthesized through granular synthesis. Different combinations of these two noises produced five variations of the masker, which were evaluated at different levels along with white noise and pink noise using a modified version of an Oldenburger Satztest to test for an affect on speech intelligibility and a questionnaire to asses its subjective acceptance. The goal was to find which of the synthesized noises works best as a speech masker. This thesis first uses a theoretical introduction to establish the basics of sound perception, psychoacoustic masking, and sound texture synthesis. The design of each of the noises, as well as their respective implementations in MATLAB, is explained, followed by the procedures used to evaluate the maskers. The results obtained in the evaluation are analyzed. Lastly, conclusions are drawn and future work is and modifications to the masker are proposed. RESUMEN. La introduccin de las oficinas abiertas en los aos 60 tena como objeto flexibilizar el ambiente laboral, hacerlo ms eficiente y que estuviera ms orientado al trabajo en equipo. Como consecuencia, subi el nivel de ruido de fondo, que no slo dificulta la concentracin, sino que causa problemas fisiolgicos, como el aumento del estrs, adems de reducir la privacidad. Hay estudios que prueban que las conversaciones de fondo en particular tienen un efecto negativo en el nivel de concentracin y disminuyen el rendimiento de los trabajadores. Por lo tanto, reducir la inteligibilidad del habla es uno de los principales objetivos en la actualidad. Un mtodo empleado para hacerlo ha sido el uso de ruido enmascarante, que consiste en reproducir seales continuas de ruido a travs de un sistema de altavoces que enmascare el habla. Aunque diversos estudios demuestran que es un mtodo eficaz, los ruidos utilizados hasta la fecha (normalmente ruido rosa filtrado), no son muy bien aceptados por los usuarios. El proyecto colaborativo "Private Workspace", dentro del cual se engloba el trabajo realizado en este Proyecto Fin de Grado, tiene por objeto desarrollar un sistema de ruido enmascarador acoplado y adaptativo, adems de una estructura fsica, para su uso en oficinas abiertas con el fin de combatir los problemas descritos anteriormente. Existen indicios de que los sonidos naturales son mejor aceptados, en parte porque pueden tener una estructura fsica que simule ser la fuente de los mismos. La utilizacin de grabaciones directas de estos sonidos no est recomendada por varios motivos, y por lo tanto los sonidos naturales deben ser sintetizados. El presente trabajo consiste en la sntesis de una textura de sonido (en ingls sound texture) para ser usada como ruido enmascarador, adems de su evaluacin. La textura est compuesta de dos partes: un sonido de viento sintetizado mediante sntesis sustractiva y un sonido de hojas sintetizado mediante sntesis granular. Diferentes combinaciones de estos dos sonidos producen cinco variaciones de ruido enmascarador. Estos cinco ruidos han sido evaluados a diferentes niveles, junto con ruido blanco y ruido rosa, mediante una versin modificada de un Oldenburger Satztest para comprobar cmo afectan a la inteligibilidad del habla, y mediante un cuestionario para una evaluacin subjetiva de su aceptacin. El objetivo era encontrar qu ruido de los que se han sintetizado funciona mejor como enmascarador del habla. El proyecto consiste en una introduccin terica que establece las bases de la percepcin del sonido, el enmascaramiento psicoacstico, y la sntesis de texturas de sonido. Se explica a continuacin el diseo de cada uno de los ruidos, as como su implementacin en MATLAB. Posteriormente se detallan los procedimientos empleados para evaluarlos. Los resultados obtenidos se analizan y se extraen conclusiones. Por ltimo, se propone un posible trabajo futuro y mejoras al ruido sintetizado.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Several issues concerning the current use of speech interfaces are discussed and the design and development of a speech interface that enables air traffic controllers to command and control their terminals by voice is presented. A special emphasis is made in the comparison between laboratory experiments and field experiments in which a set of ergonomics-related effects are detected that cannot be observed in the controlled laboratory experiments. The paper presents both objective and subjective performance obtained in field evaluation of the system with student controllers at an air traffic control (ATC) training facility. The system exhibits high word recognition test rates (0.4% error in Spanish and 1.5% in English) and low command error (6% error in Spanish and 10.6% error in English in the field tests). Subjective impression has also been positive, encouraging future development and integration phases in the Spanish ATC terminals designed by Aeropuertos Espaoles y Navegacin Area (AENA).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speech Technologies can provide important benefits for the development of more usable and safe in-vehicle human-machine interactive systems (HMIs). However mainly due robustness issues, the use of spoken interaction can entail important distractions to the driver. In this challenging scenario, while speech technologies are evolving, further research is necessary to explore how they can be complemented with both other modalities (multimodality) and information from the increasing number of available sensors (context-awareness). The perceived quality of speech technologies can significantly be increased by implementing such policies, which simply try to make the best use of all the available resources; and the in vehicle scenario is an excellent test-bed for this kind of initiatives. In this contribution we propose an event-based HMI design framework which combines context modelling and multimodal interaction using a W3C XML language known as SCXML. SCXML provides a general process control mechanism that is being considered by W3C to improve both voice interaction (VoiceXML) and multimodal interaction (MMI). In our approach we try to anticipate and extend these initiatives presenting a flexible SCXML-based approach for the design of a wide range of multimodal context-aware HMI in-vehicle interfaces. The proposed framework for HMI design and specification has been implemented in an automotive OSGi service platform, and it is being used and tested in the Spanish research project MARTA for the development of several in-vehicle interactive applications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the proposals that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La cuestin principal abordada en esta tesis doctoral es la mejora de los sistemas biomtricos de reconocimiento de personas a partir de la voz, proponiendo el uso de una nueva parametrizacin, que hemos denominado parametrizacin biomtrica extendida dependiente de gnero (GDEBP en sus siglas en ingls). No se propone una ruptura completa respecto a los parmetros clsicos sino una nueva forma de utilizarlos y complementarlos. En concreto, proponemos el uso de parmetros diferentes dependiendo del gnero del locutor, ya que como es bien sabido, la voz masculina y femenina presentan caractersticas diferentes que debern modelarse, por tanto, de diferente manera. Adems complementamos los parmetros clsicos utilizados (MFFC extrados de la seal de voz), con un nuevo conjunto de parmetros extrados a partir de la deconstruccin de la seal de voz en sus componentes de fuente gltica (ms relacionada con el proceso y rganos de fonacin y por tanto con caractersticas fsicas del locutor) y de tracto vocal (ms relacionada con la articulacin acstica y por tanto con el mensaje emitido). Para verificar la validez de esta propuesta se plantean diversos escenarios, utilizando diferentes bases de datos, para validar que la GDEBP permite generar una descripcin ms precisa de los locutores que los parmetros MFCC clsicos independientes del gnero. En concreto se plantean diferentes escenarios de identificacin sobre texto restringido y texto independiente utilizando las bases de datos de HESPERIA y ALBAYZIN. El trabajo tambin se completa con la participacin en dos competiciones internacionales de reconocimiento de locutor, NIST SRE (2010 y 2012) y MOBIO 2013. En el primer caso debido a la naturaleza de las bases de datos utilizadas se obtuvieron resultados cercanos al estado del arte, mientras que en el segundo de los casos el sistema presentado obtuvo la mejor tasa de reconocimiento para locutores femeninos. A pesar de que el objetivo principal de esta tesis no es el estudio de sistemas de clasificacin, s ha sido necesario analizar el rendimiento de diferentes sistemas de clasificacin, para ver el rendimiento de la parametrizacin propuesta. En concreto, se ha abordado el uso de sistemas de reconocimiento basados en el paradigma GMM-UBM, supervectores e i-vectors. Los resultados que se presentan confirman que la utilizacin de caractersticas que permitan describir los locutores de manera ms precisa es en cierto modo ms importante que la eleccin del sistema de clasificacin utilizado por el sistema. En este sentido la parametrizacin propuesta supone un paso adelante en la mejora de los sistemas de reconocimiento biomtrico de personas por la voz, ya que incluso con sistemas de clasificacin relativamente simples se consiguen tasas de reconocimiento realmente competitivas. ABSTRACT The main question addressed in this thesis is the improvement of automatic speaker recognition systems, by the introduction of a new front-end module that we have called Gender Dependent Extended Biometric Parameterisation (GDEBP). This front-end do not constitute a complete break with respect to classical parameterisation techniques used in speaker recognition but a new way to obtain these parameters while introducing some complementary ones. Specifically, we propose a gender-dependent parameterisation, since as it is well known male and female voices have different characteristic, and therefore the use of different parameters to model these distinguishing characteristics should provide a better characterisation of speakers. Additionally, we propose the introduction of a new set of biometric parameters extracted from the components which result from the deconstruction of the voice into its glottal source estimate (close related to the phonation process and the involved organs, and therefore the physical characteristics of the speaker) and vocal tract estimate (close related to acoustic articulation and therefore to the spoken message). These biometric parameters constitute a complement to the classical MFCC extracted from the power spectral density of speech as a whole. In order to check the validity of this proposal we establish different practical scenarios, using different databases, so we can conclude that a GDEBP generates a more accurate description of speakers than classical approaches based on gender-independent MFCC. Specifically, we propose scenarios based on text-constrain and text-independent test using HESPERIA and ALBAYZIN databases. This work is also completed with the participation in two international speaker recognition evaluations: NIST SRE (2010 and 2012) and MOBIO 2013, with diverse results. In the first case, due to the nature of the NIST databases, we obtain results closed to state-of-the-art although confirming our hypothesis, whereas in the MOBIO SRE we obtain the best simple system performance for female speakers. Although the study of classification systems is beyond the scope of this thesis, we found it necessary to analise the performance of different classification systems, in order to verify the effect of them on the propose parameterisation. In particular, we have addressed the use of speaker recognition systems based on the GMM-UBM paradigm, supervectors and i-vectors. The presented results confirm that the selection of a set of parameters that allows for a more accurate description of the speakers is as important as the selection of the classification method used by the biometric system. In this sense, the proposed parameterisation constitutes a step forward in improving speaker recognition systems, since even when using relatively simple classification systems, really competitive recognition rates are achieved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Quality of Life of a person may depend on early attention to his neurodevel-opment disorders in childhood. Identification of language disorders under the age of six years old can speed up required diagnosis and/or treatment processes. This paper details the enhancement of a Clinical Decision Support System (CDSS) aimed to assist pediatricians and language therapists at early identification and re-ferral of language disorders. The system helps to fine tune the Knowledge Base of Language Delays (KBLD) that was already developed and validated in clinical routine with 146 children. Medical experts supported the construction of Gades CDSS by getting scientific consensus from literature and fifteen years of regis-tered use cases of children with language disorders. The current research focuses on an innovative cooperative model that allows the evolution of the KBLD of Gades through the supervised evaluation of the CDSS learnings with experts feedback. The deployment of the resulting system is being assessed under a mul-tidisciplinary team of seven experts from the fields of speech therapist, neonatol-ogy, pediatrics, and neurology.