944 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To report the audiological outcomes of cochlear implantation in two patients with severe to profound sensorineural hearing loss secondary to superficial siderosis of the CNS and discuss some programming peculiarities that were found in these cases. Retrospective review. Data concerning clinical presentation, diagnosis and audiological assessment pre- and post-implantation were collected of two patients with superficial siderosis of the CNS. Both patients showed good hearing thresholds but variable speech perception outcomes. One patient did not achieve open-set speech recognition, but the other achieved 70% speech recognition in quiet. Electrical compound action potentials could not be elicited in either patient. Map parameters showed the need for increased charge. Electrode impedances showed high longitudinal variability. The implants were fairly beneficial in restoring hearing and improving communication abilities although many reprogramming sessions have been required. The hurdle in programming was the need of frequent adjustments due to the physiologic variations in electrical discharges and neural conduction, besides the changes in the impedances. Patients diagnosed with superficial siderosis may achieve limited results in speech perception scores due to both cochlear and retrocochlear reasons. Careful counseling about the results must be given to the patients and their families before the cochlear implantation indication.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN] In this paper, we present a vascular tree model made with synthetic materials and which allows us to obtain images to make a 3D reconstruction.We have used PVC tubes of several diameters and lengths that will let us evaluate the accuracy of our 3D reconstruction. In order to calibrate the camera we have used a corner detector. Also we have used Optical Flow techniques to follow the points through the images going and going back. We describe two general techniques to extract a sequence of corresponding points from multiple views of an object. The resulting sequence of points will be used later to reconstruct a set of 3D points representing the object surfaces on the scene. We have made the 3D reconstruction choosing by chance a couple of images and we have calculated the projection error. After several repetitions, we have found the best 3D location for the point.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this thesis is to establish a direct relationship between literature and fields of knowledge such as science and technology, by focusing on some concepts that were fundamental for both science and the humanities at the beginning of the 20th century. The concepts are those of simultaneity, multiple points of view, map, relativity and acausality. In the spirit of several recent ideas, for example Katherine Hayles’ isomorphism notion, the dissertation shows how writers such as James Joyce, Virginia Woolf, Thomas Mann and Robert Musil developed the mentioned concepts within their narratives. The working hypothesis is that those concepts were at a crossroad of human activities, and that those authors used them extensively within their narratives. It is further argued that those same concepts – as developed by Joyce in Ulysses, Woolf’s shorts stories and novels from the end of the 1910’s until the end of the1920’s, Mann’s Der Zauberberg (The Magic Mountain), and Musil’s Der Mann ohne Eigenschaften (The Man Without Qualities) — are still fundamental for our conception of time and space today. The thesis is divided into two parts. The first two chapters will analyse the concepts of simultaneity and multiple points of view and their relationship to cartography as developed within English literature and culture. The next two chapters will address the concepts of relativity and acausality, as developed within German literature and culture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The level of improvement in the audiological results of Baha(®) users mainly depends on the patient's preoperative hearing thresholds and the type of Baha sound processor used. This investigation shows correlations between the preoperative hearing threshold and postoperative aided thresholds and audiological results in speech understanding in quiet of 84 Baha users with unilateral conductive hearing loss, bilateral conductive hearing loss and bilateral mixed hearing loss. Secondly, speech understanding in noise of 26 Baha users with different Baha sound processors (Compact, Divino, and BP100) is investigated. Linear regression between aided sound field thresholds and bone conduction (BC) thresholds of the better ear shows highest correlation coefficients and the steepest slope. Differences between better BC thresholds and aided sound field thresholds are smallest for mid-frequencies (1 and 2 kHz) and become larger at 0.5 and 4 kHz. For Baha users, the gain in speech recognition in quiet can be expected to lie in the order of magnitude of the gain in their hearing threshold. Compared to its predecessor sound processors Baha(®) Compact and Baha(®) Divino, Baha(®) BP100 improves speech understanding in noise significantly by +0.9 to +4.6 dB signal-to-noise ratio, depending on the setting and the use of directional microphone. For Baha users with unilateral and bilateral conductive hearing loss and bilateral mixed hearing loss, audiological results in aided sound field thresholds can be estimated with the better BC hearing threshold. The benefit in speech understanding in quiet can be expected to be similar to the gain in their sound field hearing threshold. The most recent technology of Baha sound processor improves speech understanding in noise by an order of magnitude that is well perceived by users and which can be very useful in everyday life.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cognitive functions in the child's brain develop in the context of complex adaptive processes, determined by genetic and environmental factors. Little is known about the cerebral representation of cognitive functions during development. In particular, knowledge about the development of right hemispheric (RH) functions is scarce. Considering the dynamics of brain development, localization and lateralization of cognitive functions must be expected to change with age. Twenty healthy subjects (8.6-20.5 years) were examined with fMRI and neuropsychological tests. All participants completed two fMRI tasks known to activate left hemispheric (LH) regions (language tasks) and two tasks known to involve predominantly RH areas (visual search tasks). A laterality index (LI) was computed to determine the asymmetry of activation. Group analysis revealed unilateral activation of the LH language circuitry during language tasks while visual search tasks induced a more widespread RH activation pattern in frontal, superior temporal, and occipital areas. Laterality of language increased between the ages of 8-20 in frontal (r = 0.392, P = 0.049) and temporal (r = 0.387, P = 0.051) areas. The asymmetry of visual search functions increased in frontal (r = -0.525, P = 0.009) and parietal (r = -0.439, P = 0.027) regions. A positive correlation was found between Verbal-IQ and the LI during a language task (r = 0.585, P = 0.028), while visuospatial skills correlated with LIs of visual search (r = -0.621, P = 0.018). To summarize, cognitive development is accompanied by changes in the functional representation of neuronal circuitries, with a strengthening of lateralization not only for LH but also for RH functions. Our data show that age and performance, independently, account for the increases of laterality with age.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user's memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A popular method for nasolabial rating in unilateral cleft lip and palate (UCLP) is the Asher-McDade system consisting of a 5-point ordinal scale assessing nasal form, nasal symmetry, nasal profile, and vermilion border. The aim of the current study was to identify reference photographs illustrating this scale to facilitate its use.Four observers assessed nasolabial appearance on frontal and profile photographs of the nasolabial area of 42 children of Caucasian origin with a repaired UCLP at age 9 years. Cronbachs alpha, based on the individual scores of the 4 observers, ranged from 0.73 to 0.82 for the 4 nasolabial ratings, indicating a good reliability. The reliability of the overall score (mean of the 4 component scores) was also high (Cronbachs alpha, 0.83). Both for the nasolabial component ratings and for the overall score, duplicate measurement errors were small. The reliability for the mean of the 4 observers' scores was good, Spearman rank correlation coefficients ranging from 0.56 to 0.96.Subsequently, photographs were selected that showed the highest agreement among observers. For each of the 4 components (eg, nasal form, nasal deviation, nasal profile, and shape of the vermilion border), 5 photographs were selected to illustrate the whole range of the scale (score, 1-5), resulting in the selection of 20 pictures.It was concluded that nasolabial appearance rating can be performed reliably using a panel of judges and averaging the scores of all observers. Reference photographs, as developed from this study, may facilitate the rating task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The article presents the design process of intelligent virtual human patients that are used for the enhancement of clinical skills. The description covers the development from conceptualization and character creation to technical components and the application in clinical research and training. The aim is to create believable social interactions with virtual agents that help the clinician to develop skills in symptom and ability assessment, diagnosis, interview techniques and interpersonal communication. The virtual patient fulfills the requirements of a standardized patient producing consistent, reliable and valid interactions in portraying symptoms and behaviour related to a specific clinical condition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Attention deficit/hyperactivity disorder (ADHD) is a highly heritable neurodevelopmental disorder of childhood onset. Clinical and biological evidence points to shared common central nervous system (CNS) pathology of ADHD and restless legs syndrome (RLS). It was hypothesized that variants previously found to be associated with RLS in two large genome-wide association studies (GWA), will also be associated with ADHD. SNPs located in MEIS1 (rs2300478), BTBD9 (rs9296249, rs3923809, rs6923737), and MAP2K5 (rs12593813, rs4489954) as well as three SNPs tagging the identified haplotype in MEIS1 (rs6710341, rs12469063, rs4544423) were genotyped in a well characterized German sample of 224 families comprising one or more affected sibs (386 children) and both parents. We found no evidence for preferential transmission of the hypothesized variants to ADHD. Subsequent analyses elicited nominal significant association with haplotypes consisting of the three SNPs in BTBD9 (chi2 = 14.8, df = 7, nominal p = 0.039). According to exploratory post hoc analyses, the major contribution to this finding came from the A-A-A-haplotype with a haplotype-wise nominal p-value of 0.009. However, this result did not withstand correction for multiple testing. In view of our results, RLS risk alleles may have a lower effect on ADHD than on RLS or may not be involved in ADHD. The negative findings may additionally result from genetic heterogeneity of ADHD, i.e. risk alleles for RLS may only be relevant for certain subtypes of ADHD. Genes relevant to RLS remain interesting candidates for ADHD; particularly BTBD9 needs further study, as it has been related to iron storage, a potential pathophysiological link between RLS and certain subtypes of ADHD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crowdsourcing linguistic phenomena with smartphone applications is relatively new. Apps have been used to train acoustic models for automatic speech recognition (de Vries et al. 2014) and to archive endangered languages (Iwaidja Inyaman Team 2012). Leemann and Kolly (2013) developed a free app for iOS—Dialäkt Äpp (DÄ) (>78k downloads)—to document language change in Swiss German. Here, we present results of sound change based on DÄ data. DÄ predicts the users’ dialects: for 16 variables, users select their dialectal variant. DÄ then tells users which dialect they speak. Underlying this prediction are maps from the Linguistic Atlas of German-speaking Switzerland (SDS, 1962-2003), which documents the linguistic situation around 1950. If predicted wrongly, users indicate their actual dialect. With this information, the 16 variables can be assessed for language change. Results revealed robustness of phonetic variables; lexical and morphological variables were more prone to change. Phonetic variables like to lift (variants: /lupfə, lʏpfə, lipfə/) revealed SDS agreement scores of nearly 85%, i.e., little sound change. Not all phonetic variables are equally robust: ladle (variants: /xælə, xællə, xæuə, xæɫə, xæɫɫə/) exhibited significant sound change. We will illustrate the results using maps that show details of the sound changes at hand.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detección de Actividad de Voz en entornos acústicos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefonía basadas en reconocimiento automático de voz, aplicaciones en sistemas de transcripción automática, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial interés en el estudio de las voces de fondo, principal fuente de error de la mayoría de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de características contiene dos componentes: la energía normalizada y la variación de la energía. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliación del vector de características de partida dotándole así de información espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topologías y, finalmente, 3) estudio e inclusión de nuevas características, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detección, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los objetivos de este proyecto son proporcionar la teoría, los ejercicios y otros recursos necesarios para que los alumnos de la EUIT de Telecomunicación con un nivel A1 en el Marco Común Europeo de Referencia para las Lenguas (MCERL) puedan obtener el nivel A2 en inglés sin necesidad de asistir a clases ni matricularse en cursos presenciales. La plataforma utilizada para conseguir este fin es Moodle, siendo utilizada en la página web de ILLLab. Este curso online sirve para alcanzar los conocimientos requeridos en la asignatura optativa Introduction to English for Professional and Academic Communication I que parte del nivel B1. Se realiza una propuesta de la gramática con sus correspondientes ejemplos y ejercicios basados todos ellos en adaptaciones de actividades publicadas en un corpus de libros de texto. Se añaden recursos (pequeñas lecturas, videos, enlaces) que se consideran apropiados para el tema tratado. Por otro lado, también se persigue solucionar el problema de los cursos de idiomas basados en e-learning ya que no proporcionan las herramientas necesarias para poner en práctica la expresión oral. Para ello, se aporta una aplicación basada en técnicas de reconocimiento de voz, con tres actividades en las que los resultados han de darse de forma hablada y con la correcta pronunciación. Así, se busca dar una base de conocimientos y experiencias prácticas para futuros proyectos basados en herramientas de síntesis y reconocimiento de voz, además de buscar un nuevo enfoque en el estudio de idiomas. Abstract: The objectives of this project are to provide the theory, exercises and other resources for students at the EUIT Telecommunications with A1 level in the Common European Framework of Reference for Languages (MCERL) in order to get A2 level in English without attending face-to-face courses. The platform used to achieve this aim is Moodle, which is currently being used in ILLLab website. This online course is due to attain the knowledge required in the optional subject Introduction to English for Professional and Academic Communication I which is based on the B1 level. It is a proposal of grammar with corresponding examples and exercises all based on adaptations of activities posted on a corpus of textbooks. It also adds resources (short readings, videos or links) that are appropriate for the subject. On the other hand, this project aims to solve the problem of language courses based on e-learning because these do not usually provide the student with the necessary tools to practice speaking. For this, we develop an application based on speech recognition techniques and propose three activities to practice speaking, and pronunciation. The proposal seeks to provide knowledge and practical experience for future projects based on synthesis tools and voice recognition, and means a new approach to e-learning courses for the study of languages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design and development of spoken interaction systems has been a thoroughly studied research scope for the last decades. The aim is to obtain systems with the ability to interact with human agents with a high degree of naturalness and efficiency, allowing them to carry out the actions they desire using speech, as it is the most natural means of communication between humans. To achieve that degree of naturalness, it is not enough to endow systems with the ability to accurately understand the user’s utterances and to properly react to them, even considering the information provided by the user in his or her previous interactions. The system has also to be aware of the evolution of the conditions under which the interaction takes place, in order to act the most coherent way as possible at each moment. Consequently, one of the most important features of the system is that it has to be context-aware. This context awareness of the system can be reflected in the modification of the behaviour of the system taking into account the current situation of the interaction. For instance, the system should decide which action it has to carry out, or the way to perform it, depending on the user that requests it, on the way that the user addresses the system, on the characteristics of the environment in which the interaction takes place, and so on. In other words, the system has to adapt its behaviour to these evolving elements of the interaction. Moreover that adaptation has to be carried out, if possible, in such a way that the user: i) does not perceive that the system has to make any additional effort, or to devote interaction time to perform tasks other than carrying out the requested actions, and ii) does not have to provide the system with any additional information to carry out the adaptation, which could imply a lesser efficiency of the interaction, since users should devote several interactions only to allow the system to become adapted. In the state-of-the-art spoken dialogue systems, researchers have proposed several disparate strategies to adapt the elements of the system to different conditions of the interaction (such as the acoustic characteristics of a specific user’s speech, the actions previously requested, and so on). Nevertheless, to our knowledge there is not any consensus on the procedures to carry out these adaptation. The approaches are to an extent unrelated from one another, in the sense that each one considers different pieces of information, and the treatment of that information is different taking into account the adaptation carried out. In this regard, the main contributions of this Thesis are the following ones: Definition of a contextualization framework. We propose a unified approach that can cover any strategy to adapt the behaviour of a dialogue system to the conditions of the interaction (i.e. the context). In our theoretical definition of the contextualization framework we consider the system’s context as all the sources of variability present at any time of the interaction, either those ones related to the environment in which the interaction takes place, or to the human agent that addresses the system at each moment. Our proposal relies on three aspects that any contextualization approach should fulfill: plasticity (i.e. the system has to be able to modify its behaviour in the most proactive way taking into account the conditions under which the interaction takes place), adaptivity (i.e. the system has also to be able to consider the most appropriate sources of information at each moment, both environmental and user- and dialogue-dependent, to effectively adapt to the conditions aforementioned), and transparency (i.e. the system has to carry out the contextualizaton-related tasks in such a way that the user neither perceives them nor has to do any effort in providing the system with any information that it needs to perform that contextualization). Additionally, we could include a generality aspect to our proposed framework: the main features of the framework should be easy to adopt in any dialogue system, regardless of the solution proposed to manage the dialogue. Once we define the theoretical basis of our contextualization framework, we propose two cases of study on its application in a spoken dialogue system. We focus on two aspects of the interaction: the contextualization of the speech recognition models, and the incorporation of user-specific information into the dialogue flow. One of the modules of a dialogue system that is more prone to be contextualized is the speech recognition system. This module makes use of several models to emit a recognition hypothesis from the user’s speech signal. Generally speaking, a recognition system considers two types of models: an acoustic one (that models each of the phonemes that the recognition system has to consider) and a linguistic one (that models the sequences of words that make sense for the system). In this work we contextualize the language model of the recognition system in such a way that it takes into account the information provided by the user in both his or her current utterance and in the previous ones. These utterances convey information useful to help the system in the recognition of the next utterance. The contextualization approach that we propose consists of a dynamic adaptation of the language model that is used by the recognition system. We carry out this adaptation by means of a linear interpolation between several models. Instead of training the best interpolation weights, we make them dependent on the conditions of the dialogue. In our approach, the system itself will obtain these weights as a function of the reliability of the different elements of information available, such as the semantic concepts extracted from the user’s utterance, the actions that he or she wants to carry out, the information provided in the previous interactions, and so on. One of the aspects more frequently addressed in Human-Computer Interaction research is the inclusion of user specific characteristics into the information structures managed by the system. The idea is to take into account the features that make each user different from the others in order to offer to each particular user different services (or the same service, but in a different way). We could consider this approach as a user-dependent contextualization of the system. In our work we propose the definition of a user model that contains all the information of each user that could be potentially useful to the system at a given moment of the interaction. In particular we will analyze the actions that each user carries out throughout his or her interaction. The objective is to determine which of these actions become the preferences of that user. We represent the specific information of each user as a feature vector. Each of the characteristics that the system will take into account has a confidence score associated. With these elements, we propose a probabilistic definition of a user preference, as the action whose likelihood of being addressed by the user is greater than the one for the rest of actions. To include the user dependent information into the dialogue flow, we modify the information structures on which the dialogue manager relies to retrieve information that could be needed to solve the actions addressed by the user. Usage preferences become another source of contextual information that will be considered by the system towards a more efficient interaction (since the new information source will help to decrease the need of the system to ask users for additional information, thus reducing the number of turns needed to carry out a specific action). To test the benefits of the contextualization framework that we propose, we carry out an evaluation of the two strategies aforementioned. We gather several performance metrics, both objective and subjective, that allow us to compare the improvements of a contextualized system against the baseline one. We will also gather the user’s opinions as regards their perceptions on the behaviour of the system, and its degree of adaptation to the specific features of each interaction. Resumen El diseño y el desarrollo de sistemas de interacción hablada ha sido objeto de profundo estudio durante las pasadas décadas. El propósito es la consecución de sistemas con la capacidad de interactuar con agentes humanos con un alto grado de eficiencia y naturalidad. De esta manera, los usuarios pueden desempeñar las tareas que deseen empleando la voz, que es el medio de comunicación más natural para los humanos. A fin de alcanzar el grado de naturalidad deseado, no basta con dotar a los sistemas de la abilidad de comprender las intervenciones de los usuarios y reaccionar a ellas de manera apropiada (teniendo en consideración, incluso, la información proporcionada en previas interacciones). Adicionalmente, el sistema ha de ser consciente de las condiciones bajo las cuales transcurre la interacción, así como de la evolución de las mismas, de tal manera que pueda actuar de la manera más coherente en cada instante de la interacción. En consecuencia, una de las características primordiales del sistema es que debe ser sensible al contexto. Esta capacidad del sistema de conocer y emplear el contexto de la interacción puede verse reflejada en la modificación de su comportamiento debida a las características actuales de la interacción. Por ejemplo, el sistema debería decidir cuál es la acción más apropiada, o la mejor manera de llevarla a término, dependiendo del usuario que la solicita, del modo en el que lo hace, etcétera. En otras palabras, el sistema ha de adaptar su comportamiento a tales elementos mutables (o dinámicos) de la interacción. Dos características adicionales son requeridas a dicha adaptación: i) el usuario no ha de percibir que el sistema dedica recursos (temporales o computacionales) a realizar tareas distintas a las que aquél le solicita, y ii) el usuario no ha de dedicar esfuerzo alguno a proporcionar al sistema información adicional para llevar a cabo la interacción. Esto último implicaría una menor eficiencia de la interacción, puesto que los usuarios deberían dedicar parte de la misma a proporcionar información al sistema para su adaptación, sin ningún beneficio inmediato. En los sistemas de diálogo hablado propuestos en la literatura, se han propuesto diferentes estrategias para llevar a cabo la adaptación de los elementos del sistema a las diferentes condiciones de la interacción (tales como las características acústicas del habla de un usuario particular, o a las acciones a las que se ha referido con anterioridad). Sin embargo, no existe una estrategia fija para proceder a dicha adaptación, sino que las mismas no suelen guardar una relación entre sí. En este sentido, cada una de ellas tiene en cuenta distintas fuentes de información, la cual es tratada de manera diferente en función de las características de la adaptación buscada. Teniendo en cuenta lo anterior, las contribuciones principales de esta Tesis son las siguientes: Definición de un marco de contextualización. Proponemos un criterio unificador que pueda cubrir cualquier estrategia de adaptación del comportamiento de un sistema de diálogo a las condiciones de la interacción (esto es, el contexto de la misma). En nuestra definición teórica del marco de contextualización consideramos el contexto del sistema como todas aquellas fuentes de variabilidad presentes en cualquier instante de la interacción, ya estén relacionadas con el entorno en el que tiene lugar la interacción, ya dependan del agente humano que se dirige al sistema en cada momento. Nuestra propuesta se basa en tres aspectos que cualquier estrategia de contextualización debería cumplir: plasticidad (es decir, el sistema ha de ser capaz de modificar su comportamiento de la manera más proactiva posible, teniendo en cuenta las condiciones en las que tiene lugar la interacción), adaptabilidad (esto es, el sistema ha de ser capaz de considerar la información oportuna en cada instante, ya dependa del entorno o del usuario, de tal manera que adecúe su comportamiento de manera eficaz a las condiciones mencionadas), y transparencia (que implica que el sistema ha de desarrollar las tareas relacionadas con la contextualización de tal manera que el usuario no perciba la manera en que dichas tareas se llevan a cabo, ni tampoco deba proporcionar al sistema con información adicional alguna). De manera adicional, incluiremos en el marco propuesto el aspecto de la generalidad: las características del marco de contextualización han de ser portables a cualquier sistema de diálogo, con independencia de la solución propuesta en los mismos para gestionar el diálogo. Una vez hemos definido las características de alto nivel de nuestro marco de contextualización, proponemos dos estrategias de aplicación del mismo a un sistema de diálogo hablado. Nos centraremos en dos aspectos de la interacción a adaptar: los modelos empleados en el reconocimiento de habla, y la incorporación de información específica de cada usuario en el flujo de diálogo. Uno de los módulos de un sistema de diálogo más susceptible de ser contextualizado es el sistema de reconocimiento de habla. Este módulo hace uso de varios modelos para generar una hipótesis de reconocimiento a partir de la señal de habla. En general, un sistema de reconocimiento emplea dos tipos de modelos: uno acústico (que modela cada uno de los fonemas considerados por el reconocedor) y uno lingüístico (que modela las secuencias de palabras que tienen sentido desde el punto de vista de la interacción). En este trabajo contextualizamos el modelo lingüístico del reconocedor de habla, de tal manera que tenga en cuenta la información proporcionada por el usuario, tanto en su intervención actual como en las previas. Estas intervenciones contienen información (semántica y/o discursiva) que puede contribuir a un mejor reconocimiento de las subsiguientes intervenciones del usuario. La estrategia de contextualización propuesta consiste en una adaptación dinámica del modelo de lenguaje empleado en el reconocedor de habla. Dicha adaptación se lleva a cabo mediante una interpolación lineal entre diferentes modelos. En lugar de entrenar los mejores pesos de interpolación, proponemos hacer los mismos dependientes de las condiciones actuales de cada diálogo. El propio sistema obtendrá estos pesos como función de la disponibilidad y relevancia de las diferentes fuentes de información disponibles, tales como los conceptos semánticos extraídos a partir de la intervención del usuario, o las acciones que el mismo desea ejecutar. Uno de los aspectos más comúnmente analizados en la investigación de la Interacción Persona-Máquina es la inclusión de las características específicas de cada usuario en las estructuras de información empleadas por el sistema. El objetivo es tener en cuenta los aspectos que diferencian a cada usuario, de tal manera que el sistema pueda ofrecer a cada uno de ellos el servicio más apropiado (o un mismo servicio, pero de la manera más adecuada a cada usuario). Podemos considerar esta estrategia como una contextualización dependiente del usuario. En este trabajo proponemos la definición de un modelo de usuario que contenga toda la información relativa a cada usuario, que pueda ser potencialmente utilizada por el sistema en un momento determinado de la interacción. En particular, analizaremos aquellas acciones que cada usuario decide ejecutar a lo largo de sus diálogos con el sistema. Nuestro objetivo es determinar cuáles de dichas acciones se convierten en las preferencias de cada usuario. La información de cada usuario quedará representada mediante un vector de características, cada una de las cuales tendrá asociado un valor de confianza. Con ambos elementos proponemos una definición probabilística de una preferencia de uso, como aquella acción cuya verosimilitud es mayor que la del resto de acciones solicitadas por el usuario. A fin de incluir la información dependiente de usuario en el flujo de diálogo, llevamos a cabo una modificación de las estructuras de información en las que se apoya el gestor de diálogo para recuperar información necesaria para resolver ciertos diálogos. En dicha modificación las preferencias de cada usuario pasarán a ser una fuente adicional de información contextual, que será tenida en cuenta por el sistema en aras de una interacción más eficiente (puesto que la nueva fuente de información contribuirá a reducir la necesidad del sistema de solicitar al usuario información adicional, dando lugar en consecuencia a una reducción del número de intervenciones necesarias para llevar a cabo una acción determinada). Para determinar los beneficios de las aplicaciones del marco de contextualización propuesto, llevamos a cabo una evaluación de un sistema de diálogo que incluye las estrategias mencionadas. Hemos recogido diversas métricas, tanto objetivas como subjetivas, que nos permiten determinar las mejoras aportadas por un sistema contextualizado en comparación con el sistema sin contextualizar. De igual manera, hemos recogido las opiniones de los participantes en la evaluación acerca de su percepción del comportamiento del sistema, y de su capacidad de adaptación a las condiciones concretas de cada interacción.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo del presente proyecto es proporcionar una actividad de la pronunciación y repaso de vocabulario en lengua inglesa para la plataforma Moodle alojada en la página web de Integrated Language Learning Lab (ILLLab). La página web ILLLab tiene el objetivo de que los alumnos de la EUIT de Telecomunicación de la UPM con un nivel de inglés A2 según el Marco Común Europeo de Referencia para las Lenguas (MCERL), puedan trabajar de manera autónoma para avanzar hacia el nivel B2 en inglés. La UPM exige estos conocimientos de nivel de inglés para cursar la asignatura English for Professional and Academic Communication (EPAC) de carácter obligatorio e impartida en el séptimo semestre del Grado en Ingeniería de Telecomunicaciones. Asimismo, se persigue abordar el problema de las escasas actividades de expresión oral de las plataformas de autoaprendizaje se dedican a la formación en idiomas y, más concretamente, al inglés. Con ese fin, se proporciona una herramienta basada en sistemas de reconocimiento de voz para que el usuario practique la pronunciación de las palabras inglesas. En el primer capítulo del trabajo se introduce la aplicación Traffic Lights, explicando sus orígenes y en qué consiste. En el segundo capítulo se abordan aspectos teóricos relacionados con el reconocimiento de voz y se comenta sus funciones principales y las aplicaciones actuales para las que se usa. El tercer capítulo ofrece una explicación detallada de los diferentes lenguajes utilizados para la realización del proyecto, así como de su código desarrollado. En el cuarto capítulo se plantea un manual de usuario de la aplicación, exponiendo al usuario cómo funciona la aplicación y un ejemplo de uso. Además, se añade varias secciones para el administrador de la aplicación, en las que se especifica cómo agregar nuevas palabras en la base de datos y hacer cambios en el tiempo estimado que el usuario tiene para acabar una partida del juego. ABSTRACT: The objective of the present project is to provide an activity of pronunciation and vocabulary review in English language within the platform Moodle hosted at the Integrated Language Learning Lab (ILLLab) website. The ILLLab website has the aim to provide students at the EUIT of Telecommunication in the UPM with activities to develop their A2 level according to the Common European Framework of Reference for Languages (CEFR). In the platform, students can work independently to advance towards a B2 level in English. The UPM requires this level of English proficiency for enrolling in the compulsory subject English for Professional and Academic Communication (EPAC) taught in the seventh semester of the Degree in Telecommunications Engineering. Likewise, this project tries to provide alternatives to solve the problem of scarce speaking activities included in the learning platforms that offer language courses, and specifically, English language courses. For this purpose, it provides a tool based on speech recognition systems so that the user can practice the pronunciation of English words. The first chapter of the project introduces the application Traffic Lights, explaining its origins and what it is. The second chapter deals with theoretical aspects related with speech recognition and comments their main features and current applications for which it is generally used. The third chapter provides a detailed explanation of the different programming languages used for the implementation of the project and reviews its code development. The fourth chapter presents an application user manual, exposing to the user how the application works and an example of use. Also, several sections are added addressed to the application administrator, which specify how to add new words to the database and how to make changes in the original stings as could be the estimated time that the user has to finish the game.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.