908 resultados para recognition system
Resumo:
Pós-graduação em Química - IBILCE
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Sistemas de reconhecimento e síntese de voz são constituídos por módulos que dependem da língua e, enquanto existem muitos recursos públicos para alguns idiomas (p.e. Inglês e Japonês), os recursos para Português Brasileiro (PB) ainda são escassos. Outro aspecto é que, para um grande número de tarefas, a taxa de erro dos sistemas de reconhecimento de voz atuais ainda é elevada, quando comparada à obtida por seres humanos. Assim, apesar do sucesso das cadeias escondidas de Markov (HMM), é necessária a pesquisa por novos métodos. Este trabalho tem como motivação esses dois fatos e se divide em duas partes. A primeira descreve o desenvolvimento de recursos e ferramentas livres para reconhecimento e síntese de voz em PB, consistindo de bases de dados de áudio e texto, um dicionário fonético, um conversor grafema-fone, um separador silábico e modelos acústico e de linguagem. Todos os recursos construídos encontram-se publicamente disponíveis e, junto com uma interface de programação proposta, têm sido usados para o desenvolvimento de várias novas aplicações em tempo-real, incluindo um módulo de reconhecimento de voz para a suíte de aplicativos para escritório OpenOffice.org. São apresentados testes de desempenho dos sistemas desenvolvidos. Os recursos aqui produzidos e disponibilizados facilitam a adoção da tecnologia de voz para PB por outros grupos de pesquisa, desenvolvedores e pela indústria. A segunda parte do trabalho apresenta um novo método para reavaliar (rescoring) o resultado do reconhecimento baseado em HMMs, o qual é organizado em uma estrutura de dados do tipo lattice. Mais especificamente, o sistema utiliza classificadores discriminativos que buscam diminuir a confusão entre pares de fones. Para cada um desses problemas binários, são usadas técnicas de seleção automática de parâmetros para escolher a representaçãao paramétrica mais adequada para o problema em questão.
Resumo:
Este trabalho visa propor uma solução contendo um sistema de reconhecimento de fala automático em nuvem. Dessa forma, não há necessidade de um reconhecedor sendo executado na própria máquina cliente, pois o mesmo estará disponível através da Internet. Além do reconhecimento automático de voz em nuvem, outra vertente deste trabalho é alta disponibilidade. A importância desse tópico se d´a porque o ambiente servidor onde se planeja executar o reconhecimento em nuvem não pode ficar indisponível ao usuário. Dos vários aspectos que requerem robustez, tal como a própria conexão de Internet, o escopo desse trabalho foi definido como os softwares livres que permitem a empresas aumentarem a disponibilidade de seus serviços. Dentre os resultados alcançados e para as condições simuladas, mostrou-se que o reconhecedor de voz em nuvem desenvolvido pelo grupo atingiu um desempenho próximo ao do Google.
Resumo:
In many movies of scientific fiction, machines were capable of speaking with humans. However mankind is still far away of getting those types of machines, like the famous character C3PO of Star Wars. During the last six decades the automatic speech recognition systems have been the target of many studies. Throughout these years many technics were developed to be used in applications of both software and hardware. There are many types of automatic speech recognition system, among which the one used in this work were the isolated word and independent of the speaker system, using Hidden Markov Models as the recognition system. The goals of this work is to project and synthesize the first two steps of the speech recognition system, the steps are: the speech signal acquisition and the pre-processing of the signal. Both steps were developed in a reprogrammable component named FPGA, using the VHDL hardware description language, owing to the high performance of this component and the flexibility of the language. In this work it is presented all the theory of digital signal processing, as Fast Fourier Transforms and digital filters and also all the theory of speech recognition using Hidden Markov Models and LPC processor. It is also presented all the results obtained for each one of the blocks synthesized e verified in hardware
Resumo:
Pós-graduação em Química - IQ
Resumo:
Plant mines are structures with the form of a cavity caused by consumption of host plant tissue by the insect's miner larvae. Plant mines are more common in leaves, but in Cipocereus minensis, a species in which the leaves are modified spines, the miner activity is restricted to the stem. The aim of this paper was to document the morphological and anatomical differences in the infected and uninfected stems of C. minensis due to the feeding habit of the mining agent. Fresh tissue samples of non-mined and mined young stem of C minensis were collected and examined in transverse sections. We hypothesize that the infection begins following mating when the females scratch the surface of the stem or while they feed on fruits and lay eggs, which subsequently develop into larvae, invading the cactus stem. The insect's miner larvae had mostly consumed the parenchyma tissue towards the center of the stem, and periderm formed along the entire path of the insect. This meristematic tissue or "wound periderm" is a common response for compartmentalization to isolate the damaged tissue, in this case the incubating chamber, in which the eggs will be placed. There were no signs of consumption of vascular tissue in the infested samples, further suggesting a compartmentalized infestation. The nest chamber was found in the stem pith region, with periderm surrounding an insect's miner pupa inside identified as a member of the Cerambycidae. The mining insect depends on a host plant to complete the life cycle; however, the nature of this partnership and the long-term effects of the insect on the plant tissue are unknown. The complex mechanisms by which herbivorous insects control the morphogenesis of the plant host are discussed. We propose that C. minensis has a recognition system to identify insect attack and evaluate the effectiveness of early response triggering compartmentalized defense mechanisms by protecting the injured area with a new layer of periderm.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Ambient Intelligence (AmI) envisions a world where smart, electronic environments are aware and responsive to their context. People moving into these settings engage many computational devices and systems simultaneously even if they are not aware of their presence. AmI stems from the convergence of three key technologies: ubiquitous computing, ubiquitous communication and natural interfaces. The dependence on a large amount of fixed and mobile sensors embedded into the environment makes of Wireless Sensor Networks one of the most relevant enabling technologies for AmI. WSN are complex systems made up of a number of sensor nodes, simple devices that typically embed a low power computational unit (microcontrollers, FPGAs etc.), a wireless communication unit, one or more sensors and a some form of energy supply (either batteries or energy scavenger modules). Low-cost, low-computational power, low energy consumption and small size are characteristics that must be taken into consideration when designing and dealing with WSNs. In order to handle the large amount of data generated by a WSN several multi sensor data fusion techniques have been developed. The aim of multisensor data fusion is to combine data to achieve better accuracy and inferences than could be achieved by the use of a single sensor alone. In this dissertation we present our results in building several AmI applications suitable for a WSN implementation. The work can be divided into two main areas: Multimodal Surveillance and Activity Recognition. Novel techniques to handle data from a network of low-cost, low-power Pyroelectric InfraRed (PIR) sensors are presented. Such techniques allow the detection of the number of people moving in the environment, their direction of movement and their position. We discuss how a mesh of PIR sensors can be integrated with a video surveillance system to increase its performance in people tracking. Furthermore we embed a PIR sensor within the design of a Wireless Video Sensor Node (WVSN) to extend its lifetime. Activity recognition is a fundamental block in natural interfaces. A challenging objective is to design an activity recognition system that is able to exploit a redundant but unreliable WSN. We present our activity in building a novel activity recognition architecture for such a dynamic system. The architecture has a hierarchical structure where simple nodes performs gesture classification and a high level meta classifiers fuses a changing number of classifier outputs. We demonstrate the benefit of such architecture in terms of increased recognition performance, and fault and noise robustness. Furthermore we show how we can extend network lifetime by performing a performance-power trade-off. Smart objects can enhance user experience within smart environments. We present our work in extending the capabilities of the Smart Micrel Cube (SMCube), a smart object used as tangible interface within a tangible computing framework, through the development of a gesture recognition algorithm suitable for this limited computational power device. Finally the development of activity recognition techniques can greatly benefit from the availability of shared dataset. We report our experience in building a dataset for activity recognition. Such dataset is freely available to the scientific community for research purposes and can be used as a testbench for developing, testing and comparing different activity recognition techniques.
Resumo:
Physicochemical experimental techniques combined with the specificity of a biological recognition system have resulted in a variety of new analytical devices known as biosensors. Biosensors are under intensive development worldwide because they have many potential applications, e.g. in the fields of clinical diagnostics, food analysis, and environmental monitoring. Much effort is spent on the development of highly sensitive sensor platforms to study interactions on the molecular scale. In the first part, this thesis focuses on exploiting the biosensing application of nanoporous gold (NPG) membranes. NPG with randomly distributed nanopores (pore sizes less than 50 nm) will be discussed here. The NPG membrane shows unique plasmonic features, i.e. it supports both propagating and localized surface plasmon resonance modes (p SPR and l-SPR, respectively), both offering sensitive probing of the local refractive index variation on/in NPG. Surface refractive index sensors have an inherent advantage over fluorescence optical biosensors that require a chromophoric group or other luminescence label to transduce the binding event. In the second part, gold/silica composite inverse opals with macroporous structures were investigated with bio- or chemical sensing applications in mind. These samples combined the advantages of a larger available gold surface area with a regular and highly ordered grating structure. The signal of the plasmon was less noisy in these ordered substrate structures compared to the random pore structures of the NPG samples. In the third part of the thesis, surface plasmon resonance (SPR) spectroscopy was applied to probe the protein-protein interaction of the calcium binding protein centrin with the heterotrimeric G-protein transducin on a newly designed sensor platform. SPR spectroscopy was intended to elucidate how the binding of centrin to transducin is regulated towards understanding centrin functions in photoreceptor cells.
Resumo:
Este Proyecto Fin de Carrera trata sobre el reconocimiento e identificación de caracteres de matrículas de automóviles. Este tipo de sistemas de reconocimiento también se los conoce mundialmente como sistemas ANPR ("Automatic Number Plate Recognition") o LPR ("License Plate Recognition"). La gran cantidad de vehículos y logística que se mueve cada segundo por todo el planeta, hace necesaria su registro para su tratamiento y control. Por ello, es necesario implementar un sistema que pueda identificar correctamente estos recursos, para su posterior procesado, construyendo así una herramienta útil, ágil y dinámica. El presente trabajo ha sido estructurado en varias partes. La primera de ellas nos muestra los objetivos y las motivaciones que se persiguen con la realización de este proyecto. En la segunda, se abordan y desarrollan todos los diferentes procesos teóricos y técnicos, así como matemáticos, que forman un sistema ANPR común, con el fin de implementar una aplicación práctica que pueda demostrar la utilidad de estos en cualquier situación. En la tercera, se desarrolla esa parte práctica en la que se apoya la base teórica del trabajo. En ésta se describen y desarrollan los diversos algoritmos, creados con el fin de estudiar y comprobar todo lo planteado hasta ahora, así como observar su comportamiento. Se implementan varios procesos característicos del reconocimiento de caracteres y patrones, como la detección de áreas o patrones, rotado y transformación de imágenes, procesos de detección de bordes, segmentación de caracteres y patrones, umbralización y normalización, extracción de características y patrones, redes neuronales, y finalmente el reconocimiento óptico de caracteres o comúnmente conocido como OCR. La última parte refleja los resultados obtenidos a partir del sistema de reconocimiento de caracteres implementado para el trabajo y se exponen las conclusiones extraídas a partir de éste. Finalmente se plantean las líneas futuras de mejora, desarrollo e investigación, para poder realizar un sistema más eficiente y global. This Thesis deals about license plate characters recognition and identification. These kinds of systems are also known worldwide as ANPR systems ("Automatic Number Plate Recognition") or LPR ("License Plate Recognition"). The great number of vehicles and logistics moving every second all over the world, requires a registration for treatment and control. Thereby, it’s therefore necessary to implement a system that can identify correctly these resources, for further processing, thus building a useful, flexible and dynamic tool. This work has been structured into several parts. The first one shows the objectives and motivations attained by the completion of this project. In the second part, it’s developed all the different theoretical and technical processes, forming a common ANPR system in order to implement a practical application that can demonstrate the usefulness of these ones on any situation. In the third, the practical part is developed, which is based on the theoretical work. In this one are described and developed various algorithms, created to study and verify all the questions until now suggested, and complain the behavior of these systems. Several recognition of characters and patterns characteristic processes are implemented, such as areas or patterns detection, image rotation and transformation, edge detection processes, patterns and character segmentation, thresholding and normalization, features and patterns extraction, neural networks, and finally the optical character recognition or commonly known like OCR. The last part shows the results obtained from the character recognition system implemented for this thesis and the outlines conclusions drawn from it. Finally, future lines of improvement, research and development are proposed, in order to make a more efficient and comprehensive system.
Resumo:
The design and development of spoken interaction systems has been a thoroughly studied research scope for the last decades. The aim is to obtain systems with the ability to interact with human agents with a high degree of naturalness and efficiency, allowing them to carry out the actions they desire using speech, as it is the most natural means of communication between humans. To achieve that degree of naturalness, it is not enough to endow systems with the ability to accurately understand the user’s utterances and to properly react to them, even considering the information provided by the user in his or her previous interactions. The system has also to be aware of the evolution of the conditions under which the interaction takes place, in order to act the most coherent way as possible at each moment. Consequently, one of the most important features of the system is that it has to be context-aware. This context awareness of the system can be reflected in the modification of the behaviour of the system taking into account the current situation of the interaction. For instance, the system should decide which action it has to carry out, or the way to perform it, depending on the user that requests it, on the way that the user addresses the system, on the characteristics of the environment in which the interaction takes place, and so on. In other words, the system has to adapt its behaviour to these evolving elements of the interaction. Moreover that adaptation has to be carried out, if possible, in such a way that the user: i) does not perceive that the system has to make any additional effort, or to devote interaction time to perform tasks other than carrying out the requested actions, and ii) does not have to provide the system with any additional information to carry out the adaptation, which could imply a lesser efficiency of the interaction, since users should devote several interactions only to allow the system to become adapted. In the state-of-the-art spoken dialogue systems, researchers have proposed several disparate strategies to adapt the elements of the system to different conditions of the interaction (such as the acoustic characteristics of a specific user’s speech, the actions previously requested, and so on). Nevertheless, to our knowledge there is not any consensus on the procedures to carry out these adaptation. The approaches are to an extent unrelated from one another, in the sense that each one considers different pieces of information, and the treatment of that information is different taking into account the adaptation carried out. In this regard, the main contributions of this Thesis are the following ones: Definition of a contextualization framework. We propose a unified approach that can cover any strategy to adapt the behaviour of a dialogue system to the conditions of the interaction (i.e. the context). In our theoretical definition of the contextualization framework we consider the system’s context as all the sources of variability present at any time of the interaction, either those ones related to the environment in which the interaction takes place, or to the human agent that addresses the system at each moment. Our proposal relies on three aspects that any contextualization approach should fulfill: plasticity (i.e. the system has to be able to modify its behaviour in the most proactive way taking into account the conditions under which the interaction takes place), adaptivity (i.e. the system has also to be able to consider the most appropriate sources of information at each moment, both environmental and user- and dialogue-dependent, to effectively adapt to the conditions aforementioned), and transparency (i.e. the system has to carry out the contextualizaton-related tasks in such a way that the user neither perceives them nor has to do any effort in providing the system with any information that it needs to perform that contextualization). Additionally, we could include a generality aspect to our proposed framework: the main features of the framework should be easy to adopt in any dialogue system, regardless of the solution proposed to manage the dialogue. Once we define the theoretical basis of our contextualization framework, we propose two cases of study on its application in a spoken dialogue system. We focus on two aspects of the interaction: the contextualization of the speech recognition models, and the incorporation of user-specific information into the dialogue flow. One of the modules of a dialogue system that is more prone to be contextualized is the speech recognition system. This module makes use of several models to emit a recognition hypothesis from the user’s speech signal. Generally speaking, a recognition system considers two types of models: an acoustic one (that models each of the phonemes that the recognition system has to consider) and a linguistic one (that models the sequences of words that make sense for the system). In this work we contextualize the language model of the recognition system in such a way that it takes into account the information provided by the user in both his or her current utterance and in the previous ones. These utterances convey information useful to help the system in the recognition of the next utterance. The contextualization approach that we propose consists of a dynamic adaptation of the language model that is used by the recognition system. We carry out this adaptation by means of a linear interpolation between several models. Instead of training the best interpolation weights, we make them dependent on the conditions of the dialogue. In our approach, the system itself will obtain these weights as a function of the reliability of the different elements of information available, such as the semantic concepts extracted from the user’s utterance, the actions that he or she wants to carry out, the information provided in the previous interactions, and so on. One of the aspects more frequently addressed in Human-Computer Interaction research is the inclusion of user specific characteristics into the information structures managed by the system. The idea is to take into account the features that make each user different from the others in order to offer to each particular user different services (or the same service, but in a different way). We could consider this approach as a user-dependent contextualization of the system. In our work we propose the definition of a user model that contains all the information of each user that could be potentially useful to the system at a given moment of the interaction. In particular we will analyze the actions that each user carries out throughout his or her interaction. The objective is to determine which of these actions become the preferences of that user. We represent the specific information of each user as a feature vector. Each of the characteristics that the system will take into account has a confidence score associated. With these elements, we propose a probabilistic definition of a user preference, as the action whose likelihood of being addressed by the user is greater than the one for the rest of actions. To include the user dependent information into the dialogue flow, we modify the information structures on which the dialogue manager relies to retrieve information that could be needed to solve the actions addressed by the user. Usage preferences become another source of contextual information that will be considered by the system towards a more efficient interaction (since the new information source will help to decrease the need of the system to ask users for additional information, thus reducing the number of turns needed to carry out a specific action). To test the benefits of the contextualization framework that we propose, we carry out an evaluation of the two strategies aforementioned. We gather several performance metrics, both objective and subjective, that allow us to compare the improvements of a contextualized system against the baseline one. We will also gather the user’s opinions as regards their perceptions on the behaviour of the system, and its degree of adaptation to the specific features of each interaction. Resumen El diseño y el desarrollo de sistemas de interacción hablada ha sido objeto de profundo estudio durante las pasadas décadas. El propósito es la consecución de sistemas con la capacidad de interactuar con agentes humanos con un alto grado de eficiencia y naturalidad. De esta manera, los usuarios pueden desempeñar las tareas que deseen empleando la voz, que es el medio de comunicación más natural para los humanos. A fin de alcanzar el grado de naturalidad deseado, no basta con dotar a los sistemas de la abilidad de comprender las intervenciones de los usuarios y reaccionar a ellas de manera apropiada (teniendo en consideración, incluso, la información proporcionada en previas interacciones). Adicionalmente, el sistema ha de ser consciente de las condiciones bajo las cuales transcurre la interacción, así como de la evolución de las mismas, de tal manera que pueda actuar de la manera más coherente en cada instante de la interacción. En consecuencia, una de las características primordiales del sistema es que debe ser sensible al contexto. Esta capacidad del sistema de conocer y emplear el contexto de la interacción puede verse reflejada en la modificación de su comportamiento debida a las características actuales de la interacción. Por ejemplo, el sistema debería decidir cuál es la acción más apropiada, o la mejor manera de llevarla a término, dependiendo del usuario que la solicita, del modo en el que lo hace, etcétera. En otras palabras, el sistema ha de adaptar su comportamiento a tales elementos mutables (o dinámicos) de la interacción. Dos características adicionales son requeridas a dicha adaptación: i) el usuario no ha de percibir que el sistema dedica recursos (temporales o computacionales) a realizar tareas distintas a las que aquél le solicita, y ii) el usuario no ha de dedicar esfuerzo alguno a proporcionar al sistema información adicional para llevar a cabo la interacción. Esto último implicaría una menor eficiencia de la interacción, puesto que los usuarios deberían dedicar parte de la misma a proporcionar información al sistema para su adaptación, sin ningún beneficio inmediato. En los sistemas de diálogo hablado propuestos en la literatura, se han propuesto diferentes estrategias para llevar a cabo la adaptación de los elementos del sistema a las diferentes condiciones de la interacción (tales como las características acústicas del habla de un usuario particular, o a las acciones a las que se ha referido con anterioridad). Sin embargo, no existe una estrategia fija para proceder a dicha adaptación, sino que las mismas no suelen guardar una relación entre sí. En este sentido, cada una de ellas tiene en cuenta distintas fuentes de información, la cual es tratada de manera diferente en función de las características de la adaptación buscada. Teniendo en cuenta lo anterior, las contribuciones principales de esta Tesis son las siguientes: Definición de un marco de contextualización. Proponemos un criterio unificador que pueda cubrir cualquier estrategia de adaptación del comportamiento de un sistema de diálogo a las condiciones de la interacción (esto es, el contexto de la misma). En nuestra definición teórica del marco de contextualización consideramos el contexto del sistema como todas aquellas fuentes de variabilidad presentes en cualquier instante de la interacción, ya estén relacionadas con el entorno en el que tiene lugar la interacción, ya dependan del agente humano que se dirige al sistema en cada momento. Nuestra propuesta se basa en tres aspectos que cualquier estrategia de contextualización debería cumplir: plasticidad (es decir, el sistema ha de ser capaz de modificar su comportamiento de la manera más proactiva posible, teniendo en cuenta las condiciones en las que tiene lugar la interacción), adaptabilidad (esto es, el sistema ha de ser capaz de considerar la información oportuna en cada instante, ya dependa del entorno o del usuario, de tal manera que adecúe su comportamiento de manera eficaz a las condiciones mencionadas), y transparencia (que implica que el sistema ha de desarrollar las tareas relacionadas con la contextualización de tal manera que el usuario no perciba la manera en que dichas tareas se llevan a cabo, ni tampoco deba proporcionar al sistema con información adicional alguna). De manera adicional, incluiremos en el marco propuesto el aspecto de la generalidad: las características del marco de contextualización han de ser portables a cualquier sistema de diálogo, con independencia de la solución propuesta en los mismos para gestionar el diálogo. Una vez hemos definido las características de alto nivel de nuestro marco de contextualización, proponemos dos estrategias de aplicación del mismo a un sistema de diálogo hablado. Nos centraremos en dos aspectos de la interacción a adaptar: los modelos empleados en el reconocimiento de habla, y la incorporación de información específica de cada usuario en el flujo de diálogo. Uno de los módulos de un sistema de diálogo más susceptible de ser contextualizado es el sistema de reconocimiento de habla. Este módulo hace uso de varios modelos para generar una hipótesis de reconocimiento a partir de la señal de habla. En general, un sistema de reconocimiento emplea dos tipos de modelos: uno acústico (que modela cada uno de los fonemas considerados por el reconocedor) y uno lingüístico (que modela las secuencias de palabras que tienen sentido desde el punto de vista de la interacción). En este trabajo contextualizamos el modelo lingüístico del reconocedor de habla, de tal manera que tenga en cuenta la información proporcionada por el usuario, tanto en su intervención actual como en las previas. Estas intervenciones contienen información (semántica y/o discursiva) que puede contribuir a un mejor reconocimiento de las subsiguientes intervenciones del usuario. La estrategia de contextualización propuesta consiste en una adaptación dinámica del modelo de lenguaje empleado en el reconocedor de habla. Dicha adaptación se lleva a cabo mediante una interpolación lineal entre diferentes modelos. En lugar de entrenar los mejores pesos de interpolación, proponemos hacer los mismos dependientes de las condiciones actuales de cada diálogo. El propio sistema obtendrá estos pesos como función de la disponibilidad y relevancia de las diferentes fuentes de información disponibles, tales como los conceptos semánticos extraídos a partir de la intervención del usuario, o las acciones que el mismo desea ejecutar. Uno de los aspectos más comúnmente analizados en la investigación de la Interacción Persona-Máquina es la inclusión de las características específicas de cada usuario en las estructuras de información empleadas por el sistema. El objetivo es tener en cuenta los aspectos que diferencian a cada usuario, de tal manera que el sistema pueda ofrecer a cada uno de ellos el servicio más apropiado (o un mismo servicio, pero de la manera más adecuada a cada usuario). Podemos considerar esta estrategia como una contextualización dependiente del usuario. En este trabajo proponemos la definición de un modelo de usuario que contenga toda la información relativa a cada usuario, que pueda ser potencialmente utilizada por el sistema en un momento determinado de la interacción. En particular, analizaremos aquellas acciones que cada usuario decide ejecutar a lo largo de sus diálogos con el sistema. Nuestro objetivo es determinar cuáles de dichas acciones se convierten en las preferencias de cada usuario. La información de cada usuario quedará representada mediante un vector de características, cada una de las cuales tendrá asociado un valor de confianza. Con ambos elementos proponemos una definición probabilística de una preferencia de uso, como aquella acción cuya verosimilitud es mayor que la del resto de acciones solicitadas por el usuario. A fin de incluir la información dependiente de usuario en el flujo de diálogo, llevamos a cabo una modificación de las estructuras de información en las que se apoya el gestor de diálogo para recuperar información necesaria para resolver ciertos diálogos. En dicha modificación las preferencias de cada usuario pasarán a ser una fuente adicional de información contextual, que será tenida en cuenta por el sistema en aras de una interacción más eficiente (puesto que la nueva fuente de información contribuirá a reducir la necesidad del sistema de solicitar al usuario información adicional, dando lugar en consecuencia a una reducción del número de intervenciones necesarias para llevar a cabo una acción determinada). Para determinar los beneficios de las aplicaciones del marco de contextualización propuesto, llevamos a cabo una evaluación de un sistema de diálogo que incluye las estrategias mencionadas. Hemos recogido diversas métricas, tanto objetivas como subjetivas, que nos permiten determinar las mejoras aportadas por un sistema contextualizado en comparación con el sistema sin contextualizar. De igual manera, hemos recogido las opiniones de los participantes en la evaluación acerca de su percepción del comportamiento del sistema, y de su capacidad de adaptación a las condiciones concretas de cada interacción.
Resumo:
Inicio del desarrollo de un algoritmo eficiente orientado a dispositivos con baja capacidad de proceso, que ayude a personas sin necesariamente una preparación adecuada a llevar a cabo un proceso de toma de una señal biológica, como puede ser un electrocardiograma. La aplicación deberá, por tanto, asesorar en la toma de la señal al usuario, evaluar la calidad de la grabación obtenida, y en tiempo seudo real, comprobar si la calidad de la señal obtenida es suficientemente buena para su posterior diagnóstico, de tal modo que en caso de que sea necesaria una repetición de la prueba médica, esta pueda realizarse de inmediato. Además, el algoritmo debe extraer las características más relevantes de la señal electrocardiográfica, procesarlas, y obtener una serie de patrones significativos que permitan la orientación a la diagnosis de algunas de las patologías más comunes que se puedan extraer de la información de las señales cardíacas. Para la extracción, evaluación y toma de decisiones de este proceso previo a la generación del diagnóstico, se seguirá la arquitectura clásica de un sistema de detección de patrones, definiendo las clases que sean necesarias según el número de patologías que se deseen identificar. Esta información de diagnosis, obtenida mediante la identificación del sistema de reconocimiento de patrones, podría ser de ayuda u orientación para la posterior revisión de la prueba por parte de un profesional médico cualificado y de manera remota, evitando así el desplazamiento del mismo a zonas donde, por los medios existentes a día de hoy, es muy remota la posibilidad de presencia de personal sanitario. ABTRACT Start of development of an efficient algorithm designed to devices with low processing power, which could help people without adequate preparation to undertake a process of taking a biological signal, such as an electrocardiogram. Therefore, the application must assist the user in taking the signal and evaluating the quality of the recording. All of this must to be in live time. It must to check the quality of the signal obtained, and if is it necessary a repetition of the test, this could be done immediately. Furthermore, the algorithm must extract the most relevant features of the ECG signal, process it, and get meaningful patterns that allow to a diagnosis orientation of some of the more common diseases that can be drawn from the cardiac signal information. For the extraction, evaluation and decision making in this previous process to the generation of diagnosis, we will follow the classic architecture of a pattern recognition system, defining the necessary classes according to the number of pathologies that we wish to identify. This diagnostic information obtained by identifying the pattern recognition system could be for help or guidance for further review of the signal by a qualified medical professional, and it could be done remotely, thus avoiding the movements to areas where nowadays it is extremely unlikely to place any health staff, due to the poor economic condition.
Resumo:
El objetivo principal alrededor del cual se desenvuelve este proyecto es el desarrollo de un sistema de reconocimiento facial. Entre sus objetivos específicos se encuentran: realizar una primera aproximación sobre las técnicas de reconocimiento facial existentes en la actualidad, elegir una aplicación donde pueda ser útil el reconocimiento facial, diseñar y desarrollar un programa en MATLAB que lleve a cabo la función de reconocimiento facial, y evaluar el funcionamiento del sistema desarrollado. Este documento se encuentra dividido en cuatro partes: INTRODUCCIÓN, MARCO TEÓRICO, IMPLEMENTACIÓN, y RESULTADOS, CONCLUSIONES Y LÍNEAS FUTURAS. En la primera parte, se hace una introducción relativa a la actualidad del reconocimiento facial y se comenta brevemente sobre las técnicas existentes para desarrollar un sistema biométrico de este tipo. En ella se justifican también aquellas técnicas que acabaron formando parte de la implementación. En la segunda parte, el marco teórico, se explica la estructura general que tiene un sistema de reconocimiento biométrico, así como sus modos de funcionamiento, y las tasas de error utilizadas para evaluar y comparar su rendimiento. Así mismo, se lleva a cabo una descripción más profunda sobre los conceptos y métodos utilizados para efectuar la detección y reconocimiento facial en la tercera parte del proyecto. La tercera parte abarca una descripción detallada de la solución propuesta. En ella se explica el diseño, características y aplicación de la implementación; que trata de un programa elaborado en MATLAB con interfaz gráfica, y que utiliza cuatro sistemas de reconocimiento facial, basados cada uno en diferentes técnicas: Análisis por componentes principales, análisis lineal discriminante, wavelets de Gabor, y emparejamiento de grafos elásticos. El programa ofrece además la capacidad de crear y editar una propia base de datos con etiquetas, dándole aplicación directa sobre el tema que se trata. Se proponen además una serie de características con el objetivo de ampliar y mejorar las funcionalidades del programa diseñado. Dentro de dichas características destaca la propuesta de un modo de verificación híbrido aplicable a cualquier rama de la biometría y un programa de evaluación capaz de medir, graficar, y comparar las configuraciones de cada uno de los sistemas de reconocimiento implementados. Otra característica destacable es la herramienta programada para la creación de grafos personalizados y generación de modelos, aplicable a reconocimiento de objetos en general. En la cuarta y última parte, se presentan al principio los resultados obtenidos. En ellos se contemplan y analizan las comparaciones entre las distintas configuraciones de los sistemas de reconocimiento implementados para diferentes bases de datos (una de ellas formada con imágenes con condiciones de adquisición no controladas). También se miden las tasas de error del modo de verificación híbrido propuesto. Finalmente, se extraen conclusiones, y se proponen líneas futuras de investigación. ABSTRACT The main goal of this project is to develop a facial recognition system. To meet this end, it was necessary to accomplish a series of specific objectives, which were: researching on the existing face recognition technics nowadays, choosing an application where face recognition might be useful, design and develop a face recognition system using MATLAB, and measure the performance of the implemented system. This document is divided into four parts: INTRODUCTION, THEORTICAL FRAMEWORK, IMPLEMENTATION, and RESULTS, CONCLUSSIONS AND FUTURE RESEARCH STUDIES. In the first part, an introduction is made in relation to facial recognition nowadays, and the techniques used to develop a biometric system of this kind. Furthermore, the techniques chosen to be part of the implementation are justified. In the second part, the general structure and the two basic modes of a biometric system are explained. The error rates used to evaluate and compare the performance of a biometric system are explained as well. Moreover, a description of the concepts and methods used to detect and recognize faces in the third part is made. The design, characteristics, and applications of the systems put into practice are explained in the third part. The implementation consists in developing a program with graphical user interface made in MATLAB. This program uses four face recognition systems, each of them based on a different technique: Principal Component Analysis (PCA), Fisher’s Linear Discriminant (FLD), Gabor wavelets, and Elastic Graph Matching (EGM). In addition, with this implementation it is possible to create and edit one´s tagged database, giving it a direct application. Also, a group of characteristics are proposed to enhance the functionalities of the program designed. Among these characteristics, three of them should be emphasized in this summary: A proposal of an hybrid verification mode of a biometric system; and an evaluation program capable of measuring, plotting curves, and comparing different configurations of each implemented recognition system; and a tool programmed to create personalized graphs and models (tagged graph associated to an image of a person), which can be used generally in object recognition. In the fourth and last part of the project, the results of the comparisons between different configurations of the systems implemented are shown for three databases (One of them created with pictures taken under non-controlled environments). The error rates of the proposed hybrid verification mode are measured as well. Finally, conclusions are extracted and future research studies are proposed.
Resumo:
El objetivo general de este trabajo es el correcto funcionamiento de un sistema de reconocimiento facial compuesto de varios módulos, implementados en distintos lenguajes. Uno de dichos módulos está escrito en Python y se encargarí de determinar el género del rostro o rostros que aparecen en una imagen o en un fotograma de una secuencia de vídeo. El otro módulo, escrito en C++, llevará a cabo el reconocimiento de cada una de las partes de la cara (ojos, nariz, boca) y la orientación hacia la que está posicionada (derecha, izquierda). La primera parte de esta memoria corresponde a la reimplementación de todas las partes de un analizador facial, que constituyen el primer módulo antes mencionado. Estas partes son un analizador, compuesto a su vez por un reconocedor (Tracker) y un procesador (Processor), y una clase visor para poder visualizar los resultados. Por un lado, el reconocedor o "Tracker.es el encargado de encontrar la cara y sus partes, que serán pasadas al procesador o Processor, que analizará la cara obtenida por el reconocedor y determinará su género. Este módulo estaba dise~nado completamente en C y OpenCV 1.0, y ha sido reescrito en Python y OpenCV 2.4. Y en la segunda parte, se explica cómo realizar la comunicación entre el primer módulo escrito en Python y el segundo escrito en C++. Además, se analizarán diferentes herramientas para poder ejecutar código C++ desde programas Python. Dichas herramientas son PyBindGen, Cython y Boost. Dependiendo de las necesidades del programador se contará cuál de ellas es más conveniente utilizar en cada caso. Por último, en el apartado de resultados se puede observar el funcionamiento del sistema con la integración de los dos módulos, y cómo se muestran por pantalla los puntos de interés, el género y la orientación del rostro utilizando imágenes tomadas con una cámara web.---ABSTRACT---The main objective of this document is the proper functioning of a facial recognition system composed of two modules, implemented in diferent languages. One of these modules is written in Python, and his purpose is determining the gender of the face or faces in an image or a frame of a video sequence. The other module is written in C ++ and it will perform the recognition of each of the parts of the face (eyes, nose , mouth), and the head pose (right, left).The first part of this document corresponds to the reimplementacion of all components of a facial analyzer , which constitute the first module that I mentioned before. These parts are an analyzer , composed by a tracke) and a processor, and a viewer to display the results. The tracker function is to find and its parts, which will be passed to the processor, which will analyze the face obtained by the tracker. The processor will determine the face's gender. This module was completely written in C and OpenCV 1.0, and it has been rewritten in Python and OpenCV 2.4. And in the second part, it explains how to comunicate two modules, one of them written in Python and the other one written in C++. Furthermore, it talks about some tools to execute C++ code from Python scripts. The tools are PyBindGen, Cython and Boost. It will tell which one of those tools is better to use depend on the situation. Finally, in the results section it is possible to see how the system works with the integration of the two modules, and how the points of interest, the gender an the head pose are displayed on the screen using images taken from a webcam.