Biblioteca Digital

67 resultados para 2D barcode based authentication scheme

Depth perceptual video coding for free viewpoint video based on H.264/AVC

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel scheme for depth sequences compression, based on a perceptual coding algorithm, is proposed. A depth sequence describes the object position in the 3D scene, and is used, in Free Viewpoint Video, for the generation of synthetic video sequences. In perceptual video coding the human visual system characteristics are exploited to improve the compression efficiency. As depth sequences are never shown, the perceptual video coding, assessed over them, is not effective. The proposed algorithm is based on a novel perceptual rate distortion optimization process, assessed over the perceptual distortion of the rendered views generated through the encoded depth sequences. The experimental results show the effectiveness of the proposed method, able to obtain a very considerable improvement of the rendered view perceptual quality.

Efficient hybrid monocular-stereo approach to on-board, video-based traffic sign detection and tracking

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion

Providing 3D video services: the challenge from 2D to 3DTV quality of experience

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, three-dimensional (3D) video has decisively burst onto the entertainment industry scene, and has arrived in households even before the standardization process has been completed. 3D television (3DTV) adoption and deployment can be seen as a major leap in television history, similar to previous transitions from black and white (B&W) to color, from analog to digital television (TV), and from standard definition to high definition. In this paper, we analyze current 3D video technology trends in order to define a taxonomy of the availability and possible introduction of 3D-based services. We also propose an audiovisual network services architecture which provides a smooth transition from two-dimensional (2D) to 3DTV in an Internet Protocol (IP)-based scenario. Based on subjective assessment tests, we also analyze those factors which will influence the quality of experience in those 3D video services, focusing on effects of both coding and transmission errors. In addition, examples of the application of the architecture and results of assessment tests are provided.

Biosensing comparison between different geometries based on vertical submicron-structrures made of SU-8 resist

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previous work of the research group [1-4] demonstrated the viability of using periodic lattices of micro and nanopillars, called Bio-photonic sensing Cells (BICELLs), as an optical biosensor vertically characterized by visible spectrometry. Also we have studied theoretically [5] the performance of the BICELLs by 2D and 3D simulation in orde r to optimize the biosensing response. In this work we present the fabrication and biosensing comparison of different geometrical parameters on periodic lattices of pillars in order to discuss theoretical conclusions with these results. In this way, we have explored the biosensing response of other patter ns such as crosses, stars, cylinders, concentrical cylinders (Figure 1). Also we introduced a novel method to test the BICELLs in a cost-effective way by using an ultra-thin film of SU-8 spin-coated onto the patterns to reproduce the effect of a biofilm attached to the biosensor surface. Finally we have tested the biosensing response of the different geometries by the well-known Bovine Serum Albumin (BSA) immunoassay and compared with the theoretical simulation.

Simulación de evolución dirigida de bacteriófagos en poblaciones de bacterias en 2D

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The research group is currently developing a biological computing model to be implemented with Escherichia Coli bacteria and bacteriophages M13, but it has to be modelled and simulated before any experiment in order to reduce the amount of failed attempts, time and costs. The problem that gave rise to this project is that there are no software tools which are able to simulate the biological process underlying that com- putational model, so it needs to be developed before doing any experimental implementation. There are several software tools which can simulate most of the biological processes and bacterial interactions in which this model is based, so what needs to be done is to study those available simulation tools, compare them and choose the most appropriate in order to be improved adding the desired functionality for this design. Directed evolution is a method used in biotechnology to obtain proteins or nucleic acids with properties not found in nature. It consists of three steps: 1) creating a library of mutants, 2) selecting the mutants with the desired properties, 3) replicating the variants identified in the selection step. The new software tool will be verified by simulating the selection step of a process of directed evolution applied to bacteriophages. ---ABSRACT---El grupo de investigación está desarrollando un modelo de computación biolóogica para ser implementado con bacterias Escherichia Coli y bacteriofagos M13, aunque primero tiene que ser modelizado antes de realizar cualquier experimento, de forma que los intentos fallidos y por lo tanto los costes se verán reducidos. El problema que dio lugar a este proyecto es la ausencia de herramientas software capaces de simular el proceso biológico que subyace a este modelo de computación biológica, por lo que dicha herramienta tiene que ser desarrollada antes de realizar cualquier implementación real. Existen varias herramientas software capaces de simular la mayoría de los procesos biológicos y las interacciones entre bacterias en los que se basa este modelo, por lo que este trabajo consiste en realizar un estudio de dichas herramientas de simulación, compararlas y escoger aquella más apropiada para ser mejorada añadiendo la funcionalidad deseada para este diseño. La evolución dirigida es un método utilizado en biotecnología para obtener proteínas o ácidos nucleicos con propiedades que no se encuentran en la naturaleza. Este método consiste en tres pasos: 1) crear una librería de mutantes, 2) seleccionar los mutantes con las propiedades deseadas, 3) Replicar los mutantes deseados. La nueva herramienta software será verificada mediante la simulación de la selección de mutantes de un proceso de evolución dirigida aplicado a bacteriofagos.

Quantum metropolitan optical network based on wavelength division multiplexing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quantum Key Distribution (QKD) is maturing quickly. However, the current approaches to its application in optical networks make it an expensive technology. QKD networks deployed to date are designed as a collection of point-to-point, dedicated QKD links where non-neighboring nodes communicate using the trusted repeater paradigm. We propose a novel optical network model in which QKD systems share the communication infrastructure by wavelength multiplexing their quantum and classical signals. The routing is done using optical components within a metropolitan area which allows for a dynamically any-to-any communication scheme. Moreover, it resembles a commercial telecom network, takes advantage of existing infrastructure and utilizes commercial components, allowing for an easy, cost-effective and reliable deployment.

Modelling of landslides propagation with SPH : effects of rheology and pore water pressure

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El estudio desarrollado en este trabajo de tesis se centra en la modelización numérica de la fase de propagación de los deslizamientos rápidos de ladera a través del método sin malla Smoothed Particle Hydrodynamics (SPH). Este método tiene la gran ventaja de permitir el análisis de problemas de grandes deformaciones evitando operaciones costosas de remallado como en el caso de métodos numéricos con mallas tal como el método de los Elementos Finitos. En esta tesis, particular atención viene dada al rol que la reología y la presión de poros desempeñan durante estos eventos. El modelo matemático utilizado se basa en la formulación de Biot-Zienkiewicz v - pw, que representa el comportamiento, expresado en términos de velocidad del esqueleto sólido y presiones de poros, de la mezcla de partículas sólidas en un medio saturado. Las ecuaciones que gobiernan el problema son: • la ecuación de balance de masa de la fase del fluido intersticial, • la ecuación de balance de momento de la fase del fluido intersticial y de la mezcla, • la ecuación constitutiva y • una ecuación cinemática. Debido a sus propiedades geométricas, los deslizamientos de ladera se caracterizan por tener una profundidad muy pequeña frente a su longitud y a su anchura, y, consecuentemente, el modelo matemático mencionado anteriormente se puede simplificar integrando en profundidad las ecuaciones, pasando de un modelo 3D a 2D, el cual presenta una combinación excelente de precisión, sencillez y costes computacionales. El modelo propuesto en este trabajo se diferencia de los modelos integrados en profundidad existentes por incorporar un ulterior modelo capaz de proveer información sobre la presión del fluido intersticial a cada paso computacional de la propagación del deslizamiento. En una manera muy eficaz, la evolución de los perfiles de la presión de poros está numéricamente resuelta a través de un esquema explicito de Diferencias Finitas a cada nodo SPH. Este nuevo enfoque es capaz de tener en cuenta la variación de presión de poros debida a cambios de altura, de consolidación vertical o de cambios en las tensiones totales. Con respecto al comportamiento constitutivo, uno de los problemas principales al modelizar numéricamente deslizamientos rápidos de ladera está en la dificultad de simular con la misma ley constitutiva o reológica la transición de la fase de iniciación, donde el material se comporta como un sólido, a la fase de propagación donde el material se comporta como un fluido. En este trabajo de tesis, se propone un nuevo modelo reológico basado en el modelo viscoplástico de Perzyna, pensando a la viscoplasticidad como a la llave para poder simular tanto la fase de iniciación como la de propagación con el mismo modelo constitutivo. Con el fin de validar el modelo matemático y numérico se reproducen tanto ejemplos de referencia con solución analítica como experimentos de laboratorio. Finalmente, el modelo se aplica a casos reales, con especial atención al caso del deslizamiento de 1966 en Aberfan, mostrando como los resultados obtenidos simulan con éxito estos tipos de riesgos naturales. The study developed in this thesis focuses on the modelling of landslides propagation with the Smoothed Particle Hydrodynamics (SPH) meshless method which has the great advantage of allowing to deal with large deformation problems by avoiding expensive remeshing operations as happens for mesh methods such as, for example, the Finite Element Method. In this thesis, special attention is given to the role played by rheology and pore water pressure during these natural hazards. The mathematical framework used is based on the v - pw Biot-Zienkiewicz formulation, which represents the behaviour, formulated in terms of soil skeleton velocity and pore water pressure, of the mixture of solid particles and pore water in a saturated media. The governing equations are: • the mass balance equation for the pore water phase, • the momentum balance equation for the pore water phase and the mixture, • the constitutive equation and • a kinematic equation. Landslides, due to their shape and geometrical properties, have small depths in comparison with their length or width, therefore, the mathematical model aforementioned can then be simplified by depth integrating the equations, switching from a 3D to a 2D model, which presents an excellent combination of accuracy, computational costs and simplicity. The proposed model differs from previous depth integrated models by including a sub-model able to provide information on pore water pressure profiles at each computational step of the landslide's propagation. In an effective way, the evolution of the pore water pressure profiles is numerically solved through a set of 1D Finite Differences explicit scheme at each SPH node. This new approach is able to take into account the variation of the pore water pressure due to changes of height, vertical consolidation or changes of total stress. Concerning the constitutive behaviour, one of the main issues when modelling fast landslides is the difficulty to simulate with the same constitutive or rheological model the transition from the triggering phase, where the landslide behaves like a solid, to the propagation phase, where the landslide behaves in a fluid-like manner. In this work thesis, a new rheological model is proposed, based on the Perzyna viscoplastic model, thinking of viscoplasticity as the key to close the gap between the triggering and the propagation phase. In order to validate the mathematical model and the numerical approach, benchmarks and laboratory experiments are reproduced and compared to analytical solutions when possible. Finally, applications to real cases are studied, with particular attention paid to the Aberfan flowslide of 1966, showing how the mathematical model accurately and successfully simulate these kind of natural hazards.

Classifying GABAergic interneurons with semi-supervised projected model-based clustering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives: A recently introduced pragmatic scheme promises to be a useful catalog of interneuron names.We sought to automatically classify digitally reconstructed interneuronal morphologies according tothis scheme. Simultaneously, we sought to discover possible subtypes of these types that might emergeduring automatic classification (clustering). We also investigated which morphometric properties weremost relevant for this classification.Materials and methods: A set of 118 digitally reconstructed interneuronal morphologies classified into thecommon basket (CB), horse-tail (HT), large basket (LB), and Martinotti (MA) interneuron types by 42 of theworld?s leading neuroscientists, quantified by five simple morphometric properties of the axon and fourof the dendrites. We labeled each neuron with the type most commonly assigned to it by the experts. Wethen removed this class information for each type separately, and applied semi-supervised clustering tothose cells (keeping the others? cluster membership fixed), to assess separation from other types and lookfor the formation of new groups (subtypes). We performed this same experiment unlabeling the cells oftwo types at a time, and of half the cells of a single type at a time. The clustering model is a finite mixtureof Gaussians which we adapted for the estimation of local (per-cluster) feature relevance. We performedthe described experiments on three different subsets of the data, formed according to how many expertsagreed on type membership: at least 18 experts (the full data set), at least 21 (73 neurons), and at least26 (47 neurons).Results: Interneurons with more reliable type labels were classified more accurately. We classified HTcells with 100% accuracy, MA cells with 73% accuracy, and CB and LB cells with 56% and 58% accuracy,respectively. We identified three subtypes of the MA type, one subtype of CB and LB types each, andno subtypes of HT (it was a single, homogeneous type). We got maximum (adapted) Silhouette widthand ARI values of 1, 0.83, 0.79, and 0.42, when unlabeling the HT, CB, LB, and MA types, respectively,confirming the quality of the formed cluster solutions. The subtypes identified when unlabeling a singletype also emerged when unlabeling two types at a time, confirming their validity. Axonal morphometricproperties were more relevant that dendritic ones, with the axonal polar histogram length in the [pi, 2pi) angle interval being particularly useful.Conclusions: The applied semi-supervised clustering method can accurately discriminate among CB, HT, LB, and MA interneuron types while discovering potential subtypes, and is therefore useful for neuronal classification. The discovery of potential subtypes suggests that some of these types are more heteroge-neous that previously thought. Finally, axonal variables seem to be more relevant than dendritic ones fordistinguishing among the CB, HT, LB, and MA interneuron types.

Non uniform embedding based on relevance analysis with reduced computational complexity: application to the detection of pathologies from biosignal recordings

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nonlinear analysis tools for studying and characterizing the dynamics of physiological signals have gained popularity, mainly because tracking sudden alterations of the inherent complexity of biological processes might be an indicator of altered physiological states. Typically, in order to perform an analysis with such tools, the physiological variables that describe the biological process under study are used to reconstruct the underlying dynamics of the biological processes. For that goal, a procedure called time-delay or uniform embedding is usually employed. Nonetheless, there is evidence of its inability for dealing with non-stationary signals, as those recorded from many physiological processes. To handle with such a drawback, this paper evaluates the utility of non-conventional time series reconstruction procedures based on non uniform embedding, applying them to automatic pattern recognition tasks. The paper compares a state of the art non uniform approach with a novel scheme which fuses embedding and feature selection at once, searching for better reconstructions of the dynamics of the system. Moreover, results are also compared with two classic uniform embedding techniques. Thus, the goal is comparing uniform and non uniform reconstruction techniques, including the one proposed in this work, for pattern recognition in biomedical signal processing tasks. Once the state space is reconstructed, the scheme followed characterizes with three classic nonlinear dynamic features (Largest Lyapunov Exponent, Correlation Dimension and Recurrence Period Density Entropy), while classification is carried out by means of a simple k-nn classifier. In order to test its generalization capabilities, the approach was tested with three different physiological databases (Speech Pathologies, Epilepsy and Heart Murmurs). In terms of the accuracy obtained to automatically detect the presence of pathologies, and for the three types of biosignals analyzed, the non uniform techniques used in this work lightly outperformed the results obtained using the uniform methods, suggesting their usefulness to characterize non-stationary biomedical signals in pattern recognition applications. On the other hand, in view of the results obtained and its low computational load, the proposed technique suggests its applicability for the applications under study.

An autostereoscopic device for mobile applications based on a liquid crystal microlens array and an OLED display

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, many experimental and theoretical research groups worldwide have actively worked on demonstrating the use of liquid crystals (LCs) as adaptive lenses for image generation, waveform shaping, and non-mechanical focusing applications. In particular, important achievements have concerned the development of alternative solutions for 3D vision. This work focuses on the design and evaluation of the electro-optic response of a LC-based 2D/3D autostereoscopic display prototype. A strategy for achieving 2D/3D vision has been implemented with a cylindrical LC lens array placed in front of a display; this array acts as a lenticular sheet with a tunable focal length by electrically controlling the birefringence. The performance of the 2D/3D device was evaluated in terms of the angular luminance, image deflection, crosstalk, and 3D contrast within a simulated environment. These measurements were performed with characterization equipment for autostereoscopic 3D displays (angular resolution of 0.03 ).

A low-cost alternative scheme to detect a 100 Gbps PM-DQPSK signal

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose and demonstrate a low-cost alternative scheme of direct-detection to detect a 100Gbps polarization-multiplexed differential quadrature phase-shift keying (PM-DQPSK) signal. The proposed scheme is based on a delay line and a polarization rotator; the phase-shift keying signal is first converted into a polarization shift keying signal. Then, this signal is converted into an intensity modulated signal by a polarization beam splitter. Finally, the intensity-modulated signal is detected by balanced photodetectors. In order to demonstrate that our proposed receiver is suitable for using as a PM-DQPSK demodulator, a set of simulations have been performed. In addition to testing the sensitivity, the performance under various impairments, including narrow optical filtering, polarization mode dispersion, chromatic dispersion and polarization sensitivity, is analyzed. The simulation results show that our performance receiver is as good as a conventional receiver based on four delay interferometers. Moreover, in comparison with the typical receiver, fewer components are used in our receiver. Hence, implementation is easier, and total cost is reduced. In addition, our receiver can be easily improved to a bit-rate tunable receiver.

Usability analysis of authentication techniques

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This document will be divided into two main parts. The first one will be the classification of the authentication techniques. We will search the main electronic databases for papers related to authentication techniques. We will then summarize the related papers and show what classifications they use for the authentication techniques. After all of the documents have been read and summarized we will analyse them and group the authentication techniques into the classifications found. For the second part of the document we will focus on the study of usability attributes in the authentication techniques. This to know how authentications techniques compare to one another based on their usability attributes. We will search the main electronic databases for papers related to the usability attributes of authentication techniques based on the usability definition of ISO/IEC 25010 (SQuaRE) and its attributes. We will then summarize the related papers and show what authentication methods they describe and which usability attributes they measure. After all of the documents have been read and summarized we will analyse them depending on their usability attribute. At the end we will elaborate those results to show which authentication techniques have better usability in terms of a specific usability attribute. This will help practitioners who are interested in using authentication methods but want or need to focus on a specific usability attribute. They will be able to use this as a guide to help them chose the best option that fits their purpose.

Scalable UWB photonic generator based on the combination of doublet pulses

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose and experimentally demonstrate a scalable and reconfigurable optical scheme to generate high order UWB pulses. Firstly, various ultra wideband doublets are created through a process of phase-tointensity conversion by means of a phase modulation and a dispersive media. In a second stage, doublets are combined in an optical processing unit that allows the reconfiguration of UWB high order pulses. Experimental results both in time and frequency domains are presented showing good performance related to the fractional bandwidth and spectral efficiency parameters.

ISA S88 / Loose Coupling Based Solution for Production Control Software Reusability

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A solution for the problem of reusability of software system for batch production systems is proposed. It is based on ISA S88 standard that prescribes the abstraction of elements in the manufacturing system that is equipment, processes and procedures abstraction, required to make a product batch. An easy to apply data scheme, compatible with the standard, is developed for management of production information. In addition to flexibility provided by the S88 standard, software system reusability requires a solution supporting manufacturing equipment reconfigurability. Toward this end a coupling mechanism is developed. A software tool, including these solutions, was developed and validated at laboratory level, using product manufacturing information of an actual plant.

Contributions to Speech Analytics based on Speech Recognition and Topic Identification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La última década ha sido testigo de importantes avances en el campo de la tecnología de reconocimiento de voz. Los sistemas comerciales existentes actualmente poseen la capacidad de reconocer habla continua de múltiples locutores, consiguiendo valores aceptables de error, y sin la necesidad de realizar procedimientos explícitos de adaptación. A pesar del buen momento que vive esta tecnología, el reconocimiento de voz dista de ser un problema resuelto. La mayoría de estos sistemas de reconocimiento se ajustan a dominios particulares y su eficacia depende de manera significativa, entre otros muchos aspectos, de la similitud que exista entre el modelo de lenguaje utilizado y la tarea específica para la cual se está empleando. Esta dependencia cobra aún más importancia en aquellos escenarios en los cuales las propiedades estadísticas del lenguaje varían a lo largo del tiempo, como por ejemplo, en dominios de aplicación que involucren habla espontánea y múltiples temáticas. En los últimos años se ha evidenciado un constante esfuerzo por mejorar los sistemas de reconocimiento para tales dominios. Esto se ha hecho, entre otros muchos enfoques, a través de técnicas automáticas de adaptación. Estas técnicas son aplicadas a sistemas ya existentes, dado que exportar el sistema a una nueva tarea o dominio puede requerir tiempo a la vez que resultar costoso. Las técnicas de adaptación requieren fuentes adicionales de información, y en este sentido, el lenguaje hablado puede aportar algunas de ellas. El habla no sólo transmite un mensaje, también transmite información acerca del contexto en el cual se desarrolla la comunicación hablada (e.g. acerca del tema sobre el cual se está hablando). Por tanto, cuando nos comunicamos a través del habla, es posible identificar los elementos del lenguaje que caracterizan el contexto, y al mismo tiempo, rastrear los cambios que ocurren en estos elementos a lo largo del tiempo. Esta información podría ser capturada y aprovechada por medio de técnicas de recuperación de información (information retrieval) y de aprendizaje de máquina (machine learning). Esto podría permitirnos, dentro del desarrollo de mejores sistemas automáticos de reconocimiento de voz, mejorar la adaptación de modelos del lenguaje a las condiciones del contexto, y por tanto, robustecer al sistema de reconocimiento en dominios con condiciones variables (tales como variaciones potenciales en el vocabulario, el estilo y la temática). En este sentido, la principal contribución de esta Tesis es la propuesta y evaluación de un marco de contextualización motivado por el análisis temático y basado en la adaptación dinámica y no supervisada de modelos de lenguaje para el robustecimiento de un sistema automático de reconocimiento de voz. Esta adaptación toma como base distintos enfoque de los sistemas mencionados (de recuperación de información y aprendizaje de máquina) mediante los cuales buscamos identificar las temáticas sobre las cuales se está hablando en una grabación de audio. Dicha identificación, por lo tanto, permite realizar una adaptación del modelo de lenguaje de acuerdo a las condiciones del contexto. El marco de contextualización propuesto se puede dividir en dos sistemas principales: un sistema de identificación de temática y un sistema de adaptación dinámica de modelos de lenguaje. Esta Tesis puede describirse en detalle desde la perspectiva de las contribuciones particulares realizadas en cada uno de los campos que componen el marco propuesto: _ En lo referente al sistema de identificación de temática, nos hemos enfocado en aportar mejoras a las técnicas de pre-procesamiento de documentos, asimismo en contribuir a la definición de criterios más robustos para la selección de index-terms. – La eficiencia de los sistemas basados tanto en técnicas de recuperación de información como en técnicas de aprendizaje de máquina, y específicamente de aquellos sistemas que particularizan en la tarea de identificación de temática, depende, en gran medida, de los mecanismos de preprocesamiento que se aplican a los documentos. Entre las múltiples operaciones que hacen parte de un esquema de preprocesamiento, la selección adecuada de los términos de indexado (index-terms) es crucial para establecer relaciones semánticas y conceptuales entre los términos y los documentos. Este proceso también puede verse afectado, o bien por una mala elección de stopwords, o bien por la falta de precisión en la definición de reglas de lematización. En este sentido, en este trabajo comparamos y evaluamos diferentes criterios para el preprocesamiento de los documentos, así como también distintas estrategias para la selección de los index-terms. Esto nos permite no sólo reducir el tamaño de la estructura de indexación, sino también mejorar el proceso de identificación de temática. – Uno de los aspectos más importantes en cuanto al rendimiento de los sistemas de identificación de temática es la asignación de diferentes pesos a los términos de acuerdo a su contribución al contenido del documento. En este trabajo evaluamos y proponemos enfoques alternativos a los esquemas tradicionales de ponderado de términos (tales como tf-idf ) que nos permitan mejorar la especificidad de los términos, así como también discriminar mejor las temáticas de los documentos. _ Respecto a la adaptación dinámica de modelos de lenguaje, hemos dividimos el proceso de contextualización en varios pasos. – Para la generación de modelos de lenguaje basados en temática, proponemos dos tipos de enfoques: un enfoque supervisado y un enfoque no supervisado. En el primero de ellos nos basamos en las etiquetas de temática que originalmente acompañan a los documentos del corpus que empleamos. A partir de estas, agrupamos los documentos que forman parte de la misma temática y generamos modelos de lenguaje a partir de dichos grupos. Sin embargo, uno de los objetivos que se persigue en esta Tesis es evaluar si el uso de estas etiquetas para la generación de modelos es óptimo en términos del rendimiento del reconocedor. Por esta razón, nosotros proponemos un segundo enfoque, un enfoque no supervisado, en el cual el objetivo es agrupar, automáticamente, los documentos en clusters temáticos, basándonos en la similaridad semántica existente entre los documentos. Por medio de enfoques de agrupamiento conseguimos mejorar la cohesión conceptual y semántica en cada uno de los clusters, lo que a su vez nos permitió refinar los modelos de lenguaje basados en temática y mejorar el rendimiento del sistema de reconocimiento. – Desarrollamos diversas estrategias para generar un modelo de lenguaje dependiente del contexto. Nuestro objetivo es que este modelo refleje el contexto semántico del habla, i.e. las temáticas más relevantes que se están discutiendo. Este modelo es generado por medio de la interpolación lineal entre aquellos modelos de lenguaje basados en temática que estén relacionados con las temáticas más relevantes. La estimación de los pesos de interpolación está basada principalmente en el resultado del proceso de identificación de temática. – Finalmente, proponemos una metodología para la adaptación dinámica de un modelo de lenguaje general. El proceso de adaptación tiene en cuenta no sólo al modelo dependiente del contexto sino también a la información entregada por el proceso de identificación de temática. El esquema usado para la adaptación es una interpolación lineal entre el modelo general y el modelo dependiente de contexto. Estudiamos también diferentes enfoques para determinar los pesos de interpolación entre ambos modelos. Una vez definida la base teórica de nuestro marco de contextualización, proponemos su aplicación dentro de un sistema automático de reconocimiento de voz. Para esto, nos enfocamos en dos aspectos: la contextualización de los modelos de lenguaje empleados por el sistema y la incorporación de información semántica en el proceso de adaptación basado en temática. En esta Tesis proponemos un marco experimental basado en una arquitectura de reconocimiento en ‘dos etapas’. En la primera etapa, empleamos sistemas basados en técnicas de recuperación de información y aprendizaje de máquina para identificar las temáticas sobre las cuales se habla en una transcripción de un segmento de audio. Esta transcripción es generada por el sistema de reconocimiento empleando un modelo de lenguaje general. De acuerdo con la relevancia de las temáticas que han sido identificadas, se lleva a cabo la adaptación dinámica del modelo de lenguaje. En la segunda etapa de la arquitectura de reconocimiento, usamos este modelo adaptado para realizar de nuevo el reconocimiento del segmento de audio. Para determinar los beneficios del marco de trabajo propuesto, llevamos a cabo la evaluación de cada uno de los sistemas principales previamente mencionados. Esta evaluación es realizada sobre discursos en el dominio de la política usando la base de datos EPPS (European Parliamentary Plenary Sessions - Sesiones Plenarias del Parlamento Europeo) del proyecto europeo TC-STAR. Analizamos distintas métricas acerca del rendimiento de los sistemas y evaluamos las mejoras propuestas con respecto a los sistemas de referencia. ABSTRACT The last decade has witnessed major advances in speech recognition technology. Today’s commercial systems are able to recognize continuous speech from numerous speakers, with acceptable levels of error and without the need for an explicit adaptation procedure. Despite this progress, speech recognition is far from being a solved problem. Most of these systems are adjusted to a particular domain and their efficacy depends significantly, among many other aspects, on the similarity between the language model used and the task that is being addressed. This dependence is even more important in scenarios where the statistical properties of the language fluctuates throughout the time, for example, in application domains involving spontaneous and multitopic speech. Over the last years there has been an increasing effort in enhancing the speech recognition systems for such domains. This has been done, among other approaches, by means of techniques of automatic adaptation. These techniques are applied to the existing systems, specially since exporting the system to a new task or domain may be both time-consuming and expensive. Adaptation techniques require additional sources of information, and the spoken language could provide some of them. It must be considered that speech not only conveys a message, it also provides information on the context in which the spoken communication takes place (e.g. on the subject on which it is being talked about). Therefore, when we communicate through speech, it could be feasible to identify the elements of the language that characterize the context, and at the same time, to track the changes that occur in those elements over time. This information can be extracted and exploited through techniques of information retrieval and machine learning. This allows us, within the development of more robust speech recognition systems, to enhance the adaptation of language models to the conditions of the context, thus strengthening the recognition system for domains under changing conditions (such as potential variations in vocabulary, style and topic). In this sense, the main contribution of this Thesis is the proposal and evaluation of a framework of topic-motivated contextualization based on the dynamic and non-supervised adaptation of language models for the enhancement of an automatic speech recognition system. This adaptation is based on an combined approach (from the perspective of both information retrieval and machine learning fields) whereby we identify the topics that are being discussed in an audio recording. The topic identification, therefore, enables the system to perform an adaptation of the language model according to the contextual conditions. The proposed framework can be divided in two major systems: a topic identification system and a dynamic language model adaptation system. This Thesis can be outlined from the perspective of the particular contributions made in each of the fields that composes the proposed framework: _ Regarding the topic identification system, we have focused on the enhancement of the document preprocessing techniques in addition to contributing in the definition of more robust criteria for the selection of index-terms. – Within both information retrieval and machine learning based approaches, the efficiency of topic identification systems, depends, to a large extent, on the mechanisms of preprocessing applied to the documents. Among the many operations that encloses the preprocessing procedures, an adequate selection of index-terms is critical to establish conceptual and semantic relationships between terms and documents. This process might also be weakened by a poor choice of stopwords or lack of precision in defining stemming rules. In this regard we compare and evaluate different criteria for preprocessing the documents, as well as for improving the selection of the index-terms. This allows us to not only reduce the size of the indexing structure but also to strengthen the topic identification process. – One of the most crucial aspects, in relation to the performance of topic identification systems, is to assign different weights to different terms depending on their contribution to the content of the document. In this sense we evaluate and propose alternative approaches to traditional weighting schemes (such as tf-idf ) that allow us to improve the specificity of terms, and to better identify the topics that are related to documents. _ Regarding the dynamic language model adaptation, we divide the contextualization process into different steps. – We propose supervised and unsupervised approaches for the generation of topic-based language models. The first of them is intended to generate topic-based language models by grouping the documents, in the training set, according to the original topic labels of the corpus. Nevertheless, a goal of this Thesis is to evaluate whether or not the use of these labels to generate language models is optimal in terms of recognition accuracy. For this reason, we propose a second approach, an unsupervised one, in which the objective is to group the data in the training set into automatic topic clusters based on the semantic similarity between the documents. By means of clustering approaches we expect to obtain a more cohesive association of the documents that are related by similar concepts, thus improving the coverage of the topic-based language models and enhancing the performance of the recognition system. – We develop various strategies in order to create a context-dependent language model. Our aim is that this model reflects the semantic context of the current utterance, i.e. the most relevant topics that are being discussed. This model is generated by means of a linear interpolation between the topic-based language models related to the most relevant topics. The estimation of the interpolation weights is based mainly on the outcome of the topic identification process. – Finally, we propose a methodology for the dynamic adaptation of a background language model. The adaptation process takes into account the context-dependent model as well as the information provided by the topic identification process. The scheme used for the adaptation is a linear interpolation between the background model and the context-dependent one. We also study different approaches to determine the interpolation weights used in this adaptation scheme. Once we defined the basis of our topic-motivated contextualization framework, we propose its application into an automatic speech recognition system. We focus on two aspects: the contextualization of the language models used by the system, and the incorporation of semantic-related information into a topic-based adaptation process. To achieve this, we propose an experimental framework based in ‘a two stages’ recognition architecture. In the first stage of the architecture, Information Retrieval and Machine Learning techniques are used to identify the topics in a transcription of an audio segment. This transcription is generated by the recognition system using a background language model. According to the confidence on the topics that have been identified, the dynamic language model adaptation is carried out. In the second stage of the recognition architecture, an adapted language model is used to re-decode the utterance. To test the benefits of the proposed framework, we carry out the evaluation of each of the major systems aforementioned. The evaluation is conducted on speeches of political domain using the EPPS (European Parliamentary Plenary Sessions) database from the European TC-STAR project. We analyse several performance metrics that allow us to compare the improvements of the proposed systems against the baseline ones.

«
1
2
3
4
5
»