Biblioteca Digital

822 resultados para Sub-topic retrieval

Glocalización: Nuevos Enfoques Teóricos Sobre el Desarrollo Regional (Sub Nacional) en el Contexto de la Integración Económica y de la Globalización

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El autor aborda en este artículo cuestiones atinentes al espacio, la geografía y la organización del territorio a partir del tema del desarrollo. Para fundamentar su tesis según la cual varios de los factores que impulsan la globalización tienen una base territorial, como el comercio internacional de bienes y servicios, el cambio tecnológico o los flujos internacionales, apela a las categorías del desarrollo social y las sitúa en dos dimensiones. En una primera instancia, el autor toma en cuenta la lógica intranacional de despliegue de los factores políticos, ambientales y comerciales; en un segundo momento destaca el papel de las regiones y las localidades en la reconstrucción del sistema económico mundial para concluir que en un contexto todavía incierto se destaca la configuración de regiones megapolitanas que funcionan como motores regionales en la nueva economía global.---The author handles this article with questions that regard space,geography and territorial organisation based on the topic of development. To support his thesis that refers to the way in which various factors that give an impulse to globalisation have a territorial base, such as the international commerce of goods and services, technological change or international networks, the author utilises social development categories and places them in two dimensions. Initially, the author takes into account the international logic of the development of commercial, environment and political factors. Secondly, he emphasizes the role of regions and localities in the reconstruction of the global economic system to conclude that in a present uncertain context it is worth pointing out the configuration of megapolitanregions that function as regional motors in a new global economy.

Efectos del tratado de libre comercio con Estados Unidos en el sub sector avícola colombiano

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La industria avícola, es uno de los sub-sectores mas representativos en la economía, tanto de Estados Unidos como para Colombia en los últimos años. Esta industria en Colombia, ha logrado aumentar su producción, sin embargo el presente TLC firmado y puesto en marcha con Estados Unidos ha puesto en riesgo la Avicultura del Terrirtorio colombiano, debido a muchas diferencias, como el precio de venta, el cual el colombiano es mucho mayor al norteamerciano, diferencias en infraestructura, en tecnicas de producción, en la partes del pollo que se consumen lo que hace que Colombia se encuentre en desventaja y tenga que buscar herramientas para competir.

La influencia de los sub-complejos de seguridad regional en la evolución del problema de las drogas como amenaza a la seguridad hemisférica en el marco de la Organización de Estados Americanos. Periodo 2003-2013

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El objetivo de este trabajo monográfico es analizar la evolución del problema de las drogas dentro del marco de la Organización de Estados Americanos (OEA) durante el periodo comprendido entre los años 2003 y 2013. Se desarrolla un estudio sobre los dos grupos de países que más han participado en lo concerniente a este tema. Por un lado Estados Unidos, Canadá y México; y por otro lado Colombia, Perú y Bolivia son lo países que por sus lógicas y tendencias han impulsado este problema y su evolución, y por tanto también desde los cuales se han impulsado las diferentes tendencias y/o soluciones que en cuanto a este problema oscilan dentro de la OEA. Iniciando con la tendencia prohibicionista en 1978, en los últimos años se han desarrollado otras dos: la despenalización y la legalización. El análisis sobre la incidencia que han tenido los países nombrados anteriormente en el problema de las drogas se desarrollará a partir de la comprensión aportada por la teoría de los Complejos Regionales de Seguridad.

Semantic-driven context-aware visual information indexing and retrieval: applied in the film post-production domain

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.

Coupling aerosol surface and bulk chemistry with a kinetic double layer model (K2-SUB): oxidation of oleic acid by ozone

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a kinetic double layer model coupling aerosol surface and bulk chemistry (K2-SUB) based on the PRA framework of gas-particle interactions (Poschl-Rudich-Ammann, 2007). K2-SUB is applied to a popular model system of atmospheric heterogeneous chemistry: the interaction of ozone with oleic acid. We show that our modelling approach allows de-convoluting surface and bulk processes, which has been a controversial topic and remains an important challenge for the understanding and description of atmospheric aerosol transformation. In particular, we demonstrate how a detailed treatment of adsorption and reaction at the surface can be coupled to a description of bulk reaction and transport that is consistent with traditional resistor model formulations. From literature data we have derived a consistent set of kinetic parameters that characterise mass transport and chemical reaction of ozone at the surface and in the bulk of oleic acid droplets. Due to the wide range of rate coefficients reported from different experimental studies, the exact proportions between surface and bulk reaction rates remain uncertain. Nevertheless, the model results suggest an important role of chemical reaction in the bulk and an approximate upper limit of similar to 10(-11) cm(2) s(-1) for the surface reaction rate coefficient. Sensitivity studies show that the surface accommodation coefficient of the gas-phase reactant has a strong non-linear influence on both surface and bulk chemical reactions. We suggest that K2-SUB may be used to design, interpret and analyse future experiments for better discrimination between surface and bulk processes in the oleic acid-ozone system as well as in other heterogeneous reaction systems of atmospheric relevance.

Auditory distraction eliminates retrieval induced forgetting: implications for the processing of unattended sound

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Retrieval-Induced Forgetting (RIF) paradigm includes three phases: (a) study/encoding of category exemplars, (b) practicing retrieval of a sub-set of those category exemplars, and (c) recall of all exemplars. At the final recall phase, recall of items that belong to the same categories as those items that undergo retrieval-practice, but that do not undergo retrieval-practice, is impaired. The received view is that this is because retrieval of target category-exemplars (e.g., ‘Tiger’ in the category Four-legged animal) requires inhibition of non-target category-exemplars (e.g., ‘Dog’ and ‘Lion’) that compete for retrieval. Here, we used the RIF paradigm to investigate whether ignoring auditory items during the retrieval-practice phase modulates the inhibitory process. In two experiments, RIF was present when retrieval-practice was conducted in quiet and when conducted in the presence of spoken words that belonged to a category other than that of the items that were targets for retrieval-practice. In contrast, RIF was abolished when words that either were identical to the retrieval-practice words or were only semantically related to the retrieval-practice words were presented as background speech. The results suggest that the act of ignoring speech can reduce inhibition of the non-practiced category-exemplars, thereby eliminating RIF, but only when the spoken words are competitors for retrieval (i.e., belong to the same semantic category as the to-be-retrieved items).

Land surface temperature retrieval from Sentinel 2 and 3 Missions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we explore the synergistic use of future MSI instrument on board Sentinel-2 platform and OLCI/SLSTR instruments on board Sentinel-3 platform in order to improve LST products currently derived from the single AATSR instrument on board the ENVI- SAT satellite. For this purpose, the high spatial resolu- tion data from Setinel2/MSI will be used for a good characterization of the land surface sub-pixel heteroge- neity, in particular for a precise parameterization of surface emissivity using a land cover map and spectral mixture techniques. On the other hand, the high spectral resolution of OLCI instrument, suitable for a better characterization of the atmosphere, along with the dual- view available in the SLTSR instrument, will allow a better atmospheric correction through improved aero- sol/water vapor content retrievals and the implementa- tion of novel cloud screening procedures. Effective emissivity and atmospheric corrections will allow accu- rate LST retrievals using the SLSTR thermal bands by developing a synergistic split-window/dual-angle algo- rithm. ENVISAT MERIS and AATSR instruments and different high spatial resolution data (Landsat/TM, Proba/CHRIS, Terra/ASTER) will be used as bench- mark for the future OLCI, SLSTR and MSI instruments. Results will be validated using ground data collected in the framework of different field campaigns organized by ESA.

Evapotranspiração na sub-bacia do Riacho Jardim – CE, por sensoriamento remoto

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pós-graduação em Agronomia (Energia na Agricultura) - FCA

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Medicalgenomics.org : an open source database and retrieval system for biomedical analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Die Molekularbiologie von Menschen ist ein hochkomplexes und vielfältiges Themengebiet, in dem in vielen Bereichen geforscht wird. Der Fokus liegt hier insbesondere auf den Bereichen der Genomik, Proteomik, Transkriptomik und Metabolomik, und Jahre der Forschung haben große Mengen an wertvollen Daten zusammengetragen. Diese Ansammlung wächst stetig und auch für die Zukunft ist keine Stagnation absehbar. Mittlerweile aber hat diese permanente Informationsflut wertvolles Wissen in unüberschaubaren, digitalen Datenbergen begraben und das Sammeln von forschungsspezifischen und zuverlässigen Informationen zu einer großen Herausforderung werden lassen. Die in dieser Dissertation präsentierte Arbeit hat ein umfassendes Kompendium von humanen Geweben für biomedizinische Analysen generiert. Es trägt den Namen medicalgenomics.org und hat diverse biomedizinische Probleme auf der Suche nach spezifischem Wissen in zahlreichen Datenbanken gelöst. Das Kompendium ist das erste seiner Art und sein gewonnenes Wissen wird Wissenschaftlern helfen, einen besseren systematischen Überblick über spezifische Gene oder funktionaler Profile, mit Sicht auf Regulation sowie pathologische und physiologische Bedingungen, zu bekommen. Darüber hinaus ermöglichen verschiedene Abfragemethoden eine effiziente Analyse von signalgebenden Ereignissen, metabolischen Stoffwechselwegen sowie das Studieren der Gene auf der Expressionsebene. Die gesamte Vielfalt dieser Abfrageoptionen ermöglicht den Wissenschaftlern hoch spezialisierte, genetische Straßenkarten zu erstellen, mit deren Hilfe zukünftige Experimente genauer geplant werden können. Infolgedessen können wertvolle Ressourcen und Zeit eingespart werden, bei steigenden Erfolgsaussichten. Des Weiteren kann das umfassende Wissen des Kompendiums genutzt werden, um biomedizinische Hypothesen zu generieren und zu überprüfen.

Retrieval simulations of the vertical profiles of water vapour and other chemical species in the Martian atmosphere using PACS

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Water vapour, despite being a minor constituent in the Martian atmosphere with its precipitable amount of less than 70 pr. μm, attracts considerable attention in the scientific community because of its potential importance for past life on Mars. The partial pressure of water vapour is highly variable because of its seasonal condensation onto the polar caps and exchange with a subsurface reservoir. It is also known to drive photochemical processes: photolysis of water produces H, OH, HO2 and some other odd hydrogen compounds, which in turn destroy ozone. Consequently, the abundance of water vapour is anti-correlated with ozone abundance. The Herschel Space Observatory provides for the first time the possibility to retrieve vertical water profiles in the Martian atmosphere. Herschel will contribute to this topic with its guaranteed-time key project called "Water and related chemistry in the solar system". Observations of Mars by Heterodyne Instrument for the Far Infrared (HIFI) and Photodetector Array Camera and Spectrometer (PACS) onboard Herschel are planned in the frame of the programme. HIFI with its high spectral resolution enables accurate observations of vertically resolved H2O and temperature profiles in the Martian atmosphere. Unlike HIFI, PACS is not capable of resolving the line-shape of molecular lines. However, our present study of PACS observations for the Martian atmosphere shows that the vertical sensitivity of the PACS observations can be improved by using multiple-line observations with different line opacities. We have investigated the possibility of retrieving vertical profiles of temperature and molecular abundances of minor species including H2O in the Martian atmosphere using PACS. In this paper, we report that PACS is able to provide water vapour vertical profiles for the Martian atmosphere and we present the expected spectra for future PACS observations. We also show that the spectral resolution does not allow the retrieval of several studied minor species, such as H2O2, HCl, NO, SO2, etc.

Meta-model for calculating the mean energy demand of automated storage and retrieval systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Energy efficiency has become an important research topic in intralogistics. Especially in this field the focus is placed on automated storage and retrieval systems (AS/RS) utilizing stacker cranes as these systems are widespread and consume a significant portion of the total energy demand of intralogistical systems. Numerical simulation models were developed to calculate the energy demand rather precisely for discrete single and dual command cycles. Unfortunately these simulation models are not suitable to perform fast calculations to determine a mean energy demand value of a complete storage aisle. For this purpose analytical approaches would be more convenient but until now analytical approaches only deliver results for certain configurations. In particular, for commonly used stacker cranes equipped with an intermediate circuit connection within their drive configuration there is no analytical approach available to calculate the mean energy demand. This article should address this research gap and present a calculation approach which enables planners to quickly calculate the energy demand of these systems.

A distributed, semiotic-inductive and human-Oriented approach to web-scale knowledge retrieval

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web-scale knowledge retrieval can be enabled by distributed information retrieval, clustering Web clients to a large-scale computing infrastructure for knowledge discovery from Web documents. Based on this infrastructure, we propose to apply semiotic (i.e., sub-syntactical) and inductive (i.e., probabilistic) methods for inferring concept associations in human knowledge. These associations can be combined to form a fuzzy (i.e.,gradual) semantic net representing a map of the knowledge in the Web. Thus, we propose to provide interactive visualizations of these cognitive concept maps to end users, who can browse and search the Web in a human-oriented, visual, and associative interface.

Modeling sub-threshold slope and DIBL mismatch of sub-22nm FinFet

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A great challenge for future information technologies is building reliable systems on top of unreliable components. Parameters of modern and future technology devices are affected by severe levels of process variability and devices will degrade and even fail during the normal lifeDme of the chip due to aging mechanisms. These extreme levels of variability are caused by the high device miniaturizaDon and the random placement of individual atoms. Variability is considered a "red brick" by the InternaDonal Technology Roadmap for Semiconductors. The session is devoted to this topic presenDng research experiences from the Spanish Network on Variability called VARIABLES. In this session a talk entlited "Modeling sub-threshold slope and DIBL mismatch of sub-22nm FinFet" was presented.

Contributions to Speech Analytics based on Speech Recognition and Topic Identification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La última década ha sido testigo de importantes avances en el campo de la tecnología de reconocimiento de voz. Los sistemas comerciales existentes actualmente poseen la capacidad de reconocer habla continua de múltiples locutores, consiguiendo valores aceptables de error, y sin la necesidad de realizar procedimientos explícitos de adaptación. A pesar del buen momento que vive esta tecnología, el reconocimiento de voz dista de ser un problema resuelto. La mayoría de estos sistemas de reconocimiento se ajustan a dominios particulares y su eficacia depende de manera significativa, entre otros muchos aspectos, de la similitud que exista entre el modelo de lenguaje utilizado y la tarea específica para la cual se está empleando. Esta dependencia cobra aún más importancia en aquellos escenarios en los cuales las propiedades estadísticas del lenguaje varían a lo largo del tiempo, como por ejemplo, en dominios de aplicación que involucren habla espontánea y múltiples temáticas. En los últimos años se ha evidenciado un constante esfuerzo por mejorar los sistemas de reconocimiento para tales dominios. Esto se ha hecho, entre otros muchos enfoques, a través de técnicas automáticas de adaptación. Estas técnicas son aplicadas a sistemas ya existentes, dado que exportar el sistema a una nueva tarea o dominio puede requerir tiempo a la vez que resultar costoso. Las técnicas de adaptación requieren fuentes adicionales de información, y en este sentido, el lenguaje hablado puede aportar algunas de ellas. El habla no sólo transmite un mensaje, también transmite información acerca del contexto en el cual se desarrolla la comunicación hablada (e.g. acerca del tema sobre el cual se está hablando). Por tanto, cuando nos comunicamos a través del habla, es posible identificar los elementos del lenguaje que caracterizan el contexto, y al mismo tiempo, rastrear los cambios que ocurren en estos elementos a lo largo del tiempo. Esta información podría ser capturada y aprovechada por medio de técnicas de recuperación de información (information retrieval) y de aprendizaje de máquina (machine learning). Esto podría permitirnos, dentro del desarrollo de mejores sistemas automáticos de reconocimiento de voz, mejorar la adaptación de modelos del lenguaje a las condiciones del contexto, y por tanto, robustecer al sistema de reconocimiento en dominios con condiciones variables (tales como variaciones potenciales en el vocabulario, el estilo y la temática). En este sentido, la principal contribución de esta Tesis es la propuesta y evaluación de un marco de contextualización motivado por el análisis temático y basado en la adaptación dinámica y no supervisada de modelos de lenguaje para el robustecimiento de un sistema automático de reconocimiento de voz. Esta adaptación toma como base distintos enfoque de los sistemas mencionados (de recuperación de información y aprendizaje de máquina) mediante los cuales buscamos identificar las temáticas sobre las cuales se está hablando en una grabación de audio. Dicha identificación, por lo tanto, permite realizar una adaptación del modelo de lenguaje de acuerdo a las condiciones del contexto. El marco de contextualización propuesto se puede dividir en dos sistemas principales: un sistema de identificación de temática y un sistema de adaptación dinámica de modelos de lenguaje. Esta Tesis puede describirse en detalle desde la perspectiva de las contribuciones particulares realizadas en cada uno de los campos que componen el marco propuesto: _ En lo referente al sistema de identificación de temática, nos hemos enfocado en aportar mejoras a las técnicas de pre-procesamiento de documentos, asimismo en contribuir a la definición de criterios más robustos para la selección de index-terms. – La eficiencia de los sistemas basados tanto en técnicas de recuperación de información como en técnicas de aprendizaje de máquina, y específicamente de aquellos sistemas que particularizan en la tarea de identificación de temática, depende, en gran medida, de los mecanismos de preprocesamiento que se aplican a los documentos. Entre las múltiples operaciones que hacen parte de un esquema de preprocesamiento, la selección adecuada de los términos de indexado (index-terms) es crucial para establecer relaciones semánticas y conceptuales entre los términos y los documentos. Este proceso también puede verse afectado, o bien por una mala elección de stopwords, o bien por la falta de precisión en la definición de reglas de lematización. En este sentido, en este trabajo comparamos y evaluamos diferentes criterios para el preprocesamiento de los documentos, así como también distintas estrategias para la selección de los index-terms. Esto nos permite no sólo reducir el tamaño de la estructura de indexación, sino también mejorar el proceso de identificación de temática. – Uno de los aspectos más importantes en cuanto al rendimiento de los sistemas de identificación de temática es la asignación de diferentes pesos a los términos de acuerdo a su contribución al contenido del documento. En este trabajo evaluamos y proponemos enfoques alternativos a los esquemas tradicionales de ponderado de términos (tales como tf-idf ) que nos permitan mejorar la especificidad de los términos, así como también discriminar mejor las temáticas de los documentos. _ Respecto a la adaptación dinámica de modelos de lenguaje, hemos dividimos el proceso de contextualización en varios pasos. – Para la generación de modelos de lenguaje basados en temática, proponemos dos tipos de enfoques: un enfoque supervisado y un enfoque no supervisado. En el primero de ellos nos basamos en las etiquetas de temática que originalmente acompañan a los documentos del corpus que empleamos. A partir de estas, agrupamos los documentos que forman parte de la misma temática y generamos modelos de lenguaje a partir de dichos grupos. Sin embargo, uno de los objetivos que se persigue en esta Tesis es evaluar si el uso de estas etiquetas para la generación de modelos es óptimo en términos del rendimiento del reconocedor. Por esta razón, nosotros proponemos un segundo enfoque, un enfoque no supervisado, en el cual el objetivo es agrupar, automáticamente, los documentos en clusters temáticos, basándonos en la similaridad semántica existente entre los documentos. Por medio de enfoques de agrupamiento conseguimos mejorar la cohesión conceptual y semántica en cada uno de los clusters, lo que a su vez nos permitió refinar los modelos de lenguaje basados en temática y mejorar el rendimiento del sistema de reconocimiento. – Desarrollamos diversas estrategias para generar un modelo de lenguaje dependiente del contexto. Nuestro objetivo es que este modelo refleje el contexto semántico del habla, i.e. las temáticas más relevantes que se están discutiendo. Este modelo es generado por medio de la interpolación lineal entre aquellos modelos de lenguaje basados en temática que estén relacionados con las temáticas más relevantes. La estimación de los pesos de interpolación está basada principalmente en el resultado del proceso de identificación de temática. – Finalmente, proponemos una metodología para la adaptación dinámica de un modelo de lenguaje general. El proceso de adaptación tiene en cuenta no sólo al modelo dependiente del contexto sino también a la información entregada por el proceso de identificación de temática. El esquema usado para la adaptación es una interpolación lineal entre el modelo general y el modelo dependiente de contexto. Estudiamos también diferentes enfoques para determinar los pesos de interpolación entre ambos modelos. Una vez definida la base teórica de nuestro marco de contextualización, proponemos su aplicación dentro de un sistema automático de reconocimiento de voz. Para esto, nos enfocamos en dos aspectos: la contextualización de los modelos de lenguaje empleados por el sistema y la incorporación de información semántica en el proceso de adaptación basado en temática. En esta Tesis proponemos un marco experimental basado en una arquitectura de reconocimiento en ‘dos etapas’. En la primera etapa, empleamos sistemas basados en técnicas de recuperación de información y aprendizaje de máquina para identificar las temáticas sobre las cuales se habla en una transcripción de un segmento de audio. Esta transcripción es generada por el sistema de reconocimiento empleando un modelo de lenguaje general. De acuerdo con la relevancia de las temáticas que han sido identificadas, se lleva a cabo la adaptación dinámica del modelo de lenguaje. En la segunda etapa de la arquitectura de reconocimiento, usamos este modelo adaptado para realizar de nuevo el reconocimiento del segmento de audio. Para determinar los beneficios del marco de trabajo propuesto, llevamos a cabo la evaluación de cada uno de los sistemas principales previamente mencionados. Esta evaluación es realizada sobre discursos en el dominio de la política usando la base de datos EPPS (European Parliamentary Plenary Sessions - Sesiones Plenarias del Parlamento Europeo) del proyecto europeo TC-STAR. Analizamos distintas métricas acerca del rendimiento de los sistemas y evaluamos las mejoras propuestas con respecto a los sistemas de referencia. ABSTRACT The last decade has witnessed major advances in speech recognition technology. Today’s commercial systems are able to recognize continuous speech from numerous speakers, with acceptable levels of error and without the need for an explicit adaptation procedure. Despite this progress, speech recognition is far from being a solved problem. Most of these systems are adjusted to a particular domain and their efficacy depends significantly, among many other aspects, on the similarity between the language model used and the task that is being addressed. This dependence is even more important in scenarios where the statistical properties of the language fluctuates throughout the time, for example, in application domains involving spontaneous and multitopic speech. Over the last years there has been an increasing effort in enhancing the speech recognition systems for such domains. This has been done, among other approaches, by means of techniques of automatic adaptation. These techniques are applied to the existing systems, specially since exporting the system to a new task or domain may be both time-consuming and expensive. Adaptation techniques require additional sources of information, and the spoken language could provide some of them. It must be considered that speech not only conveys a message, it also provides information on the context in which the spoken communication takes place (e.g. on the subject on which it is being talked about). Therefore, when we communicate through speech, it could be feasible to identify the elements of the language that characterize the context, and at the same time, to track the changes that occur in those elements over time. This information can be extracted and exploited through techniques of information retrieval and machine learning. This allows us, within the development of more robust speech recognition systems, to enhance the adaptation of language models to the conditions of the context, thus strengthening the recognition system for domains under changing conditions (such as potential variations in vocabulary, style and topic). In this sense, the main contribution of this Thesis is the proposal and evaluation of a framework of topic-motivated contextualization based on the dynamic and non-supervised adaptation of language models for the enhancement of an automatic speech recognition system. This adaptation is based on an combined approach (from the perspective of both information retrieval and machine learning fields) whereby we identify the topics that are being discussed in an audio recording. The topic identification, therefore, enables the system to perform an adaptation of the language model according to the contextual conditions. The proposed framework can be divided in two major systems: a topic identification system and a dynamic language model adaptation system. This Thesis can be outlined from the perspective of the particular contributions made in each of the fields that composes the proposed framework: _ Regarding the topic identification system, we have focused on the enhancement of the document preprocessing techniques in addition to contributing in the definition of more robust criteria for the selection of index-terms. – Within both information retrieval and machine learning based approaches, the efficiency of topic identification systems, depends, to a large extent, on the mechanisms of preprocessing applied to the documents. Among the many operations that encloses the preprocessing procedures, an adequate selection of index-terms is critical to establish conceptual and semantic relationships between terms and documents. This process might also be weakened by a poor choice of stopwords or lack of precision in defining stemming rules. In this regard we compare and evaluate different criteria for preprocessing the documents, as well as for improving the selection of the index-terms. This allows us to not only reduce the size of the indexing structure but also to strengthen the topic identification process. – One of the most crucial aspects, in relation to the performance of topic identification systems, is to assign different weights to different terms depending on their contribution to the content of the document. In this sense we evaluate and propose alternative approaches to traditional weighting schemes (such as tf-idf ) that allow us to improve the specificity of terms, and to better identify the topics that are related to documents. _ Regarding the dynamic language model adaptation, we divide the contextualization process into different steps. – We propose supervised and unsupervised approaches for the generation of topic-based language models. The first of them is intended to generate topic-based language models by grouping the documents, in the training set, according to the original topic labels of the corpus. Nevertheless, a goal of this Thesis is to evaluate whether or not the use of these labels to generate language models is optimal in terms of recognition accuracy. For this reason, we propose a second approach, an unsupervised one, in which the objective is to group the data in the training set into automatic topic clusters based on the semantic similarity between the documents. By means of clustering approaches we expect to obtain a more cohesive association of the documents that are related by similar concepts, thus improving the coverage of the topic-based language models and enhancing the performance of the recognition system. – We develop various strategies in order to create a context-dependent language model. Our aim is that this model reflects the semantic context of the current utterance, i.e. the most relevant topics that are being discussed. This model is generated by means of a linear interpolation between the topic-based language models related to the most relevant topics. The estimation of the interpolation weights is based mainly on the outcome of the topic identification process. – Finally, we propose a methodology for the dynamic adaptation of a background language model. The adaptation process takes into account the context-dependent model as well as the information provided by the topic identification process. The scheme used for the adaptation is a linear interpolation between the background model and the context-dependent one. We also study different approaches to determine the interpolation weights used in this adaptation scheme. Once we defined the basis of our topic-motivated contextualization framework, we propose its application into an automatic speech recognition system. We focus on two aspects: the contextualization of the language models used by the system, and the incorporation of semantic-related information into a topic-based adaptation process. To achieve this, we propose an experimental framework based in ‘a two stages’ recognition architecture. In the first stage of the architecture, Information Retrieval and Machine Learning techniques are used to identify the topics in a transcription of an audio segment. This transcription is generated by the recognition system using a background language model. According to the confidence on the topics that have been identified, the dynamic language model adaptation is carried out. In the second stage of the recognition architecture, an adapted language model is used to re-decode the utterance. To test the benefits of the proposed framework, we carry out the evaluation of each of the major systems aforementioned. The evaluation is conducted on speeches of political domain using the EPPS (European Parliamentary Plenary Sessions) database from the European TC-STAR project. We analyse several performance metrics that allow us to compare the improvements of the proposed systems against the baseline ones.

«
1
2
3
4
5
6
7
8
...
54
55
»