Biblioteca Digital

1000 resultados para weights identification

Using linear programming for weights identification of generalized bonferroni means in R

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The generalized Bonferroni mean is able to capture some interaction effects between variables and model mandatory requirements. We present a number of weights identification algorithms we have developed in the R programming language in order to model data using the generalized Bonferroni mean subject to various preferences. We then compare its accuracy when fitting to the journal ranks dataset.

Identification of weights in aggregation operators

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This chapter provides a review of various techniques for identification of weights in generalized mean and ordered weighted averaging aggregation operators, as well as identification of fuzzy measures in Choquet integral based operators. Our main focus is on using empirical data to compute the weights. We present a number of practical algorithms to identify the best aggregation operator that fits the data.

Nonlinear structural dynamical system identification using adaptive particle filters

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of identifying parameters of nonlinear vibrating systems using spatially incomplete, noisy, time-domain measurements is considered. The problem is formulated within the framework of dynamic state estimation formalisms that employ particle filters. The parameters of the system, which are to be identified, are treated as a set of random variables with finite number of discrete states. The study develops a procedure that combines a bank of self-learning particle filters with a global iteration strategy to estimate the probability distribution of the system parameters to be identified. Individual particle filters are based on the sequential importance sampling filter algorithm that is readily available in the existing literature. The paper develops the requisite recursive formulary for evaluating the evolution of weights associated with system parameter states. The correctness of the formulations developed is demonstrated first by applying the proposed procedure to a few linear vibrating systems for which an alternative solution using adaptive Kalman filter method is possible. Subsequently, illustrative examples on three nonlinear vibrating systems, using synthetic vibration data, are presented to reveal the correct functioning of the method. (c) 2007 Elsevier Ltd. All rights reserved.

Vitellins and vitellogenins of Dysdercus koenigii (Heteroptera : Pyrrhocoridae) - identification, purification and temporal pattern

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two vitellins, VtA and VtB, were purified from the eggs of Dysdercus koenigii by gel filtration and ion exchange chromatography. VtA and VtB have molecular weights of 290 and 260 kDa, respectively. Both Vts are glycolipoproteinaceous in nature. VtA is composed of three polypeptides of M-r 116, 92 and 62 kDa while VtB contained an additional subunit of M-r 40 kDa. All subunits except the 116-kDa subunit are glycolipopolypeptides. Polyclonal antibody raised against VtA (anti-VtA antibody) cross-reacted with VtB and also with vitellogenic haemolymph and ovaries and pre-vitellogenic fat bodies, but not with haemolymph from either adult male, fifth instar female, or pre-vitellogenic females demonstrating sex and stage specificity of the Vts. Immunoblots in the presence of anti-VtA revealed two proteins (of 290 and 260 kDa) in both vitellogenic haemolymph and pre-vitellogenic fat bodies that are recognised as D. koenigii Vgs. In newly emerged females, Vgs appeared on day 1 in fat bodies and on day 3 in haemolymph and ovaries. Vg concentration was maximum on day 2 in fat body, day 4 in haemolymph and day 7 in ovary. Although the biochemical and temporal characteristics of these proteins show similarity to some hemipterans, they are strikingly dissimilar with those of a very closely related species. (C) 1999 Elsevier Science Inc. All rights reserved.

Identification of Active Sources in Single-Channel Convolutive Mixtures Using Known Source Models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We address the problem of identifying the constituent sources in a single-sensor mixture signal consisting of contributions from multiple simultaneously active sources. We propose a generic framework for mixture signal analysis based on a latent variable approach. The basic idea of the approach is to detect known sources represented as stochastic models, in a single-channel mixture signal without performing signal separation. A given mixture signal is modeled as a convex combination of known source models and the weights of the models are estimated using the mixture signal. We show experimentally that these weights indicate the presence/absence of the respective sources. The performance of the proposed approach is illustrated through mixture speech data in a reverberant enclosure. For the task of identifying the constituent speakers using data from a single microphone, the proposed approach is able to identify the dominant source with up to 8 simultaneously active background sources in a room with RT60 = 250 ms, using models obtained from clean speech data for a Source to Interference Ratio (SIR) greater than 2 dB.

The regional types of China's floating population: Identification methods and spatial patterns

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the rapid increase of the number and influence of floating population in China, it is urgently needed to understand the regional types of China's floating population and their spatial characteristics. After reviewing the current methods for identifying regional types of floating population, this paper puts forward a new composite-index identification method and its modification version which is consisted of two indexes of the net migration rate and gross migration rate. Then, the traditional single-index and the new composite-index identification methods are empirically tested to explore their spatial patterns and characteristics by using China's 2000 census data at county level. The results show: (1) The composite-index identification method is much better than traditional single-index method because it can measure the migration direction and scale of floating simultaneously, and in particular it can identify the unique regional types of floating population with large scale of immigration and emigration. (2) The modified composite-index identification method, by using the share of a region's certain type of floating population to the total in China as weights, can effectively correct the over- or under-estimated errors due to the rather large or small total population of a region. (3) The spatial patterns of different regional types of China's floating population are closely related to the regional differentiation of their natural environment, population density and socio-economic development level. The three active regional types of floating population are mainly located in the eastern part of China with lower elevation, more than 800 mm precipitation, rather higher population densities and economic development levels.

Identification of Phenylethanoid Glycosides in Plant Extract of Plantago asiatica L. by Liquid Chromatography-Electrospray Ionization Mass Spectrometry

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present work describes a liquid chromatography-electrospray ionization mass spectrometry (LC-ESI-MS) method for rapid identification of phenylethanoid glycosides in plant extract from Plantago asiatica L. By using a binary mobile phase system consisting of 0.2% acetic acid and acetonitrile under gradient conditions, a good separation was achieved on a reversed-phase C-18 column. The [M-H](-) ions, the molecular weights, and the fragment ions of phenylethanoid glycosides were obtained in the negative ion mode using LC-ESI-MS. The identification of the phenylethanoid glycosides (peaks 1-3) in the extract of P. asiatica L. was based on matching their retention time, the detection of molecular ions, and the fragment ions obtained by collision-induced dissociation (CID) experiments with those of the authentic standards and data reported in the literature.

Investigation of the composition of linear alkylbenzenes with emphasis on the identification and quantitation of some trace compounds using GS/MS system in both electron impact and chemical ionization modes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Linear alkylbenzenes, LAB, formed by the Alel3 or HF catalyzed alkylation of benzene are common raw materials for surfactant manufacture. Normally they are sulphonated using S03 or oleum to give the corresponding linear alkylbenzene sulphonates In >95 % yield. As concern has grown about the environmental impact of surfactants,' questions have been raised about the trace levels of unreacted raw materials, linear alkylbenzenes and minor impurities present in them. With the advent of modem analytical instruments and techniques, namely GCIMS, the opportunity has arisen to identify the exact nature of these impurities and to determine the actual levels of them present in the commercial linear ,alkylbenzenes. The object of the proposed study was to separate, identify and quantify major and minor components (1-10%) in commercial linear alkylbenzenes. The focus of this study was on the structure elucidation and determination of impurities and on the qualitative determination of them in all analyzed linear alkylbenzene samples. A gas chromatography/mass spectrometry, (GCIMS) study was performed o~ five samples from the same manufacturer (different production dates) and then it was followed by the analyses of ten commercial linear alkylbenzenes from four different suppliers. All the major components, namely linear alkylbenzene isomers, followed the same elution pattern with the 2-phenyl isomer eluting last. The individual isomers were identified by interpretation of their electron impact and chemical ionization mass spectra. The percent isomer distribution was found to be different from sample to sample. Average molecular weights were calculated using two methods, GC and GCIMS, and compared with the results reported on the Certificate of Analyses (C.O.A.) provided by the manufacturers of commercial linear alkylbenzenes. The GC results in most cases agreed with the reported values, whereas GC/MS results were significantly lower, between 0.41 and 3.29 amu. The minor components, impurities such as branched alkylbenzenes and dialkyltetralins eluted according to their molecular weights. Their fragmentation patterns were studied using electron impact ionization mode and their molecular weight ions confirmed by a 'soft ionization technique', chemical ionization. The level of impurities present i~ the analyzed commercial linear alkylbenzenes was expressed as the percent of the total sample weight, as well as, in mg/g. The percent of impurities was observed to vary between 4.5 % and 16.8 % with the highest being in sample "I". Quantitation (mg/g) of impurities such as branched alkylbenzenes and dialkyltetralins was done using cis/trans-l,4,6,7-tetramethyltetralin as an internal standard. Samples were analyzed using .GC/MS system operating under full scan and single ion monitoring data acquisition modes. The latter data acquisition mode, which offers higher sensitivity, was used to analyze all samples under investigation for presence of linear dialkyltetralins. Dialkyltetralins were reported quantitatively, whereas branched alkylbenzenes were reported semi-qualitatively. The GC/MS method that was developed during the course of this study allowed identification of some other trace impurities present in commercial LABs. Compounds such as non-linear dialkyltetralins, dialkylindanes, diphenylalkanes and alkylnaphthalenes were identified but their detailed structure elucidation and the quantitation was beyond the scope of this study. However, further investigation of these compounds will be the subject of a future study.

Identification of nonlinear systems with a dynamic recurrent neural network

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two approaches are presented to calculate the weights for a Dynamic Recurrent Neural Network (DRNN) in order to identify the input-output dynamics of a class of nonlinear systems. The number of states of the identified network is constrained to be the same as the number of states of the plant.

The identification of QTL controlling ergot sclerotia size in hexaploid wheat implicates a role for the Rht dwarfing alleles

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fungal pathogen Claviceps purpurea infects ovaries of a broad range of temperate grasses and cereals, including hexaploid wheat, causing a disease commonly known as ergot. Sclerotia produced in place of seed carry a cocktail of harmful alkaloid compounds that result in a range of symptoms in humans and animals, causing ergotism. Following a field assessment of C. purpurea infection in winter wheat, two varieties ‘Robigus’ and ‘Solstice’ were selected which consistently produced the largest differential effect on ergot sclerotia weights. They were crossed to produce a doubled haploid mapping population, and a marker map, consisting of 714 genetic loci and a total length of 2895 cM was produced. Four ergot reducing QTL were identified using both sclerotia weight and size as phenotypic parameters; QCp.niab.2A and QCp.niab.4B being detected in the wheat variety ‘Robigus’, and QCp.niab.6A and QCp.niab.4D in the variety ‘Solstice’. The ergot resistance QTL QCp.niab.4B and QCp.niab.4D peaks mapped to the same markers as the known reduced height (Rht) loci on chromosomes 4B and 4D, Rht-B1 and Rht-D1, respectively. In both cases, the reduction in sclerotia weight and size was associated with the semi-dwarfing alleles, Rht-B1b from ‘Robigus’ and Rht-D1b from ‘Solstice’. Two-dimensional, two-QTL scans identified significant additive interactions between QTL QCp.niab.4B and QCp.niab.4D, and between QCp.niab.2A and QCp.niab.4B when looking at sclerotia size, but not between QCp.niab.2A and QCp.niab.4D. The two plant height QTL, QPh.niab.4B and QPh.niab.4D, which mapped to the same locations as QCp.niab.4B and QCp.niab.4D, also displayed significant genetic interactions.

Learning weights in the generalized OWA operators

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses identification of parameters of generalized ordered weighted averaging (GOWA) operators from empirical data. Similarly to ordinary OWA operators, GOWA are characterized by a vector of weights, as well as the power to which the arguments are raised. We develop optimization techniques which allow one to fit such operators to the observed data. We also generalize these methods for functional defined GOWA and generalized Choquet integral based aggregation operators.

Social-spider optimization-based artificial neural networks training and its applications for Parkinson's disease identification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evolutionary algorithms have been widely used for Artificial Neural Networks (ANN) training, being the idea to update the neurons' weights using social dynamics of living organisms in order to decrease the classification error. In this paper, we have introduced Social-Spider Optimization to improve the training phase of ANN with Multilayer perceptrons, and we validated the proposed approach in the context of Parkinson's Disease recognition. The experimental section has been carried out against with five other well-known meta-heuristics techniques, and it has shown SSO can be a suitable approach for ANN-MLP training step.

Identification of the N-Linked Glycosylation Sites of the Transcription Factor Rest and Effect of Glycosylation on DNA Binding and Transcriptional Activity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

REST is a zinc-finger transcription factor implicated in several processes such as maintenance of embryonic stem cell pluripotency and regulation of mitotic fidelity in non-neuronal cells [Chong et al., 1995]. The gene encodes for a 116-kDa protein that acts as a molecular platform for co-repressors recruitment and promotes modifications of DNA and histones [Ballas, 2005]. REST showed different apparent molecular weights, consistent with the possible presence of post-translational modifications [Lee et al., 2000]. Among these the most common is glycosylation, the covalent attachment of carbohydrates during or after protein synthesis [Apweiler et al., 1999] My thesis has ascertained, for the first time, the presence of glycan chians in the transcription factor REST. Through enzymatic deglycosylation and MS, oligosaccharide composition of glycan chains was evaluated: a complex mixture of glycans, composed of N-acetylgalactosamine, galactose and mannose, was observed thus confirming the presence of O- and N-linked glycan chains. Glycosylation site mapping was done using a 18O-labeling method and MS/MS and twelve potential N-glycosylation sites were identified. The most probable glycosylation target residues were mutated through site-directed mutagenesis and REST mutants were expressed in different cell lines. Variations in the protein molecular weight and mutant REST ability to bind the RE-1 sequence were analyzed. Gene reporter assays showed that, altogether, removal of N-linked glycan chains causes loss of transcriptional repressor function, except for mutant N59 which showed a slight residual repressor activity in presence of IGF-I. Taken togheter these results demonstrate the presence of complex glycan chians in the transcription factor REST: I have depicted their composition, started defining their position on the protein backbone and identified their possible role in the transcription factor functioning. Considering the crucial role of glycosylation and transcription factors activity in the aetiology of many diseases, any further knowledge could find important and interesting pharmacological application.

Identification and expression of different dehydrin subclasses involved in the drought response of Trifolium repens

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reverse transcribed RNAs coding for YnKn, YnSKn, SKn, and KS dehydrin types in drought-stressed white clover (Trifolium repens) were identified and characterized. The nucleotide analyses revealed the complex nature of dehydrin-coding sequences, often featured with alternative start and stop codons within the open reading frames, which could be a prerequisite for high variability among the transcripts originating from a single gene. For some dehydrin sequences, the existence of natural antisense transcripts was predicted. The differential distribution of dehydrin homologues in roots and leaves from a single white clover stolon under normal and drought conditions was evaluated by semi-quantitative RT-PCR and immunoblots with antibodies against the conserved K-, Y- and S-segments. The data suggest that different dehydrin classes have distinct roles in the drought stress response and vegetative development, demonstrating some specific characteristic features. Substantial levels of YSK-type proteins with different molecular weights were immunodetected in the non-stressed developing leaves. The acidic SK2 and KS dehydrin transcripts exhibited some developmental gradient in leaves. A strong increase of YK transcripts was documented in the fully expanded leaves and roots of drought-stressed individuals. The immunodetected drought-induced signals imply that Y- and K-segment containing dehydrins could be the major inducible Late Embryogenesis Abundant class 2 proteins (LEA 2) that accumulate predominantly under drought.

Contributions to Speech Analytics based on Speech Recognition and Topic Identification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La última década ha sido testigo de importantes avances en el campo de la tecnología de reconocimiento de voz. Los sistemas comerciales existentes actualmente poseen la capacidad de reconocer habla continua de múltiples locutores, consiguiendo valores aceptables de error, y sin la necesidad de realizar procedimientos explícitos de adaptación. A pesar del buen momento que vive esta tecnología, el reconocimiento de voz dista de ser un problema resuelto. La mayoría de estos sistemas de reconocimiento se ajustan a dominios particulares y su eficacia depende de manera significativa, entre otros muchos aspectos, de la similitud que exista entre el modelo de lenguaje utilizado y la tarea específica para la cual se está empleando. Esta dependencia cobra aún más importancia en aquellos escenarios en los cuales las propiedades estadísticas del lenguaje varían a lo largo del tiempo, como por ejemplo, en dominios de aplicación que involucren habla espontánea y múltiples temáticas. En los últimos años se ha evidenciado un constante esfuerzo por mejorar los sistemas de reconocimiento para tales dominios. Esto se ha hecho, entre otros muchos enfoques, a través de técnicas automáticas de adaptación. Estas técnicas son aplicadas a sistemas ya existentes, dado que exportar el sistema a una nueva tarea o dominio puede requerir tiempo a la vez que resultar costoso. Las técnicas de adaptación requieren fuentes adicionales de información, y en este sentido, el lenguaje hablado puede aportar algunas de ellas. El habla no sólo transmite un mensaje, también transmite información acerca del contexto en el cual se desarrolla la comunicación hablada (e.g. acerca del tema sobre el cual se está hablando). Por tanto, cuando nos comunicamos a través del habla, es posible identificar los elementos del lenguaje que caracterizan el contexto, y al mismo tiempo, rastrear los cambios que ocurren en estos elementos a lo largo del tiempo. Esta información podría ser capturada y aprovechada por medio de técnicas de recuperación de información (information retrieval) y de aprendizaje de máquina (machine learning). Esto podría permitirnos, dentro del desarrollo de mejores sistemas automáticos de reconocimiento de voz, mejorar la adaptación de modelos del lenguaje a las condiciones del contexto, y por tanto, robustecer al sistema de reconocimiento en dominios con condiciones variables (tales como variaciones potenciales en el vocabulario, el estilo y la temática). En este sentido, la principal contribución de esta Tesis es la propuesta y evaluación de un marco de contextualización motivado por el análisis temático y basado en la adaptación dinámica y no supervisada de modelos de lenguaje para el robustecimiento de un sistema automático de reconocimiento de voz. Esta adaptación toma como base distintos enfoque de los sistemas mencionados (de recuperación de información y aprendizaje de máquina) mediante los cuales buscamos identificar las temáticas sobre las cuales se está hablando en una grabación de audio. Dicha identificación, por lo tanto, permite realizar una adaptación del modelo de lenguaje de acuerdo a las condiciones del contexto. El marco de contextualización propuesto se puede dividir en dos sistemas principales: un sistema de identificación de temática y un sistema de adaptación dinámica de modelos de lenguaje. Esta Tesis puede describirse en detalle desde la perspectiva de las contribuciones particulares realizadas en cada uno de los campos que componen el marco propuesto: _ En lo referente al sistema de identificación de temática, nos hemos enfocado en aportar mejoras a las técnicas de pre-procesamiento de documentos, asimismo en contribuir a la definición de criterios más robustos para la selección de index-terms. – La eficiencia de los sistemas basados tanto en técnicas de recuperación de información como en técnicas de aprendizaje de máquina, y específicamente de aquellos sistemas que particularizan en la tarea de identificación de temática, depende, en gran medida, de los mecanismos de preprocesamiento que se aplican a los documentos. Entre las múltiples operaciones que hacen parte de un esquema de preprocesamiento, la selección adecuada de los términos de indexado (index-terms) es crucial para establecer relaciones semánticas y conceptuales entre los términos y los documentos. Este proceso también puede verse afectado, o bien por una mala elección de stopwords, o bien por la falta de precisión en la definición de reglas de lematización. En este sentido, en este trabajo comparamos y evaluamos diferentes criterios para el preprocesamiento de los documentos, así como también distintas estrategias para la selección de los index-terms. Esto nos permite no sólo reducir el tamaño de la estructura de indexación, sino también mejorar el proceso de identificación de temática. – Uno de los aspectos más importantes en cuanto al rendimiento de los sistemas de identificación de temática es la asignación de diferentes pesos a los términos de acuerdo a su contribución al contenido del documento. En este trabajo evaluamos y proponemos enfoques alternativos a los esquemas tradicionales de ponderado de términos (tales como tf-idf ) que nos permitan mejorar la especificidad de los términos, así como también discriminar mejor las temáticas de los documentos. _ Respecto a la adaptación dinámica de modelos de lenguaje, hemos dividimos el proceso de contextualización en varios pasos. – Para la generación de modelos de lenguaje basados en temática, proponemos dos tipos de enfoques: un enfoque supervisado y un enfoque no supervisado. En el primero de ellos nos basamos en las etiquetas de temática que originalmente acompañan a los documentos del corpus que empleamos. A partir de estas, agrupamos los documentos que forman parte de la misma temática y generamos modelos de lenguaje a partir de dichos grupos. Sin embargo, uno de los objetivos que se persigue en esta Tesis es evaluar si el uso de estas etiquetas para la generación de modelos es óptimo en términos del rendimiento del reconocedor. Por esta razón, nosotros proponemos un segundo enfoque, un enfoque no supervisado, en el cual el objetivo es agrupar, automáticamente, los documentos en clusters temáticos, basándonos en la similaridad semántica existente entre los documentos. Por medio de enfoques de agrupamiento conseguimos mejorar la cohesión conceptual y semántica en cada uno de los clusters, lo que a su vez nos permitió refinar los modelos de lenguaje basados en temática y mejorar el rendimiento del sistema de reconocimiento. – Desarrollamos diversas estrategias para generar un modelo de lenguaje dependiente del contexto. Nuestro objetivo es que este modelo refleje el contexto semántico del habla, i.e. las temáticas más relevantes que se están discutiendo. Este modelo es generado por medio de la interpolación lineal entre aquellos modelos de lenguaje basados en temática que estén relacionados con las temáticas más relevantes. La estimación de los pesos de interpolación está basada principalmente en el resultado del proceso de identificación de temática. – Finalmente, proponemos una metodología para la adaptación dinámica de un modelo de lenguaje general. El proceso de adaptación tiene en cuenta no sólo al modelo dependiente del contexto sino también a la información entregada por el proceso de identificación de temática. El esquema usado para la adaptación es una interpolación lineal entre el modelo general y el modelo dependiente de contexto. Estudiamos también diferentes enfoques para determinar los pesos de interpolación entre ambos modelos. Una vez definida la base teórica de nuestro marco de contextualización, proponemos su aplicación dentro de un sistema automático de reconocimiento de voz. Para esto, nos enfocamos en dos aspectos: la contextualización de los modelos de lenguaje empleados por el sistema y la incorporación de información semántica en el proceso de adaptación basado en temática. En esta Tesis proponemos un marco experimental basado en una arquitectura de reconocimiento en ‘dos etapas’. En la primera etapa, empleamos sistemas basados en técnicas de recuperación de información y aprendizaje de máquina para identificar las temáticas sobre las cuales se habla en una transcripción de un segmento de audio. Esta transcripción es generada por el sistema de reconocimiento empleando un modelo de lenguaje general. De acuerdo con la relevancia de las temáticas que han sido identificadas, se lleva a cabo la adaptación dinámica del modelo de lenguaje. En la segunda etapa de la arquitectura de reconocimiento, usamos este modelo adaptado para realizar de nuevo el reconocimiento del segmento de audio. Para determinar los beneficios del marco de trabajo propuesto, llevamos a cabo la evaluación de cada uno de los sistemas principales previamente mencionados. Esta evaluación es realizada sobre discursos en el dominio de la política usando la base de datos EPPS (European Parliamentary Plenary Sessions - Sesiones Plenarias del Parlamento Europeo) del proyecto europeo TC-STAR. Analizamos distintas métricas acerca del rendimiento de los sistemas y evaluamos las mejoras propuestas con respecto a los sistemas de referencia. ABSTRACT The last decade has witnessed major advances in speech recognition technology. Today’s commercial systems are able to recognize continuous speech from numerous speakers, with acceptable levels of error and without the need for an explicit adaptation procedure. Despite this progress, speech recognition is far from being a solved problem. Most of these systems are adjusted to a particular domain and their efficacy depends significantly, among many other aspects, on the similarity between the language model used and the task that is being addressed. This dependence is even more important in scenarios where the statistical properties of the language fluctuates throughout the time, for example, in application domains involving spontaneous and multitopic speech. Over the last years there has been an increasing effort in enhancing the speech recognition systems for such domains. This has been done, among other approaches, by means of techniques of automatic adaptation. These techniques are applied to the existing systems, specially since exporting the system to a new task or domain may be both time-consuming and expensive. Adaptation techniques require additional sources of information, and the spoken language could provide some of them. It must be considered that speech not only conveys a message, it also provides information on the context in which the spoken communication takes place (e.g. on the subject on which it is being talked about). Therefore, when we communicate through speech, it could be feasible to identify the elements of the language that characterize the context, and at the same time, to track the changes that occur in those elements over time. This information can be extracted and exploited through techniques of information retrieval and machine learning. This allows us, within the development of more robust speech recognition systems, to enhance the adaptation of language models to the conditions of the context, thus strengthening the recognition system for domains under changing conditions (such as potential variations in vocabulary, style and topic). In this sense, the main contribution of this Thesis is the proposal and evaluation of a framework of topic-motivated contextualization based on the dynamic and non-supervised adaptation of language models for the enhancement of an automatic speech recognition system. This adaptation is based on an combined approach (from the perspective of both information retrieval and machine learning fields) whereby we identify the topics that are being discussed in an audio recording. The topic identification, therefore, enables the system to perform an adaptation of the language model according to the contextual conditions. The proposed framework can be divided in two major systems: a topic identification system and a dynamic language model adaptation system. This Thesis can be outlined from the perspective of the particular contributions made in each of the fields that composes the proposed framework: _ Regarding the topic identification system, we have focused on the enhancement of the document preprocessing techniques in addition to contributing in the definition of more robust criteria for the selection of index-terms. – Within both information retrieval and machine learning based approaches, the efficiency of topic identification systems, depends, to a large extent, on the mechanisms of preprocessing applied to the documents. Among the many operations that encloses the preprocessing procedures, an adequate selection of index-terms is critical to establish conceptual and semantic relationships between terms and documents. This process might also be weakened by a poor choice of stopwords or lack of precision in defining stemming rules. In this regard we compare and evaluate different criteria for preprocessing the documents, as well as for improving the selection of the index-terms. This allows us to not only reduce the size of the indexing structure but also to strengthen the topic identification process. – One of the most crucial aspects, in relation to the performance of topic identification systems, is to assign different weights to different terms depending on their contribution to the content of the document. In this sense we evaluate and propose alternative approaches to traditional weighting schemes (such as tf-idf ) that allow us to improve the specificity of terms, and to better identify the topics that are related to documents. _ Regarding the dynamic language model adaptation, we divide the contextualization process into different steps. – We propose supervised and unsupervised approaches for the generation of topic-based language models. The first of them is intended to generate topic-based language models by grouping the documents, in the training set, according to the original topic labels of the corpus. Nevertheless, a goal of this Thesis is to evaluate whether or not the use of these labels to generate language models is optimal in terms of recognition accuracy. For this reason, we propose a second approach, an unsupervised one, in which the objective is to group the data in the training set into automatic topic clusters based on the semantic similarity between the documents. By means of clustering approaches we expect to obtain a more cohesive association of the documents that are related by similar concepts, thus improving the coverage of the topic-based language models and enhancing the performance of the recognition system. – We develop various strategies in order to create a context-dependent language model. Our aim is that this model reflects the semantic context of the current utterance, i.e. the most relevant topics that are being discussed. This model is generated by means of a linear interpolation between the topic-based language models related to the most relevant topics. The estimation of the interpolation weights is based mainly on the outcome of the topic identification process. – Finally, we propose a methodology for the dynamic adaptation of a background language model. The adaptation process takes into account the context-dependent model as well as the information provided by the topic identification process. The scheme used for the adaptation is a linear interpolation between the background model and the context-dependent one. We also study different approaches to determine the interpolation weights used in this adaptation scheme. Once we defined the basis of our topic-motivated contextualization framework, we propose its application into an automatic speech recognition system. We focus on two aspects: the contextualization of the language models used by the system, and the incorporation of semantic-related information into a topic-based adaptation process. To achieve this, we propose an experimental framework based in ‘a two stages’ recognition architecture. In the first stage of the architecture, Information Retrieval and Machine Learning techniques are used to identify the topics in a transcription of an audio segment. This transcription is generated by the recognition system using a background language model. According to the confidence on the topics that have been identified, the dynamic language model adaptation is carried out. In the second stage of the recognition architecture, an adapted language model is used to re-decode the utterance. To test the benefits of the proposed framework, we carry out the evaluation of each of the major systems aforementioned. The evaluation is conducted on speeches of political domain using the EPPS (European Parliamentary Plenary Sessions) database from the European TC-STAR project. We analyse several performance metrics that allow us to compare the improvements of the proposed systems against the baseline ones.

«
1
2
3
4
5
6
7
8
...
66
67
»