41 resultados para Traditional clustering
Resumo:
Desentrañar el funcionamiento del cerebro es uno de los principales desafíos a los que se enfrenta la ciencia actual. Un área de estudio que ha despertado muchas expectativas e interés es el análisis de la estructura cortical desde el punto de vista morfológico, de manera que se cree una simulación del cerebro a nivel molecular. Con ello se espera poder profundizar en el estudio de numerosas enfermedades neurológicas y patológicas. Con el desarrollo de este proyecto se persigue el estudio del soma y de las espinas desde el punto de vista de la neuromorfología teórica. Es común en el estado del arte que en el análisis de las características morfológicas de una neurona en tres dimensiones el soma sea ignorado o, en el mejor de los casos, que sea sustituido por una simple esfera. De hecho, el concepto de soma resulta abstracto porque no se dispone de una dfinición estricta y robusta que especifique exactamente donde finaliza y comienzan las dendritas. En este proyecto se alcanza por primera vez una definición matemática de soma para determinar qué es el soma. Con el fin de simular somas se ahonda en los atributos utilizados en el estado del arte. Estas propiedades, de índole genérica, no especifican una morfología única. Es por ello que se propone un método que agrupe propiedades locales y globales de la morfología. En disposición de las características se procede con la categorización del cuerpo celular en distintas clases a partir de un nuevo subtipo de red bayesiana dinámica adaptada al espacio. Con ello se discute la existencia de distintas clases de somas y se descubren las diferencias entre los somas piramidales de distintas capas del cerebro. A partir del modelo matemático se simulan por primera vez somas virtuales. Algunas morfologías de espinas han sido atribuidas a ciertos comportamientos cognitivos. Por ello resulta de interés dictaminar las clases existentes y relacionarlas con funciones de la actividad cerebral. La clasificación más extendida (Peters y Kaiserman-Abramof, 1970) presenta una definición ambigua y subjetiva dependiente de la interpretación de cada individuo y por tanto discutible. Este estudio se sustenta en un conjunto de descriptores extraídos mediante una técnica de análisis topológico local para representaciones 3D. Sobre estos datos se trata de alcanzar el conjunto de clases más adecuado en el que agrupar las espinas así como de describir cada grupo mediante reglas unívocas. A partir de los resultados, se discute la existencia de un continuo de espinas y las propiedades que caracterizan a cada subtipo de espina. ---ABSTRACT---Unravel how the brain works is one of the main challenges faced by current science. A field of study which has aroused great expectations and interest is the analysis of the cortical structure from a morphological point of view, so that a molecular level simulation of the brain is achieved. This is expected to deepen the study of many neurological and pathological diseases. This project seeks the study of the soma and spines from the theoretical neuromorphology point of view. In the state of the art it is common that when it comes to analyze the morphological characteristics of a three dimension neuron the soma is ignored or, in the best case, it is replaced by a simple sphere. In fact, the concept of soma is abstract because there is not a robust and strict definition on exactly where it ends and dendrites begin. In this project a mathematical definition is reached for the first time to determine what a soma is. With the aim to simulate somas the atributes applied in the state of the art are studied. These properties, generic in nature, do not specify a unique morphology. It is why it was proposed a method to group local and global morphology properties. In arrangement of the characteristics it was proceed with the categorization of the celular body into diferent classes by using a new subtype of dynamic Bayesian network adapted to space. From the result the existance of different classes of somas and diferences among pyramidal somas from distinct brain layers are discovered. From the mathematical model virtual somas were simulated for the first time. Some morphologies of spines have been attributed to certain cognitive behaviours. For this reason it is interesting to rule the existent classes and to relate them with their functions in the brain activity. The most extended classification (Peters y Kaiserman-Abramof, 1970) presents an ambiguous and subjective definition that relies on the interpretation of each individual and consequently it is arguable. This study was based on the set of descriptors extracted from a local topological analysis technique for 3D representations. On these data it was tried to reach the most suitable set of classes to group the spines as well as to describe each cluster by unambiguous rules. From these results, the existance of a continuum of spines and the properties that characterize each spine subtype were discussed .
Resumo:
This paper presents a detailed genetic study of Castanea sativa in El Bierzo, a major nut production region with interesting features. It is located within a glacial refuge at one extreme of the distribution area (northwest Spain); it has a centenary tradition of chestnut management; and more importantly, it shows an unusual degree of genetic isolation. Seven nuclear microsatellite markers were selected to analyze the genetic variability and structure of 169 local trees grafted for nut production. We analyzed in the same manner 62 local nuts. The selected loci were highly discriminant for the genotypes studied, giving a combined probability of identity of 6.1 × 10−6. An unprecedented density of trees was sampled for this project over the entire region, and nuts were collected representing 18 cultivars marketed by local producers. Several instances of misclassification by local growers were detected. Fixation index estimates and analysis of molecular variance (AMOVA) data are supportive of an unexpectedly high level of genetic differentiation in El Bierzo, larger than that estimated in a previous study with broader geographical scope but based on limited local sampling (Pereira-Lorenzo et al., Tree Genet Genomes 6: 701–715, 2010a). Likewise, we have determined that clonality due to grafting had been previously overestimated. In line with these observations, no significant spatial structure was found using both a model-based Bayesian procedure and Mantel’s tests. Taken together, our results evidence the need for more fine-scale genetic studies if conservation strategies are to be efficiently improved.
Resumo:
La arquitectura china ha experimentado grandes cambios a lo largo de un extenso proceso histórico. El hito de mayor importancia es el que da paso al denominado Tiempo Moderno, periodo en el cual irrumpe por vez primera en China la arquitectura occidental, que comienza a tener una influencia muy activa y significativa sobre los rasgos y la identidad de la arquitectura tradicional china, hasta ese momento el único estilo o forma de hacer –muy diferente, en cuanto a su concepción y fisonomía, de los planteamientos occidentales- que había sobrevivido sin desvíos significativos, configurando un panorama milenario bastante homogéneo en los aspectos técnicos y artísticos en el desarrollo de esa arquitectura. Por un cúmulo de complejas razones, la mayor parte de la arquitectura china del periodo feudal -es decir el que forman todos los años anteriores a 1849- ha desaparecido. Sin embargo, desde la fecha indicada hasta la Revolución de 1949 (el denominado periodo semicolonial o semifeudal), sí se conservan muchas edificaciones, que fueron mejor construidas y mantenidas luego, destacando por su importancia en ese sentido las iglesias cristianas. Dichos templos representan cronológicamente, no sólo la primera irrupción de la arquitectura clásica occidental en China, sino el inicio de un proceso de modernización de la profundamente enraizada y, en buena medida, estancada arquitectura vernácula, combinando técnicas y estilos de ambos planteamientos, para dar como resultado originales edificaciones de un singular eclecticismo que caracterizarían buena parte de la arquitectura de dicha etapa semicolonial. En términos generales, últimamente se ha ido prestando cada vez más atención a esta arquitectura de los tiempos modernos, aunque las iglesias cristianas de la provincia de Shaanxi no han sido objeto de estudio específico, a pesar de que su tipología es muy representativa de las construcciones de esta clase en otras regiones del interior de China. La investigación que desarrolla la presente tesis doctoral sale al paso de esa deficiencia, abriendo puertas a la continuación del trabajo referido, extendido a otras zonas o arquitecturas, y, por extensión, a la profundización analítica de la hibridación arquitectónica y cultural entre China y Occidente. Sobre las bases de investigación documental, estudios de campo y dibujo, la tesis plantea un estudio aclaratorio de los rasgos y raíces de la arquitectura tradicional china, al que sigue otro histórico y tipológico de los templos cristianos en la provincia de Shaanxi, deteniéndose en sus características fundamentales, situación (uso) actual y estado de conservación. Se ha considerado imprescindible añadir al trabajo, como apéndice, un elaborado glosario conceptual ilustrado de términos básicos arquitectónicos y constructivos, en chino, inglés y español. ABSTRACT The Chinese architecture has gone through great changes during the long process of history. The tremendous changing period was the named Modern Times of China when, for the very first time, the western architecture was introduced into China and became to influence majorly on the traditional Chinese architecture. Before that, the traditional Chinese architecture which has its own, yet totally different system from the occidental architecture system was the only architectural style could be found in China. Although, due to many historical, conceptual and architectural characteristic reasons, large amount of the ancient Chinese architecture built in the feudal China was not preserved, there are a lot of buildings of semi-feudal China that was well constructed and conserved. The most important architectural type of the semi-feudal China is the Christian Churches. It was not only the first western architectural form that was brought into and well developed in China, but also was the beginner of the modernization process of Chinese architecture. Because of the deep root of the 2000-year traditional Chinese architecture, all the Christian Churches built in China during the semi-colonial society has a combined style of both the traditional Chinese architecture and the classic western churches. They are a priceless asset of the Chinese architectural history. Recently, more and more attention had been paid on the Chinese Modern Times architecture, however, the Christian Churches in Shaanxi Province, the province which has a unique history with the Christian, but less economically developed have never been researched yet. The Christian Churches of Shaanxi Province reflect the general feature of developing history of the Christian Churches of common inner-land regions in China. The research opens the door to further study on other Christian Churches and related buildings, and also for the further study on the Chinese-western architectural and culture communication. On the base of document research, field survey and mapping, in this thesis, an in-depth study had been done on the general history of the features and roots of the traditional Chinese architecture, the developing history of the Christian Churches of Shaanxi Province and the architectural types, examples, characteristics, present situation and conservation status. By comparing the Christian Churches of the cities in Shaanxi province to the Christian Churches in other more developed cities, and by comparing the Christian Churches in China to the classic western churches, the architectural combination feature of the Christian Churches in China are highlighted. The thesis is a fundamental research on which many further studies about the architectural developing history, characteristics and conservation of the Christian Churches in China could be done. It is considered essential to add to the work, as an appendix, an elaborate conceptual illustrated glossary of architectural and construction terms in Chinese, English and Spanish.
Resumo:
Objectives: A recently introduced pragmatic scheme promises to be a useful catalog of interneuron names.We sought to automatically classify digitally reconstructed interneuronal morphologies according tothis scheme. Simultaneously, we sought to discover possible subtypes of these types that might emergeduring automatic classification (clustering). We also investigated which morphometric properties weremost relevant for this classification.Materials and methods: A set of 118 digitally reconstructed interneuronal morphologies classified into thecommon basket (CB), horse-tail (HT), large basket (LB), and Martinotti (MA) interneuron types by 42 of theworld?s leading neuroscientists, quantified by five simple morphometric properties of the axon and fourof the dendrites. We labeled each neuron with the type most commonly assigned to it by the experts. Wethen removed this class information for each type separately, and applied semi-supervised clustering tothose cells (keeping the others? cluster membership fixed), to assess separation from other types and lookfor the formation of new groups (subtypes). We performed this same experiment unlabeling the cells oftwo types at a time, and of half the cells of a single type at a time. The clustering model is a finite mixtureof Gaussians which we adapted for the estimation of local (per-cluster) feature relevance. We performedthe described experiments on three different subsets of the data, formed according to how many expertsagreed on type membership: at least 18 experts (the full data set), at least 21 (73 neurons), and at least26 (47 neurons).Results: Interneurons with more reliable type labels were classified more accurately. We classified HTcells with 100% accuracy, MA cells with 73% accuracy, and CB and LB cells with 56% and 58% accuracy,respectively. We identified three subtypes of the MA type, one subtype of CB and LB types each, andno subtypes of HT (it was a single, homogeneous type). We got maximum (adapted) Silhouette widthand ARI values of 1, 0.83, 0.79, and 0.42, when unlabeling the HT, CB, LB, and MA types, respectively,confirming the quality of the formed cluster solutions. The subtypes identified when unlabeling a singletype also emerged when unlabeling two types at a time, confirming their validity. Axonal morphometricproperties were more relevant that dendritic ones, with the axonal polar histogram length in the [pi, 2pi) angle interval being particularly useful.Conclusions: The applied semi-supervised clustering method can accurately discriminate among CB, HT, LB, and MA interneuron types while discovering potential subtypes, and is therefore useful for neuronal classification. The discovery of potential subtypes suggests that some of these types are more heteroge-neous that previously thought. Finally, axonal variables seem to be more relevant than dendritic ones fordistinguishing among the CB, HT, LB, and MA interneuron types.
Resumo:
Recent advances in non-destructive imaging techniques, such as X-ray computed tomography (CT), make it possible to analyse pore space features from the direct visualisation from soil structures. A quantitative characterisation of the three-dimensional solid-pore architecture is important to understand soil mechanics, as they relate to the control of biological, chemical, and physical processes across scales. This analysis technique therefore offers an opportunity to better interpret soil strata, as new and relevant information can be obtained. In this work, we propose an approach to automatically identify the pore structure of a set of 200-2D images that represent slices of an original 3D CT image of a soil sample, which can be accomplished through non-linear enhancement of the pixel grey levels and an image segmentation based on a PFCM (Possibilistic Fuzzy C-Means) algorithm. Once the solids and pore spaces have been identified, the set of 200-2D images is then used to reconstruct an approximation of the soil sample by projecting only the pore spaces. This reconstruction shows the structure of the soil and its pores, which become more bounded, less bounded, or unbounded with changes in depth. If the soil sample image quality is sufficiently favourable in terms of contrast, noise and sharpness, the pore identification is less complicated, and the PFCM clustering algorithm can be used without additional processing; otherwise, images require pre-processing before using this algorithm. Promising results were obtained with four soil samples, the first of which was used to show the algorithm validity and the additional three were used to demonstrate the robustness of our proposal. The methodology we present here can better detect the solid soil and pore spaces on CT images, enabling the generation of better 2D?3D representations of pore structures from segmented 2D images.
Resumo:
Esta investigación avanza en el intento de recuperar la técnica tradicional del estuco a fuego, paradigma de la decoración de espacios interiores hasta la mitad del s. XX por sus imitaciones marmóreas de gran calidad. Para ejecutarlos se utilizaban morteros de cal con áridos de mármol, pigmentos minerales y grasas; con estos materiales se preparaban y tendían las masas, se pintaban al fresco y se terminaban con el paso de una herramienta metálica caliente. Como sucede con otras técnicas, el paso del tiempo ha difuminado los conocimientos sobre la materia, haciendo muy difícil la conservación o restauración de los ejemplos en nuestro patrimonio. A partir de la escasa bibliografía existente se ha recuperado el proceso, sometiendo las muestras elaboradas a diferentes ensayos que han caracterizado el acabado final,tanto en sus características intrínsecas como en las que le son conferidas al proteger finalmente las muestras con una capa de cera, tal y como se realizaba tradicionalmente.---ABSTRACT---This research advances in the attempt to recover the traditional technique of stucco by fire; paradigm of indoor decoration until half of 20th century due to its high quality marbled imitations. On its implementation process, it was used lime mortars with marble aggregates, mineral pigments and greases. With these materials, masses were prepared and extended into the walls, they were painted in fresco and ended with the passage of a heated metal tool. As with any other technique, the time passing over has blurred the knowledge on the subject, making very difficult the preservation or restoration of examples in our heritage. Despite of the limited literature, the process has been recovered; subjecting the samples prepared at different trials that have characterized the finished, both in its merits and its conferred characteristics by finally protecting the samples with a layer of wax, as was done traditionally.
Resumo:
Automatic 2D-to-3D conversion is an important application for filling the gap between the increasing number of 3D displays and the still scant 3D content. However, existing approaches have an excessive computational cost that complicates its practical application. In this paper, a fast automatic 2D-to-3D conversion technique is proposed, which uses a machine learning framework to infer the 3D structure of a query color image from a training database with color and depth images. Assuming that photometrically similar images have analogous 3D structures, a depth map is estimated by searching the most similar color images in the database, and fusing the corresponding depth maps. Large databases are desirable to achieve better results, but the computational cost also increases. A clustering-based hierarchical search using compact SURF descriptors to characterize images is proposed to drastically reduce search times. A significant computational time improvement has been obtained regarding other state-of-the-art approaches, maintaining the quality results.
Resumo:
Most red wines commercialized in the market use the malolactic fermentationprocess in order to ensure stability from a microbiological point of view. In this secondfermentation, malic acid is converted into L-lactic acid under controlled setups. Howeverthis process is not free from possible collateral effects that on some occasions produceoff-flavors, wine quality loss and human health problems. In warm viticulture regions suchas the south of Spain, the risk of suffering a deviation during the malolactic fermentationprocess increases due to the high must pH. This contributes to produce wines with highvolatile acidity and biogenic amine values. This manuscript develops a new red winemakingmethodology that consists of combining the use of two non-Saccharomyces yeast strains asan alternative to the traditional malolactic fermentation. In this method, malic acid is totallyconsumed by Schizosaccharomyces pombe, thus achieving the microbiological stabilizationobjective, while Lachancea thermotolerans produces lactic acid in order not to reduce andeven increase the acidity of wines produced from low acidity musts. This technique reducesthe risks inherent to the malolactic fermentation process when performed in warm regions.The result is more fruity wines that contain less acetic acid and biogenic amines than thetraditional controls that have undergone the classical malolactic fermentation.
Resumo:
La última década ha sido testigo de importantes avances en el campo de la tecnología de reconocimiento de voz. Los sistemas comerciales existentes actualmente poseen la capacidad de reconocer habla continua de múltiples locutores, consiguiendo valores aceptables de error, y sin la necesidad de realizar procedimientos explícitos de adaptación. A pesar del buen momento que vive esta tecnología, el reconocimiento de voz dista de ser un problema resuelto. La mayoría de estos sistemas de reconocimiento se ajustan a dominios particulares y su eficacia depende de manera significativa, entre otros muchos aspectos, de la similitud que exista entre el modelo de lenguaje utilizado y la tarea específica para la cual se está empleando. Esta dependencia cobra aún más importancia en aquellos escenarios en los cuales las propiedades estadísticas del lenguaje varían a lo largo del tiempo, como por ejemplo, en dominios de aplicación que involucren habla espontánea y múltiples temáticas. En los últimos años se ha evidenciado un constante esfuerzo por mejorar los sistemas de reconocimiento para tales dominios. Esto se ha hecho, entre otros muchos enfoques, a través de técnicas automáticas de adaptación. Estas técnicas son aplicadas a sistemas ya existentes, dado que exportar el sistema a una nueva tarea o dominio puede requerir tiempo a la vez que resultar costoso. Las técnicas de adaptación requieren fuentes adicionales de información, y en este sentido, el lenguaje hablado puede aportar algunas de ellas. El habla no sólo transmite un mensaje, también transmite información acerca del contexto en el cual se desarrolla la comunicación hablada (e.g. acerca del tema sobre el cual se está hablando). Por tanto, cuando nos comunicamos a través del habla, es posible identificar los elementos del lenguaje que caracterizan el contexto, y al mismo tiempo, rastrear los cambios que ocurren en estos elementos a lo largo del tiempo. Esta información podría ser capturada y aprovechada por medio de técnicas de recuperación de información (information retrieval) y de aprendizaje de máquina (machine learning). Esto podría permitirnos, dentro del desarrollo de mejores sistemas automáticos de reconocimiento de voz, mejorar la adaptación de modelos del lenguaje a las condiciones del contexto, y por tanto, robustecer al sistema de reconocimiento en dominios con condiciones variables (tales como variaciones potenciales en el vocabulario, el estilo y la temática). En este sentido, la principal contribución de esta Tesis es la propuesta y evaluación de un marco de contextualización motivado por el análisis temático y basado en la adaptación dinámica y no supervisada de modelos de lenguaje para el robustecimiento de un sistema automático de reconocimiento de voz. Esta adaptación toma como base distintos enfoque de los sistemas mencionados (de recuperación de información y aprendizaje de máquina) mediante los cuales buscamos identificar las temáticas sobre las cuales se está hablando en una grabación de audio. Dicha identificación, por lo tanto, permite realizar una adaptación del modelo de lenguaje de acuerdo a las condiciones del contexto. El marco de contextualización propuesto se puede dividir en dos sistemas principales: un sistema de identificación de temática y un sistema de adaptación dinámica de modelos de lenguaje. Esta Tesis puede describirse en detalle desde la perspectiva de las contribuciones particulares realizadas en cada uno de los campos que componen el marco propuesto: _ En lo referente al sistema de identificación de temática, nos hemos enfocado en aportar mejoras a las técnicas de pre-procesamiento de documentos, asimismo en contribuir a la definición de criterios más robustos para la selección de index-terms. – La eficiencia de los sistemas basados tanto en técnicas de recuperación de información como en técnicas de aprendizaje de máquina, y específicamente de aquellos sistemas que particularizan en la tarea de identificación de temática, depende, en gran medida, de los mecanismos de preprocesamiento que se aplican a los documentos. Entre las múltiples operaciones que hacen parte de un esquema de preprocesamiento, la selección adecuada de los términos de indexado (index-terms) es crucial para establecer relaciones semánticas y conceptuales entre los términos y los documentos. Este proceso también puede verse afectado, o bien por una mala elección de stopwords, o bien por la falta de precisión en la definición de reglas de lematización. En este sentido, en este trabajo comparamos y evaluamos diferentes criterios para el preprocesamiento de los documentos, así como también distintas estrategias para la selección de los index-terms. Esto nos permite no sólo reducir el tamaño de la estructura de indexación, sino también mejorar el proceso de identificación de temática. – Uno de los aspectos más importantes en cuanto al rendimiento de los sistemas de identificación de temática es la asignación de diferentes pesos a los términos de acuerdo a su contribución al contenido del documento. En este trabajo evaluamos y proponemos enfoques alternativos a los esquemas tradicionales de ponderado de términos (tales como tf-idf ) que nos permitan mejorar la especificidad de los términos, así como también discriminar mejor las temáticas de los documentos. _ Respecto a la adaptación dinámica de modelos de lenguaje, hemos dividimos el proceso de contextualización en varios pasos. – Para la generación de modelos de lenguaje basados en temática, proponemos dos tipos de enfoques: un enfoque supervisado y un enfoque no supervisado. En el primero de ellos nos basamos en las etiquetas de temática que originalmente acompañan a los documentos del corpus que empleamos. A partir de estas, agrupamos los documentos que forman parte de la misma temática y generamos modelos de lenguaje a partir de dichos grupos. Sin embargo, uno de los objetivos que se persigue en esta Tesis es evaluar si el uso de estas etiquetas para la generación de modelos es óptimo en términos del rendimiento del reconocedor. Por esta razón, nosotros proponemos un segundo enfoque, un enfoque no supervisado, en el cual el objetivo es agrupar, automáticamente, los documentos en clusters temáticos, basándonos en la similaridad semántica existente entre los documentos. Por medio de enfoques de agrupamiento conseguimos mejorar la cohesión conceptual y semántica en cada uno de los clusters, lo que a su vez nos permitió refinar los modelos de lenguaje basados en temática y mejorar el rendimiento del sistema de reconocimiento. – Desarrollamos diversas estrategias para generar un modelo de lenguaje dependiente del contexto. Nuestro objetivo es que este modelo refleje el contexto semántico del habla, i.e. las temáticas más relevantes que se están discutiendo. Este modelo es generado por medio de la interpolación lineal entre aquellos modelos de lenguaje basados en temática que estén relacionados con las temáticas más relevantes. La estimación de los pesos de interpolación está basada principalmente en el resultado del proceso de identificación de temática. – Finalmente, proponemos una metodología para la adaptación dinámica de un modelo de lenguaje general. El proceso de adaptación tiene en cuenta no sólo al modelo dependiente del contexto sino también a la información entregada por el proceso de identificación de temática. El esquema usado para la adaptación es una interpolación lineal entre el modelo general y el modelo dependiente de contexto. Estudiamos también diferentes enfoques para determinar los pesos de interpolación entre ambos modelos. Una vez definida la base teórica de nuestro marco de contextualización, proponemos su aplicación dentro de un sistema automático de reconocimiento de voz. Para esto, nos enfocamos en dos aspectos: la contextualización de los modelos de lenguaje empleados por el sistema y la incorporación de información semántica en el proceso de adaptación basado en temática. En esta Tesis proponemos un marco experimental basado en una arquitectura de reconocimiento en ‘dos etapas’. En la primera etapa, empleamos sistemas basados en técnicas de recuperación de información y aprendizaje de máquina para identificar las temáticas sobre las cuales se habla en una transcripción de un segmento de audio. Esta transcripción es generada por el sistema de reconocimiento empleando un modelo de lenguaje general. De acuerdo con la relevancia de las temáticas que han sido identificadas, se lleva a cabo la adaptación dinámica del modelo de lenguaje. En la segunda etapa de la arquitectura de reconocimiento, usamos este modelo adaptado para realizar de nuevo el reconocimiento del segmento de audio. Para determinar los beneficios del marco de trabajo propuesto, llevamos a cabo la evaluación de cada uno de los sistemas principales previamente mencionados. Esta evaluación es realizada sobre discursos en el dominio de la política usando la base de datos EPPS (European Parliamentary Plenary Sessions - Sesiones Plenarias del Parlamento Europeo) del proyecto europeo TC-STAR. Analizamos distintas métricas acerca del rendimiento de los sistemas y evaluamos las mejoras propuestas con respecto a los sistemas de referencia. ABSTRACT The last decade has witnessed major advances in speech recognition technology. Today’s commercial systems are able to recognize continuous speech from numerous speakers, with acceptable levels of error and without the need for an explicit adaptation procedure. Despite this progress, speech recognition is far from being a solved problem. Most of these systems are adjusted to a particular domain and their efficacy depends significantly, among many other aspects, on the similarity between the language model used and the task that is being addressed. This dependence is even more important in scenarios where the statistical properties of the language fluctuates throughout the time, for example, in application domains involving spontaneous and multitopic speech. Over the last years there has been an increasing effort in enhancing the speech recognition systems for such domains. This has been done, among other approaches, by means of techniques of automatic adaptation. These techniques are applied to the existing systems, specially since exporting the system to a new task or domain may be both time-consuming and expensive. Adaptation techniques require additional sources of information, and the spoken language could provide some of them. It must be considered that speech not only conveys a message, it also provides information on the context in which the spoken communication takes place (e.g. on the subject on which it is being talked about). Therefore, when we communicate through speech, it could be feasible to identify the elements of the language that characterize the context, and at the same time, to track the changes that occur in those elements over time. This information can be extracted and exploited through techniques of information retrieval and machine learning. This allows us, within the development of more robust speech recognition systems, to enhance the adaptation of language models to the conditions of the context, thus strengthening the recognition system for domains under changing conditions (such as potential variations in vocabulary, style and topic). In this sense, the main contribution of this Thesis is the proposal and evaluation of a framework of topic-motivated contextualization based on the dynamic and non-supervised adaptation of language models for the enhancement of an automatic speech recognition system. This adaptation is based on an combined approach (from the perspective of both information retrieval and machine learning fields) whereby we identify the topics that are being discussed in an audio recording. The topic identification, therefore, enables the system to perform an adaptation of the language model according to the contextual conditions. The proposed framework can be divided in two major systems: a topic identification system and a dynamic language model adaptation system. This Thesis can be outlined from the perspective of the particular contributions made in each of the fields that composes the proposed framework: _ Regarding the topic identification system, we have focused on the enhancement of the document preprocessing techniques in addition to contributing in the definition of more robust criteria for the selection of index-terms. – Within both information retrieval and machine learning based approaches, the efficiency of topic identification systems, depends, to a large extent, on the mechanisms of preprocessing applied to the documents. Among the many operations that encloses the preprocessing procedures, an adequate selection of index-terms is critical to establish conceptual and semantic relationships between terms and documents. This process might also be weakened by a poor choice of stopwords or lack of precision in defining stemming rules. In this regard we compare and evaluate different criteria for preprocessing the documents, as well as for improving the selection of the index-terms. This allows us to not only reduce the size of the indexing structure but also to strengthen the topic identification process. – One of the most crucial aspects, in relation to the performance of topic identification systems, is to assign different weights to different terms depending on their contribution to the content of the document. In this sense we evaluate and propose alternative approaches to traditional weighting schemes (such as tf-idf ) that allow us to improve the specificity of terms, and to better identify the topics that are related to documents. _ Regarding the dynamic language model adaptation, we divide the contextualization process into different steps. – We propose supervised and unsupervised approaches for the generation of topic-based language models. The first of them is intended to generate topic-based language models by grouping the documents, in the training set, according to the original topic labels of the corpus. Nevertheless, a goal of this Thesis is to evaluate whether or not the use of these labels to generate language models is optimal in terms of recognition accuracy. For this reason, we propose a second approach, an unsupervised one, in which the objective is to group the data in the training set into automatic topic clusters based on the semantic similarity between the documents. By means of clustering approaches we expect to obtain a more cohesive association of the documents that are related by similar concepts, thus improving the coverage of the topic-based language models and enhancing the performance of the recognition system. – We develop various strategies in order to create a context-dependent language model. Our aim is that this model reflects the semantic context of the current utterance, i.e. the most relevant topics that are being discussed. This model is generated by means of a linear interpolation between the topic-based language models related to the most relevant topics. The estimation of the interpolation weights is based mainly on the outcome of the topic identification process. – Finally, we propose a methodology for the dynamic adaptation of a background language model. The adaptation process takes into account the context-dependent model as well as the information provided by the topic identification process. The scheme used for the adaptation is a linear interpolation between the background model and the context-dependent one. We also study different approaches to determine the interpolation weights used in this adaptation scheme. Once we defined the basis of our topic-motivated contextualization framework, we propose its application into an automatic speech recognition system. We focus on two aspects: the contextualization of the language models used by the system, and the incorporation of semantic-related information into a topic-based adaptation process. To achieve this, we propose an experimental framework based in ‘a two stages’ recognition architecture. In the first stage of the architecture, Information Retrieval and Machine Learning techniques are used to identify the topics in a transcription of an audio segment. This transcription is generated by the recognition system using a background language model. According to the confidence on the topics that have been identified, the dynamic language model adaptation is carried out. In the second stage of the recognition architecture, an adapted language model is used to re-decode the utterance. To test the benefits of the proposed framework, we carry out the evaluation of each of the major systems aforementioned. The evaluation is conducted on speeches of political domain using the EPPS (European Parliamentary Plenary Sessions) database from the European TC-STAR project. We analyse several performance metrics that allow us to compare the improvements of the proposed systems against the baseline ones.
Resumo:
In recent years, high-performance multicrystalline silicon (HPMC-Si) has emerged as an attractive alternative to traditional ingot-based multicrystalline silicon (mc-Si), with a similar cost structure but improved cell performance. Herein, we evaluate the gettering response of traditional mc-Si and HPMC-Si. Microanalytical techniques demonstrate that HPMC-Si and mc-Si share similar lifetime-limiting defect types but have different relative concentrations and distributions. HPMC-Si shows a substantial lifetime improvement after P-gettering compared with mc-Si, chiefly because of lower area fraction of dislocation-rich clusters. In both materials, the dislocation clusters and grain boundaries were associated with relatively higher interstitial iron point-defect concentrations after diffusion, which is suggestive of dissolving metal-impurity precipitates. The relatively fewer dislocation clusters in HPMC-Si are shown to exhibit similar characteristics to those found in mc-Si. Given similar governing principles, a proxy to determine relative recombination activity of dislocation clusters developed for mc-Si is successfully transferred to HPMC-Si. The lifetime in the remainder of HPMC-Si material is found to be limited by grain-boundary recombination. To reduce the recombination activity of grain boundaries in HPMC-Si, coordinated impurity control during growth, gettering, and passivation must be developed.
Resumo:
On-line partial discharge (PD) measurements have become a common technique for assessing the insulation condition of installed high voltage (HV) insulated cables. When on-line tests are performed in noisy environments, or when more than one source of pulse-shaped signals are present in a cable system, it is difficult to perform accurate diagnoses. In these cases, an adequate selection of the non-conventional measuring technique and the implementation of effective signal processing tools are essential for a correct evaluation of the insulation degradation. Once a specific noise rejection filter is applied, many signals can be identified as potential PD pulses, therefore, a classification tool to discriminate the PD sources involved is required. This paper proposes an efficient method for the classification of PD signals and pulse-type noise interferences measured in power cables with HFCT sensors. By using a signal feature generation algorithm, representative parameters associated to the waveform of each pulse acquired are calculated so that they can be separated in different clusters. The efficiency of the clustering technique proposed is demonstrated through an example with three different PD sources and several pulse-shaped interferences measured simultaneously in a cable system with a high frequency current transformer (HFCT).