906 resultados para automatically generated meta classifiers with large levels
Resumo:
To deliver sample estimates provided with the necessary probability foundation to permit generalization from the sample data subset to the whole target population being sampled, probability sampling strategies are required to satisfy three necessary not sufficient conditions: (i) All inclusion probabilities be greater than zero in the target population to be sampled. If some sampling units have an inclusion probability of zero, then a map accuracy assessment does not represent the entire target region depicted in the map to be assessed. (ii) The inclusion probabilities must be: (a) knowable for nonsampled units and (b) known for those units selected in the sample: since the inclusion probability determines the weight attached to each sampling unit in the accuracy estimation formulas, if the inclusion probabilities are unknown, so are the estimation weights. This original work presents a novel (to the best of these authors' knowledge, the first) probability sampling protocol for quality assessment and comparison of thematic maps generated from spaceborne/airborne Very High Resolution (VHR) images, where: (I) an original Categorical Variable Pair Similarity Index (CVPSI, proposed in two different formulations) is estimated as a fuzzy degree of match between a reference and a test semantic vocabulary, which may not coincide, and (II) both symbolic pixel-based thematic quality indicators (TQIs) and sub-symbolic object-based spatial quality indicators (SQIs) are estimated with a degree of uncertainty in measurement in compliance with the well-known Quality Assurance Framework for Earth Observation (QA4EO) guidelines. Like a decision-tree, any protocol (guidelines for best practice) comprises a set of rules, equivalent to structural knowledge, and an order of presentation of the rule set, known as procedural knowledge. The combination of these two levels of knowledge makes an original protocol worth more than the sum of its parts. The several degrees of novelty of the proposed probability sampling protocol are highlighted in this paper, at the levels of understanding of both structural and procedural knowledge, in comparison with related multi-disciplinary works selected from the existing literature. In the experimental session the proposed protocol is tested for accuracy validation of preliminary classification maps automatically generated by the Satellite Image Automatic MapperT (SIAMT) software product from two WorldView-2 images and one QuickBird-2 image provided by DigitalGlobe for testing purposes. In these experiments, collected TQIs and SQIs are statistically valid, statistically significant, consistent across maps and in agreement with theoretical expectations, visual (qualitative) evidence and quantitative quality indexes of operativeness (OQIs) claimed for SIAMT by related papers. As a subsidiary conclusion, the statistically consistent and statistically significant accuracy validation of the SIAMT pre-classification maps proposed in this contribution, together with OQIs claimed for SIAMT by related works, make the operational (automatic, accurate, near real-time, robust, scalable) SIAMT software product eligible for opening up new inter-disciplinary research and market opportunities in accordance with the visionary goal of the Global Earth Observation System of Systems (GEOSS) initiative and the QA4EO international guidelines.
Resumo:
Abstract Web 2.0 applications enabled users to classify information resources using their own vocabularies. The bottom-up nature of these user-generated classification systems have turned them into interesting knowledge sources, since they provide a rich terminology generated by potentially large user communities. Previous research has shown that it is possible to elicit some emergent semantics from the aggregation of individual classifications in these systems. However the generation of ontologies from them is still an open research problem. In this thesis we address the problem of how to tap into user-generated classification systems for building domain ontologies. Our objective is to design a method to develop domain ontologies from user-generated classifications systems. To do so, we rely on ontologies in the Web of Data to formalize the semantics of the knowledge collected from the classification system. Current ontology development methodologies have recognized the importance of reusing knowledge from existing resources. Thus, our work is framed within the NeOn methodology scenario for building ontologies by reusing and reengineering non-ontological resources. The main contributions of this work are: An integrated method to develop ontologies from user-generated classification systems. With this method we extract a domain terminology from the classification system and then we formalize the semantics of this terminology by reusing ontologies in the Web of Data. Identification and adaptation of existing techniques for implementing the activities in the method so that they can fulfill the requirements of each activity. A novel study about emerging semantics in user-generated lists. Resumen La web 2.0 permitió a los usuarios clasificar recursos de información usando su propio vocabulario. Estos sistemas de clasificación generados por usuarios son recursos interesantes para la extracción de conocimiento debido principalmente a que proveen una extensa terminología generada por grandes comunidades de usuarios. Se ha demostrado en investigaciones previas que es posible obtener una semántica emergente de estos sistemas. Sin embargo la generación de ontologías a partir de ellos es todavía un problema de investigación abierto. Esta tesis trata el problema de cómo aprovechar los sistemas de clasificación generados por usuarios en la construcción de ontologías de dominio. Así el objetivo de la tesis es diseñar un método para desarrollar ontologías de dominio a partir de sistemas de clasificación generados por usuarios. El método propuesto reutiliza conceptualizaciones existentes en ontologías publicadas en la Web de Datos para formalizar la semántica del conocimiento que se extrae del sistema de clasificación. Por tanto, este trabajo está enmarcado dentro del escenario para desarrollar ontologías mediante la reutilización y reingeniería de recursos no ontológicos que se ha definido en la Metodología NeOn. Las principales contribuciones de este trabajo son: Un método integrado para desarrollar una ontología de dominio a partir de sistemas de clasificación generados por usuarios. En este método se extrae una terminología de dominio del sistema de clasificación y posteriormente se formaliza su semántica reutilizando ontologías en la Web de Datos. La identificación y adaptación de un conjunto de técnicas para implementar las actividades propuestas en el método de tal manera que puedan cumplir automáticamente los requerimientos de cada actividad. Un novedoso estudio acerca de la semántica emergente en las listas generadas por usuarios en la Web.
Resumo:
La mosca mediterránea de la fruta Ceratitis capitata (Wiedemann, 1824) está considerada una de las plagas clave para la fruticultura. El malatión es un insecticida organofosforado que fue empleado mayoritariamente en España para el control de C. capitata hasta 2009, año en el que dejó de utilizarse por no estar incluido en el anexo I de la Directiva Europea 91/414/ECC. El incremento del uso del malatión, debido a las graves pérdidas económicas causadas por C. capitata, provocó la aparición de poblaciones de campo resistentes. El estudio de una población resistente a malatión, recogida en Castelló en 2004, permitió la identificación de dos mecanismos de resistencia: una mutación puntual (G328A) en la acetilcolinesterasa (AChE) y un mecanismo de resistencia metabólica, probablemente mediado por carboxilesterasas. Teniendo en cuenta estos antecedentes, nos propusimos estudiar los mecanismos implicados en la resistencia a malatión en C. capitata. Además, durante el desarrollo de esta Tesis, el malatión fue sustituido por otros insecticidas como el espinosad y la lambda-cialotrina para el control de la plaga. En este nuevo contexto, es extremadamente importante analizar la susceptibilidad de poblaciones de campo frente a espinosad y estudiar la posible existencia de resistencia cruzada a estos insecticidas, así como sentar las bases para el estudio de futuros mecanismos de resistencia. En primer lugar, analizamos mediante bioensayos con dosis discriminante la susceptibilidad a malatión y espinosad en doce poblaciones de C. capitata de Andalucía, Aragón, Cataluña, Comunidad Valenciana e Islas Baleares; y nuestros resultados sugirieron la presencia de individuos resistentes a malatión en la mayoría de las poblaciones analizadas. En el caso del espinosad, observamos que la susceptibilidad a este insecticida de origen biológico fue elevada en la mayoría de las poblaciones, sin embargo, la población recogida en Xàbia (Alicante) mostró un nivel de susceptibilidad unas dos veces menor al resto de poblaciones. Mediante la selección en laboratorio, obtuvimos dos líneas resistentes a malatión, W-4Km y W-10Km, con unos niveles de resistencia con respeto a la línea susceptible C de 178 y 400 veces, respectivamente. Además, se seleccionó por primera vez en C. capitata una línea altamente resistente a espinosad (Xàbia-W-100s), que actualmente es unas 500 veces más resistente que la línea de laboratorio C. Con el objetivo de escoger la estrategia más adecuada para el manejo de la plaga, estudiamos la susceptibilidad a diferentes tipos de insecticidas en la línea resistente a malatión W- 4Km. En esta línea detectamos resistencia cruzada moderada a los organofosforados fentión, diazinón, fosmet, triclorfón y metil-clorpirifos (de 7 a 16 veces) y frente al carbamato carbaril, al piretroide lambda-cialotrina y al quimioesterilizante lufenurón (de 4 a 6 veces). Por otra parte, la resistencia cruzada frente a espinosad fue baja (1,5 veces). Es importante destacar que los niveles de resistencia estimados frente a todos los insecticidas fueron de uno o dos órdenes de magnitud inferiores al observado en la línea W-4Km frente a malatión (178 veces), hecho que podría deberse, al menos, a dos posibles hipótesis: que la mutación AChE G328A confiera mayor insensibilidad al malaoxón (forma activa del malatión) que a otros insecticidas que tienen como diana la AChE y/o, en segundo lugar, que el mecanismo de resistencia mediado por carboxilesterasas hidrolice el malatión de manera más eficiente que los otros insecticidas analizados. En el estudio de nuevos mecanismos de resistencia en C. capitata, por un lado, analizamos la diversidad de enzimas citocromo P450, asociadas con resistencia metabólica en otras especies, y por otro lado, desarrollamos un sistema para la detección de nuevas mutaciones puntuales que pudiesen aparecer en los genes que codifican la AChE (Ccace2) y la aliesterasa (Ccae7). Mediante el empleo de cebadores degenerados obtuvimos 37 genes CYP, que codifican enzimas P450, pertenecientes a cinco familias. Posteriormente, en un estudio de inducción con fenobarbital, observamos que la expresión de cuatro de los seis genes analizados era susceptible de ser inducida. Por otro lado, se puso a punto un sistema que permite amplificar y secuenciar, a partir de DNA genómico, los exones de los genes Ccace2 y Ccae7 en los que se han encontrado mutaciones relacionadas con resistencia a insecticidas en otras especies. Los resultados obtenidos facilitarán el estudio de nuevos mecanismos de resistencia mediados por estas enzimas en C. capitata. Se diseñó un método PCR-RFLP para identificar los individuos portadores de la mutación AChE G328A (alelo de resistencia Ccace2R) sin la necesidad de realizar bioensayos y que, además, permite detectar resistencia cuando ésta se encuentra a baja frecuencia. Según el análisis realizado, el alelo Ccace2R se observó en 25 de las 27 localidades españolas muestreadas en el territorio español, incluyendo las Islas Baleares y Canarias. Sin embargo, este alelo no se detectó en poblaciones procedentes de once países y de cinco continentes. El análisis de la presencia del alelo Ccace2R en las líneas resistentes a malatión durante el proceso de selección en el laboratorio mostró una rápida disminución de los homocigotos, tanto para el alelo susceptible como para el alelo de resistencia, en favor de los individuos heterocigotos. Así, después de 52 generaciones de selección, se observó que la totalidad de los individuos analizados de la línea W-10Km presentaban un genotipo heterocigoto para la mutación AChE G328A. Este desequilibrio contradice la segregación mendeliana esperada para un gen con dos alelos pero podría ser explicado por la existencia de una duplicación del gen Ccace2. La demostración de la presencia de esta duplicación se realizó mediante: i) el cruzamiento de individuos heterocigotos de la línea W-10Km con homocigotos susceptibles de la línea C, que dio lugar a una descendencia en la que el 100% de los individuos eran heterocigotos; ii) la evaluación del número de copias del gen Ccace2 por PCR cuantitativa en tiempo real (qPCR), que resultó dos veces mayor en individuos de la línea W-10Km en comparación con los de la línea C; iii) el análisis del nivel de expresión de Ccace2, que fue el doble en la línea W-10Km con respecto a la línea C, y iv) el estudio de la actividad AChE, que resultó mayor en los individuos de la línea W-10Km. Según los resultados obtenidos, una duplicación del gen Ccace2 provoca la coexistencia en un mismo cromosoma del alelo silvestre y del alelo mutado y, además, las dos copias del gen Ccace2, al estar ligadas, producen una heterocigosis permanente (Ccace2RS). De esta manera se explica que el hecho de que 100% de los individuos de la línea W-10Km mostrasen un perfil de restricción correspondiente a un individuo heterocigoto ya que, en realidad, eran homocigotos estructurales para la duplicación (genotipo CCace2RS/RS). Se ha detectado un coste biológico asociado a la duplicación que consiste en un incremento en la mortalidad acumulada de los adultos a partir del séptimo día después de la emergencia. La descripción de la duplicación Ccace2RS supone la identificación de un nuevo mecanismo de resistencia a malatión en C. capitata. Finalmente, mediante el diseño de un método de doble PCR-RFLP se determinó la presencia de la duplicación Ccace2RS en la mayoría de las poblaciones españolas. La proporción de individuos portadores de la duplicación osciló entre el 5% y el 35%, observándose los mayores valores de frecuencia en las poblaciones de C. capitata recogidas en la cuenca mediterránea. Podemos por lo tanto concluir que la resistencia a malatión asociada a la mutación AChE G328A y a la duplicación Ccace2RS está ampliamente establecida en las poblaciones españolas de C. capitata. Nuestros resultados desaconsejan la utilización del malatión (si fuera de nuevo autorizado) o de otros organofosforados para el control de esta plaga. Además, una de las líneas resistentes a malatión mostró resistencia cruzada frente a insecticidas con diferentes modos de acción y que se utilizan actualmente para el control de C. capitata, tales como lambda-cialotrina y lufenurón. La alta susceptibilidad a espinosad observada en las poblaciones españolas, así como la reducida resistencia cruzada estimada para este insecticida, sugieren que su utilización es adecuada para el control de la plaga. Sin embargo, la utilización de un sólo insecticida puede entrañar riesgos por favorecer la selección de resistencia, de hecho, mediante selección en laboratorio se obtuvo una población altamente resistente a espinosad. Por tanto, es recomendable implementar programas de control integrado y de manejo de la resistencia en C. capitata utilizando distintos sistemas de control e insecticidas con diferentes mecanismos de acción que permitan su sostenibilidad en el tiempo. Los sistemas de detección de alelos de resistencia desarrollados en este trabajo permitirán la detección precoz de resistencia en campo, facilitando la decisión sobre el sistema de control más adecuado. Además, los conocimientos generados podrán contribuir al desarrollo de nuevos sistemas de detección para otros mecanismos de resistencia. Abstract. The Mediterranean fruit fly, Ceratitis capitata (Wiedemann, 1824), is considered one of the most harmful pests in fruit crops. Until 2009, when malathion use was banned due to its not inclusion in the Annex I of Directive 91/414/EEC, the application of this organophosphate (OP) insecticide in Spain increased gradually due to the large economic losses caused by C. capitata. The increase in the frequency of treatments resulted in the development of resistant field populations. The study of a malathion-resistant population, collected in 2004 in Castelló (Comunidad Valenciana), allowed the identification of two resistance mechanisms: a single point mutation (G328A) in the target acetylcholinesterase (AChE), as well as a metabolic resistance mechanism, most likely carboxylesterase-mediated. Taking all the preceding into account, we studied the malathion resistance mechanisms in C. capitata. During the development of this PhD Thesis malathion use was banned by the European Union, being replaced by other insecticides, such as spinosad and lambda-cyhalotrin. Within this new working frame, the need to analyse the possible existence of cross-resistance to these insecticides and the susceptibility to spinosad in field populations was raised. This would define the baseline for future studies on resistance mechanisms. Firstly, through discriminant dose bioassays, we analysed malathion and spinosad susceptibility in twelve C. capitata populations from Andalucia, Aragon, Cataluña, C. Valenciana and the Baleares Islands. Our results suggest the presence of malathion-resistant individuals in most of the populations analysed. Regarding spinosad, we noticed a high susceptibility to this biologically derived insecticide in most of the populations, but in the one collected in Xabia (Alicante), which had a susceptibility level two times lower than the rest of populations. Through laboratory selection, we obtained two malathion-resistant strains, W-4Km and W-10Km, with resistance levels 178- and 400-fold, respectively, compared to the control susceptible C strain. Besides, a strain highly-resistant to spinosad (Xabia-W-100s), 500-times more resistant than control C strain, was selected. In order to decide the most appropriate management strategy for the pest, we studied the susceptibility to different insecticides in the malathion-resistant W-4Km strain. We detected a moderated cross-resistance to the OPs fenthion, diazinon, phosmet, trichlorphon and methylchlorpyrifos (7- to 16-fold), and to the carbamate carbaryl, the pyretroid lambda-cyhalotrin and the chemosterilizer lufenuron (4- to 6-fold). On the other hand, cross-resistance to spinosad was low (1.5-fold). It is important to note that resistance levels to all insecticides were one or two orders of magnitude less than that observed against malathion in W-4Km strain (178-fold), a fact that might be due to, at least, two possible causes: mutation AChE G328A may provide a higher insensitivity to malaoxon (the active form of malathion) than to other insecticides having AChE as target, and/or, secondly, the carboxylesterase-mediated resistance mechanism hydrolyzes malathion more efficiently than all other analysed insecticides. To investigate new resistance mechanisms in C. capitata we analysed the diversity of the cytochrome P450 enzymes, which have been associated to metabolic resistance in insects, and we developed a new method to detect single point mutations in acetylcholinesterase (Ccace2) and aliesterase (Ccae7) genes that could appear. Using degenerate primers we obtained 37 CYP genes, coding P450 enzymes, included in five families. Afterwards, in a phenobarbital-induction study, we observed that the expression of 4 out of the 6 analysed genes could be induced. On the other hand, a system was set up to amplify and to sequence from genomic DNA the exons of genes Ccace2 and Ccae7 where mutations related to insecticide resistance have been found in other species. The results obtained could facilitate the study of new resistance mechanisms in C. capitata mediated by these enzymes. A PCR-RFLP method was designed to detect the presence of the mutation AChE G328A (resistance allele Ccace2R), with no need to perform bioassays and allowing detecting resistance at low frequency. According to the analysis, the resistance allele was found in 25 out of 27 sampled locations in Spain, including the Balearic and the Canary Islands. However, this allele was not detected in other populations collected in 11 countries from 5 continents. The follow-up of the presence of the allele Ccace2R in the malathion-resistant strains during the selection process in the laboratory showed a quick decrease in homozygous individuals, for both the susceptible and the resistant alleles, favouring heterozygous. Thus, after 52 generations of selection, all the individuals analysed from W-10Km strain showed a heterozygous genotype for mutation AChE G328A, contradicting mendelian segregation as expected for a gene with two alleles. Afterwards, we were able to demonstrate that this was caused by the presence of a duplication of the gene coding acetylcholinesterase by: i) crossing heterozygous individuals from W-10Km strain with susceptible homozygous from C strain, originating a F1 population in which 100% of individuals were heterozygous; ii) evaluating the number of copies of gen Ccace2 by quantitative PCR in real time (qPCR), that happened to be twice higher in individuals from W-10Km VII strain when compared with C strain; iii) analysing the level of expression of Ccace2, twice in W- 10Km strain when compared to C strain; iv) studying the acetylcholinesterase activity, that was higher in individuals from W-10Km strain. According to these results, duplication of gen Ccace2 originates the coexistence of the susceptible and the resistant allele in the same chromosome. The two linked copies of the gene Ccace2 provoke the existence of permanent heterozygosis (Ccace2RS). This explains why the 100% of individuals from W-10Km strain showed an heterozygous restriction pattern since, in fact, they were structural homozygotes for the duplication (genotype Ccace2RS/RS). A biological cost has been detected associated to this duplication, consisting in a rise in accumulated adult mortality from the seventh day after emergence. The Ccace2RS duplication described in this study represents a new resistance mechanism to malathion in C. capitata. Finally, by the design of a double PCR-RFLP method, the presence of Ccace2RS duplication was confirmed in most of the Spanish populations. We observed that the proportion of individuals carrying the duplication oscillated between 5 and 35%, the frequency being higher in those C. capitata populations collected in the area of the Mediterranean basin. Therefore, we can conclude that malathion resistance associated to mutation AChE G328A and to Ccace2RS duplication are widely distributed in Spanish populations of C. capitata. Our results advice against the use of malathion (if it came to be newly authorized for use) or other OPs for the control of this pest. Besides, one of the malathion-resistant strains showed cross-resistance against insecticides with diverse action modes that are currently used for pest control, such as lambdacyhalotrin and lufenuron. High susceptibility to spinosad in the Spanish populations, as well as the reduced cross-resistance estimated for this insecticide suggests its adequacy for Medfly control. However, the use of a single insecticide is a risky strategy since it favours the selection of resistance. In fact, a population highly resistant to spinosad was obtained through laboratory selection. Therefore, it is advisable to implement integrated pest management (IPM) and resistance management programs for C. capitata control. Using insecticides with different modes of action and diverse control systems would contribute to the sustainability of the pest control. The resistance allele detection systems developed through this work will allow the early detection of resistance in the field, making possible the selection of the most appropriate method for pest control. Besides, the generated knowledge may also contribute to the development of new detection systems for other resistance mechanisms.
Resumo:
A set of software development tools for building real-time control systems on a simple robotics platform is described in the paper. The tools are being used in a real-time systems course as a basis for student projects. The development platform is a low-cost PC running GNU/Linux, and the target system is LEGO MINDSTORMS NXT, thus keeping the cost of the laboratory low. Real-time control software is developed using a mixed paradigm. Functional code for control algorithms is automatically generated in C from Simulink models. This code is then integrated into a concurrent, real-time software architecture based on a set of components written in Ada. This approach enables the students to take advantage of the high-level, model-oriented features that Simulink oers for designing control algorithms, and the comprehensive support for concurrency and real-time constructs provided by Ada.
Resumo:
Modern sensor technologies and simulators applied to large and complex dynamic systems (such as road traffic networks, sets of river channels, etc.) produce large amounts of behavior data that are difficult for users to interpret and analyze. Software tools that generate presentations combining text and graphics can help users understand this data. In this paper we describe the results of our research on automatic multimedia presentation generation (including text, graphics, maps, images, etc.) for interactive exploration of behavior datasets. We designed a novel user interface that combines automatically generated text and graphical resources. We describe the general knowledge-based design of our presentation generation tool. We also present applications that we developed to validate the method, and a comparison with related work.
Resumo:
La última década ha sido testigo de importantes avances en el campo de la tecnología de reconocimiento de voz. Los sistemas comerciales existentes actualmente poseen la capacidad de reconocer habla continua de múltiples locutores, consiguiendo valores aceptables de error, y sin la necesidad de realizar procedimientos explícitos de adaptación. A pesar del buen momento que vive esta tecnología, el reconocimiento de voz dista de ser un problema resuelto. La mayoría de estos sistemas de reconocimiento se ajustan a dominios particulares y su eficacia depende de manera significativa, entre otros muchos aspectos, de la similitud que exista entre el modelo de lenguaje utilizado y la tarea específica para la cual se está empleando. Esta dependencia cobra aún más importancia en aquellos escenarios en los cuales las propiedades estadísticas del lenguaje varían a lo largo del tiempo, como por ejemplo, en dominios de aplicación que involucren habla espontánea y múltiples temáticas. En los últimos años se ha evidenciado un constante esfuerzo por mejorar los sistemas de reconocimiento para tales dominios. Esto se ha hecho, entre otros muchos enfoques, a través de técnicas automáticas de adaptación. Estas técnicas son aplicadas a sistemas ya existentes, dado que exportar el sistema a una nueva tarea o dominio puede requerir tiempo a la vez que resultar costoso. Las técnicas de adaptación requieren fuentes adicionales de información, y en este sentido, el lenguaje hablado puede aportar algunas de ellas. El habla no sólo transmite un mensaje, también transmite información acerca del contexto en el cual se desarrolla la comunicación hablada (e.g. acerca del tema sobre el cual se está hablando). Por tanto, cuando nos comunicamos a través del habla, es posible identificar los elementos del lenguaje que caracterizan el contexto, y al mismo tiempo, rastrear los cambios que ocurren en estos elementos a lo largo del tiempo. Esta información podría ser capturada y aprovechada por medio de técnicas de recuperación de información (information retrieval) y de aprendizaje de máquina (machine learning). Esto podría permitirnos, dentro del desarrollo de mejores sistemas automáticos de reconocimiento de voz, mejorar la adaptación de modelos del lenguaje a las condiciones del contexto, y por tanto, robustecer al sistema de reconocimiento en dominios con condiciones variables (tales como variaciones potenciales en el vocabulario, el estilo y la temática). En este sentido, la principal contribución de esta Tesis es la propuesta y evaluación de un marco de contextualización motivado por el análisis temático y basado en la adaptación dinámica y no supervisada de modelos de lenguaje para el robustecimiento de un sistema automático de reconocimiento de voz. Esta adaptación toma como base distintos enfoque de los sistemas mencionados (de recuperación de información y aprendizaje de máquina) mediante los cuales buscamos identificar las temáticas sobre las cuales se está hablando en una grabación de audio. Dicha identificación, por lo tanto, permite realizar una adaptación del modelo de lenguaje de acuerdo a las condiciones del contexto. El marco de contextualización propuesto se puede dividir en dos sistemas principales: un sistema de identificación de temática y un sistema de adaptación dinámica de modelos de lenguaje. Esta Tesis puede describirse en detalle desde la perspectiva de las contribuciones particulares realizadas en cada uno de los campos que componen el marco propuesto: _ En lo referente al sistema de identificación de temática, nos hemos enfocado en aportar mejoras a las técnicas de pre-procesamiento de documentos, asimismo en contribuir a la definición de criterios más robustos para la selección de index-terms. – La eficiencia de los sistemas basados tanto en técnicas de recuperación de información como en técnicas de aprendizaje de máquina, y específicamente de aquellos sistemas que particularizan en la tarea de identificación de temática, depende, en gran medida, de los mecanismos de preprocesamiento que se aplican a los documentos. Entre las múltiples operaciones que hacen parte de un esquema de preprocesamiento, la selección adecuada de los términos de indexado (index-terms) es crucial para establecer relaciones semánticas y conceptuales entre los términos y los documentos. Este proceso también puede verse afectado, o bien por una mala elección de stopwords, o bien por la falta de precisión en la definición de reglas de lematización. En este sentido, en este trabajo comparamos y evaluamos diferentes criterios para el preprocesamiento de los documentos, así como también distintas estrategias para la selección de los index-terms. Esto nos permite no sólo reducir el tamaño de la estructura de indexación, sino también mejorar el proceso de identificación de temática. – Uno de los aspectos más importantes en cuanto al rendimiento de los sistemas de identificación de temática es la asignación de diferentes pesos a los términos de acuerdo a su contribución al contenido del documento. En este trabajo evaluamos y proponemos enfoques alternativos a los esquemas tradicionales de ponderado de términos (tales como tf-idf ) que nos permitan mejorar la especificidad de los términos, así como también discriminar mejor las temáticas de los documentos. _ Respecto a la adaptación dinámica de modelos de lenguaje, hemos dividimos el proceso de contextualización en varios pasos. – Para la generación de modelos de lenguaje basados en temática, proponemos dos tipos de enfoques: un enfoque supervisado y un enfoque no supervisado. En el primero de ellos nos basamos en las etiquetas de temática que originalmente acompañan a los documentos del corpus que empleamos. A partir de estas, agrupamos los documentos que forman parte de la misma temática y generamos modelos de lenguaje a partir de dichos grupos. Sin embargo, uno de los objetivos que se persigue en esta Tesis es evaluar si el uso de estas etiquetas para la generación de modelos es óptimo en términos del rendimiento del reconocedor. Por esta razón, nosotros proponemos un segundo enfoque, un enfoque no supervisado, en el cual el objetivo es agrupar, automáticamente, los documentos en clusters temáticos, basándonos en la similaridad semántica existente entre los documentos. Por medio de enfoques de agrupamiento conseguimos mejorar la cohesión conceptual y semántica en cada uno de los clusters, lo que a su vez nos permitió refinar los modelos de lenguaje basados en temática y mejorar el rendimiento del sistema de reconocimiento. – Desarrollamos diversas estrategias para generar un modelo de lenguaje dependiente del contexto. Nuestro objetivo es que este modelo refleje el contexto semántico del habla, i.e. las temáticas más relevantes que se están discutiendo. Este modelo es generado por medio de la interpolación lineal entre aquellos modelos de lenguaje basados en temática que estén relacionados con las temáticas más relevantes. La estimación de los pesos de interpolación está basada principalmente en el resultado del proceso de identificación de temática. – Finalmente, proponemos una metodología para la adaptación dinámica de un modelo de lenguaje general. El proceso de adaptación tiene en cuenta no sólo al modelo dependiente del contexto sino también a la información entregada por el proceso de identificación de temática. El esquema usado para la adaptación es una interpolación lineal entre el modelo general y el modelo dependiente de contexto. Estudiamos también diferentes enfoques para determinar los pesos de interpolación entre ambos modelos. Una vez definida la base teórica de nuestro marco de contextualización, proponemos su aplicación dentro de un sistema automático de reconocimiento de voz. Para esto, nos enfocamos en dos aspectos: la contextualización de los modelos de lenguaje empleados por el sistema y la incorporación de información semántica en el proceso de adaptación basado en temática. En esta Tesis proponemos un marco experimental basado en una arquitectura de reconocimiento en ‘dos etapas’. En la primera etapa, empleamos sistemas basados en técnicas de recuperación de información y aprendizaje de máquina para identificar las temáticas sobre las cuales se habla en una transcripción de un segmento de audio. Esta transcripción es generada por el sistema de reconocimiento empleando un modelo de lenguaje general. De acuerdo con la relevancia de las temáticas que han sido identificadas, se lleva a cabo la adaptación dinámica del modelo de lenguaje. En la segunda etapa de la arquitectura de reconocimiento, usamos este modelo adaptado para realizar de nuevo el reconocimiento del segmento de audio. Para determinar los beneficios del marco de trabajo propuesto, llevamos a cabo la evaluación de cada uno de los sistemas principales previamente mencionados. Esta evaluación es realizada sobre discursos en el dominio de la política usando la base de datos EPPS (European Parliamentary Plenary Sessions - Sesiones Plenarias del Parlamento Europeo) del proyecto europeo TC-STAR. Analizamos distintas métricas acerca del rendimiento de los sistemas y evaluamos las mejoras propuestas con respecto a los sistemas de referencia. ABSTRACT The last decade has witnessed major advances in speech recognition technology. Today’s commercial systems are able to recognize continuous speech from numerous speakers, with acceptable levels of error and without the need for an explicit adaptation procedure. Despite this progress, speech recognition is far from being a solved problem. Most of these systems are adjusted to a particular domain and their efficacy depends significantly, among many other aspects, on the similarity between the language model used and the task that is being addressed. This dependence is even more important in scenarios where the statistical properties of the language fluctuates throughout the time, for example, in application domains involving spontaneous and multitopic speech. Over the last years there has been an increasing effort in enhancing the speech recognition systems for such domains. This has been done, among other approaches, by means of techniques of automatic adaptation. These techniques are applied to the existing systems, specially since exporting the system to a new task or domain may be both time-consuming and expensive. Adaptation techniques require additional sources of information, and the spoken language could provide some of them. It must be considered that speech not only conveys a message, it also provides information on the context in which the spoken communication takes place (e.g. on the subject on which it is being talked about). Therefore, when we communicate through speech, it could be feasible to identify the elements of the language that characterize the context, and at the same time, to track the changes that occur in those elements over time. This information can be extracted and exploited through techniques of information retrieval and machine learning. This allows us, within the development of more robust speech recognition systems, to enhance the adaptation of language models to the conditions of the context, thus strengthening the recognition system for domains under changing conditions (such as potential variations in vocabulary, style and topic). In this sense, the main contribution of this Thesis is the proposal and evaluation of a framework of topic-motivated contextualization based on the dynamic and non-supervised adaptation of language models for the enhancement of an automatic speech recognition system. This adaptation is based on an combined approach (from the perspective of both information retrieval and machine learning fields) whereby we identify the topics that are being discussed in an audio recording. The topic identification, therefore, enables the system to perform an adaptation of the language model according to the contextual conditions. The proposed framework can be divided in two major systems: a topic identification system and a dynamic language model adaptation system. This Thesis can be outlined from the perspective of the particular contributions made in each of the fields that composes the proposed framework: _ Regarding the topic identification system, we have focused on the enhancement of the document preprocessing techniques in addition to contributing in the definition of more robust criteria for the selection of index-terms. – Within both information retrieval and machine learning based approaches, the efficiency of topic identification systems, depends, to a large extent, on the mechanisms of preprocessing applied to the documents. Among the many operations that encloses the preprocessing procedures, an adequate selection of index-terms is critical to establish conceptual and semantic relationships between terms and documents. This process might also be weakened by a poor choice of stopwords or lack of precision in defining stemming rules. In this regard we compare and evaluate different criteria for preprocessing the documents, as well as for improving the selection of the index-terms. This allows us to not only reduce the size of the indexing structure but also to strengthen the topic identification process. – One of the most crucial aspects, in relation to the performance of topic identification systems, is to assign different weights to different terms depending on their contribution to the content of the document. In this sense we evaluate and propose alternative approaches to traditional weighting schemes (such as tf-idf ) that allow us to improve the specificity of terms, and to better identify the topics that are related to documents. _ Regarding the dynamic language model adaptation, we divide the contextualization process into different steps. – We propose supervised and unsupervised approaches for the generation of topic-based language models. The first of them is intended to generate topic-based language models by grouping the documents, in the training set, according to the original topic labels of the corpus. Nevertheless, a goal of this Thesis is to evaluate whether or not the use of these labels to generate language models is optimal in terms of recognition accuracy. For this reason, we propose a second approach, an unsupervised one, in which the objective is to group the data in the training set into automatic topic clusters based on the semantic similarity between the documents. By means of clustering approaches we expect to obtain a more cohesive association of the documents that are related by similar concepts, thus improving the coverage of the topic-based language models and enhancing the performance of the recognition system. – We develop various strategies in order to create a context-dependent language model. Our aim is that this model reflects the semantic context of the current utterance, i.e. the most relevant topics that are being discussed. This model is generated by means of a linear interpolation between the topic-based language models related to the most relevant topics. The estimation of the interpolation weights is based mainly on the outcome of the topic identification process. – Finally, we propose a methodology for the dynamic adaptation of a background language model. The adaptation process takes into account the context-dependent model as well as the information provided by the topic identification process. The scheme used for the adaptation is a linear interpolation between the background model and the context-dependent one. We also study different approaches to determine the interpolation weights used in this adaptation scheme. Once we defined the basis of our topic-motivated contextualization framework, we propose its application into an automatic speech recognition system. We focus on two aspects: the contextualization of the language models used by the system, and the incorporation of semantic-related information into a topic-based adaptation process. To achieve this, we propose an experimental framework based in ‘a two stages’ recognition architecture. In the first stage of the architecture, Information Retrieval and Machine Learning techniques are used to identify the topics in a transcription of an audio segment. This transcription is generated by the recognition system using a background language model. According to the confidence on the topics that have been identified, the dynamic language model adaptation is carried out. In the second stage of the recognition architecture, an adapted language model is used to re-decode the utterance. To test the benefits of the proposed framework, we carry out the evaluation of each of the major systems aforementioned. The evaluation is conducted on speeches of political domain using the EPPS (European Parliamentary Plenary Sessions) database from the European TC-STAR project. We analyse several performance metrics that allow us to compare the improvements of the proposed systems against the baseline ones.
Resumo:
Entre las soluciones más satisfactorias al problema de las emisiones de CO2 está la captura y almacenamiento de este gas de efecto invernadero en reservorios profundos. Esta técnica implica la necesidad de monitorizar grandes extensiones de terreno. Utilizando una zona de vulcanismo residual, en la provincia de Ciudad Real, se han monitorizado las emisiones de CO2 utilizando imágenes de muy alta resolución espacial. Se han generado índices de vegetación, y estos se han correlacionado con medidas de contenido de CO2 del aire en los puntos de emisión. Los resultados han arrojado niveles de correlación significativos (p. ej.: SAVI = -0,93) y han llevado a descubrir un nuevo punto de emisión de CO2. Palabras clave: teledetección, CO2, vegetación, satélite Monitoring CO2 emissions in a natural analogue by correlating with vegetation indices Abstract: Among the most satisfactory solutions for the CO2 emissions problem is the capture and storage of this greenhouse gas in deep reservoirs. This technique involves the need to monitor large areas. Using a volcanic area with residual activity, in the province of Ciudad Real, CO2 emissions were monitored through very high spatial resolution imagery. Vegetation indexes were generated and correlated with measurements of the air?s CO2 content at the emission points. The results yielded significant correlation levels (e.g.: SAVI = -0.93) and led to the discovery of a new CO2 emission point. Keywords: remote sensing, CO2, vegetation, satellite.
Resumo:
Following striate cortex damage in monkeys and humans there can be residual function mediated by parallel visual pathways. In humans this can sometimes be associated with a “feeling” that something has happened, especially with rapid movement or abrupt onset. For less transient events, discriminative performance may still be well above chance even when the subject reports no conscious awareness of the stimulus. In a previous study we examined parameters that yield good residual visual performance in the “blind” hemifield of a subject with unilateral damage to the primary visual cortex. With appropriate parameters we demonstrated good discriminative performance, both with and without conscious awareness of a visual event. These observations raise the possibility of imaging the brain activity generated in the “aware” and the “unaware” modes, with matched levels of discrimination performance, and hence of revealing patterns of brain activation associated with visual awareness. The intact hemifield also allows a comparison with normal vision. Here we report the results of a functional magnetic resonance imaging study on the same subject carried out under aware and unaware stimulus conditions. The results point to a shift in the pattern of activity from neocortex in the aware mode, to subcortical structures in the unaware mode. In the aware mode prestriate and dorsolateral prefrontal cortices (area 46) are active. In the unaware mode the superior colliculus is active, together with medial and orbital prefrontal cortical sites.
Resumo:
Cell–substratum adhesion is an essential requirement for survival of human neonatal keratinocytes in vitro. Similarly, activation of the epidermal growth factor receptor (EGF-R) has recently been implicated not only in cell cycle progression but also in survival of normal keratinocytes. The mechanisms by which either cell–substratum adhesion or EGF-R activation protect keratinocytes from programmed cell death are poorly understood. Here we describe that blockade of the EGF-R and inhibition of substratum adhesion share a common downstream event, the down-regulation of the cell death protector Bcl-xL. Expression of Bcl-xL protein was down-regulated during forced suspension culture of keratinocytes, concurrent with large-scale apoptosis. Similarly, EGF-R blockade was accompanied by down-regulation of Bcl-xL steady-state mRNA and protein levels to an extent comparable to that observed in forced suspension culture. However, down-regulation of Bcl-xL expression by EGF-R blockade was not accompanied by apoptosis; in this case, a second signal, generated by passaging, was required to induce rapid and large-scale apoptosis. These findings are consistent with the conclusions that (i) Bcl-xL represents a shared molecular target for signaling through cell-substrate adhesion receptors and the EGF-R, and (ii) reduced levels of Bcl-xL expression through EGF-R blockade lower the tolerance of keratinocytes for cell death signals generated by cellular stress.
Resumo:
Electronic systems that use rugged lightweight plastics potentially offer attractive characteristics (low-cost processing, mechanical flexibility, large area coverage, etc.) that are not easily achieved with established silicon technologies. This paper summarizes work that demonstrates many of these characteristics in a realistic system: organic active matrix backplane circuits (256 transistors) for large (≈5 × 5-inch) mechanically flexible sheets of electronic paper, an emerging type of display. The success of this effort relies on new or improved processing techniques and materials for plastic electronics, including methods for (i) rubber stamping (microcontact printing) high-resolution (≈1 μm) circuits with low levels of defects and good registration over large areas, (ii) achieving low leakage with thin dielectrics deposited onto surfaces with relief, (iii) constructing high-performance organic transistors with bottom contact geometries, (iv) encapsulating these transistors, (v) depositing, in a repeatable way, organic semiconductors with uniform electrical characteristics over large areas, and (vi) low-temperature (≈100°C) annealing to increase the on/off ratios of the transistors and to improve the uniformity of their characteristics. The sophistication and flexibility of the patterning procedures, high level of integration on plastic substrates, large area coverage, and good performance of the transistors are all important features of this work. We successfully integrate these circuits with microencapsulated electrophoretic “inks” to form sheets of electronic paper.
Resumo:
Purpose. To analyze the diagnostic validity of accommodative and binocular tests in a sample of patients with a large near exophoria with moderate to severe symptoms. Methods. Two groups of patients between 19 and 35 years were recruited from a university clinic: 33 subjects with large exophoria at near vision and moderate or high visual discomfort and 33 patients with normal heterophoria and low visual discomfort. Visual discomfort was defined using the Conlon survey. A refractive exam and an exhaustive evaluation of accommodation and vergence were assessed. Diagnostic validity by means of receiver operator characteristic (ROC) curves, sensitivity (S), specificity (Sp), and positive and negative likelihood ratios (LR+, LR−) were assessed. This analysis was also carried out considering multiple tests as serial testing strategy. Results. ROC analysis showed the best diagnostic accuracy for receded near point of convergence (NPC) recovery (area = 0.929) and binocular accommodative facility (BAF) (area = 0.886). Using the cut-offs obtained with ROC analysis, the best diagnostic validity was obtained for the combination of NPC recovery and BAF (S = 0.77, Sp = 1, LR+ = value tending to infinity, LR− = 0.23) and the combination of NPC break and recovery with BAF (S = 0.73, Sp = 1, LR+ = tending to infinity, LR− = 0.27). Conclusions. NPC and BAF tests were the tests with the best diagnostic accuracy for subjects with large near exophoria and moderate to severe symptoms.
Resumo:
In this paper we describe an approach to interface Abstract State Machines (ASM) with Multiway Decision Graphs (MDG) to enable tool support for the formal verification of ASM descriptions. ASM is a specification method for software and hardware providing a powerful means of modeling various kinds of systems. MDGs are decision diagrams based on abstract representation of data and axe used primarily for modeling hardware systems. The notions of ASM and MDG axe hence closely related to each other, making it appealing to link these two concepts. The proposed interface between ASM and MDG uses two steps: first, the ASM model is transformed into a flat, simple transition system as an intermediate model. Second, this intermediate model is transformed into the syntax of the input language of the MDG tool, MDG-HDL. We have successfully applied this transformation scheme on a case study, the Island Tunnel Controller, where we automatically generated the corresponding MDG-HDL models from ASM specifications.
Resumo:
This thesis presents a thorough and principled investigation into the application of artificial neural networks to the biological monitoring of freshwater. It contains original ideas on the classification and interpretation of benthic macroinvertebrates, and aims to demonstrate their superiority over the biotic systems currently used in the UK to report river water quality. The conceptual basis of a new biological classification system is described, and a full review and analysis of a number of river data sets is presented. The biological classification is compared to the common biotic systems using data from the Upper Trent catchment. This data contained 292 expertly classified invertebrate samples identified to mixed taxonomic levels. The neural network experimental work concentrates on the classification of the invertebrate samples into biological class, where only a subset of the sample is used to form the classification. Other experimentation is conducted into the identification of novel input samples, the classification of samples from different biotopes and the use of prior information in the neural network models. The biological classification is shown to provide an intuitive interpretation of a graphical representation, generated without reference to the class labels, of the Upper Trent data. The selection of key indicator taxa is considered using three different approaches; one novel, one from information theory and one from classical statistical methods. Good indicators of quality class based on these analyses are found to be in good agreement with those chosen by a domain expert. The change in information associated with different levels of identification and enumeration of taxa is quantified. The feasibility of using neural network classifiers and predictors to develop numeric criteria for the biological assessment of sediment contamination in the Great Lakes is also investigated.
Resumo:
Several cationic initiator systems were developed and used to polymerise oxetane with two oxonium ion initiator systems being investigated in depth. The first initiator system was generated by the elimination of a chloride group from a chloro methyl ethyl ether. Adding a carbonyl co-catalyst to a carbocationic centre generated the second initiator system. It was found that the anion used to stabilise the initiator was critical to the initial rate of polymerisation of oxetane with hexafluoroantimonate resulting in the fastest polymerisations. Both initiator systems could be used at varying monomer to initiator concentrations to control the molecular number average, Mn, of the resultant polymer. Both initiator systems showed living characteristics and were used to polymerise further monomers and generate higher molecular weight material and block copolymers. Oxetane and 3,3-dimethyl oxetane can both be polymerised using either oxonium ion initiator system in a variety of DCM or DCM/1,4-dioxane solvent mixtures. The level of 1,4-dioxane does have an impact on the initial rate of polymerisation with higher levels resulting in lower initial rates of polymerisation but do tend to result in higher polydispersities. The level of oligomer formation is also reduced as the level of 1,4-dioxane is increased. 3,3-bis-bromomethyl oxetane was also polymerised but a large amount of hyperbranching was seen at the bromide site resulting in a difficult to solvate polymer system. Multifunctional initiator systems were also generated using the halide elimination reactions with some success being achieved with 1,3,5-tris-bromomethyl-2,4,6-tris-methyl-benzene derived initiator system. This offered some control over the molecular number average of the resultant polymer system.
Resumo:
Data envelopment analysis (DEA) is the most widely used methods for measuring the efficiency and productivity of decision-making units (DMUs). The need for huge computer resources in terms of memory and CPU time in DEA is inevitable for a large-scale data set, especially with negative measures. In recent years, wide ranges of studies have been conducted in the area of artificial neural network and DEA combined methods. In this study, a supervised feed-forward neural network is proposed to evaluate the efficiency and productivity of large-scale data sets with negative values in contrast to the corresponding DEA method. Results indicate that the proposed network has some computational advantages over the corresponding DEA models; therefore, it can be considered as a useful tool for measuring the efficiency of DMUs with (large-scale) negative data.