34 resultados para Information retrieval, dysorthography, dyslexia, finite state machines, readability
em Universidad Politécnica de Madrid
Resumo:
The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. A second order combination was also tested, mainly by averaging or selective combination of the documents retrieved by different approaches for a particular query.
Resumo:
This paper describes the first set of experiments defined by the MIRACLE (Multilingual Information RetrievAl for the CLEf campaign) research group for some of the cross language tasks defined by CLEF. These experiments combine different basic techniques, linguistic-oriented and statistic-oriented, to be applied to the indexing and retrieval processes.
Resumo:
The evolution of communications networks to Next Generation Networks (NGN) has encouraged the development of new services. Nowadays, several technologies are being integrated into telecommunications services in order to provide new functionalities, resulting in what are known as converged services. The objective is to adapt the behavior of the services to the necessities of different users, generating customized services. Some of the main technologies involved in their development are those related to the Web. But due to this type of services implies the combination of different technologies, their development is a very complex process that has to be improved to reduce the time and cost required, with the aim of promoting the success of such services. This paper proposes to apply software reuse through the utilization of a component library and presents one focused on ECharts for SIP Servlets (E4SS). It is a framework, based on the SIP Servlet specification, which uses finite state machines for the definition of converged communications services. Also, to promote the use of the library, a methodology is proposed in order to facilitate the integration between the library operations and the software development cycle.
Resumo:
ImageCLEF is a pilot experiment run at CLEF 2003 for cross language image retrieval using textual captions related to image contents. In this paper, we describe the participation of the MIRACLE research team (Multilingual Information RetrievAl at CLEF), detailing the different experiments and discussing their preliminary results.
Resumo:
Las redes sociales en la actualidad son muy relevantes, no solo ocupan mucho tiempo en la vida diaria de las personas si no que también sirve a millones de empresas para publicitarse entre otras cosas. Al fenómeno de las redes sociales se le ha unido la faceta empresarial. La liberación de las APIs de algunas redes sociales ha permitido el desarrollo de aplicaciones de todo tipo y que puedan tener diferentes objetivos como por ejemplo este proyecto. Este proyecto comenzó desde el interés por Ericsson del estudio del API de Google+ y sugerencias para dar valores añadidos a las empresas de telecomunicaciones. También ha complementando la referencia disponible en Ericsson y de los otros dos proyectos de recuperación de información de las redes sociales, añadiendo una serie de opciones para el usuario en la aplicación. Para ello, se ha analizado y realizado un ejemplo, de lo que podemos obtener de las redes sociales, principalmente Twitter y Google+. Lo primero en lo que se ha basado el proyecto ha sido en realizar un estudio teórico sobre el inicio de las redes sociales, el desarrollo y el estado en el que se encuentran, analizando así las principales redes sociales que existen y aportando una visión general sobre todas ellas. También se ha realizado un estado de arte sobre una serie de webs que se dedican al uso de esa información disponible en Internet. Posteriormente, de todas las redes sociales con APIs disponibles se realizó la elección de Google+ porque es una red social nueva aun por explorar y mejorar. Y la elección de Twitter por la serie de opciones y datos que se puede obtener de ella. De ambas se han estudiado sus APIs, para posteriormente con la información obtenida, realizar una aplicación prototipo que recogiera una serie de funciones útiles a partir de los datos de sus redes sociales. Por último se ha realizado una simple interfaz en la cual se puede acceder a los datos de la cuenta como si se estuviera en Twitter o Google+, además con los datos de Twitter se puede realizar una búsqueda avanzada con alertas, un análisis de sentimiento, ver tus mayores retweets de los que te siguen y por último realizar un seguimiento comparando lo que se comenta sobre dos temas determinados. Con este proyecto se ha pretendido proporcionar una idea general de todo lo relacionado con las redes sociales, las aplicaciones disponibles para trabajar con ellas, la información del API de Twitter y Google+ y un concepto de lo que se puede obtener. Today social networks are very relevant, they not only take a long time in daily life of people but also serve millions of businesses to advertise and other things. The phenomenon of social networks has been joined the business side. The release of the APIs of some social networks has allowed the development of applications of all types and different objectives such as this project. This project started from an interest in the study of Ericsson about Google+ API and suggestions to add value to telecommunications companies. This project has complementing the reference available in Ericsson and the other two projects of information retrieval of social networks, adding a number of options for the user in the application. To do this, we have analyzed and made an example of what we can get it from social networks, mainly Twitter and Google+. The first thing that has done in the project was to make a theoretical study on the initiation of social networks, the development and the state in which they are found, and analyze the major social networks that exist. There has also been made a state of art on a number of websites that are dedicated to the use of this information available online. Subsequently, about all the social networks APIs available, Google+ was choice because it is a new social network even to explore and improve. And the choice of Twitter for the number of options and data that can be obtained from it. In both APIs have been studied, and later with the information obtained, make a prototype application to collect a number of useful features from data of social networks. Finally there has been a simple interface, in which you can access the account as if you were on Twitter or Google+. With Twitter data can perform an advanced search with alerts, sentiment analysis, see retweets of who follow you and make comparing between two particular topics. This project is intended to provide an overview of everything related to social networks, applications available to work with them, information about API of Google+ and Twitter, and a concept of what you can get.
Resumo:
La seguridad y fiabilidad de los procesos industriales son la principal preocupación de los ingenieros encargados de las plantas industriales. Por lo tanto, desde un punto de vista económico, el objetivo principal es reducir el costo del mantenimiento, el tiempo de inactividad y las pérdidas causadas por los fallos. Por otra parte, la seguridad de los operadores, que afecta a los aspectos sociales y económicos, es el factor más relevante a considerar en cualquier sistema Debido a esto, el diagnóstico de fallos se ha convertido en un foco importante de interés para los investigadores de todo el mundo e ingenieros en la industria. Los principales trabajos enfocados en detección de fallos se basan en modelos de los procesos. Existen diferentes técnicas para el modelado de procesos industriales tales como máquinas de estado, árboles de decisión y Redes de Petri (RdP). Por lo tanto, esta tesis se centra en el modelado de procesos utilizando redes de petri interpretadas. Redes de Petri es una herramienta usada en el modelado gráfico y matemático con la habilidad para describir información de los sistemas de una manera concurrente, paralela, asincrona, distribuida y no determinística o estocástica. RdP son también una herramienta de comunicación visual gráfica útil como lo son las cartas de flujo o diagramas de bloques. Adicionalmente, las marcas de las RdP simulan la dinámica y concurrencia de los sistemas. Finalmente, ellas tienen la capacidad de definir ecuaciones de estado específicas, ecuaciones algebraicas y otros modelos que representan el comportamiento común de los sistemas. Entre los diferentes tipos de redes de petri (Interpretadas, Coloreadas, etc.), este trabajo de investigación trata con redes de petri interpretadas principalmente debido a características tales como sincronización, lugares temporizados, aparte de su capacidad para procesamiento de datos. Esta investigación comienza con el proceso para diseñar y construir el modelo y diagnosticador para detectar fallos definitivos, posteriormente, la dinámica temporal fue adicionada para detectar fallos intermitentes. Dos procesos industriales, concretamente un HVAC (Calefacción, Ventilación y Aire Acondicionado) y un Proceso de Envasado de Líquidos fueron usados como banco de pruebas para implementar la herramienta de diagnóstico de fallos (FD) creada. Finalmente, su capacidad de diagnóstico fue ampliada en orden a detectar fallos en sistemas híbridos. Finalmente, un pequeño helicóptero no tripulado fue elegido como ejemplo de sistema donde la seguridad es un desafío, y las técnicas de detección de fallos desarrolladas en esta tesis llevan a ser una herramienta valorada, desde que los accidentes de las aeronaves no tripuladas (UAVs) envuelven un alto costo económico y son la principal razón para introducir restricciones de volar sobre áreas pobladas. Así, este trabajo introduce un proceso sistemático para construir un Diagnosticador de Fallos del sistema mencionado basado en RdR Esta novedosa herramienta es capaz de detectar fallos definitivos e intermitentes. El trabajo realizado es discutido desde un punto de vista teórico y práctico. El procedimiento comienza con la división del sistema en subsistemas para seguido integrar en una RdP diagnosticadora global que es capaz de monitorear el sistema completo y mostrar las variables críticas al operador en orden a determinar la salud del UAV, para de esta manera prevenir accidentes. Un Sistema de Adquisición de Datos (DAQ) ha sido también diseñado para recoger datos durante los vuelos y alimentar la RdP diagnosticadora. Vuelos reales realizados bajo condiciones normales y de fallo han sido requeridos para llevar a cabo la configuración del diagnosticador y verificar su comportamiento. Vale la pena señalar que un alto riesgo fue asumido en la generación de fallos durante los vuelos, a pesar de eso esto permitió recoger datos básicos para desarrollar el diagnóstico de fallos, técnicas de aislamiento, protocolos de mantenimiento, modelos de comportamiento, etc. Finalmente, un resumen de la validación de resultados obtenidos durante las pruebas de vuelo es también incluido. Un extensivo uso de esta herramienta mejorará los protocolos de mantenimiento para UAVs (especialmente helicópteros) y permite establecer recomendaciones en regulaciones. El uso del diagnosticador usando redes de petri es considerado un novedoso enfoque. ABSTRACT Safety and reliability of industrial processes are the main concern of the engineers in charge of industrial plants. Thus, from an economic point of view, the main goal is to reduce the maintenance downtime cost and the losses caused by failures. Moreover, the safety of the operators, which affects to social and economic aspects, is the most relevant factor to consider in any system. Due to this, fault diagnosis has become a relevant focus of interest for worldwide researchers and engineers in the industry. The main works focused on failure detection are based on models of the processes. There are different techniques for modelling industrial processes such as state machines, decision trees and Petri Nets (PN). Thus, this Thesis is focused on modelling processes by using Interpreted Petri Nets. Petri Nets is a tool used in the graphic and mathematical modelling with ability to describe information of the systems in a concurrent, parallel, asynchronous, distributed and not deterministic or stochastic manner. PNs are also useful graphical visual communication tools as flow chart or block diagram. Additionally, the marks of the PN simulate the dynamics and concurrence of the systems. Finally, they are able to define specific state equations, algebraic equations and other models that represent the common behaviour of systems. Among the different types of PN (Interpreted, Coloured, etc.), this research work deals with the interpreted Petri Nets mainly due to features such as synchronization capabilities, timed places, apart from their capability for processing data. This Research begins with the process for designing and building the model and diagnoser to detect permanent faults, subsequently, the temporal dynamic was added for detecting intermittent faults. Two industrial processes, namely HVAC (Heating, Ventilation and Air Condition) and Liquids Packaging Process were used as testbed for implementing the Fault Diagnosis (FD) tool created. Finally, its diagnostic capability was enhanced in order to detect faults in hybrid systems. Finally, a small unmanned helicopter was chosen as example of system where safety is a challenge and fault detection techniques developed in this Thesis turn out to be a valuable tool since UAVs accidents involve high economic cost and are the main reason for setting restrictions to fly over populated areas. Thus, this work introduces a systematic process for building a Fault Diagnoser of the mentioned system based on Petri Nets. This novel tool is able to detect both intermittent and permanent faults. The work carried out is discussed from theoretical and practical point of view. The procedure begins with a division of the system into subsystems for further integration into a global PN diagnoser that is able to monitor the whole system and show critical variables to the operator in order to determine the UAV health, preventing accidents in this manner. A Data Acquisition System (DAQ) has been also designed for collecting data during the flights and feed PN Diagnoser. Real flights carried out under nominal and failure conditions have been required to perform the diagnoser setup and verify its performance. It is worth noting that a high risk was assumed in the generation of faults during the flights, nevertheless this allowed collecting basic data so as to develop fault diagnosis, isolations techniques, maintenance protocols, behaviour models, etc. Finally, a summary of the validation results obtained during real flight tests is also included. An extensive use of this tool will improve preventive maintenance protocols for UAVs (especially helicopters) and allow establishing recommendations in regulations. The use of the diagnoser by using Petri Nets is considered as novel approach.
Resumo:
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
Resumo:
This paper presents the 2006 Miracle team’s approaches to the Ad-Hoc and Geographical Information Retrieval tasks. A first set of runs was obtained using a set of basic components. Then, by putting together special combinations of these runs, an extended set was obtained. With respect to previous campaigns some improvements have been introduced in our system: an entity recognition prototype is integrated in our tokenization scheme, and the performance of our indexing and retrieval engine has been improved. For GeoCLEF, we tested retrieving using geo-entity and textual references separately, and then combining them with different approaches.
Resumo:
This paper presents the 2005 MIRACLE team’s approach to Cross-Language Geographical Retrieval (GeoCLEF). The main goal of the GeoCLEF participation of the MIRACLE team was to test the effect that geographical information retrieval techniques have on information retrieval. The baseline approach is based on the development of named entity recognition and geospatial information retrieval tools and on its combination with linguistic techniques to carry out indexing and retrieval tasks.
Resumo:
This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.
Resumo:
In the beginning of the 90s, ontology development was similar to an art: ontology developers did not have clear guidelines on how to build ontologies but only some design criteria to be followed. Work on principles, methods and methodologies, together with supporting technologies and languages, made ontology development become an engineering discipline, the so-called Ontology Engineering. Ontology Engineering refers to the set of activities that concern the ontology development process and the ontology life cycle, the methods and methodologies for building ontologies, and the tool suites and languages that support them. Thanks to the work done in the Ontology Engineering field, the development of ontologies within and between teams has increased and improved, as well as the possibility of reusing ontologies in other developments and in final applications. Currently, ontologies are widely used in (a) Knowledge Engineering, Artificial Intelligence and Computer Science, (b) applications related to knowledge management, natural language processing, e-commerce, intelligent information integration, information retrieval, database design and integration, bio-informatics, education, and (c) the Semantic Web, the Semantic Grid, and the Linked Data initiative. In this paper, we provide an overview of Ontology Engineering, mentioning the most outstanding and used methodologies, languages, and tools for building ontologies. In addition, we include some words on how all these elements can be used in the Linked Data initiative.
Resumo:
This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating semantic technologies and uses the Business Process Execution Language (BPEL) to define evaluation workflows that can be executed by process engines. This framework supports a variety of evaluations, from different semantic areas, including search, and is extendible to new evaluations. We show how BPEL addresses this diversity as well as how it is used to solve specific challenges such as heterogeneity, error handling and reuse
Resumo:
EURATOM/CIEMAT and Technical University of Madrid (UPM) have been involved in the development of a FPSC [1] (Fast Plant System Control) prototype for ITER, based on PXIe (PCI eXtensions for Instrumentation). One of the main focuses of this project has been data acquisition and all the related issues, including scientific data archiving. Additionally, a new data archiving solution has been developed to demonstrate the obtainable performances and possible bottlenecks of scientific data archiving in Fast Plant System Control. The presented system implements a fault tolerant architecture over a GEthernet network where FPSC data are reliably archived on remote, while remaining accessible to be redistributed, within the duration of a pulse. The storing service is supported by a clustering solution to guaranty scalability, so that FPSC management and configuration may be simplified, and a unique view of all archived data provided. All the involved components have been integrated under EPICS [2] (Experimental Physics and Industrial Control System), implementing in each case the necessary extensions, state machines and configuration process variables. The prototyped solution is based on the NetCDF-4 [3] and [4] (Network Common Data Format) file format in order to incorporate important features, such as scientific data models support, huge size files management, platform independent codification, or single-writer/multiple-readers concurrency. In this contribution, a complete description of the above mentioned solution is presented, together with the most relevant results of the tests performed, while focusing in the benefits and limitations of the applied technologies.
Resumo:
This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.
Resumo:
En este proyecto fin de máster se desarrolla un modelo de simulación de la plataforma Cookies y se define una interfaz de diseño que permita reflejar la principal característica diferencial de esta plataforma, la modularidad. Para ello se propone una estructura basada en 4 submodelos independientes, uno por cada una de las capas de la plataforma, definidos con máquinas de estados o FSM (Finite State Machine). Para cada una de las capas se crean varios modelos para probar que se cumple con la condición de que las todas las funcionalidades del nodo sean independientes entre sí, manteniendo así la modularidad característica de la plataforma Cookies.