923 resultados para Information retrieval, dysorthography, dyslexia, finite state machines, readability
Resumo:
This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating semantic technologies and uses the Business Process Execution Language (BPEL) to define evaluation workflows that can be executed by process engines. This framework supports a variety of evaluations, from different semantic areas, including search, and is extendible to new evaluations. We show how BPEL addresses this diversity as well as how it is used to solve specific challenges such as heterogeneity, error handling and reuse
Resumo:
EURATOM/CIEMAT and Technical University of Madrid (UPM) have been involved in the development of a FPSC [1] (Fast Plant System Control) prototype for ITER, based on PXIe (PCI eXtensions for Instrumentation). One of the main focuses of this project has been data acquisition and all the related issues, including scientific data archiving. Additionally, a new data archiving solution has been developed to demonstrate the obtainable performances and possible bottlenecks of scientific data archiving in Fast Plant System Control. The presented system implements a fault tolerant architecture over a GEthernet network where FPSC data are reliably archived on remote, while remaining accessible to be redistributed, within the duration of a pulse. The storing service is supported by a clustering solution to guaranty scalability, so that FPSC management and configuration may be simplified, and a unique view of all archived data provided. All the involved components have been integrated under EPICS [2] (Experimental Physics and Industrial Control System), implementing in each case the necessary extensions, state machines and configuration process variables. The prototyped solution is based on the NetCDF-4 [3] and [4] (Network Common Data Format) file format in order to incorporate important features, such as scientific data models support, huge size files management, platform independent codification, or single-writer/multiple-readers concurrency. In this contribution, a complete description of the above mentioned solution is presented, together with the most relevant results of the tests performed, while focusing in the benefits and limitations of the applied technologies.
Resumo:
This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.
Resumo:
En este proyecto fin de máster se desarrolla un modelo de simulación de la plataforma Cookies y se define una interfaz de diseño que permita reflejar la principal característica diferencial de esta plataforma, la modularidad. Para ello se propone una estructura basada en 4 submodelos independientes, uno por cada una de las capas de la plataforma, definidos con máquinas de estados o FSM (Finite State Machine). Para cada una de las capas se crean varios modelos para probar que se cumple con la condición de que las todas las funcionalidades del nodo sean independientes entre sí, manteniendo así la modularidad característica de la plataforma Cookies.
Resumo:
Collaborative filtering recommender systems contribute to alleviating the problem of information overload that exists on the Internet as a result of the mass use of Web 2.0 applications. The use of an adequate similarity measure becomes a determining factor in the quality of the prediction and recommendation results of the recommender system, as well as in its performance. In this paper, we present a memory-based collaborative filtering similarity measure that provides extremely high-quality and balanced results; these results are complemented with a low processing time (high performance), similar to the one required to execute traditional similarity metrics. The experiments have been carried out on the MovieLens and Netflix databases, using a representative set of information retrieval quality measures.
Resumo:
Starting from the way the inter-cellular communication takes place by means of protein channels and also from the standard knowledge about neuron functioning, we propose a computing model called a tissue P system, which processes symbols in a multiset rewriting sense, in a net of cells similar to a neural net. Each cell has a finite state memory, processes multisets of symbol-impulses, and can send impulses (?excitations?) to the neighboring cells. Such cell nets are shown to be rather powerful: they can simulate a Turing machine even when using a small number of cells, each of them having a small number of states. Moreover, in the case when each cell works in the maximal manner and it can excite all the cells to which it can send impulses, then one can easily solve the Hamiltonian Path Problem in linear time. A new characterization of the Parikh images of ET0L languages are also obtained in this framework.
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
We present in this paper a neural-like membrane system solving the SAT problem in linear time. These neural Psystems are nets of cells working with multisets. Each cell has a finite state memory, processes multisets of symbol-impulses, and can send impulses (?excitations?) to the neighboring cells. The maximal mode of rules application and the replicative mode of communication between cells are at the core of the eficiency of these systems.
Resumo:
La proliferación en todos los ámbitos de la producción multimedia está dando lugar a la aparición de nuevos paradigmas de recuperación de información visual. Dentro de éstos, uno de los más significativos es el de los sistemas de recuperación de información visual, VIRS (Visual Information Retrieval Systems), en los que una de las tareas más representativas es la ordenación de una población de imágenes según su similitud con un ejemplo dado. En este trabajo se presenta una propuesta original para la evaluación de la similitud entre dos imágenes, basándose en la extensión del concepto de saliencia desde el espacio de imágenes al de características para establecer la relevancia de cada componente de dicho vector. Para ello se introducen metodologías para la cuantificación de la saliencia de valores individuales de características, para la combinación de estas cuantificaciones en procesos de comparación entre dos imágenes, y para, finalmente, establecer la mencionada ponderación de cada característica en atención a esta combinación. Se presentan igualmente los resultados de evaluar esta propuesta en una tarea de recuperación de imágenes por contenido en comparación con los obtenidos con la distancia euclídea. Esta comparación se realiza mediante la evaluación de ambos resultados por voluntarios.
Resumo:
Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
Resumo:
This paper aims to obtain a baseline snapshot of the requirement management process using a two-stage questionnaire to identify both performed and non-performed CMMI practices. The questionnaire proposed in this paper may help with the assessment of the requirement management process, provide useful information related to the current state of the process, and indicate those practices that require immediate attention with the aim of begin a Software Process Improvement program.
Resumo:
El proyecto que he realizado ha consistido en la creación de un sistema de información geográfica para el Campus Sur UPM, que puede servir de referencia para su implantación en cualquier otro campus universitario. Esta idea surge de la necesidad por parte de los usuarios de un campus de disponer de una herramienta que les permita consultar la información de los distintos lugares y servicios del campus, haciendo especial hincapié en su localización geográfica. Para ello ha sido necesario estudiar las tecnologías actuales que permiten implementar un sistema de información geográfica, dando lugar al sistema propuesto, que consiste en un conjunto de medios informáticos (hardware y software), que van a permitir al personal del campus obtener la información y localización de los elementos del campus desde su móvil. Tras realizar un análisis de los requisitos y funcionalidades que debía tener el sistema, el proyecto ha consistido en el diseño e implementación de dicho sistema. La información a consultar estará almacenada y disponible para su consulta en un equipo servidor accesible para el personal del campus. Para ello, durante la realización del proyecto, ha sido necesario crear un modelo de datos basado en el campus y cargar los datos geográficos de utilidad en una base de datos. Todo esto ha sido realizado mediante el producto software Smallword Core 4.2. Además, ha sido también necesario desplegar un software servidor que permita a los usuarios consultar dichos datos desde sus móviles vía WIFI o Internet, el producto utilizado para este fin ha sido Smallworld Geospatial Server 4.2. Para la realización de las consultas se han utilizado los servicios WMS(Web Map Service) y WFS(Web Feature Service) definidos por el OGC(Open Geospatial Consortium). Estos servicios están adaptados para la consulta de información geográfica. El sistema también está compuesto por una aplicación para dispositivos móviles con sistema operativo Android, que permite a los usuarios del sistema consultar y visualizar la información geográfica del campus. Dicha aplicación ha sido diseñada y programada a lo largo de la realización del proyecto. Para la realización de este proyecto también ha sido necesario un estudio del presupuesto que supondría una implantación real del sistema y el mantenimiento que implicaría tener el sistema actualizado. Por último, el proyecto incluye una breve descripción de las tecnologías futuras que podrían mejorar las funcionalidades del sistema: la realidad aumentada y el posicionamiento en el interior de edificios. ABSTRACT. The project I've done has been to create a geographic information system for the Campus Sur UPM, which can serve as a reference for implementation in any other college campus. This idea arises from the need for the campus users to have a tool that allows them to view information from different places and services, with particular emphasis on their geographical location. It has been necessary to study the current technologies that allow implementing a geographic information system, leading to the proposed system, which consists of a set of computer resources (hardware and software) that will allow campus users to obtain information and location of campus components from their mobile phones. Following an analysis of the requirements and functionalities that the system should have, the project involved the design and implementation of the system . The information will be stored and available on a computer server accessible to campus users. Accordingly, during the project, it was necessary to create a data model based on campus data and load this data in a database. All this has been done by Smallword Core 4.2 software product. In addition, it has also been necessary to deploy a server software that allows users to query the data from their phones via WIFI or Internet, the product used for this purpose has been Smallworld Geospatial Server 4.2 . To carry out the consultations have used the services WMS (Web Map Service) and WFS (Web Feature Service) defined by the OGC (Open Geospatial Consortium). These services are tailored to the geographic information retrieval. The system also consists of an application for mobile devices with Android operating system, which allows users to query and display geographic information related to the campus. This application has been designed and programmed over the project. For the realization of this project has also been necessary to study the budget that would be a real system implementation and the maintenance that would have the system updated. Finally, the project includes a brief description of future technologies that could improve the system's functionality: augmented reality and positioning inside the buildings.
Resumo:
La Gestión de Recursos Humanos a través de Internet es un problema latente y presente actualmente en cualquier sitio web dedicado a la búsqueda de empleo. Este problema también está presente en AFRICA BUILD Portal. AFRICA BUILD Portal es una emergente red socio-profesional nacida con el ánimo de crear comunidades virtuales que fomenten la educación e investigación en el área de la salud en países africanos. Uno de los métodos para fomentar la educación e investigación es mediante la movilidad de estudiantes e investigadores entre instituciones, apareciendo así, el citado problema de la gestión de recursos humanos. Por tanto, este trabajo se centra en solventar el problema de la gestión de recursos humanos en el entorno específico de AFRICA BUILD Portal. Para solventar este problema, el objetivo es desarrollar un sistema de recomendación que ayude en la gestión de recursos humanos en lo que concierne a la selección de las mejores ofertas y demandas de movilidad. Caracterizando al sistema de recomendación como un sistema semántico el cual ofrecerá las recomendaciones basándose en las reglas y restricciones impuestas por el dominio. La aproximación propuesta se basa en seguir el enfoque de los sistemas de Matchmaking semánticos. Siguiendo este enfoque, por un lado, se ha empleado un razonador de lógica descriptiva que ofrece inferencias útiles en el cálculo de las recomendaciones y por otro lado, herramientas de procesamiento de lenguaje natural para dar soporte al proceso de recomendación. Finalmente para la integración del sistema de recomendación con AFRICA BUILD Portal se han empleado diversas tecnologías web. Los resultados del sistema basados en la comparación de recomendaciones creadas por el sistema y por usuarios reales han mostrado un funcionamiento y rendimiento aceptable. Empleando medidas de evaluación de sistemas de recuperación de información se ha obtenido una precisión media del sistema de un 52%, cifra satisfactoria tratándose de un sistema semántico. Pudiendo concluir que con la solución implementada se ha construido un sistema estable y modular posibilitando: por un lado, una fácil evolución que debería ir encaminada a lograr un rendimiento mayor, incrementando su precisión y por otro lado, dejando abiertas nuevas vías de crecimiento orientadas a la explotación del potencial de AFRICA BUILD Portal mediante la Web 3.0. ---ABSTRACT---The Human Resource Management through Internet is currently a latent problem shown in any employment website. This problem has also appeared in AFRICA BUILD Portal. AFRICA BUILD Portal is an emerging socio-professional network with the objective of creating virtual communities to foster the capacity for health research and education in African countries. One way to foster this capacity of research and education is through the mobility of students and researches between institutions, thus appearing the Human Resource Management problem. Therefore, this dissertation focuses on solving the Human Resource Management problem in the specific environment of AFRICA BUILD Portal. To solve this problem, the objective is to develop a recommender system which assists the management of Human Resources with respect to the selection of the best mobility supplies and demands. The recommender system is a semantic system which will provide the recommendations according to the domain rules and restrictions. The proposed approach is based on semantic matchmaking solutions. So, this approach on the one hand uses a Description Logics reasoning engine which provides useful inferences to the recommendation process and on the other hand uses Natural Language Processing techniques to support the recommendation process. Finally, Web technologies are used in order to integrate the recommendation system into AFRICA BUILD Portal. The results of evaluating the system are based on the comparison between recommendations created by the system and by real users. These results have shown an acceptable behavior and performance. The average precision of the system has been obtained by evaluation measures for information retrieval systems, so the average precision of the system is at 52% which may be considered as a satisfactory result taking into account that the system is a semantic system. To conclude, it could be stated that the implemented system is stable and modular. This fact on the one hand allows an easy evolution that should aim to achieve a higher performance by increasing its average precision and on the other hand keeps open new ways to increase the functionality of the system oriented to exploit the potential of AFRICA BUILD Portal through Web 3.0.
Resumo:
Los servicios en red que conocemos actualmente están basados en documentos y enlaces de hipertexto que los relacionan entre sí sin aportar verdadera información acerca de los contenidos que representan. Podría decirse que se trata de “una red diseñada por personas para ser interpretada por personas”. El objetivo principal de los últimos años es encaminar esta red hacia una web de conocimiento, en la que la información pueda ser interpretada por agentes computerizados de manera automática. Para llevar a cabo esta transformación es necesaria la utilización de nuevas tecnologías especialmente diseñadas para la descripción de contenidos como son las ontologías. Si bien las redes convencionales están evolucionando, no son las únicas que lo están haciendo. El rápido crecimiento de las redes de sensores y el importante aumento en el número de dispositivos conectados a internet, hace necesaria la incorporación de tecnologías de la web semántica a este tipo de redes. Para la realización de este Proyecto de Fin de Carrera se utilizará la ontología SSN, diseñada para la descripción semántica de sensores y las redes de las que forman parte con el fin de permitir una mejor interacción entre los dispositivos y los sistemas que hacen uso de ellos. El trabajo desarrollado a lo largo de este Proyecto de Fin de Carrera gira en torno a esta ontología, siendo el principal objetivo la generación semiautomática de código a partir de un modelo de sistemas descrito en función de las clases y propiedades proporcionadas por SSN. Para alcanzar este fin se dividirá el proyecto en varias partes. Primero se realizará un análisis de la ontología mencionada. A continuación se describirá un sistema simulado de sensores y por último se implementarán las aplicaciones para la generación automática de interfaces y la representación gráfica de los dispositivos del sistema a partir de la representación del éste en un fichero de tipo OWL. ABSTRACT. The web we know today is based on documents and hypertext links that relate these documents with each another, without providing consistent information about the contents they represent. It could be said that its a network designed by people to be used by people. The main goal of the last couple of years is to guide this network into a web of knowledge, where information can be automatically processed by machines. This transformation, requires the use of new technologies specially designed for content description such as ontologies. Nowadays, conventional networks are not the only type of networks evolving. The use of sensor networks and the number of sensor devices connected to the Internet is rapidly increasing, making the use the integration of semantic web technologies to this kind of networks completely necessary. The SSN ontology will be used for the development of this Final Degree Dissertation. This ontology was design to semantically describe sensors and the networks theyre part of, allowing a better interaction between devices and the systems that use them. The development carried through this Final Degree Dissertation revolves around this ontology and aims to achieve semiautomatic code generation starting from a system model described based on classes and properties provided by SSN. To reach this goal, de Dissertation will be divided in several parts. First, an analysis about the mentioned ontology will be made. Following this, a simulated sensor system will be described, and finally, the implementation of the applications will take place. One of these applications will automatically generate de interfaces and the other one will graphically represents the devices in the sensor system, making use of the system representation in an OWL file.
Resumo:
El presente Proyecto de Fin de Máster consiste en crear una herramienta software capaz de monitorizar y gestionar la actividad de Hydra, una herramienta de gestión de entornos distribuidos, para que su estrategia de balanceo de carga se adecúe al modelo creado por GloBeM, una metodología de análisis de entornos distribuidos. GloBeM, que es una metodología externa, puede analizar y crear un modelo de máquina de estados finitos a partir de un sistema distribuido concreto. Hydra, una herramienta también externa, es un sistema de gestión de entornos cloud recientemente desarrollado y de código abierto, con un sistema de balanceo de carga efectivo pero algo limitado. El software construido recoge el modelo creado por GloBeM y lo analiza. A partir de ahí, monitoriza en tiempo real y a una frecuencia determinada la actividad de Hydra y el sistema cloud que ésta gestiona, y reconfigura sus parámetros para que su desempeño se ciña a lo estipulado por el modelo de GloBeM, extendiendo así el sistema de balanceo de carga original de Hydra.---ABSTRACT---This Master's Thesis Project involves creating a software able to monitor and manage the activity of Hydra, a tool for managing distributed environments, in order to adjust its load balancing strategy to the model created by GloBeM, an analysis methodology for distributed environments. GloBeM, which is an external methodology, can analyse and create a finite-state machine model from a particular cloud system. Hydra, also an external tool, is an open source management system for cloud environments recently developed, with a relatively limited system of load balancing. The created software gets the model created by GloBeM as an input and analyses it. From there, it monitors in real time and at a certain frequency Hydra’s activity and the cloud system that it manages, and reconfigures its parameters to adjust its performance to the stipulations by the GloBeM’s model, extending Hydra's original load balancing system.