955 resultados para Open source information retrieval
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This paper reports a research to evaluate the potential and the effects of use of annotated Paraconsistent logic in automatic indexing. This logic attempts to deal with contradictions, concerned with studying and developing inconsistency-tolerant systems of logic. This logic, being flexible and containing logical states that go beyond the dichotomies yes and no, permits to advance the hypothesis that the results of indexing could be better than those obtained by traditional methods. Interactions between different disciplines, as information retrieval, automatic indexing, information visualization, and nonclassical logics were considered in this research. From the methodological point of view, an algorithm for treatment of uncertainty and imprecision, developed under the Paraconsistent logic, was used to modify the values of the weights assigned to indexing terms of the text collections. The tests were performed on an information visualization system named Projection Explorer (PEx), created at Institute of Mathematics and Computer Science (ICMC - USP Sao Carlos), with available source code. PEx uses traditional vector space model to represent documents of a collection. The results were evaluated by criteria built in the information visualization system itself, and demonstrated measurable gains in the quality of the displays, confirming the hypothesis that the use of the para-analyser under the conditions of the experiment has the ability to generate more effective clusters of similar documents. This is a point that draws attention, since the constitution of more significant clusters can be used to enhance information indexing and retrieval. It can be argued that the adoption of non-dichotomous (non-exclusive) parameters provides new possibilities to relate similar information.
Resumo:
This article describes the design, implementation, and experiences with AcMus, an open and integrated software platform for room acoustics research, which comprises tools for measurement, analysis, and simulation of rooms for music listening and production. Through use of affordable hardware, such as laptops, consumer audio interfaces and microphones, the software allows evaluation of relevant acoustical parameters with stable and consistent results, thus providing valuable information in the diagnosis of acoustical problems, as well as the possibility of simulating modifications in the room through analytical models. The system is open-source and based on a flexible and extensible Java plug-in framework, allowing for cross-platform portability, accessibility and experimentation, thus fostering collaboration of users, developers and researchers in the field of room acoustics.
Resumo:
[EN]This paper describes a wildfi re forecasting application based on a 3D virtual environment and a fi re simulation engine. A novel open source framework is presented for the development of 3D graphics applications over large geographic areas, off ering high performance 3D visualization and powerful interaction tools for the Geographic Information Systems (GIS) community. The application includes a remote module that allows simultaneous connection of several users for monitoring a real wildfi re event.
Resumo:
L'obiettivo della ricerca è di compiere un'analisi dell'impatto della cosiddetta cultura "open" alla luce dell'attuale condizione del World Wide Web. Si prenderà in considerazione, in particolare, la genesi del movimento a partire dalle basi di cultura hacker e la relativa evoluzione nella filosofia del software libero, con il fine ultimo di identificare il ruolo attuale del modello open source nello scenario esistente. L'introduzione al concetto di Open Access completerà la ricerca anche considerando la recente riaffermazione della conoscenza come bene comune all'interno della Società dell'Informazione
Resumo:
In questo lavoro si introducono i concetti di base di Natural Language Processing, soffermandosi su Information Extraction e analizzandone gli ambiti applicativi, le attività principali e la differenza rispetto a Information Retrieval. Successivamente si analizza il processo di Named Entity Recognition, focalizzando l’attenzione sulle principali problematiche di annotazione di testi e sui metodi per la valutazione della qualità dell’estrazione di entità. Infine si fornisce una panoramica della piattaforma software open-source di language processing GATE/ANNIE, descrivendone l’architettura e i suoi componenti principali, con approfondimenti sugli strumenti che GATE offre per l'approccio rule-based a Named Entity Recognition.
Resumo:
We describe the use of log file analysis to investigate whether the use of CSCL applications corresponds to its didactical purposes. Exemplarily we examine the use of the web-based system CommSy as software support for project-oriented university courses. We present two findings: (1) We suggest measures to shape the context of CSCL applications and support their initial and continuous use. (2) We show how log files can be used to analyze how, when and by whom a CSCL system is used and thus help to validate further empirical findings. However, log file analyses can only be interpreted reasonably when additional data concerning the context of use is available.
Resumo:
In this paper, we describe agent-based content retrieval for opportunistic networks, where requesters can delegate content retrieval to agents, which retrieve the content on their behalf. The approach has been implemented in CCNx, the open source CCN framework, and evaluated on Android smart phones. Evaluations have shown that the overhead of agent delegation is only noticeable for very small content. For content larger than 4MB, agent-based content retrieval can even result in a throughput increase of 20% compared to standard CCN download applications. The requester asks every probe interval for agents that have retrieved the desired content. Evaluations have shown that a probe interval of 30s delivers the best overall performance in our scenario because the number of transmitted notification messages can be decreased by up to 80% without significantly increasing the download time.
Resumo:
Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
Resumo:
The worldwide "hyper-connection" of any object around us is the challenge that promises to cover the paradigm of the Internet of Things. If the Internet has colonized the daily life of more than 2000 million1 people around the globe, the Internet of Things faces of connecting more than 100000 million2 "things" by 2020. The underlying Internet of Things’ technologies are the cornerstone that promises to solve interrelated global problems such as exponential population growth, energy management in cities, and environmental sustainability in the average and long term. On the one hand, this Project has the goal of knowledge acquisition about prototyping technologies available in the market for the Internet of Things. On the other hand, the Project focuses on the development of a system for devices management within a Wireless Sensor and Actuator Network to offer some services accessible from the Internet. To accomplish the objectives, the Project will begin with a detailed analysis of various “open source” hardware platforms to encourage creative development of applications, and automatically extract information from the environment around them for transmission to external systems. In addition, web platforms that enable mass storage with the philosophy of the Internet of Things will be studied. The project will culminate in the proposal and specification of a service-oriented software architecture for embedded systems that allows communication between devices on the network, and the data transmission to external systems. Furthermore, it abstracts the complexities of hardware to application developers. RESUMEN. La “hiper-conexión” a nivel mundial de cualquier objeto que nos rodea es el desafío al que promete dar cobertura el paradigma de la Internet de las Cosas. Si la Internet ha colonizado el día a día de más de 2000 millones1 de personas en todo el planeta, la Internet de las Cosas plantea el reto de conectar a más de 100000 millones2 de “cosas” para el año 2020. Las tecnologías subyacentes de la Internet de las Cosas son la piedra angular que prometen dar solución a problemas globales interrelacionados como el crecimiento exponencial de la población, la gestión de la energía en las ciudades o la sostenibilidad del medioambiente a largo plazo. Este Proyecto Fin de Carrera tiene como principales objetivos por un lado, la adquisición de conocimientos acerca de las tecnologías para prototipos disponibles en el mercado para la Internet de las Cosas, y por otro lado el desarrollo de un sistema para la gestión de dispositivos de una red inalámbrica de sensores que ofrezcan unos servicios accesibles desde la Internet. Con el fin de abordar los objetivos marcados, el proyecto comenzará con un análisis detallado de varias plataformas hardware de tipo “open source” que estimulen el desarrollo creativo de aplicaciones y que permitan extraer de forma automática información del medio que les rodea para transmitirlo a sistemas externos para su posterior procesamiento. Por otro lado, se estudiarán plataformas web identificadas con la filosofía de la Internet de las Cosas que permitan el almacenamiento masivo de datos que diferentes plataformas hardware transfieren a través de la Internet. El Proyecto culminará con la propuesta y la especificación una arquitectura software orientada a servicios para sistemas empotrados que permita la comunicación entre los dispositivos de la red y la transmisión de datos a sistemas externos, así como facilitar el desarrollo de aplicaciones a los programadores mediante la abstracción de la complejidad del hardware.
Resumo:
Parte de la investigación biomédica actual se encuentra centrada en el análisis de datos heterogéneos. Estos datos pueden tener distinto origen, estructura, y semántica. Gran cantidad de datos de interés para los investigadores se encuentran en bases de datos públicas, que recogen información de distintas fuentes y la ponen a disposición de la comunidad de forma gratuita. Para homogeneizar estas fuentes de datos públicas con otras de origen privado, existen diversas herramientas y técnicas que permiten automatizar los procesos de homogeneización de datos heterogéneos. El Grupo de Informática Biomédica (GIB) [1] de la Universidad Politécnica de Madrid colabora en el proyecto europeo P-medicine [2], cuya finalidad reside en el desarrollo de una infraestructura que facilite la evolución de los procedimientos médicos actuales hacia la medicina personalizada. Una de las tareas enmarcadas en el proyecto P-medicine que tiene asignado el grupo consiste en elaborar herramientas que ayuden a usuarios en el proceso de integración de datos contenidos en fuentes de información heterogéneas. Algunas de estas fuentes de información son bases de datos públicas de ámbito biomédico contenidas en la plataforma NCBI [3] (National Center for Biotechnology Information). Una de las herramientas que el grupo desarrolla para integrar fuentes de datos es Ontology Annotator. En una de sus fases, la labor del usuario consiste en recuperar información de una base de datos pública y seleccionar de forma manual los resultados relevantes. Para automatizar el proceso de búsqueda y selección de resultados relevantes, por un lado existe un gran interés en conseguir generar consultas que guíen hacia resultados lo más precisos y exactos como sea posible, por otro lado, existe un gran interés en extraer información relevante de elevadas cantidades de documentos, lo cual requiere de sistemas que analicen y ponderen los datos que caracterizan a los mismos. En el campo informático de la inteligencia artificial, dentro de la rama de la recuperación de la información, existen diversos estudios acerca de la expansión de consultas a partir de retroalimentación relevante que podrían ser de gran utilidad para dar solución a la cuestión. Estos estudios se centran en técnicas para reformular o expandir la consulta inicial utilizando como realimentación los resultados que en una primera instancia fueron relevantes para el usuario, de forma que el nuevo conjunto de resultados tenga mayor proximidad con los que el usuario realmente desea. El objetivo de este trabajo de fin de grado consiste en el estudio, implementación y experimentación de métodos que automaticen el proceso de extracción de información trascendente de documentos, utilizándola para expandir o reformular consultas. De esta forma se pretende mejorar la precisión y el ranking de los resultados asociados. Dichos métodos serán integrados en la herramienta Ontology Annotator y enfocados a la fuente de datos de PubMed [4].---ABSTRACT---Part of the current biomedical research is focused on the analysis of heterogeneous data. These data may have different origin, structure and semantics. A big quantity of interesting data is contained in public databases which gather information from different sources and make it open and free to be used by the community. In order to homogenize thise sources of public data with others which origin is private, there are some tools and techniques that allow automating the processes of integration heterogeneous data. The biomedical informatics group of the Universidad Politécnica de Madrid cooperates with the European project P-medicine which main purpose is to create an infrastructure and models to facilitate the transition from current medical practice to personalized medicine. One of the tasks of the project that the group is in charge of consists on the development of tools that will help users in the process of integrating data from diverse sources. Some of the sources are biomedical public data bases from the NCBI platform (National Center for Biotechnology Information). One of the tools in which the group is currently working on for the integration of data sources is called the Ontology Annotator. In this tool there is a phase in which the user has to retrieve information from a public data base and select the relevant data contained in it manually. For automating the process of searching and selecting data on the one hand, there is an interest in automatically generating queries that guide towards the more precise results as possible. On the other hand, there is an interest on retrieve relevant information from large quantities of documents. The solution requires systems that analyze and weigh the data allowing the localization of the relevant items. In the computer science field of the artificial intelligence, in the branch of information retrieval there are diverse studies about the query expansion from relevance feedback that could be used to solve the problem. The main purpose of this studies is to obtain a set of results that is the closer as possible to the information that the user really wants to retrieve. In order to reach this purpose different techniques are used to reformulate or expand the initial query using a feedback the results that where relevant for the user, with this method, the new set of results will have more proximity with the ones that the user really desires. The goal of this final dissertation project consists on the study, implementation and experimentation of methods that automate the process of extraction of relevant information from documents using this information to expand queries. This way, the precision and the ranking of the results associated will be improved. These methods will be integrated in the Ontology Annotator tool and will focus on the PubMed data source.
Resumo:
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.
Resumo:
El campo de procesamiento de lenguaje natural (PLN), ha tenido un gran crecimiento en los últimos años; sus áreas de investigación incluyen: recuperación y extracción de información, minería de datos, traducción automática, sistemas de búsquedas de respuestas, generación de resúmenes automáticos, análisis de sentimientos, entre otras. En este artículo se presentan conceptos y algunas herramientas con el fin de contribuir al entendimiento del procesamiento de texto con técnicas de PLN, con el propósito de extraer información relevante que pueda ser usada en un gran rango de aplicaciones. Se pueden desarrollar clasificadores automáticos que permitan categorizar documentos y recomendar etiquetas; estos clasificadores deben ser independientes de la plataforma, fácilmente personalizables para poder ser integrados en diferentes proyectos y que sean capaces de aprender a partir de ejemplos. En el presente artículo se introducen estos algoritmos de clasificación, se analizan algunas herramientas de código abierto disponibles actualmente para llevar a cabo estas tareas y se comparan diversas implementaciones utilizando la métrica F en la evaluación de los clasificadores.
Resumo:
Evacuation route planning is a fundamental task for building engineering projects. Safety regulations are established so that all occupants are driven on time out of a building to a secure place when faced with an emergency situation. As an example, Spanish building code requires the planning of evacuation routes on large and, usually, public buildings. Engineers often plan these routes on single building projects, repeatedly assigning clusters of rooms to each emergency exit in a trial-and-error process. But problems may arise for a building complex where distribution and use changes make visual analysis cumbersome and sometimes unfeasible. This problem could be solved by using well-known spatial analysis techniques, implemented as a specialized software able to partially emulate engineer reasoning. In this paper we propose and test an easily reproducible methodology that makes use of free and open source software components for solving a case study. We ran a complete test on a building floor at the University of Alicante (Spain). This institution offers a web service (WFS) that allows retrieval of 2D geometries from any building within its campus. We demonstrate how geospatial technologies and computational geometry algorithms can be used for automating the creation and optimization of evacuation routes. In our case study, the engineers’ task is to verify that the load capacity of each emergency exit does not exceed the standards specified by Spain’s current regulations. Using Dijkstra’s algorithm, we obtain the shortest paths from every room to the most appropriate emergency exit. Once these paths are calculated, engineers can run simulations and validate, based on path statistics, different cluster configurations. Techniques and tools applied in this research would be helpful in the design and risk management phases of any complex building project.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-04