23 resultados para Semantic web, Search engine optimization, Information retrieval, Key concept induction

em Universidad de Alicante


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we explore the use of semantic classes in an existing information retrieval system in order to improve its results. Thus, we use two different ontologies of semantic classes (WordNet domain and Basic Level Concepts) in order to re-rank the retrieved documents and obtain better recall and precision. Finally, we implement a new method for weighting the expanded terms taking into account the weights of the original query terms and their relations in WordNet with respect to the new ones (which have demonstrated to improve the results). The evaluation of these approaches was carried out in the CLEF Robust-WSD Task, obtaining an improvement of 1.8% in GMAP for the semantic classes approach and 10% in MAP employing the WordNet term weighting approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a complete system for the treatment of both geographical and temporal dimensions in text and its application to information retrieval. This system has been evaluated in both the GeoTime task of the 8th and 9th NTCIR workshop in the years 2010 and 2011 respectively, making it possible to compare the system to contemporary approaches to the topic. In order to participate in this task we have added the temporal dimension to our GIR system. The system proposed here has a modular architecture in order to add or modify features. In the development of this system, we have followed a QA-based approach as well as multi-search engines to improve the system performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La búsqueda es una de las actividades centrales en el mundo digital y, por tanto, uno de los elementos clave en el análisis de cibermedios, ya que una parte de sus audiencias y de sus ingresos procede de las páginas de resultados de los buscadores (SERP). En este trabajo, presentamos algunas de las herramientas de análisis de posicionamiento SEO más utilizadas con el fin de considerar su aplicación en estudios académicos sobre cibermedios. Aplicamos los nueve indicadores más importantes de estas herramientas a la página principal de cuatro cibermedios generalistas espanoles con el fin de estimar su viabilidad como indicadores alternativos al PageRank y otros indicadores de Google.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays there is a big amount of biomedical literature which uses complex nouns and acronyms of biological entities thus complicating the task of retrieval specific information. The Genomics Track works for this goal and this paper describes the approach we used to take part of this track of TREC 2007. As this is the first time we participate in this track, we configurated a new system consisting of the following diferenciated parts: preprocessing, passage generation, document retrieval and passage (with the answer) extraction. We want to call special attention to the textual retrieval system used, which was developed by the University of Alicante. Adapting the resources for the propouse, our system has obtained precision results over the mean and median average of the 66 official runs for the Document, Aspect and Passage2 MAP; and in the case of Passage MAP we get nearly the median and mean value. We want to emphasize we have obtained these results without incorporating specific information about the domain of the track. For the future, we would like to further develop our system in this direction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. Since Geographical Information Retrieval can be considered as an extension of the Information Retrieval field, the generation of summaries could be integrated into these systems by acting as an intermediate stage, with the purpose of reducing the document length. In this manner, the access time for information searching will be improved, while at the same time relevant documents will be also retrieved. Therefore, in this paper we propose the generation of two types of summaries (generic and geographical) applying several compression rates in order to evaluate their effectiveness in the Geographical Information Retrieval task. The evaluation has been carried out using GeoCLEF as evaluation framework and following an Information Retrieval perspective without considering the geo-reranking phase commonly used in these systems. Although single-document summarization has not performed well in general, the slight improvements obtained for some types of the proposed summaries, particularly for those based on geographical information, made us believe that the integration of Text Summarization with Geographical Information Retrieval may be beneficial, and consequently, the experimental set-up developed in this research work serves as a basis for further investigations in this field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este artículo estudia la redefinición del lenguaje periodístico ante la influencia de tres tendencias de la comunicación digital: el Search Engine Optimization (SEO), la inmediatez de las redes sociales y la fragmentación de contenidos mediante enlaces. Para ello, se compara el punto de vista de los editores web de los ocho medios españoles con mayor audiencia con las iniciativas implantadas en sus redacciones y las aportaciones teóricas sobre la escritura ciberperiodística. Este análisis revela la importancia que los medios digitales otorgan a la creación de contenidos adaptados a la web, pero también sus carencias en la adopción de acciones concretas para favorecer su desarrollo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last few years, there has been a wide development in the research on textual information systems. The goal is to improve these systems in order to allow an easy localization, treatment and access to the information stored in digital format (Digital Databases, Documental Databases, and so on). There are lots of applications focused on information access (for example, Web-search systems like Google or Altavista). However, these applications have problems when they must access to cross-language information, or when they need to show information in a language different from the one of the query. This paper explores the use of syntactic-sematic patterns as a method to access to multilingual information, and revise, in the case of Information Retrieval, where it is possible and useful to employ patterns when it comes to the multilingual and interactive aspects. On the one hand, the multilingual aspects that are going to be studied are the ones related to the access to documents in different languages from the one of the query, as well as the automatic translation of the document, i.e. a machine translation system based on patterns. On the other hand, this paper is going to go deep into the interactive aspects related to the reformulation of a query based on the syntactic-semantic pattern of the request.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a CL-SR system that employs two different techniques: the first one is based on NLP rules that consist on applying logic forms to the topic processing while the second one basically consists on applying the IR-n statistical search engine to the spoken document collection. The application of logic forms to the topics allows to increase the weight of topic terms according to a set of syntactic rules. Thus, the weights of the topic terms are used by IR-n system in the information retrieval process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo presenta el uso de una ontología en el dominio financiero para la expansión de consultas con el fin de mejorar los resultados de un sistema de recuperación de información (RI) financiera. Este sistema está compuesto por una ontología y un índice de Lucene que permite recuperación de conceptos identificados mediante procesamiento de lenguaje natural. Se ha llevado a cabo una evaluación con un conjunto limitado de consultas y los resultados indican que la ambigüedad sigue siendo un problema al expandir la consulta. En ocasiones, la elección de las entidades adecuadas a la hora de expandir las consultas (filtrando por sector, empresa, etc.) permite resolver esa ambigüedad.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Business Intelligence (BI) applications have been gradually ported to the Web in search of a global platform for the consumption and publication of data and services. On the Internet, apart from techniques for data/knowledge management, BI Web applications need interfaces with a high level of interoperability (similar to the traditional desktop interfaces) for the visualisation of data/knowledge. In some cases, this has been provided by Rich Internet Applications (RIA). The development of these BI RIAs is a process traditionally performed manually and, given the complexity of the final application, it is a process which might be prone to errors. The application of model-driven engineering techniques can reduce the cost of development and maintenance (in terms of time and resources) of these applications, as they demonstrated by other types of Web applications. In the light of these issues, the paper introduces the Sm4RIA-B methodology, i.e., a model-driven methodology for the development of RIA as BI Web applications. In order to overcome the limitations of RIA regarding knowledge management from the Web, this paper also presents a new RIA platform for BI, called RI@BI, which extends the functionalities of traditional RIAs by means of Semantic Web technologies and B2B techniques. Finally, we evaluate the whole approach on a case study—the development of a social network site for an enterprise project manager.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Evacuation route planning is a fundamental task for building engineering projects. Safety regulations are established so that all occupants are driven on time out of a building to a secure place when faced with an emergency situation. As an example, Spanish building code requires the planning of evacuation routes on large and, usually, public buildings. Engineers often plan these routes on single building projects, repeatedly assigning clusters of rooms to each emergency exit in a trial-and-error process. But problems may arise for a building complex where distribution and use changes make visual analysis cumbersome and sometimes unfeasible. This problem could be solved by using well-known spatial analysis techniques, implemented as a specialized software able to partially emulate engineer reasoning. In this paper we propose and test an easily reproducible methodology that makes use of free and open source software components for solving a case study. We ran a complete test on a building floor at the University of Alicante (Spain). This institution offers a web service (WFS) that allows retrieval of 2D geometries from any building within its campus. We demonstrate how geospatial technologies and computational geometry algorithms can be used for automating the creation and optimization of evacuation routes. In our case study, the engineers’ task is to verify that the load capacity of each emergency exit does not exceed the standards specified by Spain’s current regulations. Using Dijkstra’s algorithm, we obtain the shortest paths from every room to the most appropriate emergency exit. Once these paths are calculated, engineers can run simulations and validate, based on path statistics, different cluster configurations. Techniques and tools applied in this research would be helpful in the design and risk management phases of any complex building project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introducción: Analizar la calidad de las páginas web de los servicios de catering en el ámbito escolar y su contenido en educación alimentaria, y tener una primera experiencia con la herramienta de evaluación EDALCAT. Material y métodos: Estudio descriptivo transversal. La población de estudio son páginas web de empresas de catering encargadas de la gestión de los comedores escolares. La muestra se obtuvo utilizando el buscador Google y un Ranking de las principales empresas de catering por facturación, escogiendo aquellas que tenían página web. Para la prueba piloto se seleccionaron diez páginas web según proximidad geográfica a la ciudad de Alicante y nivel de facturación. Para la evaluación de los sitios web se diseñó un cuestionario (EDALCAT), compuesto de un primer bloque de predictores de calidad con 19 variables de fiabilidad, diseño y navegación; y de un segundo bloque de contenidos específicos de educación alimentaria con 19 variables de contenido y actividades educativas. Resultados: Se han obtenido resultados positivos en 31 de las 38 variables del cuestionario, excepto en los ítems: “Buscador”, “Idioma” (40%) y “Ayuda” (10%) del bloque predictores de calidad y en los ítems: “Talleres”, “Recetario”, “Web alimentación-nutrición” (40%) y “Ejemplos” (30%) del bloque de contenidos específicos de educación alimentaria. Todas las páginas web evaluadas superan valores del 50% de cumplimiento de criterios de calidad y de contenidos mínimos en educación alimentaria, y sólo una de ellas, incumple el nivel de actividad mínimo establecido. Conclusiones: Los predictores de calidad y los contenidos específicos en educación alimentaria dieron buenos resultados en todas las páginas web evaluadas. La mayoría de ellas obtuvieron una alta puntuación en su valoración, y en su análisis individual por bloques. Tras el estudio piloto el cuestionario se ha modificado y se obtiene el EDALCAT definitivo. En líneas generales EDALCAT parece ser adecuado para evaluar la calidad de las páginas web de servicios de catering y su contenido en educación alimentaria, sin embargo el presente estudio no puede considerarse como validación del mismo.