924 resultados para XML, Information, Retrieval, Query, Language
Resumo:
This paper presents the 2006 Miracle team’s approaches to the Ad-Hoc and Geographical Information Retrieval tasks. A first set of runs was obtained using a set of basic components. Then, by putting together special combinations of these runs, an extended set was obtained. With respect to previous campaigns some improvements have been introduced in our system: an entity recognition prototype is integrated in our tokenization scheme, and the performance of our indexing and retrieval engine has been improved. For GeoCLEF, we tested retrieving using geo-entity and textual references separately, and then combining them with different approaches.
Resumo:
The access to medical literature collections such as PubMed, MedScape or Cochrane has been increased notably in the last years by the web-based tools that provide instant access to the information. However, more sophisticated methodologies are needed to exploit efficiently all that information. The lack of advanced search methods in clinical domain produce that even using well-defined questions for a particular disease, clinicians receive too many results. Since no information analysis is applied afterwards, some relevant results which are not presented in the top of the resultant collection could be ignored by the expert causing an important loose of information. In this work we present a new method to improve scientific article search using patient information for query generation. Using federated search strategy, it is able to simultaneously search in different resources and present a unique relevant literature collection. And applying NLP techniques it presents semantically similar publications together, facilitating the identification of relevant information to clinicians. This method aims to be the foundation of a collaborative environment for sharing clinical knowledge related to patients and scientific publications.
Resumo:
Las redes sociales en la actualidad son muy relevantes, no solo ocupan mucho tiempo en la vida diaria de las personas si no que también sirve a millones de empresas para publicitarse entre otras cosas. Al fenómeno de las redes sociales se le ha unido la faceta empresarial. La liberación de las APIs de algunas redes sociales ha permitido el desarrollo de aplicaciones de todo tipo y que puedan tener diferentes objetivos como por ejemplo este proyecto. Este proyecto comenzó desde el interés por Ericsson del estudio del API de Google+ y sugerencias para dar valores añadidos a las empresas de telecomunicaciones. También ha complementando la referencia disponible en Ericsson y de los otros dos proyectos de recuperación de información de las redes sociales, añadiendo una serie de opciones para el usuario en la aplicación. Para ello, se ha analizado y realizado un ejemplo, de lo que podemos obtener de las redes sociales, principalmente Twitter y Google+. Lo primero en lo que se ha basado el proyecto ha sido en realizar un estudio teórico sobre el inicio de las redes sociales, el desarrollo y el estado en el que se encuentran, analizando así las principales redes sociales que existen y aportando una visión general sobre todas ellas. También se ha realizado un estado de arte sobre una serie de webs que se dedican al uso de esa información disponible en Internet. Posteriormente, de todas las redes sociales con APIs disponibles se realizó la elección de Google+ porque es una red social nueva aun por explorar y mejorar. Y la elección de Twitter por la serie de opciones y datos que se puede obtener de ella. De ambas se han estudiado sus APIs, para posteriormente con la información obtenida, realizar una aplicación prototipo que recogiera una serie de funciones útiles a partir de los datos de sus redes sociales. Por último se ha realizado una simple interfaz en la cual se puede acceder a los datos de la cuenta como si se estuviera en Twitter o Google+, además con los datos de Twitter se puede realizar una búsqueda avanzada con alertas, un análisis de sentimiento, ver tus mayores retweets de los que te siguen y por último realizar un seguimiento comparando lo que se comenta sobre dos temas determinados. Con este proyecto se ha pretendido proporcionar una idea general de todo lo relacionado con las redes sociales, las aplicaciones disponibles para trabajar con ellas, la información del API de Twitter y Google+ y un concepto de lo que se puede obtener. Today social networks are very relevant, they not only take a long time in daily life of people but also serve millions of businesses to advertise and other things. The phenomenon of social networks has been joined the business side. The release of the APIs of some social networks has allowed the development of applications of all types and different objectives such as this project. This project started from an interest in the study of Ericsson about Google+ API and suggestions to add value to telecommunications companies. This project has complementing the reference available in Ericsson and the other two projects of information retrieval of social networks, adding a number of options for the user in the application. To do this, we have analyzed and made an example of what we can get it from social networks, mainly Twitter and Google+. The first thing that has done in the project was to make a theoretical study on the initiation of social networks, the development and the state in which they are found, and analyze the major social networks that exist. There has also been made a state of art on a number of websites that are dedicated to the use of this information available online. Subsequently, about all the social networks APIs available, Google+ was choice because it is a new social network even to explore and improve. And the choice of Twitter for the number of options and data that can be obtained from it. In both APIs have been studied, and later with the information obtained, make a prototype application to collect a number of useful features from data of social networks. Finally there has been a simple interface, in which you can access the account as if you were on Twitter or Google+. With Twitter data can perform an advanced search with alerts, sentiment analysis, see retweets of who follow you and make comparing between two particular topics. This project is intended to provide an overview of everything related to social networks, applications available to work with them, information about API of Google+ and Twitter, and a concept of what you can get.
Resumo:
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding i) linguistic information for data and vocabularies in different languages, ii) mappings between data with labels in different languages, and iii) services to dynamically access and traverse Linked Data across different languages. In this article we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.
Resumo:
Collaborative filtering recommender systems contribute to alleviating the problem of information overload that exists on the Internet as a result of the mass use of Web 2.0 applications. The use of an adequate similarity measure becomes a determining factor in the quality of the prediction and recommendation results of the recommender system, as well as in its performance. In this paper, we present a memory-based collaborative filtering similarity measure that provides extremely high-quality and balanced results; these results are complemented with a low processing time (high performance), similar to the one required to execute traditional similarity metrics. The experiments have been carried out on the MovieLens and Netflix databases, using a representative set of information retrieval quality measures.
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
La proliferación en todos los ámbitos de la producción multimedia está dando lugar a la aparición de nuevos paradigmas de recuperación de información visual. Dentro de éstos, uno de los más significativos es el de los sistemas de recuperación de información visual, VIRS (Visual Information Retrieval Systems), en los que una de las tareas más representativas es la ordenación de una población de imágenes según su similitud con un ejemplo dado. En este trabajo se presenta una propuesta original para la evaluación de la similitud entre dos imágenes, basándose en la extensión del concepto de saliencia desde el espacio de imágenes al de características para establecer la relevancia de cada componente de dicho vector. Para ello se introducen metodologías para la cuantificación de la saliencia de valores individuales de características, para la combinación de estas cuantificaciones en procesos de comparación entre dos imágenes, y para, finalmente, establecer la mencionada ponderación de cada característica en atención a esta combinación. Se presentan igualmente los resultados de evaluar esta propuesta en una tarea de recuperación de imágenes por contenido en comparación con los obtenidos con la distancia euclídea. Esta comparación se realiza mediante la evaluación de ambos resultados por voluntarios.
Resumo:
En esta tesis se estudia la representación, modelado y comparación de colecciones mediante el uso de ontologías en el ámbito de la Web Semántica. Las colecciones, entendidas como agrupaciones de objetos o elementos con entidad propia, son construcciones que aparecen frecuentemente en prácticamente todos los dominios del mundo real, y por tanto, es imprescindible disponer de conceptualizaciones de estas estructuras abstractas y de representaciones de estas conceptualizaciones en los sistemas informáticos, que definan adecuadamente su semántica. Mientras que en muchos ámbitos de la Informática y la Inteligencia Artificial, como por ejemplo la programación, las bases de datos o la recuperación de información, las colecciones han sido ampliamente estudiadas y se han desarrollado representaciones que responden a multitud de conceptualizaciones, en el ámbito de la Web Semántica, sin embargo, su estudio ha sido bastante limitado. De hecho hasta la fecha existen pocas propuestas de representación de colecciones mediante ontologías, y las que hay sólo cubren algunos tipos de colecciones y presentan importantes limitaciones. Esto impide la representación adecuada de colecciones y dificulta otras tareas comunes como la comparación de colecciones, algo crítico en operaciones habituales como las búsquedas semánticas o el enlazado de datos en la Web Semántica. Para solventar este problema esta tesis hace una propuesta de modelización de colecciones basada en una nueva clasificación de colecciones de acuerdo a sus características estructurales (homogeneidad, unicidad, orden y cardinalidad). Esta clasificación permite definir una taxonomía con hasta 16 tipos de colecciones distintas. Entre otras ventajas, esta nueva clasificación permite aprovechar la semántica de las propiedades estructurales de cada tipo de colección para realizar comparaciones utilizando las funciones de similitud y disimilitud más apropiadas. De este modo, la tesis desarrolla además un nuevo catálogo de funciones de similitud para las distintas colecciones, donde se han recogido las funciones de (di)similitud más conocidas y también algunas nuevas. Esta propuesta se ha implementado mediante dos ontologías paralelas, la ontología E-Collections, que representa los distintos tipos de colecciones de la taxonomía y su axiomática, y la ontología SIMEON (Similarity Measures Ontology) que representa los tipos de funciones de (di)similitud para cada tipo de colección. Gracias a estas ontologías, para comparar dos colecciones, una vez representadas como instancias de la clase más apropiada de la ontología E-Collections, automáticamente se sabe qué funciones de (di)similitud de la ontología SIMEON pueden utilizarse para su comparación. Abstract This thesis studies the representation, modeling and comparison of collections in the Semantic Web using ontologies. Collections, understood as groups of objects or elements with their own identities, are constructions that appear frequently in almost all areas of the real world. Therefore, it is essential to have conceptualizations of these abstract structures and representations of these conceptualizations in computer systems, that define their semantic properly. While in many areas of Computer Science and Artificial Intelligence, such as Programming, Databases or Information Retrieval, the collections have been extensively studied and there are representations that match many conceptualizations, in the field Semantic Web, however, their study has been quite limited. In fact, there are few representations of collections using ontologies so far, and they only cover some types of collections and have important limitations. This hinders a proper representation of collections and other common tasks like comparing collections, something critical in usual operations such as semantic search or linking data on the Semantic Web. To solve this problem this thesis makes a proposal for modelling collections based on a new classification of collections according to their structural characteristics (homogeneity, uniqueness, order and cardinality). This classification allows to define a taxonomy with up to 16 different types of collections. Among other advantages, this new classification can leverage the semantics of the structural properties of each type of collection to make comparisons using the most appropriate (dis)similarity functions. Thus, the thesis also develops a new catalog of similarity functions for the different types of collections. This catalog contains the most common (dis)similarity functions as well as new ones. This proposal is implemented through two parallel ontologies, the E-Collections ontology that represents the different types of collections in the taxonomy and their axiomatic, and the SIMEON ontology (Similarity Measures Ontology) that represents the types of (dis)similarity functions for each type of collection. Thanks to these ontologies, to compare two collections, once represented as instances of the appropriate class of E-Collections ontology, we can know automatically which (dis)similarity functions of the SIMEON ontology are suitable for the comparison. Finally, the feasibility and usefulness of this modeling and comparison of collections proposal is proved in the field of oenology, applying both E-Collections and SIMEON ontologies to the representation and comparison of wines with the E-Baco ontology.
USO DE TEORIAS NO CAMPO DE SISTEMAS DE INFORMAÇÃO: MAPEAMENTO USANDO TÉCNICAS DE MINERAÇÃO DE TEXTOS
Resumo:
Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)
USO DE TEORIAS NO CAMPO DE SISTEMAS DE INFORMAÇÃO: MAPEAMENTO USANDO TÉCNICAS DE MINERAÇÃO DE TEXTOS
Resumo:
Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)
Resumo:
Using an event-related functional MRI design, we explored the relative roles of dorsal and ventral prefrontal cortex (PFC) regions during specific components (Encoding, Delay, Response) of a working memory task under different memory-load conditions. In a group analysis, effects of increased memory load were observed only in dorsal PFC in the encoding period. Activity was lateralized to the right hemisphere in the high but not the low memory-load condition. Individual analyses revealed variability in activation patterns across subjects. Regression analyses indicated that one source of variability was subjects’ memory retrieval rate. It was observed that dorsal PFC plays a differentially greater role in information retrieval for slower subjects, possibly because of inefficient retrieval processes or a reduced quality of mnemonic representations. This study supports the idea that dorsal and ventral PFC play different roles in component processes of working memory.
Resumo:
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.
Resumo:
In this paper, the new features that IR-n system applies on the topic processing for CL-SR are described. This set of features are based on applying logic forms to topics with the aim of incrementing the weight of topic terms according to a set of syntactic rules.
Resumo:
This paper describes a CL-SR system that employs two different techniques: the first one is based on NLP rules that consist on applying logic forms to the topic processing while the second one basically consists on applying the IR-n statistical search engine to the spoken document collection. The application of logic forms to the topics allows to increase the weight of topic terms according to a set of syntactic rules. Thus, the weights of the topic terms are used by IR-n system in the information retrieval process.
Resumo:
Proyecto emergente centrado en la desambiguación de topónimos y la detección del foco geográfico en el texto. La finalidad es mejorar el rendimiento de los sistemas de recuperación de información geográfica. Se describen los problemas abordados, la hipótesis de trabajo, las tareas a realizar y los objetivos parciales alcanzados.