937 resultados para Ontologies (Information Retrieval)


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of these systems. We consider their underlying data structures – so-called folksonomies – as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag cooccurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Presentation at the 1997 Dagstuhl Seminar "Evaluation of Multimedia Information Retrieval", Norbert Fuhr, Keith van Rijsbergen, Alan F. Smeaton (eds.), Dagstuhl Seminar Report 175, 14.04. - 18.04.97 (9716). - Abstract: This presentation will introduce ESCHER, a database editor which supports visualization in non-standard applications in engineering, science, tourism and the entertainment industry. It was originally based on the extended nested relational data model and is currently extended to include object-relational properties like inheritance, object types, integrity constraints and methods. It serves as a research platform into areas such as multimedia and visual information systems, QBE-like queries, computer-supported concurrent work (CSCW) and novel storage techniques. In its role as a Visual Information System, a database editor must support browsing and navigation. ESCHER provides this access to data by means of so called fingers. They generalize the cursor paradigm in graphical and text editors. On the graphical display, a finger is reflected by a colored area which corresponds to the object a finger is currently pointing at. In a table more than one finger may point to objects, one of which is the active finger and is used for navigating through the table. The talk will mostly concentrate on giving examples for this type of navigation and will discuss some of the architectural needs for fast object traversal and display. ESCHER is available as public domain software from our ftp site in Kassel. The portable C source can be easily compiled for any machine running UNIX and OSF/Motif, in particular our working environments IBM RS/6000 and Intel-based LINUX systems. A porting to Tcl/Tk is under way.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tanto los Sistemas de Información Geográfica como la Recuperación de Información han sido campos de investigación muy importantes en las últimas décadas. Recientemente, un nuevo campo de investigación llamado Recuperación de Información Geográfica ha surgido fruto de la confluencia de estos dos campos. El objetivo principal de este campo es definir estructuras de indexación y técnicas para almacenar y recuperar documentos de manera eficiente empleando tanto las referencias textuales como las referencias geográficas contenidas en el texto. En este artículo presentamos la arquitectura de un sistema para recuperación de información geográfica y definimos el flujo de trabajo para la extracción de las referencias geográficas de los documentos. Presentamos además una nueva estructura de indexación que combina un índice invertido, un índice espacial y una ontología. Esta estructura mejora las capacidades de consulta de otras propuestas

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Los nuevos Planes de Estudio adaptados al Espacio Europeo de la Educación Superior ya están implantándose en nuestras universidades y esto ha supuesto plantearnos qué cambios debemos incorporar como docentes, para que nuestros estudiantes consigan las competencias, habilidades y destrezas necesarias para ser profesionales competentes en un futuro cercano

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This class introduces basics of web mining and information retrieval including, for example, an introduction to the Vector Space Model and Text Mining. Guest Lecturer: Dr. Michael Granitzer Optional: Modeling the Internet and the Web: Probabilistic Methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003 (Chapter 4, Text Analysis)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

These slides support students in understanding how to respond to the challenge of: "I’ve been told not to use Google or Wikipedia to research my essay. What else is there?" The powerpoint guides students in how to identify high quality, up to date and relevant resources on the web that they can reliably draw upon for their academic assignments. The slides were created by the subject liaison librarian who supports the School of Electronics and Computer Science at the UNiversity of Southampton, Fiona Nichols.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Finding journal articles from full text sources such as IEEEXplore, ACM and LNCS (Lecture Noters in Computer Science)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Real-time geoparsing of social media streams (e.g. Twitter, YouTube, Instagram, Flickr, FourSquare) is providing a new 'virtual sensor' capability to end users such as emergency response agencies (e.g. Tsunami early warning centres, Civil protection authorities) and news agencies (e.g. Deutsche Welle, BBC News). Challenges in this area include scaling up natural language processing (NLP) and information retrieval (IR) approaches to handle real-time traffic volumes, reducing false positives, creating real-time infographic displays useful for effective decision support and providing support for trust and credibility analysis using geosemantics. I will present in this seminar on-going work by the IT Innovation Centre over the last 4 years (TRIDEC and REVEAL FP7 projects) in building such systems, and highlights our research towards improving trustworthy and credible of crisis map displays and real-time analytics for trending topics and influential social networks during major news worthy events.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O aspecto fulcral desta dissertação centra-se-à volta do desafio de procurar facilitar o acesso à informação contida na base de dados bibliográfica da Biblioteca Universitária João Paulo II (BUJPII) da Universidade Católica Portuguesa (UCP) cujo conteúdo temático tem sido até agora representado pela Classificação Decimal Universal (CDU), linguagem documental pouco acessível a grande parte dos nossos utilizadores, na sua maioria estudantes universitários que a consideram um instrumento de pesquisa pouco amigável porque estão muito pouco ou nada familiarizados com este tipo de classificação numérica preferindo o uso de palavras-chave no acesso ao conteúdo temático das obras. Com este objectivo em vista, propusemo-nos levar a cabo este trabalho de investigação fazendo a harmonização (correspondência) entre as notações da CDU, usada na classificação da colecção de fundos da BUJPII e uma lista simplificada de Cabeçalhos de Assunto da Biblioteca do Congresso, com o propósito de iniciar um processo de atribuição de cabeçalhos de assunto, mapeados a partir das notações da CDU, a parte dos referidos fundos, cuja recuperação de conteúdo tem sido feita até agora através da Classificação Decimal Universal. O estudo incidiu experimentalmente numa amostragem de monografias de áreas não indexadas mas já classificadas, cujos registos bibliográficos se encontram na base de dados da Biblioteca Universitária João Paulo II. O projecto consistiu na atribuição de cabeçalhos de assunto, traduzidos manualmente para português a partir da lista em inglês dos Cabeçalhos de Assunto da Biblioteca do Congresso (LCSH). Procurou-se que estivessem semanticamente tão próximos quanto possível dos assuntos que correspondiam às notações da Classificação Decimal Universal (CDU) com as quais as monografias tinham sido anteriormente classificadas. O trabalho foi primeiro elaborado de forma manual e depois “carregado” no software Horizon, dado ser este o sistema informático de gestão integrada em uso na Biblioteca Universitária João Paulo II, sendo o objectivo futuro a indexação de todas as áreas do seu acervo bibliográfico, como forma complementar privilegiada no acesso à informação.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In general, ranking entities (resources) on the Semantic Web (SW) is subject to importance, relevance, and query length. Few existing SW search systems cover all of these aspects. Moreover, many existing efforts simply reuse the technologies from conventional Information Retrieval (IR), which are not designed for SW data. This paper proposes a ranking mechanism, which includes all three categories of rankings and are tailored to SW data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Search has become a hot topic in Internet computing, with rival search engines battling to become the de facto Web portal, harnessing search algorithms to wade through information on a scale undreamed of by early information retrieval (IR) pioneers. This article examines how search has matured from its roots in specialized IR systems to become a key foundation of the Web. The authors describe new challenges posed by the Web's scale, and show how search is changing the nature of the Web as much as the Web has changed the nature of search