955 resultados para Information Retrieval, Document Databases, Digital Libraries


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of the systems. We consider their underlying data structures – socalled folksonomies – as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag co-occurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of these systems. We consider their underlying data structures – so-called folksonomies – as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag cooccurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at the 1997 Dagstuhl Seminar "Evaluation of Multimedia Information Retrieval", Norbert Fuhr, Keith van Rijsbergen, Alan F. Smeaton (eds.), Dagstuhl Seminar Report 175, 14.04. - 18.04.97 (9716). - Abstract: This presentation will introduce ESCHER, a database editor which supports visualization in non-standard applications in engineering, science, tourism and the entertainment industry. It was originally based on the extended nested relational data model and is currently extended to include object-relational properties like inheritance, object types, integrity constraints and methods. It serves as a research platform into areas such as multimedia and visual information systems, QBE-like queries, computer-supported concurrent work (CSCW) and novel storage techniques. In its role as a Visual Information System, a database editor must support browsing and navigation. ESCHER provides this access to data by means of so called fingers. They generalize the cursor paradigm in graphical and text editors. On the graphical display, a finger is reflected by a colored area which corresponds to the object a finger is currently pointing at. In a table more than one finger may point to objects, one of which is the active finger and is used for navigating through the table. The talk will mostly concentrate on giving examples for this type of navigation and will discuss some of the architectural needs for fast object traversal and display. ESCHER is available as public domain software from our ftp site in Kassel. The portable C source can be easily compiled for any machine running UNIX and OSF/Motif, in particular our working environments IBM RS/6000 and Intel-based LINUX systems. A porting to Tcl/Tk is under way.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This class introduces basics of web mining and information retrieval including, for example, an introduction to the Vector Space Model and Text Mining. Guest Lecturer: Dr. Michael Granitzer Optional: Modeling the Internet and the Web: Probabilistic Methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003 (Chapter 4, Text Analysis)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

These slides support students in understanding how to respond to the challenge of: "I’ve been told not to use Google or Wikipedia to research my essay. What else is there?" The powerpoint guides students in how to identify high quality, up to date and relevant resources on the web that they can reliably draw upon for their academic assignments. The slides were created by the subject liaison librarian who supports the School of Electronics and Computer Science at the UNiversity of Southampton, Fiona Nichols.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finding journal articles from full text sources such as IEEEXplore, ACM and LNCS (Lecture Noters in Computer Science)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A step-by-step guide to building an EndNote library, downloading information from general databases and citing references in Word documents

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Real-time geoparsing of social media streams (e.g. Twitter, YouTube, Instagram, Flickr, FourSquare) is providing a new 'virtual sensor' capability to end users such as emergency response agencies (e.g. Tsunami early warning centres, Civil protection authorities) and news agencies (e.g. Deutsche Welle, BBC News). Challenges in this area include scaling up natural language processing (NLP) and information retrieval (IR) approaches to handle real-time traffic volumes, reducing false positives, creating real-time infographic displays useful for effective decision support and providing support for trust and credibility analysis using geosemantics. I will present in this seminar on-going work by the IT Innovation Centre over the last 4 years (TRIDEC and REVEAL FP7 projects) in building such systems, and highlights our research towards improving trustworthy and credible of crisis map displays and real-time analytics for trending topics and influential social networks during major news worthy events.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A step-by-step guide to building an EndNote lilbrary, downloading information from general databases, citing references in Word documents and synchronising online.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabalho de investigação partiu de uma necessidade pessoal em responder a algumas questões sentidas diariamente na execução das minhas tarefas de consultora na área das Ciências Documentais, nomeadamente no recurso aos instrumentos de gestão documental. Tendo já criado alguns instrumentos de gestão documental durante a execução das minhas tarefas, pretendo elaborar um ―manual‖ com propostas de modelos de todos os instrumentos de gestão documental existentes e necessários à execução de tarefas arquivísticas. Iniciamos o trabalho consultando a bibliografia existente e as recomendações emanadas da entidade nacional responsável pelo estabelecimento de políticas arquivísticas, a DGLAB. Este trabalho está estruturado em quatro capítulos. O primeiro apresenta a definição e os objetivos da gestão documental passando por uma breve exposição das questões atuais no âmbito da gestão documental, em Portugal, e ainda uma resumida cronologia da instituição dos Arquivos Nacionais, atual DGLAB. Seguindo-se o segundo capítulo, onde se definem os instrumentos de gestão documental e se faz uma caraterização de cada um, assim como uma análise de cada instrumento de gestão documental aconselhado pela DGLAB, sintetizando com um quadro os objetivos de cada um dos instrumentos documentais. No terceiro capítulo apresentamos a estrutura adotada para a elaboração do questionário que foi usado para recolher os dados relativos à utilização ou não dos instrumentos de gestão documental, nos arquivos dos organismos públicos e os respetivos dados obtidos. No último capítulo enumeram-se os instrumentos de gestão documental apresentando o conteúdo considerado importante na elaboração de cada um, os quais são dados a conhecer nos apêndices, um a um, dos quais salientamos o auto de eliminação, a guia de remessa, o manual de gestão documental, o relatório de avaliação de massas documentais acumuladas, o plano de preservação digital, entre outros. Concluímos que a Administração Pública conhece e sabe da existência da maioria dos instrumentos de gestão documental, nomeadamente os que consideram mais importantes à gestão do seu arquivo - guias de remessa, autos de eliminação e autos de entrega, os quais são também os aconselhados pela DGLAB. Realçamos também o facto de apenas uma diminuta quantidade de organismos possuírem uma portaria de gestão de documentos e o manual de gestão documental.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O aspecto fulcral desta dissertação centra-se-à volta do desafio de procurar facilitar o acesso à informação contida na base de dados bibliográfica da Biblioteca Universitária João Paulo II (BUJPII) da Universidade Católica Portuguesa (UCP) cujo conteúdo temático tem sido até agora representado pela Classificação Decimal Universal (CDU), linguagem documental pouco acessível a grande parte dos nossos utilizadores, na sua maioria estudantes universitários que a consideram um instrumento de pesquisa pouco amigável porque estão muito pouco ou nada familiarizados com este tipo de classificação numérica preferindo o uso de palavras-chave no acesso ao conteúdo temático das obras. Com este objectivo em vista, propusemo-nos levar a cabo este trabalho de investigação fazendo a harmonização (correspondência) entre as notações da CDU, usada na classificação da colecção de fundos da BUJPII e uma lista simplificada de Cabeçalhos de Assunto da Biblioteca do Congresso, com o propósito de iniciar um processo de atribuição de cabeçalhos de assunto, mapeados a partir das notações da CDU, a parte dos referidos fundos, cuja recuperação de conteúdo tem sido feita até agora através da Classificação Decimal Universal. O estudo incidiu experimentalmente numa amostragem de monografias de áreas não indexadas mas já classificadas, cujos registos bibliográficos se encontram na base de dados da Biblioteca Universitária João Paulo II. O projecto consistiu na atribuição de cabeçalhos de assunto, traduzidos manualmente para português a partir da lista em inglês dos Cabeçalhos de Assunto da Biblioteca do Congresso (LCSH). Procurou-se que estivessem semanticamente tão próximos quanto possível dos assuntos que correspondiam às notações da Classificação Decimal Universal (CDU) com as quais as monografias tinham sido anteriormente classificadas. O trabalho foi primeiro elaborado de forma manual e depois “carregado” no software Horizon, dado ser este o sistema informático de gestão integrada em uso na Biblioteca Universitária João Paulo II, sendo o objectivo futuro a indexação de todas as áreas do seu acervo bibliográfico, como forma complementar privilegiada no acesso à informação.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Goal modelling is a well known rigorous method for analysing problem rationale and developing requirements. Under the pressures typical of time-constrained projects its benefits are not accessible. This is because of the effort and time needed to create the graph and because reading the results can be difficult owing to the effects of crosscutting concerns. Here we introduce an adaptation of KAOS to meet the needs of rapid turn around and clarity. The main aim is to help the stakeholders gain an insight into the larger issues that might be overlooked if they make a premature start into implementation. The method emphasises the use of obstacles, accepts under-refined goals and has new methods for managing crosscutting concerns and strategic decision making. It is expected to be of value to agile as well as traditional processes.