924 resultados para Information Retrieval, Document Databases, Digital Libraries
Resumo:
Pós-graduação em Televisão Digital: Informação e Conhecimento - FAAC
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This paper describes the first participation of IR-n system at Spoken Document Retrieval, focusing on the experiments we made before participation and showing the results we obtained. IR-n system is an Information Retrieval system based on passages and the recognition of sentences to define them. So, the main goal of this experiment is to adapt IR-n system to the spoken document structure by means of the utterance splitter and the overlapping passage technique allowing to match utterances and sentences.
Resumo:
ACM Computing Classification System (1998): I.7, I.7.5.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
The Everglades Online Thesaurus is a structured vocabulary of concepts and terms relating to the south Florida environment. Designed as an information management tool for both researchers and metadata creators, the Thesaurus is intended to improve information retrieval across the many disparate information systems, databases, and web sites that provide Everglades-related information. The vocabulary provided by the Everglades Online Thesaurus expresses each relevant concept using a single ‘preferred term’, whereas in natural language many terms may exist to express that same concept. In this way, the Thesaurus offers the possibility of standardizing the terminology used to describe Everglades-related information — an important factor in predictable and successful resource discovery.
Resumo:
Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.
Bibliotecas digitais em Arquitetura e urbanismo: um estudo sobre a arquitetura da informação digital
Resumo:
The goal of this paper was to search the state of the art from the Digital Libraries in Architecture and Urbanism in the Higher Education Institutions (IES) through conceptualizations and showing the importance of Digital Libraries in the disclosure and easing of information transferring. Questions about digital information architecture, usability, digital preservation and accessibility were approached. The research was made in the websites of Brazilian Universities, firstly to identify the institutions which offered the Architecture and Urbanism course, focusing on postgraduate education. After identifying the offering, the research was done by analyzing the contents, storage and dissemination and access to information, these libraries. It was found that the digital libraries are increasingly and taking part of organizations and educational institutions focusing on the knowledge dissemination releasing digitally information that may be needed for institution or the individual. A monitoring was done over of the physical and computational restructuring of the Board of Studies and Research in Architecture and Urbanism (Câmara de Estudos e Pesquisa em Arquitetura e Urbanismo, CEPAU), from the Architecture and Urbanism Course of the Federal University of Rio Grande do Norte (UFRN), showing the need of installing a Digital Library to integrate the databases of PPGAU s research groups, which today remain independent, with no interface among themselves. The research chosen area was Architecture and Urbanism, because there is a gap and little documentation about digital libraries in this area
Resumo:
Maintaining accessibility to and understanding of digital information over time is a complex challenge that often requires contributions and interventions from a variety of individuals and organizations. The processes of preservation planning and evaluation are fundamentally implicit and share similar complexity. Both demand comprehensive knowledge and understanding of every aspect of to-be-preserved content and the contexts within which preservation is undertaken. Consequently, means are required for the identification, documentation and association of those properties of data, representation and management mechanisms that in combination lend value, facilitate interaction and influence the preservation process. These properties may be almost limitless in terms of diversity, but are integral to the establishment of classes of risk exposure, and the planning and deployment of appropriate preservation strategies. We explore several research objectives within the course of this thesis. Our main objective is the conception of an ontology for risk management of digital collections. Incorporated within this are our aims to survey the contexts within which preservation has been undertaken successfully, the development of an appropriate methodology for risk management, the evaluation of existing preservation evaluation approaches and metrics, the structuring of best practice knowledge and lastly the demonstration of a range of tools that utilise our findings. We describe a mixed methodology that uses interview and survey, extensive content analysis, practical case study and iterative software and ontology development. We build on a robust foundation, the development of the Digital Repository Audit Method Based on Risk Assessment. We summarise the extent of the challenge facing the digital preservation community (and by extension users and creators of digital materials from many disciplines and operational contexts) and present the case for a comprehensive and extensible knowledge base of best practice. These challenges are manifested in the scale of data growth, the increasing complexity and the increasing onus on communities with no formal training to offer assurances of data management and sustainability. These collectively imply a challenge that demands an intuitive and adaptable means of evaluating digital preservation efforts. The need for individuals and organisations to validate the legitimacy of their own efforts is particularly prioritised. We introduce our approach, based on risk management. Risk is an expression of the likelihood of a negative outcome, and an expression of the impact of such an occurrence. We describe how risk management may be considered synonymous with preservation activity, a persistent effort to negate the dangers posed to information availability, usability and sustainability. Risk can be characterised according to associated goals, activities, responsibilities and policies in terms of both their manifestation and mitigation. They have the capacity to be deconstructed into their atomic units and responsibility for their resolution delegated appropriately. We continue to describe how the manifestation of risks typically spans an entire organisational environment, and as the focus of our analysis risk safeguards against omissions that may occur when pursuing functional, departmental or role-based assessment. We discuss the importance of relating risk-factors, through the risks themselves or associated system elements. To do so will yield the preservation best-practice knowledge base that is conspicuously lacking within the international digital preservation community. We present as research outcomes an encapsulation of preservation practice (and explicitly defined best practice) as a series of case studies, in turn distilled into atomic, related information elements. We conduct our analyses in the formal evaluation of memory institutions in the UK, US and continental Europe. Furthermore we showcase a series of applications that use the fruits of this research as their intellectual foundation. Finally we document our results in a range of technical reports and conference and journal articles. We present evidence of preservation approaches and infrastructures from a series of case studies conducted in a range of international preservation environments. We then aggregate this into a linked data structure entitled PORRO, an ontology relating preservation repository, object and risk characteristics, intended to support preservation decision making and evaluation. The methodology leading to this ontology is outlined, and lessons are exposed by revisiting legacy studies and exposing the resource and associated applications to evaluation by the digital preservation community.
Bibliotecas digitais em Arquitetura e urbanismo: um estudo sobre a arquitetura da informação digital
Resumo:
The goal of this paper was to search the state of the art from the Digital Libraries in Architecture and Urbanism in the Higher Education Institutions (IES) through conceptualizations and showing the importance of Digital Libraries in the disclosure and easing of information transferring. Questions about digital information architecture, usability, digital preservation and accessibility were approached. The research was made in the websites of Brazilian Universities, firstly to identify the institutions which offered the Architecture and Urbanism course, focusing on postgraduate education. After identifying the offering, the research was done by analyzing the contents, storage and dissemination and access to information, these libraries. It was found that the digital libraries are increasingly and taking part of organizations and educational institutions focusing on the knowledge dissemination releasing digitally information that may be needed for institution or the individual. A monitoring was done over of the physical and computational restructuring of the Board of Studies and Research in Architecture and Urbanism (Câmara de Estudos e Pesquisa em Arquitetura e Urbanismo, CEPAU), from the Architecture and Urbanism Course of the Federal University of Rio Grande do Norte (UFRN), showing the need of installing a Digital Library to integrate the databases of PPGAU s research groups, which today remain independent, with no interface among themselves. The research chosen area was Architecture and Urbanism, because there is a gap and little documentation about digital libraries in this area
Resumo:
The MARS (Media Asset Retrieval System) Project is a collaboration between public broadcasters, libraries and schools in the Puget Sound region to assess the needs of their constituents and pool resources to develop solutions to meet those needs. The Project’s ultimate goal is to create a digital online resource that will provide access to content produced by public broadcasters and libraries. The MARS Project is funded by a grant from the Corporation for Public Broadcasting (CPB) Television Future Fund. Convergence ConsortiumThe Convergence Consortium is a model for community collaboration, including representatives from public broadcasting, libraries and schools in the Puget Sound region. They meet regularly to consider collaborative efforts that will be mutually beneficial to their institutions and constituents. Specifically, the archives of public broadcasters have been identified as significant resources that can be accessed through libraries and used by schools, and integrated with text and photographic archives from other partners.Using the work-centered framework, we collected data through interviews with nine engineers and observation of their searching while they performed their regular, job-related searches on the Web. The framework was used to analyze the data on two levels: 1) the activities and organizational relationships and constrains of work domains, and 2) users’ cognitive and social activities and their subjective preferences during searching.
Resumo:
O CERN - a Organização Europeia para a Investigação Nuclear - é um dos maiores centros de investigação a nível mundial, responsável por diversas descobertas na área da física bem como na área das ciências da computação. O CERN Document Server, também conhecido como CDS Invenio, é um software desenvolvido no CERN, que tem como objectivo fornecer um conjunto de ferramentas para gerir bibliotecas digitais. A fim de melhorar as funcionalidades do CDS Invenio foi criado um novo módulo, chamado BibCirculation, para gerir os livros (e outros itens) da biblioteca do CERN, funcionando como um sistema integrado de gestão de bibliotecas. Esta tese descreve os passos que foram dados para atingir os vários objectivos deste projecto, explicando, entre outros, o processo de integração com os outros módulos existentes bem como a forma encontrada para associar informações dos livros com os metadados do CDS lnvenio. É também possível encontrar uma apresentação detalhada sobre todo o processo de implementação e os testes realizados. Finalmente, são apresentadas as conclusões deste projecto e o trabalho a desenvolver futuramente. ABSTRACT: CERN - The European Organization for Nuclear Research - is one of the largest research centers worldwide, responsible for several discoveries in physics as well as in computer science. The CERN Document Server, also known as CDS Invenio, is a software developed at CERN, which aims to provide a set of tools for managing digital libraries. ln order to improve the functionalities of CDS Invenio a new module was developed, called BibCirculation, to manage books (and other items) from the CERN library, and working as an Integrated Library System. This thesis shows the steps that have been done to achieve the several goals of this project, explaining, among others aspects, the process of integration with other existing modules as well as the way to associate the information about books with the metadata from CDS lnvenio. You can also find detailed explanation of the entire implementation process and testing. Finally, there are presented the conclusions of this project and ideas for future development.
Resumo:
This article discusses issues related to the organization and reception of information in the context of services and public information systems driven by technology. It stems from the assumption that in a ""technologized"" society, the distance between users and information is almost always of cognitive and socio-cultural nature, a product of our effort to design communication. In this context, we favor the approach of the information sign, seeking to answer how a documentary message turns into information, i.e. a structure recognized as socially useful. Observing the structural, cognitive and communicative aspects of the documentary message, based on Documentary Linguistics, Terminology, as well as on Textual Linguistics, the policy of knowledge management and innovation of the Government of the State of Sao Paulo is analyzed, which authorizes the use of Web 2.0, also questioning to what extent this initiative represents innovation in the environment of libraries.