9 resultados para Information Retrieval, Document Databases, Digital Libraries
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.
Resumo:
This article discusses issues related to the organization and reception of information in the context of services and public information systems driven by technology. It stems from the assumption that in a ""technologized"" society, the distance between users and information is almost always of cognitive and socio-cultural nature, a product of our effort to design communication. In this context, we favor the approach of the information sign, seeking to answer how a documentary message turns into information, i.e. a structure recognized as socially useful. Observing the structural, cognitive and communicative aspects of the documentary message, based on Documentary Linguistics, Terminology, as well as on Textual Linguistics, the policy of knowledge management and innovation of the Government of the State of Sao Paulo is analyzed, which authorizes the use of Web 2.0, also questioning to what extent this initiative represents innovation in the environment of libraries.
Resumo:
The Information Society (IS) may be taken as a geopolitical organization which started after the Third Industrial Revolution, having direct impact on the use of information and Information and Communication Technologies (ICT). The expression arose as techno-social paradigm change in the post-industrial society, aiming to use information as currency to the society-in-progress at that time. In Brazil it has become stronger with the Programa Sociedade da Informacao no Brasil-Livro Verde, lunched by the Ministerio da Ciencia e Tecnologia, in September 2000 without any discussion with the civil society to formulate the main document. Our main goal in this article is to discuss the Information Society in contemporary times, and also the organized and conscious use of information, looking for key-concepts to a better understanding of it, from some topics as digital inclusion-exclusion to the use of digital informational resources.
Resumo:
Successful classification, information retrieval and image analysis tools are intimately related with the quality of the features employed in the process. Pixel intensities, color, texture and shape are, generally, the basis from which most of the features are Computed and used in such fields. This papers presents a novel shape-based feature extraction approach where an image is decomposed into multiple contours, and further characterized by Fourier descriptors. Unlike traditional approaches we make use of topological knowledge to generate well-defined closed contours, which are efficient signatures for image retrieval. The method has been evaluated in the CBIR context and image analysis. The results have shown that the multi-contour decomposition, as opposed to a single shape information, introduced a significant improvement in the discrimination power. (c) 2008 Elsevier B.V. All rights reserved,
Resumo:
Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features. (c) 2008 Elsevier Inc. All rights reserved.
Resumo:
O presente estudo teve como objetivo descrever as principais características dos estudos epidemiológicos que investigaram a associação entre condições socioeconômicas e câncer de cabeça e pescoço. Foram pesquisadas as bases de dados Medline (Literatura Internacional em Ciências), Lilacs (Literatura Latino-Americana e do Caribe em Ciências da Saúde) e Scielo (Scientific Electronic Library Online), além de referências citadas nos artigos obtidos a partir da busca primária nessas bases. O período de publicação considerado englobou 38 anos (1970-2007) e a análise restringiu-se aos artigos em espanhol, inglês ou português. Foram selecionados 25 estudos, 15 com delineamento caso-controle, quatro ecológicos e seis que mesclaram informações de bases de dados oficiais, como censos e registros de câncer ou de óbitos. A maior parte das pesquisas reportou associação entre piores condições socioeconômicas e câncer de cabeça e pescoço. Os indicadores mais empregados foram a ocupação e a escolaridade. Poucos estudos investigaram mediação, procurando evidenciar quais os fatores proximais operam na associação investigada. Pesquisas adicionais, com critérios uniformes para proceder aos ajustes nos modelos de regressão e amostra suficiente, são necessárias para investigar essa dimensão.
Resumo:
OBJETIVO: Estimar a prevalência de defeitos congênitos (DC) em uma coorte de nascidos vivos (NV) vinculando-se os bancos de dados do Sistema de Informação de Mortalidade (SIM) e do Sistema de Informação sobre Nascidos Vivos (SINASC). MÉTODOS: Estudo descritivo para avaliar as declarações de nascido vivo como fonte de informação sobre DC. A população de estudo é uma coorte de NV hospitalares do 1º semestre de 2006 de mães residentes e ocorridos no Município de São Paulo no período de 01/01/2006 a 30/06/2006, obtida por meio da vinculação dos bancos de dados das declarações de nascido vivo e óbitos neonatais provenientes da coorte. RESULTADOS: Os DC mais prevalentes segundo o SINASC foram: malformações congênitas (MC) e deformidades do aparelho osteomuscular (44,7%), MC do sistema nervoso (10,0%) e anomalias cromossômicas (8,6%). Após a vinculação, houve uma recuperação de 80,0% de indivíduos portadores de DC do aparelho circulatório, 73,3% de DC do aparelho respiratório e 62,5% de DC do aparelho digestivo. O SINASC fez 55,2% das notificações de DC e o SIM notificou 44,8%, mostrando-se importante para a recuperação de informações de DC. Segundo o SINASC, a taxa de prevalência de DC na coorte foi de 75,4%00 NV; com os dados vinculados com o SIM, essa taxa passou para 86,2%00 NV. CONCLUSÕES: A complementação de dados obtida pela vinculação SIM/SINASC fornece um perfil mais real da prevalência de DC do que aquele registrado pelo SINASC, que identifica os DC mais visíveis, enquanto o SIM identifica os mais letais, mostrando a importância do uso conjunto das duas fontes de dados.
Resumo:
With the advent and development of technology, mainly in the Internet, more and more electronic services are being offered to customers in all areas of business, especially in the offering of information services, as in virtual libraries. This article proposes a new opportunity to provide services to virtual libraries customers, presenting a methodology for the implementation of electronic services oriented by these customers' life situations. Through analytical observations of some national virtual libraries sites, it could be identified that the offer of services considering life situations and relationship interest situations can promote the service to their customers, providing greater satisfaction and, consequently, improving quality in the offer of information services. The visits to those sites and the critical analysis of the data collected during these visits, supported by bibliographic researches results, have enabled the description of this methodology, concluding that the provision of services on an isolated way or in accordance with the user's profile on sites of virtual libraries is not always enough to ensure the attendance to the needs and expectations of its customers, which suggests the offering of these services considering life situations and relationship interest situations as a complement that adds value to the business of virtual library. This becomes relevant when indicates new opportunities to provide virtual libraries services with quality, serving as a guide to the information providers managers, enabling the offering of new means to access information services by such customers, looking for pro - activity and services integration, in order to solve definitely real problems.
Resumo:
The article presents and discusses issues such as informativeness, offering of directions and information retrieval, and also lists definitions of information and mediation. Based on the topics presented, the possible problems faced by information professionals are discussed while cultural mediators in the context of art museums.