975 resultados para DIGITAL DOCUMENTS
Resumo:
This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress).
Resumo:
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout. We present a composite document approach wherein an XMLbased document representation is linked via a shadow tree of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via read aloud or play aloud ) can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself. The problems of textual recognition and tree pattern matching between the two representations are discussed in detail. Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.
Resumo:
It is just over 20 years since Adobe's PostScript opened a new era in digital documents. PostScript allows most details of rendering to be hidden within the imaging device itself, while providing a rich set of primitives enabling document engineers to think of final-form rendering as being just a sophisticated exercise in computer graphics. The refinement of the PostScript model into PDF has been amazingly successful in creating a near-universal interchange format for complex and graphically rich digital documents but the PDF format itself is neither easy to create nor to amend. In the meantime a whole new world of digital documents has sprung up centred around XML-based technologies. The most widespread example is XHTML (with optional CSS styling) but more recently we have seen Scalable Vector Graphics (SVG) emerge as an XML-based, low-level, rendering language with PostScript-compatible rendering semantics. This paper surveys graphically-rich final-form rendering technologies and asks how flexible they can be in allowing adjustments to be made to final appearance without the need for regenerating a whole page or an entire document. Particular attention is focused on the relative merits of SVG and PDF in this regard and on the desirability, in any document layout language, of being able to manipulate the graphic properties of document components parametrically, and at a level of granularity smaller than an entire page.
Resumo:
Due to the edition of the new Atles comarcal de Catalunya by the Institut Cartogràfic de Catalunya in digital format, we look at the structure of their contents and the dynamics of its use. Likewise, it is detailed which methodology was utilized to perform it. Finally, some reflections on the innovations that this new product entails in textual and cartographic areas are proposed, especially related to the current socioeconomic context and new employed technologies
Resumo:
The development of text classification techniques has been largely promoted in the past decade due to the increasing availability and widespread use of digital documents. Usually, the performance of text classification relies on the quality of categories and the accuracy of classifiers learned from samples. When training samples are unavailable or categories are unqualified, text classification performance would be degraded. In this paper, we propose an unsupervised multi-label text classification method to classify documents using a large set of categories stored in a world ontology. The approach has been promisingly evaluated by compared with typical text classification methods, using a real-world document collection and based on the ground truth encoded by human experts.
Resumo:
Esta pesquisa discute a participação do bibliotecário na formação de equipes multidisciplinares dos grupos de Avaliação de Tecnologias em Saúde (ATS), caracterizando sua atuação num novo campo que se abre para os bibliotecários em instituições de pesquisa. O objetivo geral baseia-se na criação de uma Biblioteca Digital (BD) com os parâmetros de qualidade da informação inerentes a ATS, a partir dos documentos gerados pelo Serviço de Comutação Bibliográfica (SCB) da Rede de Bibliotecas da FIOCRUZ, recomendando a inserção do bibliotecário na equipe multidisciplinar para ATS. A metodologia foi dividida em três partes: levantamento do estado da arte do conhecimento produzido na Saúde Coletiva, onde se insere a Avaliação de Tecnologias em Saúde, e da Ciência da Informação, pesquisa exploratória com uma abordagem qualitativa para coleta de dados junto ao grupo de pesquisadores de ATS de diversas instituições públicas e privadas e uma abordagem quantitativa para coleta de dados dos profissionais do SCB da Rede de Bibliotecas da Fiocruz e análise dos dados. Verificou-se que existe uma participação ativa do bibliotecário nas atividades de ATS, no que diz respeito, à formulação de estratégias de busca em base de dados, revisão de protocolos de busca, localização de publicações relevantes, auxílio para realização de revisões sistemática para os grupos de pesquisa. Pressupõe a criação de uma BD permitindo o compartilhamento de todos os documentos digitais gerados pelas bibliotecas. Com essa iniciativa pretende-se contribuir para impulsionar a produção do conhecimento científico e tecnológico na área da saúde e de ATS.
Resumo:
Les fichiers qui accompagnent le document incluent une archive .jar du zoom-éditeur (qui peut être lancé via un browser) et des exemples de z-textes réalisés avec ce logiciel.
Resumo:
This research project is a contribution to the global field of information retrieval, specifically, to develop tools to enable information access in digital documents. We recognize the need to provide the user with flexible access to the contents of large, potentially complex digital documents, with means other than a search function or a handful of metadata elements. The goal is to produce a text browsing tool offering a maximum of information based on a fairly superficial linguistic analysis. We are concerned with a type of extensive single-document indexing, and not indexing by a set of keywords (see Klement, 2002, for a clear distinction between the two). The desired browsing tool would not only give at a glance the main topics discussed in the document, but would also present relationships between these topics. It would also give direct access to the text (via hypertext links to specific passages). The present paper, after reviewing previous research on this and similar topics, discusses the methodology and the main characteristics of a prototype we have devised. Experimental results are presented, as well as an analysis of remaining hurdles and potential applications.
Resumo:
Este trabalho de investigação partiu de uma necessidade pessoal em responder a algumas questões sentidas diariamente na execução das minhas tarefas de consultora na área das Ciências Documentais, nomeadamente no recurso aos instrumentos de gestão documental. Tendo já criado alguns instrumentos de gestão documental durante a execução das minhas tarefas, pretendo elaborar um ―manual‖ com propostas de modelos de todos os instrumentos de gestão documental existentes e necessários à execução de tarefas arquivísticas. Iniciamos o trabalho consultando a bibliografia existente e as recomendações emanadas da entidade nacional responsável pelo estabelecimento de políticas arquivísticas, a DGLAB. Este trabalho está estruturado em quatro capítulos. O primeiro apresenta a definição e os objetivos da gestão documental passando por uma breve exposição das questões atuais no âmbito da gestão documental, em Portugal, e ainda uma resumida cronologia da instituição dos Arquivos Nacionais, atual DGLAB. Seguindo-se o segundo capítulo, onde se definem os instrumentos de gestão documental e se faz uma caraterização de cada um, assim como uma análise de cada instrumento de gestão documental aconselhado pela DGLAB, sintetizando com um quadro os objetivos de cada um dos instrumentos documentais. No terceiro capítulo apresentamos a estrutura adotada para a elaboração do questionário que foi usado para recolher os dados relativos à utilização ou não dos instrumentos de gestão documental, nos arquivos dos organismos públicos e os respetivos dados obtidos. No último capítulo enumeram-se os instrumentos de gestão documental apresentando o conteúdo considerado importante na elaboração de cada um, os quais são dados a conhecer nos apêndices, um a um, dos quais salientamos o auto de eliminação, a guia de remessa, o manual de gestão documental, o relatório de avaliação de massas documentais acumuladas, o plano de preservação digital, entre outros. Concluímos que a Administração Pública conhece e sabe da existência da maioria dos instrumentos de gestão documental, nomeadamente os que consideram mais importantes à gestão do seu arquivo - guias de remessa, autos de eliminação e autos de entrega, os quais são também os aconselhados pela DGLAB. Realçamos também o facto de apenas uma diminuta quantidade de organismos possuírem uma portaria de gestão de documentos e o manual de gestão documental.
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
As collections of archived digital documents continue to grow the maintenance of an archive, and the quality of reproduction from the archived format, become important long-term considerations. In particular, Adobe s PDF is now an important final form standard for archiving and distributing electronic versions of technical documents. It is important that all embedded images in the PDF, and any fonts used for text rendering, should at the very minimum be easily readable on screen. Unfortunately, because PDF is based on PostScript technology, it allows the embedding of bitmap fonts in Adobe Type 3 format as well as higher-quality outline fonts in TrueType or Adobe Type 1 formats. Bitmap fonts do not generally perform well when they are scaled and rendered on low-resolution devices such as workstation screens. The work described here investigates how a plug-in to Adobe Acrobat enables bitmap fonts to be substituted by corresponding outline fonts using a checksum matching technique against a canonical set of bitmap fonts, as originally distributed. The target documents for our initial investigations are those PDF files produced by (La)TEXsystems when set up in a default (bitmap font) configuration. For all bitmap fonts where recognition exceeds a certain confidence threshold replacement fonts in Adobe Type 1 (outline) format can be substituted with consequent improvements in file size, screen display quality and rendering speed. The accuracy of font recognition is discussed together with the prospects of extending these methods to bitmap-font PDF files from sources other than (La)TEX.
Resumo:
In evolution: that there are a myriad of ways a diversity of folks can archive digital documents (perhaps as print).Documents can only move through containers, and those containers rely on perception to be used.Further, the decision of what constitutes an archive is up to the archon.