8 resultados para Portabel Document Format
em Nottingham eTheses
Resumo:
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlying file structure for Adobe Acrobat software) as its starting point. This strategy examines the appearance and geometric position of text and image blocks distributed over an entire document. A blackboard system is used to tag the blocks as a first stage in deducing the fundamental relationships existing between them. PDF is shown to be a useful intermediate stage in the bottom-up analysis of document structure. Its information on line spacing and font usage gives important clues in bridging the semantic gap between the scanned bitmap page and its fully analysed, block-structured form. Analysis of PDF can yield not only accurate page decomposition but also sufficient document information for the later stages of structural analysis and document understanding.
Resumo:
Portable Document Format (PDF) is a page-oriented, graphically rich format based on PostScript semantics and it is also the format interpreted by the Adobe Acrobat viewers. Although each of the pages in a PDF document is an independent graphic object this property does not necessarily extend to the components (headings, diagrams, paragraphs etc.) within a page. This, in turn, makes the manipulation and extraction of graphic objects on a PDF page into a very difficult and uncertain process. The work described here investigates the advantages of a model wherein PDF pages are created from assemblies of COGs (Component Object Graphics) each with a clearly defined graphic state. The relative positioning of COGs on a PDF page is determined by appropriate "spacer" objects and a traversal of the tree of COGs and spacers determines the rendering order. The enhanced revisability of PDF documents within the COG model is discussed, together with the application of the model in those contexts which require easy revisability coupled with the ability to maintain and amend PDF document structure.
Resumo:
Two complementary de facto standards for the publication of electronic documents are HTML on theWorldWideWeb and Adobe s PDF (Portable Document Format) language for use with Acrobat viewers. Both these formats provide support for hypertext features to be embedded within documents. We present a method, which allows links and other hypertext material to be kept in an abstract form in separate link databases. The links can then be interpreted or compiled at any stage and applied, in the correct format to some specific representation such as HTML or PDF. This approach is of great value in keeping hyperlinks relevant, up-to-date and in a form which is independent of the finally delivered electronic document format. Four models are discussed for allowing publishers to insert links into documents at a late stage. The techniques discussed have been implemented using a combination of Acrobat plug-ins, Web servers and Web browsers.
Resumo:
Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that Acrobat's imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography and device-independent colour. These qualities will be very desirable components in future multimedia and hypermedia systems. The current capabilities of Acrobat and PDF are described; in particular the presence of hypertext links, bookmarks, and yellow sticker annotations (in release 1.0) together with article threads and multi-media plugins in version 2.0, This article also describes the CAJUN project (CD-ROM Acrobat Journals Using Networks) which has been investigating the automated placement of PDF hypertextual features from various front-end text processing systems. CAJUN has also been experimenting with the dissemination of PDF over e-mail, via World Wide Web and on CDROM.
Resumo:
Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that the imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography, device-independent colour and full page fidelity with respect to the printed version. PDF possesses an internal structure which supports hypertextual features, and a range of file compression options. In a sense PDF establishes a low-level multiplatform machine code for imageable objects but its notion of hypertext buttons and links is similarly low-level , in that they are anchored to physical locations on xed pages. However, many other hypertext systems think of links as potentially spanning multiple files, which may in turn be located on various machines scattered across the Internet. The immediate challenge is to bridge the "abstraction gap" between high-level notions of a link and PDF's positionally-anchored low-level view. More specifically, how can Mosaic, WWW and Acrobat/PDF be configured so that the notions of "link ", in the various systems, work together harmoniously? This paper reviews progress so far on the CAJUN project (CD-ROM Acrobat Journals Using Networks) with particular reference to experiments that have already taken place in disseminating PDF via e-mail, Gopher and FTP. The prospects for integrating Acrobat seamlessly with WWW are then discussed.
Resumo:
The publication of material in electronic form should ideally preserve, in a unified document representation, all of the richness of the printed document while maintaining enough of its underlying structure to enable searching and other forms of semantic processing. Until recently it has been hard to find a document representation which combined these attributes and which also stood some chance of becoming a de facto multi-platform standard. This paper sets out experience gained within the Electronic Publishing Research Group at the University of Nottingham in using Adobe Acrobat software and its underlying PDF (Portable Document Format) notation. The CAJUN project1 (CD-ROM Acrobat Journals Using Networks) began in 1993 and has used Acrobat software to produce electronic versions of journal papers for network and CD-ROM dissemination. The paper describes the project's progress so far and also gives a brief assessment of PDF's suitability as a universal document interchange standard.
Resumo:
Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The principal features of Acrobat are reviewed and its importance for libraries discussed in the context of experience already gained from the CAJUN project (CD-ROM Acrobat Journals Using Networks). This two-year project, funded by two well-known journal publishers, is investigating the use of Acrobat software for the electronic dissemination of journals, on CD-ROM and over networks.
Resumo:
The Portable Document Format (PDF), defined by Adobe Systems Inc. as the basis of its Acrobat product range, is discussed in some detail. Particular emphasis is given to its flexible object-oriented structure, which has yet to be fully exploited. It is currently used to represent not logical structure but simply a series of pages and associated resources. A definition of an Encapsulated PDF (EPDF) is presented, in which EPDF blocks carry with them their own resource requirements, together with geometrical and logical information. A block formatter called Juggler is described which can lay out EPDF blocks from various sources onto new pages. Future revisions of PDF supporting uniquely-named EPDF blocks tagged with semantic information would assist in composite-pagemakeup and could even lead to fully revisable PDF.