2 resultados para Pattern-matching technique
em Nottingham eTheses
Resumo:
As collections of archived digital documents continue to grow the maintenance of an archive, and the quality of reproduction from the archived format, become important long-term considerations. In particular, Adobe s PDF is now an important final form standard for archiving and distributing electronic versions of technical documents. It is important that all embedded images in the PDF, and any fonts used for text rendering, should at the very minimum be easily readable on screen. Unfortunately, because PDF is based on PostScript technology, it allows the embedding of bitmap fonts in Adobe Type 3 format as well as higher-quality outline fonts in TrueType or Adobe Type 1 formats. Bitmap fonts do not generally perform well when they are scaled and rendered on low-resolution devices such as workstation screens. The work described here investigates how a plug-in to Adobe Acrobat enables bitmap fonts to be substituted by corresponding outline fonts using a checksum matching technique against a canonical set of bitmap fonts, as originally distributed. The target documents for our initial investigations are those PDF files produced by (La)TEXsystems when set up in a default (bitmap font) configuration. For all bitmap fonts where recognition exceeds a certain confidence threshold replacement fonts in Adobe Type 1 (outline) format can be substituted with consequent improvements in file size, screen display quality and rendering speed. The accuracy of font recognition is discussed together with the prospects of extending these methods to bitmap-font PDF files from sources other than (La)TEX.
Resumo:
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout. We present a composite document approach wherein an XMLbased document representation is linked via a shadow tree of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via read aloud or play aloud ) can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself. The problems of textual recognition and tree pattern matching between the two representations are discussed in detail. Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.