2 resultados para Markup Languages

em Nottingham eTheses


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout. We present a composite document approach wherein an XMLbased document representation is linked via a shadow tree of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via read aloud or play aloud ) can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself. The problems of textual recognition and tree pattern matching between the two representations are discussed in detail. Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Each printed instance of a specific class of document can now have different degrees of customized content within the document template. This flexibility comes at a cost. If every printed page is potentially different from all others it must be rasterized separately, which is a time-consuming process. Technologies such as PPML (Personalized Print Markup Language) attempt to address this problem by dividing the bitmapped page into components that can be cached at the raster level, thereby speeding up the generation of page instances. A large number of documents are stored in Page Description Languages at a higher level of abstraction than the bitmapped page. Much of this content could be reused within a VDP environment provided that separable document components can be identified and extracted. These components then need to be individually rasterisable so that each high-level component can be related to its low-level (bitmap) equivalent. Unfortunately, the unstructured nature of most Page Description Languages makes it difficult to extract content easily. This paper outlines the problems encountered in extracting component-based content from existing page description formats, such as PostScript, PDF and SVG, and how the differences between the formats affects the ease with which content can be extracted. The techniques are illustrated with reference to a tool called COG Extractor, which extracts content from PDF and SVG and prepares it for reuse.