10 resultados para XML documents
em Instituto Politécnico do Porto, Portugal
Resumo:
Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files
Resumo:
Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files.
Resumo:
XML Schema is one of the most used specifications for defining types of XML documents. It provides an extensive set of primitive data types, ways to extend and reuse definitions and an XML syntax that simplifies automatic manipulation. However, many features that make XML Schema Definitions (XSD) so interesting also make them rather cumbersome to read. Several tools to visualize and browse schema definitions have been proposed to cope with this issue. The novel approach proposed in this paper is to base XSD visualization and navigation on the XML document itself, using solely the web browser, without requiring a pre-processing step or an intermediate representation. We present the design and implementation of a web-based XML Schema browser called schem@Doc that operates over the XSD file itself. With this approach, XSD visualization is synchronized with the source file and always reflects its current state. This tool fits well in the schema development process and is easy to integrate in web repositories containing large numbers of XSD files.
Resumo:
These are the proceedings for the eighth national conference on XML, its Associated Technologies and its Applications (XATA'2010). The paper selection resulted in 33% of papers accepted as full papers, and 33% of papers accepted as short papers. While these two types of papers were distinguish during the conference, and they had different talk duration, they all had the same limit of 12 pages. We are happy that the selected papers focus both aspects of the conference: XML technologies, and XML applications. In the first group we can include the articles on parsing and transformation technologies, like “Processing XML: a rewriting system approach", “Visual Programming of XSLT from examples", “A Refactoring Model for XML Documents", “A Performance based Approach for Processing Large XML Files in Multicore Machines", “XML to paper publishing with manual intervention" and “Parsing XML Documents in Java using Annotations". XML-core related papers are also available, focusing XML tools testing on “Test::XML::Generator: Generating XML for Unit Testing" and “XML Archive for Testing: a benchmark for GuessXQ". XML as the base for application development is also present, being discussed on different areas, like “Web Service for Interactive Products and Orders Configuration", “XML Description for Automata Manipulations", “Integration of repositories in Moodle", “XML, Annotations and Database: a Comparative Study of Metadata Definition Strategies for Frameworks", “CardioML: Integrating Personal Cardiac Information for Ubiquous Diagnosis and Analysis", “A Semantic Representation of Users Emotions when Watching Videos" and “Integrating SVG and SMIL in DAISY DTB production to enhance the contents accessibility in the Open Library for Higher Education". The wide spread of subjects makes us believe that for the time being XML is here to stay what enhances the importance of gathering this community to discuss related science and technology. Small conferences are traversing a bad period. Authors look for impact and numbers and only submit their works to big conferences sponsored by the right institutions. However the group of people behind this conference still believes that spaces like this should be preserved and maintained. This 8th gathering marks the beginning of a new cycle. We know who we are, what is our identity and we will keep working to preserve that. We hope the publication containing the works of this year's edition will catch the same attention and interest of the previous editions and above all that this publication helps in some other's work. Finally, we would like to thank all authors for their work and interest in the conference, and to the scientific committee members for their review work.
Resumo:
XSLT is a powerful and widely used language for transforming XML documents. However its power and complexity can be overwhelming for novice or infrequent users, many of which simply give up on using this language. On the other hand, many XSLT programs of practical use are simple enough to be automatically inferred from examples of source and target documents. An inferred XSLT program is seldom adequate for production usage but can be used as a skeleton of the final program, or at least as scaffolding in the process of coding it. It should be noted that the authors do not claim that XSLT programs, in general, can be inferred from examples. The aim of Vishnu - the XSLT generator engine described in this paper – is to produce XSLT programs for processing documents similar to the given examples and with enough readability to be easily understood by a programmer not familiar with the language. The architecture of Vishnu is composed by a graphical editor and a programming engine. In this paper we focus on the editor as a GWT web application where the programmer loads and edits document examples and pairs their content using graphical primitives. The programming engine receives the data collected by the editor and produces an XSLT program.
Resumo:
XSLT is a powerful and widely used language for transforming XML documents. However, its power and complexity can be overwhelming for novice or infrequent users, many of whom simply give up on using this language. On the other hand, many XSLT programs of practical use are simple enough to be automatically inferred from examples of source and target documents. An inferred XSLT program is seldom adequate for production usage but can be used as a skeleton of the final program, or at least as scaffolding in the process of coding it. It should be noted that the authors do not claim that XSLT programs, in general, can be inferred from examples. The aim of Vishnu—the XSLT generator engine described in this chapter—is to produce XSLT programs for processing documents similar to the given examples and with enough readability to be easily understood by a programmer not familiar with the language. The architecture of Vishnu is composed by a graphical editor and a programming engine. In this chapter, the authors focus on the editor as a GWT Web application where the programmer loads and edits document examples and pairs their content using graphical primitives. The programming engine receives the data collected by the editor and produces an XSLT program.
Resumo:
The discussion and analysis of the diverse outreach activities in this article provide guidance and suggestions for academic librarians who are interested in outreach and community engagement of any scale and nature. Cases are draw from a wide spectrum and are particularly strong in the setting of large academic libraries, special collections and programming for multicultural populations. The aim of this study is to present the results of research carried out regarding the needs, demand and consumption of European Union information by users in European Documentation Centres (EDC). A quantitative methodology was chosen based on a questionnaire with 24 items. This questionnaire was distributed within the EDC of Salamanca, Spain, and the EDC of Porto, Portugal, during specific time intervals between 2010 and 2011. We examined the level of EU information that EDC users possess, and identified the factors that facilitate or hinder access to EU information, the topics most demanded, and the types of documents consulted. Analysis was made of the use that the consumer of European information makes of databases and their behaviour during the consultation. Although the sample used was not very significant owing to its small size, it is a faithful reflection of the scarce visits made to EDCs. This study can be of use to managers of EDCs, providing them with better knowledge of the information needs and demands of their users. Ultimately this should lead to improvements in the services offered. The study lies within a frame of research scarcely addressed in specialized scholarly literature: European Union information.
Resumo:
Esta dissertação apresenta uma proposta de sistema capaz de preencher a lacuna entre documentos legislativos em formato PDF e documentos legislativos em formato aberto. O objetivo principal é mapear o conhecimento presente nesses documentos de maneira a representar essa coleção como informação interligada. O sistema é composto por vários componentes responsáveis pela execução de três fases propostas: extração de dados, organização de conhecimento, acesso à informação. A primeira fase propõe uma abordagem à extração de estrutura, texto e entidades de documentos PDF de maneira a obter a informação desejada, de acordo com a parametrização do utilizador. Esta abordagem usa dois métodos de extração diferentes, de acordo com as duas fases de processamento de documentos – análise de documento e compreensão de documento. O critério utilizado para agrupar objetos de texto é a fonte usada nos objetos de texto de acordo com a sua definição no código de fonte (Content Stream) do PDF. A abordagem está dividida em três partes: análise de documento, compreensão de documento e conjunção. A primeira parte da abordagem trata da extração de segmentos de texto, adotando uma abordagem geométrica. O resultado é uma lista de linhas do texto do documento; a segunda parte trata de agrupar os objetos de texto de acordo com o critério estipulado, produzindo um documento XML com o resultado dessa extração; a terceira e última fase junta os resultados das duas fases anteriores e aplica regras estruturais e lógicas no sentido de obter o documento XML final. A segunda fase propõe uma ontologia no domínio legal capaz de organizar a informação extraída pelo processo de extração da primeira fase. Também é responsável pelo processo de indexação do texto dos documentos. A ontologia proposta apresenta três características: pequena, interoperável e partilhável. A primeira característica está relacionada com o facto da ontologia não estar focada na descrição pormenorizada dos conceitos presentes, propondo uma descrição mais abstrata das entidades presentes; a segunda característica é incorporada devido à necessidade de interoperabilidade com outras ontologias do domínio legal, mas também com as ontologias padrão que são utilizadas geralmente; a terceira característica é definida no sentido de permitir que o conhecimento traduzido, segundo a ontologia proposta, seja independente de vários fatores, tais como o país, a língua ou a jurisdição. A terceira fase corresponde a uma resposta à questão do acesso e reutilização do conhecimento por utilizadores externos ao sistema através do desenvolvimento dum Web Service. Este componente permite o acesso à informação através da disponibilização de um grupo de recursos disponíveis a atores externos que desejem aceder à informação. O Web Service desenvolvido utiliza a arquitetura REST. Uma aplicação móvel Android também foi desenvolvida de maneira a providenciar visualizações dos pedidos de informação. O resultado final é então o desenvolvimento de um sistema capaz de transformar coleções de documentos em formato PDF para coleções em formato aberto de maneira a permitir o acesso e reutilização por outros utilizadores. Este sistema responde diretamente às questões da comunidade de dados abertos e de Governos, que possuem muitas coleções deste tipo, para as quais não existe a capacidade de raciocinar sobre a informação contida, e transformá-la em dados que os cidadãos e os profissionais possam visualizar e utilizar.
Resumo:
Vishnu is a tool for XSLT visual programming in Eclipse - a popular and extensible integrated development environment. Rather than writing the XSLT transformations, the programmer loads or edits two document instances, a source document and its corresponding target document, and pairs texts between then by drawing lines over the documents. This form of XSLT programming is intended for simple transformations between related document types, such as HTML formatting or conversion among similar formats. Complex XSLT programs involving, for instance, recursive templates or second order transformations are out of the scope of Vishnu. We present the architecture of Vishnu composed by a graphical editor and a programming engine. The editor is an Eclipse plug-in where the programmer loads and edits document examples and pairs their content using graphical primitives. The programming engine receives the data collected by the editor and produces an XSLT program. The design of the engine and the process of creation of an XSLT program from examples are also detailed. It starts with the generation of an initial transformation that maps source document to the target document. This transformation is fed to a rewrite process where each step produces a refined version of the transformation. Finally, the transformation is simplified before being presented to the programmer for further editing.