988 resultados para XML (Document markup language)
Resumo:
This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.
Resumo:
El trabajo presentado a lo largo de este documento es el resultado del TFG1 realizado por Israel Suárez Santiago, alumno de la Escuela Técnica Superior de Ingenieros Informáticos (ETSIINF) de la Universidad Politécnica de Madrid (UPM). Dicho trabajo tiene como finalidad proporcionar una herramienta que, basada en estándares previamente estudiados, permita la fácil creación y gestión de plantillas de mensajes HL7v32 a las que posteriormente se le añadirán datos clínicos que serán insertados en una base de datos para su fácil acceso y consulta. La herramienta desarrollada únicamente facilita una serie de opciones para la creación de la plantilla en sí, que servirá como base para la creación de mensajes HL7v3, es decir, no permite la inclusión de datos específicos en las plantillas generadas, que deberá hacerse con alguna herramienta externa o bien manualmente. Las plantillas generadas por la herramienta se basan principalmente en el estándar CDA3, que proporciona una amplia guía para la correcta generación de mensajes HL7v3. La herramienta garantiza que las plantillas resultantes estarán correctamente formadas, siendo acordes al estándar anteriormente citado y siendo, además, sintácticamente correctas, es decir, el documento .xml generado no contendrá errores. ---ABSTRACT---This document is the result of the TFG developed by Israel Suárez Santiago, student of Escuela Técnica Superior de Ingenieros Informáticos (ETSIINF) of the Universidad Politécnica de Madrid (UPM). This work aims to offer a tool based on standards that can facilitate and manage the creation of HL7v3 templates. Clinical data will be added to those templates in order to load them into a database and query them fast and easily. The tool only facilitates several options to create the template, that will be used to generate the HL7v3 messages, but it does not permit the inclusion of data on them. The inclusion of data will be done manually or using an external tool. The generated templates are based mainly on the CDA1 standard, that provides a widely guide to create HL7v32 messages. The tool guarantees that the resulting templates have been correctly generated, following the previous standard and with no errors in the .xml document generated.
Resumo:
Models are central tools for modern scientists and decision makers, and there are many existing frameworks to support their creation, execution and composition. Many frameworks are based on proprietary interfaces, and do not lend themselves to the integration of models from diverse disciplines. Web based systems, or systems based on web services, such as Taverna and Kepler, allow composition of models based on standard web service technologies. At the same time the Open Geospatial Consortium has been developing their own service stack, which includes the Web Processing Service, designed to facilitate the executing of geospatial processing - including complex environmental models. The current Open Geospatial Consortium service stack employs Extensible Markup Language as a default data exchange standard, and widely-used encodings such as JavaScript Object Notation can often only be used when incorporated with Extensible Markup Language. Similarly, no successful engagement of the Web Processing Service standard with the well-supported technologies of Simple Object Access Protocol and Web Services Description Language has been seen. In this paper we propose a pure Simple Object Access Protocol/Web Services Description Language processing service which addresses some of the issues with the Web Processing Service specication and brings us closer to achieving a degree of interoperability between geospatial models, and thus realising the vision of a useful 'model web'.
Resumo:
We argue that, for certain constrained domains, elaborate model transformation technologies-implemented from scratch in general-purpose programming languages-are unnecessary for model-driven engineering; instead, lightweight configuration of commercial off-the-shelf productivity tools suffices. In particular, in the CancerGrid project, we have been developing model-driven techniques for the generation of software tools to support clinical trials. A domain metamodel captures the community's best practice in trial design. A scientist authors a trial protocol, modelling their trial by instantiating the metamodel; customized software artifacts to support trial execution are generated automatically from the scientist's model. The metamodel is expressed as an XML Schema, in such a way that it can be instantiated by completing a form to generate a conformant XML document. The same process works at a second level for trial execution: among the artifacts generated from the protocol are models of the data to be collected, and the clinician conducting the trial instantiates such models in reporting observations-again by completing a form to create a conformant XML document, representing the data gathered during that observation. Simple standard form management tools are all that is needed. Our approach is applicable to a wide variety of information-modelling domains: not just clinical trials, but also electronic public sector computing, customer relationship management, document workflow, and so on. © 2012 Springer-Verlag.
Resumo:
The Arnamagnæan Institute, principally in the form of the present writer, has been involved in a number of projects to do with the digitisation, electronic description and text-encoding of medieval manuscripts. Several of these projects were dealt with in a previous article 'The view from the North: Some Scandinavian digitisation projects', NCD review, 4 (2004), pp. 22-30. This paper looks in some depth at two others, MASTER and CHLT. The Arnamagnæan Institute is a teaching and research institute within the Faculty of Humanities at the University of Copenhagen. It is named after the Icelandic scholar and antiquarian Árni Magnússon (1663-1730), secretary of the Royal Danish Archives and Professor of Danish Antiquities at the University of Copenhagen, who in the course of his lifetime built up what is arguably the single most important collection of early Scandinavian manuscripts in the world, some 2,500 manuscript items, the earliest dating from the 12th century. The majority of these are from Iceland, but the collection also contains important Norwegian, Danish and Swedish manuscripts, along with approximately 100 manuscripts of continental provenance. In addition to the manuscripts proper, there are collections of original charters and apographa: 776 Norwegian (including Faroese, Shetlandic and Orcadian) charters and 2895 copies, 1571 Danish charters and 1372 copies, and 1345 Icelandic charters and 5942 copies. When he died in 1730, Árni Magnússon bequeathed his collection to the University of Copenhagen. The original collection has subsequently been augmented through individual purchases and gifts and the acquisition of a number of smaller collections, bringing the total to nearly 3000 manuscript items, which, with the charters and apographa, comprise over half a million pages.
Resumo:
This thesis addressed the problem of risk analysis in mental healthcare, with respect to the GRiST project at Aston University. That project provides a risk-screening tool based on the knowledge of 46 experts, captured as mind maps that describe relationships between risks and patterns of behavioural cues. Mind mapping, though, fails to impose control over content, and is not considered to formally represent knowledge. In contrast, this thesis treated GRiSTs mind maps as a rich knowledge base in need of refinement; that process drew on existing techniques for designing databases and knowledge bases. Identifying well-defined mind map concepts, though, was hindered by spelling mistakes, and by ambiguity and lack of coverage in the tools used for researching words. A novel use of the Edit Distance overcame those problems, by assessing similarities between mind map texts, and between spelling mistakes and suggested corrections. That algorithm further identified stems, the shortest text string found in related word-forms. As opposed to existing approaches’ reliance on built-in linguistic knowledge, this thesis devised a novel, more flexible text-based technique. An additional tool, Correspondence Analysis, found patterns in word usage that allowed machines to determine likely intended meanings for ambiguous words. Correspondence Analysis further produced clusters of related concepts, which in turn drove the automatic generation of novel mind maps. Such maps underpinned adjuncts to the mind mapping software used by GRiST; one such new facility generated novel mind maps, to reflect the collected expert knowledge on any specified concept. Mind maps from GRiST are stored as XML, which suggested storing them in an XML database. In fact, the entire approach here is ”XML-centric”, in that all stages rely on XML as far as possible. A XML-based query language allows user to retrieve information from the mind map knowledge base. The approach, it was concluded, will prove valuable to mind mapping in general, and to detecting patterns in any type of digital information.
Resumo:
The Semantic Binary Data Model (SBM) is a viable alternative to the now-dominant relational data model. SBM would be especially advantageous for applications dealing with complex interrelated networks of objects provided that a robust efficient implementation can be achieved. This dissertation presents an implementation design method for SBM, algorithms, and their analytical and empirical evaluation. Our method allows building a robust and flexible database engine with a wider applicability range and improved performance. ^ Extensions to SBM are introduced and an implementation of these extensions is proposed that allows the database engine to efficiently support applications with a predefined set of queries. A New Record data structure is proposed. Trade-offs of employing Fact, Record and Bitmap Data structures for storing information in a semantic database are analyzed. ^ A clustering ID distribution algorithm and an efficient algorithm for object ID encoding are proposed. Mapping to an XML data model is analyzed and a new XML-based XSDL language facilitating interoperability of the system is defined. Solutions to issues associated with making the database engine multi-platform are presented. An improvement to the atomic update algorithm suitable for certain scenarios of database recovery is proposed. ^ Specific guidelines are devised for implementing a robust and well-performing database engine based on the extended Semantic Data Model. ^
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
Two complementary de facto standards for the publication of electronic documents are HTML on theWorldWideWeb and Adobe s PDF (Portable Document Format) language for use with Acrobat viewers. Both these formats provide support for hypertext features to be embedded within documents. We present a method, which allows links and other hypertext material to be kept in an abstract form in separate link databases. The links can then be interpreted or compiled at any stage and applied, in the correct format to some specific representation such as HTML or PDF. This approach is of great value in keeping hyperlinks relevant, up-to-date and in a form which is independent of the finally delivered electronic document format. Four models are discussed for allowing publishers to insert links into documents at a late stage. The techniques discussed have been implemented using a combination of Acrobat plug-ins, Web servers and Web browsers.
Resumo:
Public agencies are increasingly required to collaborate with each other in order to provide high-quality e-government services. This collaboration is usually based on the service-oriented approach and supported by interoperability platforms. Such platforms are specialized middleware-based infrastructures enabling the provision, discovery and invocation of interoperable software services. In turn, given that personal data handled by governments are often very sensitive, most governments have developed some sort of legislation focusing on data protection. This paper proposes solutions for monitoring and enforcing data protection laws within an E-government Interoperability Platform. In particular, the proposal addresses requirements posed by the Uruguayan Data Protection Law and the Uruguayan E-government Platform, although it can also be applied in similar scenarios. The solutions are based on well-known integration mechanisms (e.g. Enterprise Service Bus) as well as recognized security standards (e.g. eXtensible Access Control Markup Language) and were completely prototyped leveraging the SwitchYard ESB product.
Resumo:
A interacção dos humanos com os computadores envolve uma combinação das tarefas de programação e de utilização. Nem sempre é explícita a diferença entre as duas tarefas. Introduzir comandos num programa de desenho assistido por computador é utilização ou programação numa linguagem interpretada? Modificar uma folha de cálculo com macros é utilização ou programação? Usar um “Integrated Development Environment” ou IDE para inserir dados num ficheiro é utilização (do IDE) ou programação? A escrita de um texto usando LaTeX ou HTML é utilização ou programação numa “markup language”? Recorrer a um programa de computação simbólica é utilização ou programação? Utilizar um processador de texto é utilização ou programação visual? Ao utilizador não se exige um conhecimento completo de todos os comandos, todos os menus, todos os símbolos do software que utiliza. Nem a memorização da sintaxe e de todos os pormenores de funcionamento de um programa é um atributo necessário ou sequer útil ao utilizador; a concretização desse conhecimento não assegura maior eficiência na utilização. Quando se começa, apenas algumas instruções elementares são recebidas, por vezes de um colega, de um Professor, ou obtidas recorrendo à pesquisa na Internet. Com a familiarização, o utilizador exige mais do Software que usa e de si próprio: um manual passa a ser um recurso de grande utilidade. A confiança conquistada gera, periodicamente, a necessidade de auto-exame e de aumento do âmbito do conhecimento. Desta forma, quem utiliza computadores acaba por ser confrontado com uma tarefa que, efectivamente, pode ser considerada ou requer programação. Põe-se uma questão no imediato (se ninguém decidiu por si) que é a da selecção da linguagem de programação. A abordagem multiparadigma e longa experiência de utilização do C++ tornam-no atractivo para aplicações onde a eficiência se combina com a disponibilidade de estruturas de dados e algoritmos adoptados pela indústria (o que coloquialmente se denomina STL, Standard Template Library, cf. [#breymann, #josuttis], mais geralmente biblioteca Standard). Adicionalmente, linguagens populares como o Java, C# e PHP possuem sintaxes inspiradas e em muitas partes coincidentes com as do C e C++. Por exemplo, um ciclo “for” em Java é parcialmente coincidente com o do C99, que é um sub-conjunto do “for” do C++. São os pormenores, a eficiência e as capacidades do C++ que permitem a criação de software Profissional. Todos os sistemas operativos clássicos (Unix, Microsoft Windows, Linux) dispõem de compiladores, IDE, bibliotecas e são em grande parte construídos recorrendo a C e C++. Relativamente a outras linguagens, a quantidade de ferramentas disponível e o conhecimento adquirido durante décadas é difícil de ignorar. Esse conhecimento faz com que a sintaxe do C++ pareça muito maior do que o estritamente necessário e afaste potenciais interessados. A longa evolução do C++ introduziu também uma diferença no estilo muito marcada. Código dos anos 80 e 90 do século XX é frequentemente menos legível do que o que correntemente se produz. Muitos tutoriais disponíveis online fazem parecer a linguagem menos rigorosa (e mais complexa) do que na realidade é, já que raramente é apresentado o caso geral da sintaxe. Constata-se que muitos autores ainda usam os cabeçalhos do C, quando já não são necessários. Scott Meyers afirma que o C++ é uma federação de linguagens [#scottmeyers] e por esse facto requer perspectivas de abordagem distintas de outras linguagens. Sem alguma sistematização é difícil apreciar a sua compacidade e coerência. Porém, a forma harmoniosa como as componentes sintácticas se encaixam é uma grande mais-valia do C++ só constatada com experimentação e leitura atenta. A presente monografia dirige-se a quem pretenda utilizar o C++ como ferramenta profissional de Software. Em termos de pré-requisitos Académicos, dir-se-á que um curso (1º Ciclo) de Ciência ou de Engenharia aumentará o interesse por certos aspectos mais técnicos da linguagem mas qualquer indivíduo com gosto pela experimentação tirará proveito do conteúdo. Este texto não busca a exaustividade enciclopédica na cobertura do tema. Neste texto forneço, de forma directa, uma introdução ao C++ a qual permite começar a produzir código sem os custos da dispersão de fontes e notações na recolha de informação. Antecipo assim a sua utilização nos Países de Língua Portuguesa, uma vez que os textos que encontrei são ora mais exigentes ora menos completos, frequentemente ambos.
Resumo:
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout. We present a composite document approach wherein an XMLbased document representation is linked via a shadow tree of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via read aloud or play aloud ) can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself. The problems of textual recognition and tree pattern matching between the two representations are discussed in detail. Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.
Resumo:
Este proyecto presenta el desarrollo de una aplicación que permite traducir Redes de Petri Coloreadas diseñadas en CPN Tools a un lenguaje para la generación de ficheros de entrada a un simulador/optimizador de Redes de Petri Coloreadas. De esta manera se podrán optimizar modelos creados en CPN Tools ya que esta herramienta no facilita la optimización. Todo el proyecto se ha realizado en C++.
Resumo:
Aquesta memòria vol mostrar que la tecnologia XML és la millor alternativa per a afrontar el repte tecnològic existent en els sistemes d'extracció d'informació de les aplicacions de nova generació. Aquests sistemes, d'una banda, han de garantir la seva independència respecte dels esquemes de les bases de dades dels quals s'alimenten i, de l'altra, han de ser capaços de mostrar la informació en múltiples formats.