995 resultados para Structured documents
Resumo:
The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.
Resumo:
We discuss from a practical point of view a number of ssues involved in writing distributed Internet and WWW applications using LP/CLP systems. We describe PiLLoW, a publicdomain Internet and WWW programming library for LP/CLP systems that we have designed in order to simplify the process of writing such applications. PiLLoW provides facilities for accessing documents and code on the WWW; parsing, manipulating and generating HTML and XML structured documents and data; producing HTML forms; writing form handlers and CGI-scripts; and processing HTML/XML templates. An important contribution of PÍ'LLOW is to model HTML/XML code (and, thus, the content of WWW pages) as terms. The PÍ'LLOW library has been developed in the context of the Ciao Prolog system, but it has been adapted to a number of popular LP/CLP systems, supporting most of its functionality. We also describe the use of concurrency and a highlevel model of client-server interaction, Ciao Prolog's active modules, in the context of WWW programming. We propose a solution for client-side downloading and execution of Prolog code, using generic browsers. Finally, we also provide an overview of related work on the topic.
Resumo:
We discuss from a practical point of view a number of issues involved in writing Internet and WWW applications using LP/CLP systems. We describe Pd_l_oW, a public-domain Internet and WWW programming library for LP/CLP systems which we argüe significantly simplifies the process of writing such applications. Pd_l_oW provides facilities for generating HTML structured documents, producing HTML forms, writing form handlers, accessing and parsing WWW documents, and accessing code posted at HTTP addresses. We also describe the architecture of some application classes, using a high-level model of client-server interaction, active modules. We then propose an architecture for automatic LP/CLP code downloading for local execution, using generic browsers. Finally, we also provide an overview of related work on the topic. The PiLLoW library has been developed in the context of the &- Prolog and CIAO systems, but it has been adapted to a number of popular LP/CLP systems, supporting most of its functionality.
Resumo:
We discuss from a practical point of view a number of issues involved in writing Internet and WWW applications using LP/CLP systems. We describe PiLLoW, an Internet and WWW programming library for LP/CLP systems which we argüe significantly simplifies the process of writing such applications. PiLLoW provides facilities for generating HTML structured documents, producing HTML forms, writing form handlers, accessing and parsing WWW documents, and accessing code posted at HTTP addresses. We also describe the architecture of some application classes, using a high-level model of client-server interaction, active modules. Finally we describe an architecture for automatic LP/CLP code downloading for local execution, using generic browsers. The PiLLoW library has been developed in the context of the &-Prolog and CIAO systems, but it has been adapted to a number of popular LP/CLP systems, supporting most of its functionality.
Resumo:
Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (http://www.documentengineering.org). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.
Resumo:
Most Australian states have introduced legislation to provide for enduring documents for financial, personal and health care decision making in the event of incapacity. Since the introduction of Enduring Powers of Attorney (EPAs) and Advance Health Directives (AHDs) in Queensland in 1998, concerns have continued to be raised by service providers, professionals and individuals about the uptake, understanding and appropriate use of these documents. In response to these concerns, the Department of Justice and Attorney-General (DJAG) convened a Practical Guardianship Initiatives Working Party. This group identified the limited evidence base available to address these concerns. In 2009, a multidisciplinary research team from the University of Queensland and the Queensland University of Technology was awarded $90,000 from the Legal Practitioners Interest on Trust Account Fund to undertake a review of the current EPA and AHD forms. The goal of the research was to gather data on the content and useability of the forms from the perspectives of a range of stakeholders, particularly those completing the EPA and AHD, witnesses of these documents, attorneys appointed under an EPA, and health professionals involved in the completion of an AHD or dealing with it in a clinical context. The researchers also sought to gather information from the perspective of Aboriginal and Torres Strait Islander (ATSI) individuals as well people from culturally and linguistically diverse (CALD) groups. Although the focus of the research was on the forms and the extent to which the current design, content and format represents a barrier to uptake, in the course of the research, some broader issues were identified which have an impact on the effectiveness of the EPA and AHD in achieving the goals of planning for financial and personal and health care in advance of losing capacity. The data gathered enabled the researchers to achieve the primary goal of the research: to make recommendations to improve the content and useability of the forms which hopefully will lead to an increased uptake and appropriate use of the forms. However, the researchers thought it was important not to ignore broader policy issues that were identified in the course of the research. These broader issues have been highlighted in this Report, and the researchers have responded to them in a variety of ways. For some issues, the researchers have suggested alterations that could be made to the forms to address the particular concerns. For other issues, the researchers have suggested that Government may need to take specific action such as educating the broader community with some attention to strategies that engage particular groups within communities. Other concerns raised can only be dealt with by legislative reform and, in some of these cases, the researchers have identified issues that Government may wish to consider further. We do note, however, that it is beyond the scope of this Report to recommend changes to the law. This three stage mixed methods project aimed to provide systematic evidence from a broad range of stakeholders in regard to: (i) which groups use and do not use these documents and why, (ii) the contribution of the length/complexity/format/language of the forms as barriers to their completion and/or effective use, and (iii) the issues raised by the current documents for witnesses and attorneys. Understanding and use of EPAs and AHDs were generally explored in separate but parallel processes. A purposive sampling strategy included users of the documents as principals and attorneys, and professionals, witnesses and service providers who assist others to execute or use the forms. The first component of this study built on existing knowledge using a Critical Reference Group and material provided by the DJAG Practical Guardianship Initiatives Working Party. This assisted in the development of the data collection tools for subsequent stages. The second component comprised semi-structured interviews and focus groups with a targeted sample of current users of the forms, potential users, witnesses and other professionals to provide in-depth information on critical issues. Outreach to Aboriginal and Torres Strait Islander Elders and individuals and workers with CALD groups ensured a broad sample of potential users of the two documents. Fifty individual interviews and three focus groups were completed. Most interviews and focus groups focused on perceptions of, and experiences with, either the EPA or the AHD form. In the interviews with Indigenous people and the CALD focus groups, however, respondents provided their perceptions and experiences of both documents. In general, these respondents had not used the forms and were responding to the documents made available in the interview or focus group. In total, seventy-seven individuals were involved in interviews or focus groups. The final component comprised on-line surveys for EPA principals, EPA attorneys, AHD principals, witnesses of EPAs and AHDs and medical practitioners with experience of AHDs as nominated and/or treating doctors. The surveys were developed from the initial component and the qualitative analysis of the interview and focus group data. A total of 116 surveys were returned from major cities and regional Queensland. The survey data was analysed descriptively for patterns and trends. It is important to note that the aim of the survey was to gain insight into issues and concerns relating to the documents and not to make generalisations to the broader population.
Resumo:
Balancing the competing interests of autonomy and protection of individuals is an escalating challenge confronting an ageing Australian population. Legal and medical professionals are increasingly being asked to determine whether individuals are legally capable to make their own testamentary, financial and/or personal/health care decisions. Diseases such as dementia impact upon cognition which necessitates collaboration between the legal and medical professions to satisfactorily assess the effect of such mentally disabling conditions upon legal competency. Terminological and methodological differences exist between the two professions when assessing capacity in this context which subsequently create miscommunication and misunderstanding. Consequently, it is not necessarily a simple solution for a legal professional to seek the opinion of a medical practitioner. Exacerbating the situation is the fact that no consistent and transparent capacity assessment paradigm currently exists in Australia. Assessments are instead being undertaken on an ad hoc basis dependent upon the skill set of the legal and/or medical professionals involved. A qualitative study seeking the views of legal and medical professionals who practise in this area has been conducted. This incorporated a review of the relevant literature and surveys which informed the semi-structured interviews conducted with 10 legal and 20 medical practitioners. Practitioners were asked whether there is a standard approach to assessment and whether national guidelines would assist. The general consensus was that uniform guidelines would be advantageous. The research also canvassed practitioner views as to the state of the relationship between the professions when assessing capacity. Three promising practices have emerged from this research: first, is the need for the development of national guidelines and supporting principles to satisfactorily assess capacity; second, is the possibility of strengthening the relationship between legal and medical professionals to assist in the satisfactory assessment of legal capacity; and third, the need for increased community education.
Resumo:
Li, Longzhuang, Liu, Yonghuai, Obregon, A., Weatherston, M. Visual Segmentation-Based Data Record Extraction From Web Documents. Proceedings of IEEE International Conference on Information Reuse and Integration, 2007, pp. 502-507. Sponsorship: IEEE
Resumo:
This review provides a classification of public policies to promote healthier eating as well as a structured mapping of existing measures in Europe. Complete coverage of alternative policy types was ensured by complementing the review with a selection of major interventions from outside Europe. Under the auspices of the Seventh Framework Programme's Eatwell Project, funded by the European Commission, researchers from five countries reviewed a representative selection of policy actions based on scientific papers, policy documents, grey literature, government websites, other policy reviews, and interviews with policy-makers. This work resulted in a list of 129 policy interventions, 121 of which were in Europe. For each type of policy, a critical review of its effectiveness was conducted, based on the evidence currently available. The results of this review indicate a need exists for a more systematic and accurate evaluation of government-level interventions as well as for a stronger focus on actual behavioral change rather than changes in attitude or intentions alone. The currently available evidence is very heterogeneous across policy types and is often incomplete.
Resumo:
Iatrogenic errors and patient safety in clinical processes are an increasing concern. The quality of process information in hardcopy or electronic form can heavily influence clinical behaviour and decision making errors. Little work has been undertaken to assess the safety impact of clinical process planning documents guiding the clinical actions and decisions. This paper investigates the clinical process documents used in elective surgery and their impact on latent and active clinical errors. Eight clinicians from a large health trust underwent extensive semi- structured interviews to understand their use of clinical documents, and their perceived impact on errors and patient safety. Samples of the key types of document used were analysed. Theories of latent organisational and active errors from the literature were combined with the EDA semiotics model of behaviour and decision making to propose the EDA Error Model. This model enabled us to identify perceptual, evaluation, knowledge and action error types and approaches to reducing their causes. The EDA error model was then used to analyse sample documents and identify error sources and controls. Types of knowledge artefact structures used in the documents were identified and assessed in terms of safety impact. This approach was combined with analysis of the questionnaire findings using existing error knowledge from the literature. The results identified a number of document and knowledge artefact issues that give rise to latent and active errors and also issues concerning medical culture and teamwork together with recommendations for further work.
Resumo:
The need for a convergence between semi-structured data management and Information Retrieval techniques is manifest to the scientific community. In order to fulfil this growing request, W3C has recently proposed XQuery Full Text, an IR-oriented extension of XQuery. However, the issue of query optimization requires the study of important properties like query equivalence and containment; to this aim, a formal representation of document and queries is needed. The goal of this thesis is to establish such formal background. We define a data model for XML documents and propose an algebra able to represent most of XQuery Full-Text expressions. We show how an XQuery Full-Text expression can be translated into an algebraic expression and how an algebraic expression can be optimized.
Resumo:
A vast amount of temporal information is provided on the Web. Even though many facts expressed in documents are time-related, the temporal properties of Web presentations have not received much attention. In database research, temporal databases have become a mainstream topic in recent years. In Web documents, temporal data may exist as meta data in the header and as user-directed data in the body of a document. Whereas temporal data can easily be identified in the semi-structured meta data, it is more difficult to determine temporal data and its role in the body. We propose procedures for maintaining temporal integrity of Web pages and outline different approaches of applying bitemporal data concepts for Web documents. In particular, we regard desirable functionalities of Web repositories and other Web-related tools that may support the Webmasters in managing the temporal data of their Web documents. Some properties of a prototype environment are described.
Resumo:
Existing theories of semantic cognition propose models of cognitive processing occurring in a conceptual space, where ‘meaning’ is derived from the spatial relationships between concepts’ mapped locations within the space. Information visualisation is a growing area of research within the field of information retrieval, and methods for presenting database contents visually in the form of spatial data management systems (SDMSs) are being developed. This thesis combined these two areas of research to investigate the benefits associated with employing spatial-semantic mapping (documents represented as objects in two- and three-dimensional virtual environments are proximally mapped dependent on the semantic similarity of their content) as a tool for improving retrieval performance and navigational efficiency when browsing for information within such systems. Positive effects associated with the quality of document mapping were observed; improved retrieval performance and browsing behaviour were witnessed when mapping was optimal. It was also shown using a third dimension for virtual environment (VE) presentation provides sufficient additional information regarding the semantic structure of the environment that performance is increased in comparison to using two-dimensions for mapping. A model that describes the relationship between retrieval performance and browsing behaviour was proposed on the basis of findings. Individual differences were not found to have any observable influence on retrieval performance or browsing behaviour when mapping quality was good. The findings from this work have implications for both cognitive modelling of semantic information, and for designing and testing information visualisation systems. These implications are discussed in the conclusions of this work.
Resumo:
This research investigated the effectiveness and efficiency of structured writing as compared to traditional nonstructured writing as a teaching and learning strategy in a training session for teachers.^ Structured writing is a method of identifying, interrelating, sequencing, and graphically displaying information on fields of a page or computer. It is an alternative for improving training and educational outcomes by providing an effective and efficient documentation methodology.^ The problem focuses upon the contradiction between: (a) the supportive research and theory to modify traditional methods of written documents and information presentation and (b) the existing paradigm to continue with traditional communication methods.^ A MANOVA was used to determine significant difference between a control and an experimental group in a posttest only experimental design. The experimental group received the treatment of structured writing materials during a training session. Two variables were analyzed. They were: (a) effectiveness; correct items on a posttest, and (b) efficiency; time spent on test.^ The quantitative data showed a difference for the experimental group on the two dependent variables. The experimental group completed the posttest in 2 minutes less time while scoring 1.5 more items correct. An interview with the training facilitators revealed that the structured writing materials were "user friendly." ^
Resumo:
This paper describes a tool for recombining the logical structure from an XML document with the typeset appearance of the corresponding PDF document. The tool uses the XML representation as a template for the insertion of the logical structure into the existing PDF document, thereby creating a Structured/Tagged PDF. The addition of logical structure adds value to the PDF in three ways: the accessibility is improved (PDF screen readers for visually impaired users perform better), media options are enhanced (the ability to reflow PDF documents, using structure as a guide, makes PDF viable for use on hand-held devices) and the re-usability of the PDF documents benefits greatly from the presence of an XML-like structure tree to guide the process of text retrieval in reading order (e.g. when interfacing to XML applications and databases).