960 resultados para 280205 Text Processing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the followed methodology to automatically generate titles for a corpus of questions that belong to sociological opinion polls. Titles for questions have a twofold function: (1) they are the input of user searches and (2) they inform about the whole contents of the question and possible answer options. Thus, generation of titles can be considered as a case of automatic summarization. However, the fact that summarization had to be performed over very short texts together with the aforementioned quality conditions imposed on new generated titles led the authors to follow knowledge-rich and domain-dependent strategies for summarization, disregarding the more frequent extractive techniques for summarization.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that Acrobat's imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography and device-independent colour. These qualities will be very desirable components in future multimedia and hypermedia systems. The current capabilities of Acrobat and PDF are described; in particular the presence of hypertext links, bookmarks, and yellow sticker annotations (in release 1.0) together with article threads and multi-media plugins in version 2.0, This article also describes the CAJUN project (CD-ROM Acrobat Journals Using Networks) which has been investigating the automated placement of PDF hypertextual features from various front-end text processing systems. CAJUN has also been experimenting with the dissemination of PDF over e-mail, via World Wide Web and on CDROM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Electronic Publishing -- Origination, Dissemination and Design (EP-odd) is an academic journal which publishes refereed papers in the subject area of electronic publishing. The authors of the present paper are, respectively, editor-in-chief, system software consultant and senior production manager for the journal. EP-odd's policy is that editors, authors, referees and production staff will work closely together using electronic mail. Authors are also encouraged to originate their papers using one of the approved text-processing packages together with the appropriate set of macros which enforce the layout style for the journal. This same software will then be used by the publisher in the production phase. Our experiences with these strategies are presented, and two recently developed suites of software are described: one of these makes the macro sets available over electronic mail and the other automates the flow of papers through the refereeing process. The decision to produce EP-odd in this way means that the publisher has to adopt production procedures which differ markedly from those employed for a conventional journal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Starting in December 1982 the University of Nottingham decided to phototypeset almost all of its examination papers `in house' using the troff, tbl and eqn programs running under UNIX. This tutorial lecture highlights the features of the three programs with particular reference to their strengths and weaknesses in a production environment. The following issues are particularly addressed: Standards -- all three software packages require the embedding of commands and the invocation of pre-written macros, rather than `what you see is what you get'. This can help to enforce standards, in the absence of traditional compositor skills. Hardware and Software -- the requirements are analysed for an inexpensive preview facility and a low-level interface to the phototypesetter. Mathematical and Technical papers -- the fine-tuning of eqn to impose a standard house style. Staff skills and training -- systems of this kind do not require the operators to have had previous experience of phototypesetting. Of much greater importance is willingness and flexibility in learning how to use computer systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic tag- gers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and F1. Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntac- tic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achiev- ing (in almost every verb) an improvement in F1 when compared to the traditional bow approach.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our everyday environment is full of text but this rich source of information remains largely inaccessible to mobile robots. In this paper we describe an active text spotting system that uses a small number of wide angle views to locate putative text in the environment and then foveates and zooms onto that text in order to improve the reliability of text recognition. We present extensive experimental results obtained with a pan/tilt/zoom camera and a ROS-based mobile robot operating in an indoor environment.