994 resultados para Text processing
Resumo:
This paper describes the followed methodology to automatically generate titles for a corpus of questions that belong to sociological opinion polls. Titles for questions have a twofold function: (1) they are the input of user searches and (2) they inform about the whole contents of the question and possible answer options. Thus, generation of titles can be considered as a case of automatic summarization. However, the fact that summarization had to be performed over very short texts together with the aforementioned quality conditions imposed on new generated titles led the authors to follow knowledge-rich and domain-dependent strategies for summarization, disregarding the more frequent extractive techniques for summarization.
Resumo:
Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that Acrobat's imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography and device-independent colour. These qualities will be very desirable components in future multimedia and hypermedia systems. The current capabilities of Acrobat and PDF are described; in particular the presence of hypertext links, bookmarks, and yellow sticker annotations (in release 1.0) together with article threads and multi-media plugins in version 2.0, This article also describes the CAJUN project (CD-ROM Acrobat Journals Using Networks) which has been investigating the automated placement of PDF hypertextual features from various front-end text processing systems. CAJUN has also been experimenting with the dissemination of PDF over e-mail, via World Wide Web and on CDROM.
Resumo:
Electronic Publishing -- Origination, Dissemination and Design (EP-odd) is an academic journal which publishes refereed papers in the subject area of electronic publishing. The authors of the present paper are, respectively, editor-in-chief, system software consultant and senior production manager for the journal. EP-odd's policy is that editors, authors, referees and production staff will work closely together using electronic mail. Authors are also encouraged to originate their papers using one of the approved text-processing packages together with the appropriate set of macros which enforce the layout style for the journal. This same software will then be used by the publisher in the production phase. Our experiences with these strategies are presented, and two recently developed suites of software are described: one of these makes the macro sets available over electronic mail and the other automates the flow of papers through the refereeing process. The decision to produce EP-odd in this way means that the publisher has to adopt production procedures which differ markedly from those employed for a conventional journal.
Resumo:
Starting in December 1982 the University of Nottingham decided to phototypeset almost all of its examination papers `in house' using the troff, tbl and eqn programs running under UNIX. This tutorial lecture highlights the features of the three programs with particular reference to their strengths and weaknesses in a production environment. The following issues are particularly addressed: Standards -- all three software packages require the embedding of commands and the invocation of pre-written macros, rather than `what you see is what you get'. This can help to enforce standards, in the absence of traditional compositor skills. Hardware and Software -- the requirements are analysed for an inexpensive preview facility and a low-level interface to the phototypesetter. Mathematical and Technical papers -- the fine-tuning of eqn to impose a standard house style. Staff skills and training -- systems of this kind do not require the operators to have had previous experience of phototypesetting. Of much greater importance is willingness and flexibility in learning how to use computer systems.
Resumo:
This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic tag- gers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and F1. Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntac- tic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achiev- ing (in almost every verb) an improvement in F1 when compared to the traditional bow approach.
Resumo:
An implementation of a computational tool to generate new summaries from new source texts is presented, by means of the connectionist approach (artificial neural networks). Among other contributions that this work intends to bring to natural language processing research, the use of a more biologically plausible connectionist architecture and training for automatic summarization is emphasized. The choice relies on the expectation that it may bring an increase in computational efficiency when compared to the sa-called biologically implausible algorithms.
Resumo:
An Electrocardiogram (ECG) monitoring system deals with several challenges related with noise sources. The main goal of this text was the study of Adaptive Signal Processing Algorithms for ECG noise reduction when applied to real signals. This document presents an adaptive ltering technique based on Least Mean Square (LMS) algorithm to remove the artefacts caused by electromyography (EMG) and power line noise into ECG signal. For this experiments it was used real noise signals, mainly to observe the di erence between real noise and simulated noise sources. It was obtained very good results due to the ability of noise removing that can be reached with this technique. A recolha de sinais electrocardiogr a cos (ECG) sofre de diversos problemas relacionados com ru dos. O objectivo deste trabalho foi o estudo de algoritmos adaptativos para processamento digital de sinal, para redu c~ao de ru do em sinais ECG reais. Este texto apresenta uma t ecnica de redu c~ao de ru do baseada no algoritmo Least Mean Square (LMS) para remo c~ao de ru dos causados quer pela actividade muscular (EMG) quer por ru dos causados pela rede de energia el ectrica. Para as experiencias foram utilizados ru dos reais, principalmente para aferir a diferen ca de performance do algoritmo entre os sinais reais e os simulados. Foram conseguidos bons resultados, essencialmente devido as excelentes caracter sticas que esta t ecnica tem para remover ru dos.
Resumo:
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.
Resumo:
Development and standardization of reliable methods for detection of Mycobacterium tuberculosis in clinical samples is an important goal in laboratories throughout the world. In this work, lung and spleen fragments from a patient who died with the diagnosis of miliary tuberculosis were used to evaluate the influence of the type of fixative as well as the fixation and paraffin inclusion protocols on PCR performance in paraffin embedded specimens. Tissue fragments were fixed for four h to 48 h, using either 10% non-buffered or 10% buffered formalin, and embedded in pure paraffin or paraffin mixed with bee wax. Specimens were submitted to PCR for amplification of the human beta-actin gene and separately for amplification of the insertion sequence IS6110, specific from the M. tuberculosis complex. Amplification of the beta-actin gene was positive in all samples. No amplicons were generated by PCR-IS6110 when lung tissue fragments were fixed using 10% non-buffered formalin and were embedded in paraffin containing bee wax. In conclusion, combined inhibitory factors interfere in the detection of M. tuberculosis in stored material. It is important to control these inhibitory factors in order to implement molecular diagnosis in pathology laboratories.
Resumo:
Vibrio cholerae represents a significant threat to human health in developing countries. This pathogen forms biofilms which favors its attachment to surfaces and its survival and transmission by water or food. This work evaluated the in vitro biofilm formation of V. cholerae isolated from clinical and environmental sources on stainless steel of the type used in food processing by using the environmental scanning electron microscopy (ESEM). Results showed no cell adhesion at 4 h and scarce surface colonization at 24 h. Biofilms from the environmental strain were observed at 48 h with high cellular aggregations embedded in Vibrio exopolysaccharide (VPS), while less confluence and VPS production with microcolonies of elongated cells were observed in biofilms produced by the clinical strain. At 96 h the biofilms of the environmental strain were released from the surface leaving coccoid cells and residual structures, whereas biofilms of the clinical strain formed highly organized structures such as channels, mushroom-like and pillars. This is the first study that has shown the in vitro ability of V. cholerae to colonize and form biofilms on stainless steel used in food processing.