4 resultados para scholarly text editing
em Massachusetts Institute of Technology
Resumo:
A prototype presentation system base is described. It offers mechanisms, tools, and ready-made parts for building user interfaces. A general user interface model underlies the base, organized around the concept of a presentation: a visible text or graphic for conveying information. Te base and model emphasize domain independence and style independence, to apply to the widest possible range of interfaces. The primitive presentation system model treats the interface as a system of processes maintaining a semantic relation between an application data base and a presentation data base, the symbolic screen description containing presentations. A presenter continually updates the presentation data base from the application data base. The user manipulates presentations with a presentation editor. A recognizer translates the user's presentation manipulation into application data base commands. The primitive presentation system can be extended to model more complex systems by attaching additional presentation systems. In order to illustrate the model's generality and descriptive capabilities, extended model structures for several existing user interfaces are discussed. The base provides support for building the application and presentation data bases, linked together into a single, uniform network, including descriptions of classes of objects as we as the objects themselves. The base provides an initial presentation data base network graphics to continually display it, and editing functions. A variety of tools and mechanisms help create and control presenters and recognizers. To demonstrate the base's utility, three interfaces to an operating system were constructed, embodying different styles: icons, menu, and graphical annotation.
Resumo:
TEMPEST is a full-screen text editor that incorporates a structural paradigm in addition to the more traditional textual paradigm provided by most editors. While the textual paradigm treats the text as a sequence of characters, the structural paradigm treats it as a collection of named blocks which the user can define, group, and manipulate. Blocks can be defined to correspond to the structural features of he text, thereby providing more meaningful objects to operate on than characters of lines. The structural representation of the text is kept in the background, giving TEMPEST the appearance of a typical text editor. The structural and textual interfaces coexist equally, however, so one can always operate on the text from wither point of view. TEMPEST's representation scheme provides no semantic understanding of structure. This approach sacrifices depth, but affords a broad range of applicability and requires very little computational overhead. A prototype has been implemented to illustrate the feasibility and potential areas of application of the central ideas. It was developed and runs on an IBM Personal Computer.
Resumo:
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
Resumo:
We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.