908 resultados para Formatting text
Resumo:
Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.
Resumo:
TEMPEST is a full-screen text editor that incorporates a structural paradigm in addition to the more traditional textual paradigm provided by most editors. While the textual paradigm treats the text as a sequence of characters, the structural paradigm treats it as a collection of named blocks which the user can define, group, and manipulate. Blocks can be defined to correspond to the structural features of he text, thereby providing more meaningful objects to operate on than characters of lines. The structural representation of the text is kept in the background, giving TEMPEST the appearance of a typical text editor. The structural and textual interfaces coexist equally, however, so one can always operate on the text from wither point of view. TEMPEST's representation scheme provides no semantic understanding of structure. This approach sacrifices depth, but affords a broad range of applicability and requires very little computational overhead. A prototype has been implemented to illustrate the feasibility and potential areas of application of the central ideas. It was developed and runs on an IBM Personal Computer.
Resumo:
Schierz, A. (2007). Monitoring knowledge: a text-based approach. Terminology, 13 (2), 125-154. Sponsorship: EPSRC DTG Project IQ, EU IST-FET FP6-516169
Resumo:
On January 11, 2008, the National Institutes of Health ('NIH') adopted a revised Public Access Policy for peer-reviewed journal articles reporting research supported in whole or in part by NIH funds. Under the revised policy, the grantee shall ensure that a copy of the author's final manuscript, including any revisions made during the peer review process, be electronically submitted to the National Library of Medicine's PubMed Central ('PMC') archive and that the person submitting the manuscript will designate a time not later than 12 months after publication at which NIH may make the full text of the manuscript publicly accessible in PMC. NIH adopted this policy to implement a new statutory requirement under which: The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law. This White Paper is written primarily for policymaking staff in universities and other institutional recipients of NIH support responsible for ensuring compliance with the Public Access Policy. The January 11, 2008, Public Access Policy imposes two new compliance mandates. First, the grantee must ensure proper manuscript submission. The version of the article to be submitted is the final version over which the author has control, which must include all revisions made after peer review. The statutory command directs that the manuscript be submitted to PMC 'upon acceptance for publication.' That is, the author's final manuscript should be submitted to PMC at the same time that it is sent to the publisher for final formatting and copy editing. Proper submission is a two-stage process. The electronic manuscript must first be submitted through a process that requires input of additional information concerning the article, the author(s), and the nature of NIH support for the research reported. NIH then formats the manuscript into a uniform, XML-based format used for PMC versions of articles. In the second stage of the submission process, NIH sends a notice to the Principal Investigator requesting that the PMC-formatted version be reviewed and approved. Only after such approval has grantee's manuscript submission obligation been satisfied. Second, the grantee also has a distinct obligation to grant NIH copyright permission to make the manuscript publicly accessible through PMC not later than 12 months after the date of publication. This obligation is connected to manuscript submission because the author, or the person submitting the manuscript on the author's behalf, must have the necessary rights under copyright at the time of submission to give NIH the copyright permission it requires. This White Paper explains and analyzes only the scope of the grantee's copyright-related obligations under the revised Public Access Policy and suggests six options for compliance with that aspect of the grantee's obligation. Time is of the essence for NIH grantees. As a practical matter, the grantee should have a compliance process in place no later than April 7, 2008. More specifically, the new Public Access Policy applies to any article accepted for publication on or after April 7, 2008 if the article arose under (1) an NIH Grant or Cooperative Agreement active in Fiscal Year 2008, (2) direct funding from an NIH Contract signed after April 7, 2008, (3) direct funding from the NIH Intramural Program, or (4) from an NIH employee. In addition, effective May 25, 2008, anyone submitting an application, proposal or progress report to the NIH must include the PMC reference number when citing articles arising from their NIH funded research. (This includes applications submitted to the NIH for the May 25, 2008 and subsequent due dates.) Conceptually, the compliance challenge that the Public Access Policy poses for grantees is easily described. The grantee must depend to some extent upon the author(s) to take the necessary actions to ensure that the grantee is in compliance with the Public Access Policy because the electronic manuscripts and the copyrights in those manuscripts are initially under the control of the author(s). As a result, any compliance option will require an explicit understanding between the author(s) and the grantee about how the manuscript and the copyright in the manuscript are managed. It is useful to conceptually keep separate the grantee's manuscript submission obligation from its copyright permission obligation because the compliance personnel concerned with manuscript management may differ from those responsible for overseeing the author's copyright management. With respect to copyright management, the grantee has the following six options: (1) rely on authors to manage copyright but also to request or to require that these authors take responsibility for amending publication agreements that call for transfer of too many rights to enable the author to grant NIH permission to make the manuscript publicly accessible ('the Public Access License'); (2) take a more active role in assisting authors in negotiating the scope of any copyright transfer to a publisher by (a) providing advice to authors concerning their negotiations or (b) by acting as the author's agent in such negotiations; (3) enter into a side agreement with NIH-funded authors that grants a non-exclusive copyright license to the grantee sufficient to grant NIH the Public Access License; (4) enter into a side agreement with NIH-funded authors that grants a non-exclusive copyright license to the grantee sufficient to grant NIH the Public Access License and also grants a license to the grantee to make certain uses of the article, including posting a copy in the grantee's publicly accessible digital archive or repository and authorizing the article to be used in connection with teaching by university faculty; (5) negotiate a more systematic and comprehensive agreement with the biomedical publishers to ensure either that the publisher has a binding obligation to submit the manuscript and to grant NIH permission to make the manuscript publicly accessible or that the author retains sufficient rights to do so; or (6) instruct NIH-funded authors to submit manuscripts only to journals with binding deposit agreements with NIH or to journals whose copyright agreements permit authors to retain sufficient rights to authorize NIH to make manuscripts publicly accessible.
Resumo:
Version 1.1 of the Hyper Text Transfer Protocol (HTTP) was principally developed as a means for reducing both document transfer latency and network traffic. The rationale for the performance enhancements in HTTP/1.1 is based on the assumption that the network is the bottleneck in Web transactions. In practice, however, the Web server can be the primary source of document transfer latency. In this paper, we characterize and compare the performance of HTTP/1.0 and HTTP/1.1 in terms of throughput at the server and transfer latency at the client. Our approach is based on considering a broader set of bottlenecks in an HTTP transfer; we examine how bottlenecks in the network, CPU, and in the disk system affect the relative performance of HTTP/1.0 versus HTTP/1.1. We show that the network demands under HTTP/1.1 are somewhat lower than HTTP/1.0, and we quantify those differences in terms of packets transferred, server congestion window size and data bytes per packet. We show that when the CPU is the bottleneck, there is relatively little difference in performance between HTTP/1.0 and HTTP/1.1. Surprisingly, we show that when the disk system is the bottleneck, performance using HTTP/1.1 can be much worse than with HTTP/1.0. Based on these observations, we suggest a connection management policy for HTTP/1.1 that can improve throughput, decrease latency, and keep network traffic low when the disk system is the bottleneck.
Resumo:
info:eu-repo/semantics/nonPublished
Resumo:
BACKGROUND: The ability to write clearly and effectively is of central importance to the scientific enterprise. Encouraged by the success of simulation environments in other biomedical sciences, we developed WriteSim TCExam, an open-source, Web-based, textual simulation environment for teaching effective writing techniques to novice researchers. We shortlisted and modified an existing open source application - TCExam to serve as a textual simulation environment. After testing usability internally in our team, we conducted formal field usability studies with novice researchers. These were followed by formal surveys with researchers fitting the role of administrators and users (novice researchers) RESULTS: The development process was guided by feedback from usability tests within our research team. Online surveys and formal studies, involving members of the Research on Research group and selected novice researchers, show that the application is user-friendly. Additionally it has been used to train 25 novice researchers in scientific writing to date and has generated encouraging results. CONCLUSION: WriteSim TCExam is the first Web-based, open-source textual simulation environment designed to complement traditional scientific writing instruction. While initial reviews by students and educators have been positive, a formal study is needed to measure its benefits in comparison to standard instructional methods.
Resumo:
A tree-based dictionary learning model is developed for joint analysis of imagery and associated text. The dictionary learning may be applied directly to the imagery from patches, or to general feature vectors extracted from patches or superpixels (using any existing method for image feature extraction). Each image is associated with a path through the tree (from root to a leaf), and each of the multiple patches in a given image is associated with one node in that path. Nodes near the tree root are shared between multiple paths, representing image characteristics that are common among different types of images. Moving toward the leaves, nodes become specialized, representing details in image classes. If available, words (text) are also jointly modeled, with a path-dependent probability over words. The tree structure is inferred via a nested Dirichlet process, and a retrospective stick-breaking sampler is used to infer the tree depth and width.
Resumo:
This is the second installment of a three-part project to publish a group of ten Ptolemaic papyri purchased by Yale’s Beinecke Library in 1998 (acquisition “1998b”), which came to the Beinecke as three hard wads that were apparently the stuffing from the stomach cavity of a mummified animal. This article publishes: (1) P.CtYBR inv. 5019, a fragment of line ends in iambic tetrameter catalectic meter from an unknown comedy; the format suggests that this is a further example of certain type of Ptolemaic writing exercise. (2) P.CtYBR inv. 5043, a fragmentary grammatical text of uncertain import.
Resumo:
This, the second edition, adopts a critical and theoretical perspective on remuneration policy and practices in the UK, from the decline of collective bargaining to the rise of more individualistic systems based on employee performance. It tackles the conceptual issues missing from existing texts in the field of HRM by critically examining the latest academic literature on the topic. [Taken from publisher's product description].
Resumo:
David Norbrook, Review of English Studies 56 (Sept. 2005), 675-6.
‘We have waited a long time for a study of Marvell’s Latin poetry; fortunately, Estelle Haan’s monograph generously makes good the loss ... One of her most intriguing suggestions … is that Marvell may have presented paired poems like ‘Ros’ and ‘On a Drop of Dew’, and the poems to the obligingly named Dr Witty, to his student Maria Fairfax as his own patterns for the pedagogical practice of double translation. Perhaps the most original parts of the book, however, move beyond the familiar canon to cover the generic range of the Latin verse. Haan offers a very full contextualization of the early Horatian Ode to Charles I in seventeenth-century exercises in parodia. In a rewarding reading of the poem to Dr Ingelo she shows how Marvell deploys the language of Ovid’s Tristia to present Sweden as a place of shivering exile, only to subvert this model with a neo-Virgilian celebration of Christina as a virtuous, city-building Dido. She draws extensively on historical as well as literary sources to offer very detailed contextualizations of the poem to Maniban and ‘Scaevola Scotto-Britannus’... This monograph opens up many new ways into the Latin verse, not least because it is rounded off with new texts and prose translations of the Latin poems. These make a substantial contribution in their own right. They are the best and most accurate translations to date (those in Smith’s edition having some lapses); they avoid poeticisms but bring out the structure of the poems' wordplay very clearly. This book brings us a lot closer to seeing Marvell whole.'