66 resultados para Text categorization
Resumo:
This article looks at the difference between scientists’ written reports and their oral accounts, explanations and stories. The subject of these discourses is the eruption of Mount Chance on Montserrat, a British Overseas Territory in the Eastern Caribbean, and its continued monitoring and reporting. Scientific notions of risk and uncertainty which feature in these texts and tales will subsequently be examined and critiqued. Further to this, this article will end by pointing out that, ironically, the latter - the tale – can in some cases be a more effective and approximate mode of communication with the public than the former – the text.
Resumo:
The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.
Resumo:
This article discusses tense and aspect in the context of attested forms of discourse and text. The emphasis is on the semantic, pragmatic, textual, and stylistic functions of tense in context, taking into account linguistic features in the surrounding discourse, as well as the importance of factors such as medium (spoken or written), register (degree of formality), text type (literary vs. journalistic vs. conversational etc.), and discourse mode (narrative vs. report vs. description, etc.). Thus, tense and aspect are analyzed not purely as part of a linguistic “system” as such, but in the context of particular texts or forms of discourse. The article also explores the concept of “markedness” through two case studies: the narrative present and the narrative imperfect. Finally, it assesses the roles played by tenses in conveying particular points of view in texts, including shifts and/or ambiguities in point of view; Segmented Discourse Representation Theory; internal focalization and the French imperfective past tense; and textual polyphony.
Resumo:
The objective of this paper is to describe and evaluate the application of the Text Encoding Initiative (TEI) Guidelines to a corpus of oral French, this being the first corpus of oral French where the TEI has been used. The paper explains the purpose of the corpus, both in creating a specialist corpus of néo-contage that will broaden the range of oral corpora available, and, more importantly, in creating a dataset to explore a variety of oral French that has a particularly interesting status in terms of factors such as conception orale/écrite, réalisation médiale and comportement communicatif (Koch and Oesterreicher 2001). The linguistic phenomena to be encoded are both stylistic (speech and thought presentation) and syntactic (negation, detachment, inversion), and all represent areas where previous research has highlighted the significance of factors such as medium, register and discourse type, as well as a host of linguistic factors (syntactic, phonetic, lexical). After a discussion of how a tagset can be designed and applied within the TEI to encode speech and thought presentation, negation, detachment and inversion, the final section of the paper evaluates the benefits and possible drawbacks of the methodology offered by the TEI when applied to a syntactic and stylistic markup of an oral corpus.