2 resultados para Linguística textual

em CORA - Cork Open Research Archive - University College Cork - Ireland


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many textual scholars will be aware that the title of the present thesis has been composed in a conscious revisionary relation to Tim William Machan’s influential Textual Criticism and Middle English Texts. (Tim William Machan, Textual Criticism and Middle English Texts (Charlottesville, 1994)). The primary subjects of Machan’s study are works written in English between the fourteenth and sixteenth centuries, the latter part of the period conventionally labelled Middle English. In contrast, the works with which I am primarily concerned are those written by scholars of Old and Middle Irish in the nineteenth, twentieth and twenty-first centuries. Where Machan aims to articulate the textual and cultural factors that characterise Middle English works as Middle English, the purposes of this thesis are (a) to identify the underlying ideological and epistemological perspectives which have informed much of the way in which medieval Irish documents and texts are rendered into modern editions, and (b) to begin to place the editorial theory and methodology of medieval Irish studies within the broader context of Biblical, medieval and modern textual criticism. Hence, the title is Textual Criticism and Medieval Irish Studies, rather than Textual Criticism and Medieval Irish Texts

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.