2 resultados para Bag-of-words

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This dissertation has two almost unrelated themes: privileged words and Sturmian words. Privileged words are a new class of words introduced recently. A word is privileged if it is a complete first return to a shorter privileged word, the shortest privileged words being letters and the empty word. Here we give and prove almost all results on privileged words known to date. On the other hand, the study of Sturmian words is a well-established topic in combinatorics on words. In this dissertation, we focus on questions concerning repetitions in Sturmian words, reproving old results and giving new ones, and on establishing completely new research directions. The study of privileged words presented in this dissertation aims to derive their basic properties and to answer basic questions regarding them. We explore a connection between privileged words and palindromes and seek out answers to questions on context-freeness, computability, and enumeration. It turns out that the language of privileged words is not context-free, but privileged words are recognizable by a linear-time algorithm. A lower bound on the number of binary privileged words of given length is proven. The main interest, however, lies in the privileged complexity functions of the Thue-Morse word and Sturmian words. We derive recurrences for computing the privileged complexity function of the Thue-Morse word, and we prove that Sturmian words are characterized by their privileged complexity function. As a slightly separate topic, we give an overview of a certain method of automated theorem-proving and show how it can be applied to study privileged factors of automatic words. The second part of this dissertation is devoted to Sturmian words. We extensively exploit the interpretation of Sturmian words as irrational rotation words. The essential tools are continued fractions and elementary, but powerful, results of Diophantine approximation theory. With these tools at our disposal, we reprove old results on powers occurring in Sturmian words with emphasis on the fractional index of a Sturmian word. Further, we consider abelian powers and abelian repetitions and characterize the maximum exponents of abelian powers with given period occurring in a Sturmian word in terms of the continued fraction expansion of its slope. We define the notion of abelian critical exponent for Sturmian words and explore its connection to the Lagrange spectrum of irrational numbers. The results obtained are often specialized for the Fibonacci word; for instance, we show that the minimum abelian period of a factor of the Fibonacci word is a Fibonacci number. In addition, we propose a completely new research topic: the square root map. We prove that the square root map preserves the language of any Sturmian word. Moreover, we construct a family of non-Sturmian optimal squareful words whose language the square root map also preserves.This construction yields examples of aperiodic infinite words whose square roots are periodic.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The purpose of this study is to explore how the potential cues of spoken sarcasm and irony are transferred into written Finnish subtitles in the American television series Gilmore Girls. The aim is to discover how the use of cuing differs between the English source text and the Finnish target text. The research is conducted through qualitative and quantitative analysis and comparison between the two languages. Three (3) episodes from Gilmore Girls are analysed using the Finnish DVD subtitles and the episodes’ English unofficial transcripts. The sarco-ironic remarks and their translations are identified, and their individual sarco-ironic cues are categorised. The number of cues, the number of cues in each category and the number of cue shifts are compared quantitatively in order to find whether the translator uses less or different cues than the writer of the original source text. The qualitative analysis focuses on the cue types used in both texts. The results confirmed the hypothesis of sarcasm and irony being relatively straightforward linguistic phenomena to translate, and the hypothesis of the category of words of reinforcement/trivialising being the most used in conveying sarcasm and irony. However, the study also showed that even though some cues were cut in the translation process, the sarco-ironic meaning was rarely omitted entirely from the translation. The findings also showed that the number of cue shifts was surprisingly low. From these results we can deduce that at least in the context of DVD subtitles, sarcasm and irony can be conveyed also in writing, and even though some cues were omitted or shifted, sarcasm and irony were still detectable in the Finnish subtitles. The language used in DVD translations uses less cues of sarcasm and irony and it could thus be considered less versatile than the original source language.