62 resultados para 612


Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract: Boiling blood : anger at the start of the modern era in England

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrate rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the most challenging tasks in building language resources is the copyright license management. There are several reasons for this. First of all, the current European copyright system is designed to a large extent to satisfy the commercial actors, e.g. publishers, record companies etc. This means that the scope and duration of the rights are very extensive and there are even certain forms of protection that do not exist elsewhere in the world, e.g. database right. On the other hand, the exceptions for research and teaching are typically very narrow.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There are numerous formats for writing spellcheckers for open-source systems and there are many descriptions for languages written in these formats. Similarly, for word hyphenation by computer there are TEX rules for many languages. In this paper we demonstrate a method for converting these spell-checking lexicons and hyphenation rule sets into finite-state automata, and present a new finite-state based system for writer’s tools used in current open-source software such as Firefox, OpenOffice.org and enchant via the spell-checking library voikko.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

FinnWordNet is a WordNet for Finnish that conforms to the framework given in Fellbaum (1998) and Vossen (ed.) (1998). FinnWordNet is open source and currently contains 117,000 synsets. A classic WordNet consists of synsets, or sets of partial synonyms whose shared meaning is described and exemplified by a gloss, a common part of speech and a hyperonym. Synsets in a WordNet are arranged in hierarchical partial orderings according to semantic relations like hyponymy/hyperonymy. Together the gloss, part of speech and hyperonym fix the meaning of a word and constrain the possible translations of a word in a given synset. The Finnish group has opted for translating Princeton WordNet 3.0 synsets wholesale into Finnish by professional translators, because the translation process can be controlled with regard to quality, coverage, cost and speed of translation. The project was financed by FIN-CLARIN at the University of Helsinki. According to our preliminary evaluation, the translation process was diligent and the quality is on a par with the original Princeton WordNet.