4 resultados para Toponymic spelling
em Universidad de Alicante
Resumo:
The great amount of text produced every day in the Web turned it as one of the main sources for obtaining linguistic corpora, that are further analyzed with Natural Language Processing techniques. On a global scale, languages such as Portuguese - official in 9 countries - appear on the Web in several varieties, with lexical, morphological and syntactic (among others) differences. Besides, a unified spelling system for Portuguese has been recently approved, and its implementation process has already started in some countries. However, it will last several years, so different varieties and spelling systems coexist. Since PoS-taggers for Portuguese are specifically built for a particular variety, this work analyzes different training corpora and lexica combinations aimed at building a model with high-precision annotation in several varieties and spelling systems of this language. Moreover, this paper presents different dictionaries of the new orthography (Spelling Agreement) as well as a new freely available testing corpus, containing different varieties and textual typologies.
Resumo:
Se analizan los conceptos de guerra y paz en los orígenes del islam, y se repasan los acontecimientos y justificación inicial de la guerra como necesaria para el sustento de la nueva comunidad de fieles y cómo se llega a institucionalizar el ŷihād como precepto. Cuando no es posible realizar este, como sustituto del ŷihād se desarrolla el espíritu del ribāṭ. Se explica la evolución hacia una espiritualización de este precepto, desarrollada inicialmente en «lugares de ribāṭ» y luego en las rábitas. Finalmente, se resumen los datos toponímicos y arqueológicos que conocemos sobre las rábitas en Portugal.
Resumo:
El estudio se articula en torno a tres ejes: En uno primero se recogen y desgranan "Las informaciones de Sagunto/Murbīṭar procedentes de las fuentes árabes escritas", tanto las noticias sobre diversos eventos históricos, como las noticias de las fuentes geográficas y literarias árabes (en especial las relacionadas con la descripción de sus monumentos), acabando con algunos apuntes demográficos. En un segundo apartado se estudia "La mutación toponímica de Saguntum a Murbīṭar, como fuente de información histórica", mientras que en el tercero y último se habla de "Las conquistas cristianas de Sagunto/Murbīṭar/Morvedre".
Resumo:
Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented.