Biblioteca Digital

This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.

Veja mais

Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Contrasting the Automatic Identification of Two Discourse Markers in Multiparty Dialogues

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Towards Automatic Identification of Discourse Markers in Dialogs: the Case of 'Like'

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article discusses the detection of discourse markers (DM) in dialog transcriptions, by human annotators and by automated means. After a theoretical discussion of the definition of DMs and their relevance to natural language processing, we focus on the role of like as a DM. Results from experiments with human annotators show that detection of DMs is a difficult but reliable task, which requires prosodic information from soundtracks. Then, several types of features are defined for automatic disambiguation of like: collocations, part-of-speech tags and duration-based features. Decision-tree learning shows that for like, nearly 70% precision can be reached, with near 100% recall, mainly using collocation filters. Similar results hold for well, with about 91% precision at 100% recall.

Veja mais