962 resultados para 440 French


Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on Ï2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article discusses the detection of discourse markers (DM) in dialog transcriptions, by human annotators and by automated means. After a theoretical discussion of the definition of DMs and their relevance to natural language processing, we focus on the role of like as a DM. Results from experiments with human annotators show that detection of DMs is a difficult but reliable task, which requires prosodic information from soundtracks. Then, several types of features are defined for automatic disambiguation of like: collocations, part-of-speech tags and duration-based features. Decision-tree learning shows that for like, nearly 70% precision can be reached, with near 100% recall, mainly using collocation filters. Similar results hold for well, with about 91% precision at 100% recall.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cette thèse consiste en une analyse épistémologique comparée et très détaillée de lâensemble du corpus saussurien publié ainsi que dâune portion très significative des oeuvres de Hjelmslev, Jakobson, Martinet et Benveniste. Il sâagit de montrer quâen dépit dâune filiation revendiquée le structuralisme européen nâest pas saussurien, et par là de faire apparaître, par contrecoup, la spécificité de la problématique saussurienne, ainsi que ses enjeux pour la linguistique et plus largement pour les sciences de lâhumain. La problématique saussurienne avait permis, pour la première fois dans lâhistoire de la linguistique, une appréhension théorique de la langue. La problématique structuraliste est en revanche entièrement empirique, de sorte que sa scientificité relève en réalité dâune idéologie scientifique, au sens de Georges Canguilhem. Le point nodal de cette radicale différence de problématique est lâabsence de théorisation structuraliste du rapport son/sens, et corrélativement la mécompréhension du concept saussurien de système. Celui-ci devient alors structure, câest-à-dire, comme nous tentons de le faire apparaître, appréhension structurale dâun objet dont la définition commune et évidente (celle de la langue comme instrument de communication) nâest pas remise en cause. A la problématique étiologique saussurienne, constitutive du concept de langue, répond ainsi une problématique analytique qui conduit quant à elle à la construction dâun objet (forme ou structure) en lieu et place dâun concept. Plus précisément, la problématique structuraliste est idiomologique. Elle manque ainsi la distinction entre langue et idiome dont nous tentons dès lors de démontrer la nécessité et le caractère constitutif de la théorisation de la langue et, au-delà, du langage, notamment dans le cadre dâune articulation entre linguistique et psychanalyse.

Relevância:

60.00% 60.00%

Publicador: