962 resultados para 440 French
Resumo:
This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.
Resumo:
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.
Resumo:
This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.
Resumo:
This article discusses the detection of discourse markers (DM) in dialog transcriptions, by human annotators and by automated means. After a theoretical discussion of the definition of DMs and their relevance to natural language processing, we focus on the role of like as a DM. Results from experiments with human annotators show that detection of DMs is a difficult but reliable task, which requires prosodic information from soundtracks. Then, several types of features are defined for automatic disambiguation of like: collocations, part-of-speech tags and duration-based features. Decision-tree learning shows that for like, nearly 70% precision can be reached, with near 100% recall, mainly using collocation filters. Similar results hold for well, with about 91% precision at 100% recall.
Resumo:
Cette thèse consiste en une analyse épistémologique comparée et très détaillée de l’ensemble du corpus saussurien publié ainsi que d’une portion très significative des oeuvres de Hjelmslev, Jakobson, Martinet et Benveniste. Il s’agit de montrer qu’en dépit d’une filiation revendiquée le structuralisme européen n’est pas saussurien, et par là de faire apparaître, par contrecoup, la spécificité de la problématique saussurienne, ainsi que ses enjeux pour la linguistique et plus largement pour les sciences de l’humain. La problématique saussurienne avait permis, pour la première fois dans l’histoire de la linguistique, une appréhension théorique de la langue. La problématique structuraliste est en revanche entièrement empirique, de sorte que sa scientificité relève en réalité d’une idéologie scientifique, au sens de Georges Canguilhem. Le point nodal de cette radicale différence de problématique est l’absence de théorisation structuraliste du rapport son/sens, et corrélativement la mécompréhension du concept saussurien de système. Celui-ci devient alors structure, c’est-à -dire, comme nous tentons de le faire apparaître, appréhension structurale d’un objet dont la définition commune et évidente (celle de la langue comme instrument de communication) n’est pas remise en cause. A la problématique étiologique saussurienne, constitutive du concept de langue, répond ainsi une problématique analytique qui conduit quant à elle à la construction d’un objet (forme ou structure) en lieu et place d’un concept. Plus précisément, la problématique structuraliste est idiomologique. Elle manque ainsi la distinction entre langue et idiome dont nous tentons dès lors de démontrer la nécessité et le caractère constitutif de la théorisation de la langue et, au-delà , du langage, notamment dans le cadre d’une articulation entre linguistique et psychanalyse.