Biblioteca Digital

**Autoria(s):** Cocco C.
Data(s)	01/04/2012
Resumo	Abstract: To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
Identificador	http://serval.unil.ch/?id=serval:BIB_5A2CBDB06CA2 isbn:978-1-937284-19-0 http://my.unil.ch/serval/document/BIB_5A2CBDB06CA2.pdf http://nbn-resolving.org/urn/resolver.pl?urn=urn:nbn:ch:serval-BIB_5A2CBDB06CA20 http://aclweb.org/anthology-new/E/E12/E12-3.pdf
Idioma(s)	en
Publicador	Stroudsburg: Association for Computational Linguistics Stroudsburg: Université d'Avignon
Direitos	info:eu-repo/semantics/openAccess
Fonte	Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Palavras-Chave	#Discourse types; K-means; high-dimensional embeddings; fuzzy clustering
Tipo	info:eu-repo/semantics/conferenceObject inproceedings

Acesso ao item digital