Biblioteca Digital

Hierarchical Multi-Label Text Classification in a Low-Resource Setting

**Autoria(s):** Lavista, Andrea
Contribuinte(s)	Torroni, Paolo Savino, Giuseppe
Data(s)	06/12/2022
Resumo	In this thesis we address a multi-label hierarchical text classification problem in a low-resource setting and explore different approaches to identify the best one for our case. The goal is to train a model that classifies English school exercises according to a hierarchical taxonomy with few labeled data. The experiments made in this work employ different machine learning models and text representation techniques: CatBoost with tf-idf features, classifiers based on pre-trained models (mBERT, LASER), and SetFit, a framework for few-shot text classification. SetFit proved to be the most promising approach, achieving better performance when during training only a few labeled examples per class are available. However, this thesis does not consider all the hierarchical taxonomy, but only the first two levels: to address classification with the classes at the third level further experiments should be carried out, exploring methods for zero-shot text classification, data augmentation, and strategies to exploit the hierarchical structure of the taxonomy during training.
Formato	application/pdf
Identificador	http://amslaurea.unibo.it/27453/1/thesis_andrea_lavista.pdf Lavista, Andrea (2022) Hierarchical Multi-Label Text Classification in a Low-Resource Setting. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270] <http://amslaurea.unibo.it/view/cds/CDS9063/>
Idioma(s)	en
Publicador	Alma Mater Studiorum - Università di Bologna
Relação	http://amslaurea.unibo.it/27453/
Direitos	cc_by_nc_sa4
Palavras-Chave	#natural language processing,text classification,multi-label classification,hierarchical classification,multi-label text classification,hierarchical text classification,few-shot learning,few-shot text classification,low-resource setting,pre-trained models,contextual embedding,sentence embedding,task-adaptive pre-training,domain adaptation,multilingual,BERT,SetFit,LASER,SHAP #Artificial intelligence [LM-DM270]
Tipo	PeerReviewed info:eu-repo/semantics/masterThesis

Acesso ao item digital