D. Akrivousis, N. Mylonas, I. Mollas and G. Tsoumakas, "Text classification is keyphrase explainable! Exploring local interpretability of transformer models with keyphrase extraction," 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), Thessaloniki, Greece, 2023, pp. 1-9, doi: 10.1109/DSAA60987.2023.10302566.
Keyphrase extraction is a widely discussed topic in Natural Language Processing, as it offers a concise summary of the main topics in a document. Interpretability is also an important aspect in Machine Learning as it helps prevent socio-ethical issues, such as bias and discrimination against minorities, or mistakes that may have serious consequences. Interpretability has recently gained prominence in the field of Natural Language Processing, where transformers are the dominant architectures. The goal of interpretability is to provide interpretations that pinpoint the elements of an instance contributing the most to its decision. In this work, we use keyphrase extraction to facilitate the interpretability process, producing smaller, more concise interpretations that also consider word interactions, as keyphrases usually consist of multiple words. Additionally, our technique is based on semantic similarity, making it faster and zero-shot ready, which is ideal for online learning scenarios. We evaluated the effectiveness of our technique through a series of quantitative and qualitative experiments on the well-known BERT model, comparing it against several state-of-the-art competitors.