Karlos, Stamatis, Nikolaos Mylonas, and Grigorios Tsoumakas. "Instance-Based Zero-Shot learning for semi-Automatic MeSH indexing." Pattern Recognition Letters 151 (2021): 62-68.

Author(s): Stamatis Karlos, Nikolaos Mylonas, Grigorios Tsoumakas

Keywords: Zero-shot learning, Multi-label text classification, Sentence-embeddings, MeSH terms, Label’s space dependencies, Weakly-supervised methods


Abstract: Zero-shot learning constitutes a variant of the broader category of weakly supervised learning algorithms. Its main asset is the possibility of identifying entities for which no training data are provided in advance. Under this extreme scenario, conventional supervised learning methods cannot operate properly, while consumption of human resources for obtaining even limited instances may be highly restricted, especially when the label space is quite complex because of its cardinality and the underlying semantic dependencies. However, removing the human factor from the learning loop under complicated tasks cannot guarantee robust performance. Thus, semi-automated solutions are widely accepted by both the research and industrial communities, favoring cooperation of human and machine, mainly for alleviating the spent effort of the former, and for acquiring safer predictions. In contrast with the majority of existing Zero-shot learning approaches, we propose a generalized instance-based method oriented towards tackling the Multi-label classification task without performing any transductive operations over the test instances. Instead, we aim to provide a label ranking of the unseen classes exploiting sentence-based semantic embeddings and label similarities, through a dedicated fine-tuned language representational model. We also use a pattern matching rule to further boost the ranking of our method. Some realistic assumptions are made in order for our approach to work correctly and provide said ranking. Results on a biomedical database with a semantically rich fine-grained label space are really promising, rendering its utilization as a helpful and computationally inexpensive tool for facilitating semi-automated indexing