Mylonas, Nikolaos, Stamatis Karlos, and Grigorios Tsoumakas. "A Multi-instance Multi-label Weakly Supervised Approach for Dealing with Emerging MeSH Descriptors." International Conference on Artificial Intelligence in Medicine. Springer, Cham, 2021.
Author(s): Nikolaos Mylonas, Stamatis Karlos, Grigorios Tsoumakas
Keywords: Weakly supervised learning, MeSH Indexing, Multiple-instance Learning, Sentence and word embeddings, Similarity threshold tuning
Abstract: The constant evolution of Medical Subject Headings (MeSH) vocabulary and specifically the changes in its descriptors brings forth a number of issues that need automation. The main one being that changed descriptors often lack proper ground truth articles. Therefore, the learning models which demand strong supervision are not directly applicable, settling the predictions on such changes not a straightforward task. The importance of this problem is also enforced by its multi-label nature and the fine-grained character of the examined class-descriptors, factors that demand a lot of human resources. In this work, we alleviate these issues through retrieving insights from a source of information about those descriptors present in MeSH in order to create a weakly-labeled train set. Furthermore, we exploit short-text information per article, implementing an averaging transformation on the corresponding sentence embeddings, applying a similarity mechanism for assigning weak-labels to our formatted data set, thus we named our approach WeakMeSH. The benefits of applying the proposed end-to-end approach are examined on a large-scale subset of the BioASQ 2018 data set consisting of 900 thousand instances, investigating two separate groups of MeSH changes: brand new and complex changes. Our performance tested on BioASQ 2020 data set against several other approaches that can either distill weak information on their own or apply alternative transformations against the proposed one was proven highly competitive.