N. Mylonas, I. Mollas and G. Tsoumakas, "Beyond Annual Revisions: A Multi-Label Concept Drift Analysis of MeSH", 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), L'Aquila, Italy, 2023, pp. 157-162, doi: 10.1109/CBMS58004.2023.00209.

Author(s): Nikolaos Mylonas, Ioannis Mollas ,Grigorios Tsoumakas

Keywords: Analytical models , Vocabulary , Machine learning algorithms , Biological system modeling , Text categorization , Machine learning , Predictive models

Tags:

Abstract: MeSH (Medical Subject Headings) is a hierarchically structured thesaurus used for indexing biomedical information. This vocabulary contains most of the biomedical knowledge available to date. To keep up with the continuous evolution and expanding of our understanding on the medical field, yearly revisions take place in MeSH. These revisions introduce new descriptors in the thesaurus, in addition to changes in already existing ones, either directly or indirectly. This constant evolution of the thesaurus causes many older descriptors to exhibit some form of drift in their meaning, which in turn affects the performance of Machine Learning models trained on an older version of the thesaurus when used to predict data obtained from more recent versions. In this paper, we study the phenomenon of concept drift in MeSH, through evaluating the performance of a state-of-the-art text classification algorithm in articles from different years. We also investigate how changes in descriptors indirectly affect different ones that are related to them by studying the shifts in their co-occurrence, using this shift as a measure of concept drift.