Introduction
Semantic Indexing (SI) (aka Subject Indexing) is the task of annotating documents with a set of labels representing their subject, facilitating downstream tasks such as document retrieval, knowledge extraction, and question answering. Manual semantic indexing of documents has a long-standing history, with biomedical literature, in particular, being manually indexed at different scales for more than a century. During the last twenty years, automated semantic indexing methods have been developed to support human indexers. These methods, that fall under the Natural Language Processing (NLP) and Machine Learning (ML) fields, often formulate the problem as a text classification task. Over the years, these methods have gradually evolved at performance levels that are satisfactory enough to fully replace manual indexing. However, a lot of open issues are still standing in the field such as the emergence and modification of subject labels as the domain evolves. In such cases, where no ground truth data are available, automated methods that rely on zero-shot and weakly supervised approaches are needed.
Our contribution
In our research, we develop methods dealing with different aspects of the semantic indexing task focusing on the biomedical domain as a high-impact use case. Our objectives:
- Efficient multi-label methods for biomedical semantic indexing
- Semantic indexing with emerging and fine-grained subject labels
- Zero-shot and weakly supervised systems
Publications
- What is all this new MeSH about? A. Nentidis, A. Krithara, G. Tsoumakas, G. Paliouras, International Journal on Digital Libraries 22 (4), 319-337, 2021
- Instance-Based Zero-Shot learning for semi-Automatic MeSH indexing. S. Karlos, N. Mylonas, G. Tsoumakas, Pattern Recognition Letters 151, 62-68, 2021
- A Multi-instance Multi-label Weakly Supervised Approach for Dealing with Emerging MeSH Descriptors. N. Mylonas, S. Karlos, G. Tsoumakas, International Conference on Artificial Intelligence in Medicine, 397-407, 2021
- Zero-Shot Classification of Biomedical Articles with Emerging MeSH Descriptors. N. Mylonas, S. Karlos, G. Tsoumakas, 11th Hellenic Conference on Artificial Intelligence, 175-184, 2020
- Beyond MeSH: Fine-grained semantic indexing of biomedical literature based on weak supervision. A. Nentidis, A. Krithara, G. Tsoumakas, G. Paliouras, Information Processing & Management 57 (5), 102282 5, 2020
- Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature Based on Weak Supervision. A. Nentidis, A. Krithara, G. Tsoumakas, G. Paliouras, IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), 2019
- Large-Scale Semantic Indexing and Question Answering in Biomedicine. E. Papagiannopoulou, Y. Papanikolaou, D. Dimitriadis, S. Lagopoulos, G. Tsoumakas, M. Laliotis, N. Markantonatos, I. Vlahavas, In Proceedings of the BioASQ 2016 Workshop, 2016
- Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models. Y. Papanikolaou, G. Tsoumakas, M. Laliotis, N. Markantonatos, I. Vlahavas, Journal of biomedical semantics 8 (1), 1-13, 2017
- AUTH-Atypon at BioASQ 3: Large-Scale Semantic Indexing in Biomedicine. Y. Papanikolaou. G. Tsoumakas, M. Laliotis, N. Markantonatos, I. Vlahavas, CLEF (Working Notes), 2015
- Ensemble Approaches for Large-Scale Multi-Label Classification and Question Answering in Biomedicine. I. Papanikolaou, D. Dimitriadis, G. Tsoumakas, M. Laliotis, N. Markantonatos, I. Vlahavas, Proceedings BioASQ 2014 Workshop, Sheffield, UK, 2014.
Awards
Our systems have been tested at the BioASQ challenge and have won awards.
- BioASQ 2013
- Task 1a
- 2nd place in batch 1
- 1st place in batch 2
- 1st place in batch 3
- Task 1a
- BioASQ 2014
- Task 2a
- 1st place in batch 1
- Task 2a
- BioASQ 2015
- Task 3a
- 2nd place in batch 1
- 2nd place in batch 3
- Task 3a
- BioASQ 2016
- Task 4a
- 2nd place in batch 1
- 2nd place in batch 2
- 2nd place in batch 3
- Task 4a
- BioASQ 2017
-
- Task 5a
- 2nd place in batch 1
- 2nd place in batch 2
- 2nd place in batch 3
- Task 5a
We have also received distinctions in international conferences.
- AIME 2021
- A multi-instance multi-label weakly supervised approach for dealing with emerging MeSH descriptors, by Nikolaos Mylonas, Stamatis Karlos, and Grigorios Tsoumakas, won the Marco Ramoni best paper award at the 19th International Conference on Artificial Intelligence in Medicine (AIME 2021)