A. Nentidis, A. Krithara, G. Tsoumakas, G. Paliouras (2020) Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision, Information Processing and Management 57, 102282.

Author(s): A. Nentidis, A. Krithara, G. Tsoumakas, G. Paliouras


Appeared In: Information Processing & Management, Volume 57, Issue 5, September 2020, 102282

Keywords: Semantic indexing, MeSH, Biomedical literature, Weak supervision


Abstract: In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of detail available in the domain knowledge and may not be sufficient to fulfil the information needs of experts in the domain. To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease. We test this method on the MeSH descriptors for two diseases: Alzheimer’s Disease and Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a strong heuristic for automated subject annotation refinement and its use as weak supervision can lead to improved concept-level annotations. The fine-grained semantic annotations can enable more precise literature retrieval, sustain the semantic integration of subject annotations with other domain resources and ease the maintenance of consistent subject annotations, as new more detailed entries are added in the MeSH thesaurus over time.