Multi-label data consist of instances that are associated with a vector of binary target variables. In the last 10 years, the topic of learning from multi-label data has witnessed enormous progress, evident by the increasing number of papers dealing with this topic, as well as by the fact that it has recently started to appear as a distinct topic in top conferences like KDD, AAAI, and ICML. Despite all this amount of work, several challenges still arise when applying multi-label learning in real-world applications and industrial settings. The main goal of AMULET is to develop novel multi-label learning techniques to deal with two such key under-addressed challenges, paving the way for wider adoption of multi-target prediction in complex real-world tasks: i) concept evolution, ii) interpretability.
Objective 1:
Address the challenge of concept evolution in multi-label data streams, in the context of: a) explicit complex changes in existing labels and addition of new labels, and b) implicit concept drift in existing labels. We will focus on streams of academic publications and will measure the improvement that the proposed techniques will bring in terms of predictive accuracy using data from the BioASQ challenge, concerning large-scale online semantic indexing (i.e. multi-label classification) of biomedical literature.
Objective 2:
Develop methods for understanding the predictions of multi-label models in the context of textual data. We will focus on data concerning hate speech in YouTube comments. The data were collected via crowdsourcing using Figure Eight’s platform in the context of an award at the AI for Everyone challenge. We will measure the utility of our contributions via carefully controlled human-subject experiments.