Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning

G. Tsoumakas, A. Dimou, E. Spyromitros-Xioufis, V. Mezaris, I. Kompatsiaris, I. Vlahavas, “Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning”, Proceedings of the 1st International Workshop on Learning from Multi-Label Data (MLD'09), G. Tsoumakas, Min-Ling Zhang, Zhi-Hua Zhou (Ed.), pp. 101-116, Bled, Slovenia, 20

Grigorios Tsoumakas, A. Dimou, E. Spyromitros-Xioufis, V. Mezaris, I. Kompatsiaris, I. Vlahavas

Proceedings of the 1st International Workshop on Learning from Multi-Label Data (MLD'09), G. Tsoumakas, Min-Ling Zhang, Zhi-Hua Zhou (Ed.), pp. 101-116, Bled, Slovenia, 2009.

Binary relevance (BR) learns a single binary model for each different label of multi-label data. It has linear complexity with respect to the number of labels, but does not take into account label correlations and may fail to accurately predict label combinations and rank labels according to relevance with a new instance. Stacking the models of BR in order to learn a model that associates their output to the true value of each label is a way to alleviate this problem. In this paper we propose the pruning of the models participating in the stacking process, by explicitly measuring the degree of label correlation using the phi coefficient. Exploratory analysis of phi shows that the correlations detected are meaningful and useful. Empirical evaluation of the pruning approach shows that it leads to substantial reduction of the computational cost of stacking and occasional improvements in predictive performance.