E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas, “Multi-Label Classification Methods for Multi-Target Regression”, Technical Report TR-LPIS-407-14, LPIS, Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2014.
Author(s): E. Spyromitros-Xioufis, Grigorios Tsoumakas, W. Groves, I. Vlahavas
Tech Report: TR-LPIS-407-14, LPIS Group, Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2014
Keywords: mutli-target regression, multi-output regression, multivariate regression, multi-label classification, regressor chains, stacking.
Abstract: Real world prediction problems often involve the simultaneous prediction of multiple target variables using the same set of predictive variables. When the target variables are binary, the prediction task is called multi-label classification while when the target variables are real-valued the task is called multi-target regression. Although multi-target regression attracted the attention of the research community prior to multi-label classification, the recent advances in this field motivate a study of whether newer state-of-the-art algorithms developed for multi-label classification are applicable and equally successful in the domain of multi-target regression. In this paper we introduce two new multi-target regression algorithms: multi-target stacking (MTS) and ensemble of regressor chains (ERC), inspired by two popular multi-label classification approaches that are based on a single-target decomposition of the multi-target problem and the idea of treating the other prediction targets as additional input variables that augment the input space. Furthermore, we detect an important shortcoming on both methods related to the methodology used to create the additional input variables and develop modified versions of the algorithms (MTSC and ERCC) to tackle it. All methods are empirically evaluated on 12 real-world multi-target regression datasets, 8 of which are first introduced in this paper and are made publicly available for future benchmarks. The experimental results show that ERCC performs significantly better than both a strong baseline that learns a single model for each target using bagging of regression trees and the state-of-the-art multi-objective random forest approach. Also, the proposed modification results in significant performance gains for both MTS and ERC.