Label Construction for Multi-label Feature Selection

N. Spolaor, M. Monard, G. Tsoumakas, H. Lee, “Label Construction for Multi-label Feature Selection”, Proceedings of the 2014 Brazilian Conference on Intelligent Systems (BRACIS), Sao Carlos, Brazil, 2014.

Multi-label learning handles datasets where each instance is associated with multiple labels, which are often correlated. As other machine learning tasks, multi-label learning also suffers from the curse of dimensionality, which can be mitigated by dimensionality reduction tasks, such as feature selection. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. This work proposes an alternative method, LCFS, which constructs new labels based on relations between the original labels to augment the label set of the original dataset. Afterwards, the augmented dataset is submitted to the standard multi-label feature selection approach. Experiments using Information Gain as a measure to evaluate features were carried out in 10 multi-label benchmark datasets. For each dataset, the quality of the features selected was assessed by the quality of the classifiers built using the features selected by the standard approach in the original dataset, as well as in the dataset constructed by four LCFS settings. The results show that setting LCFS with simple strategies using pairs of labels gives rise to better classifiers than the ones built using the standard approach in the original dataset. Moreover, these good results are accomplished when a small number of features are selected.