Spolaôr, N., Monard, M.C., Tsoumakas, G., Lee, H.D. (2015) A systematic review of multi-label feature selection and a new method based on label construction, Neurocomputing, Volume 180, pp. 3-15.
Each example in a multi-label dataset is associated with multiple labels, which are often correlated. Learning from this data can be improved when dimensionality reduction tasks, such as feature selection, are applied. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. In this work, we propose an alternative method, LCFS, that constructs new labels based on relations between the original labels. By doing so, the label set from the data is augmented with second-order information before applying the standard approach. To assess LCFS, an experimental evaluation using Information Gain as a measure to estimate the importance of features was carried out on 10 benchmark multi-label datasets. This evaluation compared four LCFS settings with the standard approach, using random feature selection as a reference. For each dataset, the performance of a feature selection method is estimated by the quality of the classifiers built from the data described by the features selected by the method. The results show that a simple LCFS setting gave rise to classifiers similar to, or better than, the ones built using the standard approach. Furthermore, this work also pioneers the use of the systematic review method to survey the related work on multi-label feature selection. The summary of the 99 papers found promotes the idea that exploring label dependence during feature selection can lead to good results.