Ensemble Pruning

Introduction

Ensemble Pruning, also known as ensemble selection, selective ensemble and ensemble thinning, deals with the reduction of the ensemble size prior to combining the members of the ensemble. It is important for two reasons: a) efficiency: Having a very large number of models in an ensemble adds a lot of computational overhead, and b) predictive performance: An ensemble may consist not only of high performance models, but also of models with lower predictive performance. Pruning the low-performing models while maintaining a good diversity of the ensemble is typically considered as a proper recipe for a successful ensemble.

Our contribution

We have developed a number of approaches for ensemble pruning. Our early work involved methods that use statistical tests in order to select a subset of models with statistically significant accuracy difference from the rest of the models [1, 2]. In addition, we modeled the ensemble pruning task as a reinforcement learning task and used Q-learning to solve it [3, 4]. We have also looked at applications of ensemble pruning to water quality prediction [5, 6]. Furthermore, we have proposed heuristics for greedy exploration of the space of sub-ensembles using directed hill-climbing [7, 8]. We have also proposed a technique for dynamic, also called instance-based, ensemble pruning based on multi-label learning [9,10]. Finally, we have contributed a taxonomy of ensemble pruning techniques [11, 12].

Bibliography

Have a look at our online ensemble pruning bibliography at CiteULike. You can grab BibTeX and RIS records, subscribe to the corresponding RSS feed, follow links to the papers’ full pdf (may require access to digital libraries) and export the complete bibliography for BibTeX or EndNote use (requires CiteULike account).

Source Code

Here you can find the source code for performing ensemble pruning. We implemented several algorithms from the recent bibliography which are built under a common framework. Soon a documentation will be available. Also, we intend to make a UI in order to help the users to experiment with ensemble pruning methods.

Additionally, we implemented a package for performing several statistical tests (Nemenyi, Wilcoxon).

The software is distributed under the GNU GPL licence. It requires Java v1.5 or better and Weka v3.5.5. Please contact Ioannis Partalas for bug reports, comments, suggestions or request for help with the source code.

Source code developers: Ioannis Partalas, Grigorios Tsoumakas.