Ensemble Pruning

Introduction

Ensemble Pruning, also known as ensemble selection, selective ensemble and ensemble thinning, deals with the reduction of the ensemble size prior to combining the members of the ensemble. It is important for two reasons: a) efficiency: Having a very large number of models in an ensemble adds a lot of computational overhead, and b) predictive performance: An ensemble may consist not only of high performance models, but also of models with lower predictive performance. Pruning the low-performing models while maintaining a good diversity of the ensemble is typically considered as a proper recipe for a successful ensemble.

Our contribution

We have developed a number of approaches for ensemble pruning. Our early work involved methods that use statistical tests in order to select a subset of models with statistically significant accuracy difference from the rest of the models [1, 2]. In addition, we modeled the ensemble pruning task as a reinforcement learning task and used Q-learning to solve it [3, 4]. We have also looked at applications of ensemble pruning to water quality prediction [5, 6]. Furthermore, we have proposed heuristics for greedy exploration of the space of sub-ensembles using directed hill-climbing [7, 8]. We have also proposed a technique for dynamic, also called instance-based, ensemble pruning based on multi-label learning [9,10]. Finally, we have contributed a taxonomy of ensemble pruning techniques [11, 12].

Bibliography

Have a look at our online ensemble pruning bibliography at CiteULike. You can grab BibTeX and RIS records, subscribe to the corresponding RSS feed, follow links to the papers' full pdf (may require access to digital libraries) and export the complete bibliography for BibTeX or EndNote use (requires CiteULike account).

Source Code

Here you can find the source code for performing ensemble pruning. We implemented several algorithms from the recent bibliography which are built under a common framework. Soon a documentation will be available. Also, we intend to make a UI in order to help the users to experiment with ensemble pruning methods.

Additionally, we implemented a package for performing several statistical tests (Nemenyi, Wilcoxon).

The software is distributed under the GNU GPL licence. It requires Java v1.5 or better and Weka v3.5.5. Please contact Ioannis Partalas for bug reports, comments, suggestions or request for help with the source code.

Source code developers: Ioannis Partalas, Grigorios Tsoumakas.

Publications

  1. G. Tsoumakas, I. Katakis, I. Vlahavas (2004) “Effective Voting of Heterogeneous Classifiers”, Proc. European Conference on Machine Learning, ECML 04, Jean-Francois Boulicaut, Floriana Esposito, Fosca Giannoti, Dino Pedreschi (Ed.), LNAI 3201, pp. 465-476, Pisa, Italy.
  2. G. Tsoumakas, L. Angelis, I. Vlahavas (2005) “Selective Fusion of Heterogeneous Classifiers”, Intelligent Data Analysis, IOS Press, 9(6), pp. 511-525.
  3. I. Partalas, G. Tsoumakas, I. Katakis, I. Vlahavas (2006) “Ensemble Pruning using Reinforcement Learning”, Proc. 4th Hellenic Conference on Artificial Intelligence (SETN-06), G. Antoniou, G. Potamias, D. Plexousakis, C. Spyropoulos (Ed.), Springer-Verlag, LNAI 3955, pp. 301-310, Heraklion, Crete, 18-20 May, 2006.
  4. I. Partalas, G. Tsoumakas, I. Vlahavas (2009) “Pruning an Ensemble of Classifiers via Reinforcement Learning”, Neurocomputing, Elsevier, 72(7-9), pp. 1900-1909.
  5. I. Partalas, E. Hatzikos, G. Tsoumakas, I. Vlahavas (2007) “Ensemble Selection for Water Quality Prediction”, Proceedings of the 10th International Conference on Engineering Applications of Neural Networks (EANN 2007), pp 428-435, Thessaloniki, Greece, August 29-31, 2007.
  6. I. Partalas, G. Tsoumakas, E. Hatzikos, I. Vlahavas (2008) “Greedy Regression Ensemble Selection: Theory and an Application to Water Quality”, Information Sciences, Elsevier, 178(20), pp. 3867-3879.
  7. I. Partalas, G. Tsoumakas, I. Vlahavas (2008) “Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection”, 18th European Conference on Artificial Intelligence, IOS Press, pp. 117-121, Patras, Greece.
  8. I. Partalas, G. Tsoumakas, I. Vlahavas (2010) “An Ensemble Uncertainty Aware Measure for Directed Hill Climbing Ensemble Pruning”, Machine Learning 81(3), pp. 257-282.
  9. F. Markatopoulou, G. Tsoumakas, I. Vlahavas (2010) “Instance-Based Ensemble Pruning via Multi-Label Classification”, 22nd IEEE International Conference on Tools with Artificial Intelligence, 27-29 October 2010, Arras, France.
  10. F. Markatopoulou, G. Tsoumakas, I. Vlahavas (2015) “Dynamic Ensemble Pruning based on Multi-Label Classification”, Neurocomputing, Elsevier, Volume 150, Part B, pp. 501-512.
  11. G. Tsoumakas, I. Partalas, I. Vlahavas (2008) “A Taxonomy and Short Review of Ensemble Selection”, ECAI Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (SUEMA-08), pp 41-46, July 2008, Patras, Greece.
  12. G. Tsoumakas, I. Partalas, I. Vlahavas (2009) “An Ensemble Pruning Primer”, Supervised and Unsupervised Methods and Their Applications to Ensemble Methods (SUEMA 2009), Oleg Okun and Giorgio Valentini (Eds.), Springer Verlag, Volume 245/2009, pp. 1-13, 2009.