G. Tsoumakas, I. Vlahavas, “Effective Stacking of Distributed Classifiers”, Proc. 15th European Conference on Artificial Intelligence (ECAI '02), Frank van Harmelen (Ed.), IOS Press, pp. 340-344, 2002.
One of the most promising lines of research towards discovering global predictive models from physically distributed data sets is local learning and model integration. Local learning avoids moving raw data around the distributed nodes and minimizes communication, coordination and synchronization cost. However, the integration of local models is not a straightforward process. Majority Voting is a simple solution that works well in some domains, but it does not always offer the best predictive performance. Stacking on the other hand, offers flexibility in modelling, but brings along the problem of how to train on sufficient and at the same time independent data without the cost of moving raw data around the distributed nodes. In addition, the scalability of Stacking with respect to the number of distributed nodes is another important issue that has not yet been substantially investigated. This paper presents a framework for constructing a global predictive model from local classifiers that does not require moving raw data around, achieves high predictive accuracy and scales up efficiently with respect to large numbers of distributed data sets.