N. Bassiliades, “Collecting University Rankings for Comparison Using Web Extraction and Entity Linking Techniques”, ICT in Education, Research and Industrial Applications, V. Ermolayev et al. (Ed.), Springer-Verlag, CCIS, Vol. 469, pp. 23-46, 2014.
University rankings are rankings of institutions in higher education, ordered by combinations of factors. Rankings are conducted by various organizations, such as news media, websites, governments, academics and private corporations. Due to huge financial and other interests, the rankings of universities worldwide recently received increasing attention. The rankings are based on different criteria and collect data in various ways. As a result, there is a large divergence in the specific rankings of different institutions. In order to compare rankings so that safe conclusions about their reliability are drawn, data from the sites of different such ranking lists must be collected. In this paper we present this first step for university ranking comparison, namely we discuss in detail how we have developed a Prolog application, called URank, that collects the data, by a) extracting them from the various ranking list web sites using web data extraction techniques, b) uniquely identifying the University entities within the above lists by linking them to the DBpedia linked open data set, and c) constructing a combined data set by merging the individual ranking list data sets using their DBpedia URI as a primary key.