Summarization is the task of compressing the key ideas of a given input into a short version that conveys all the salient information from the source. As a problem it can relate to text, speech, video or any other multimedia content. More specifically text summarization has lately become one of the most challenging and significant tasks in Natural Language Processing (NLP) due to the ever growing amount of text being published on a daily basis. Having access to short text summaries is a particularly powerful tool that allows readers to skim through large amounts of documents quickly in order to discover valuable content. It is a technology that can also be leveraged by Information Retrieval as well as other NLP systems in order to enhance their abilities.
Text summarization approaches mainly fall into one of two categories. Extractive methods aim at selecting the most important parts of the source document and compiling them to create a summary. On the other hand abstractive methods approach the task from a generative perspective and try to generate a completely new textual summary.
Although currently text summarization is an extremely popular research subject with a large amount of published work it still very much remains a problem that has not been adequately solved.
In our work we focus on the summarization methods for different types of long documents such as academic articles and financial reports. Our objective is to extend the domain knowledge with emphasis on the following aspects:
- Building summarization systems that can work with very long and diverse documents.
- Training summarizers on low resource applications.
- Developing empirical methods that improve the effectiveness of existing summarization systems.
- Improving the generalization ability of summarization systems.
- Developing human centered summarization technology.
- A.Gidiotis, S. Stefanidis, G. Tsoumakas (2020) AUTH @ CLSciSumm 20, LaySumm 20, LongSumm 20, 2020 Conference on Empirical Methods in Natural Language Processing
- A.Gidiotis, G. Tsoumakas (2020) A Divide-and-Conquer Approach to the Summarization of Long Documents, IEEE Transactions on Audio, Speech and Language Processing
- A.Gidiotis, G. Tsoumakas (2019) Structured Summarization of Academic Publications, 2019 Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Our systems have successfully participated in the Scholarly Document Processing 2020 shared tasks, achieving competitive positions:
- 1st place in CL-SciSumm task 2
- Top 4 submissions in LaySumm and LongSumm
We have also achieved top performance in multiple leaderboards of paperswithcode.com: