Content Selection in Text Summarization and Simplification
On December 17th, Jan Trienes successfully defended his PhD thesis. Over the past years, he investigated content selection in large language models (LLMs). While NLP tasks such as automatic text summarization and simplification can now be solved very effectively, the black-box nature of LLMs makes it difficult to understand and control how they select content.
Key questions addressed in the thesis include:
- what information LLMs choose to keep or omit,
- how their strategies differ from those of human experts, and
- how content selection can be adapted to specialized domains such as clinical text. Since summarization and simplification inevitably involve information loss, an additional challenge lies in enabling readers to recover omitted information when needed.
Jan’s thesis addresses these challenges by studying content selection behavior in LLMs and developing methods to make it more transparent, interpretable, and controllable. It introduces an interpretable representation of content grounded in Questions Under Discussion (QUDs), methods for analyzing content salience in summarization models, and presents guidance signals to steer content selection. It also investigates information loss in text simplification and introduces an interactive system that allows users to recover missing information. Finally, it contributes a new dataset for clinical text simplification, supporting non-expert readers and advancing research in domain-specific text simplification.
And since the defense was shortly before Christmas, the occasion was marked with a special gift.

Thanks to Jessy Li (University of Texas, Austin) for serving as an examiner for this thesis.
Key Publications
(Trienes et al., 2025) (Trienes et al., 2024) (Trienes et al., 2023)
-
Jan Trienes, Jörg Schlötterer, Junyi Jessy Li, and Christin Seifert.
Behavioral Analysis of Information Salience in Large Language Models.
Findings of the Association for Computational Linguistics: ACL 2025.
2025.
BibTeX
@inproceedings{Trienes2025_acl_information-salience, author = {Trienes, Jan and Schl{\"o}tterer, J{\"o}rg and Li, Junyi Jessy and Seifert, Christin}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2025}, title = {Behavioral Analysis of Information Salience in Large Language Models}, year = {2025}, address = {Vienna, Austria}, editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher}, month = jul, pages = {23428--23454}, publisher = {Association for Computational Linguistics}, code = {https://github.com/jantrienes/llm-salience}, doi = {10.18653/v1/2025.findings-acl.1204}, isbn = {979-8-89176-256-5}, url = {https://aclanthology.org/2025.findings-acl.1204/} } -
Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron Wallace, and Junyi Jessy Li.
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
2024.
BibTeX
@inproceedings{Trienes2024_acl_infolossqa, author = {Trienes, Jan and Joseph, Sebastian and Schl{\"o}tterer, J{\"o}rg and Seifert, Christin and Lo, Kyle and Xu, Wei and Wallace, Byron and Li, Junyi Jessy}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, title = {{I}nfo{L}oss{QA}: Characterizing and Recovering Information Loss in Text Simplification}, year = {2024}, address = {Bangkok, Thailand}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, month = aug, pages = {4263--4294}, publisher = {Association for Computational Linguistics}, code = {https://github.com/jantrienes/InfoLossQA}, url = {https://aclanthology.org/2024.acl-long.234} } -
Jan Trienes, Paul Youssef, Jörg Schlötterer, and Christin Seifert.
Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis.
Proceedings of the 16th International Natural Language Generation Conference (INLG).
2023.
BibTeX
@inproceedings{Trienes2023_inlg_guidance-radiology-report-summarization, author = {Trienes, Jan and Youssef, Paul and Schl{\"o}tterer, J{\"o}rg and Seifert, Christin}, booktitle = {Proceedings of the 16th International Natural Language Generation Conference (INLG)}, title = {Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis}, year = {2023}, addendum = {\textit{Best Evaluation Paper Award Nomination}}, code = {https://github.com/jantrienes/inlg2023-radsum}, doi = {10.18653/v1/2023.inlg-main.13}, file = {:/Volumes/Data/data-work/Research/Literature/own-pdf/Trienes2023_inlg_guidance-radiology-report-summarization_author.pdf:PDF} }