Knowledge Editing in Large Language Models (NAACL’25)

2 minute read

In our ever-changing world, knowledge of language models (LLMs) needs to be constantly updated. Facts change and LLMs need to reflect this.

The president of the U.S. is …

Knowledge Editing

One way of ‘teaching’ new knowledge to LLMs is fine-tuning, i.e. training on the new facts to overwrite the old facts. But this is very inefficient. More recently, another set of methods has emerged: knowledge editing methods. These methods identify where the knowledge is stored in the LLM parameters and then only adapt these parameters (locate-and-edit methods). Another possibility is to store the new knowledge in a separate memory and to make the LLM refer to this memory when the prompt refers to it. A special version of this is in-context editing, where the knowledge is simply added to the prompt.

Risks

But what if a language model is updated with malicious intent? What if, for example, biases or misinformation are introduced? And then these LLMs are uploaded to platforms and used in applications?

An efficient oral cure for corona is a disinfectant.

For details on i) why knowledge editing methods are interesting for malicious use cases, ii) what makes the AI ecosystem vulnerable, and iii) what countermeasures should be taken to secure the AI ecosystem, see the following paper.

P. Youssef, Z. Zhao, D. Braun, J. Schlötterer, and C. Seifert, “Position: Editing Large Language Models Poses Serious Safety Risks,” 2025, [Online]. Available at: https://openreview.net/forum?id=QLKBm1PaCU. PDF

Detecting and Mitigating Knowledge Edits

For a given LLM, we want to know if it has been edited after pre-training. We could then verify whether the edit is justified, and reverse it.

In our paper, we show that detecting edits to model parameters is possible with high accuracy, and that one not necessarily need the original (unedited) LLM to do so.

P. Youssef, Z. Zhao, C. Seifert, and J. Schlötterer, “Has this Fact been Edited? Detecting Knowledge Edits in Language Models,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Albuquerque, New Mexico, Apr. 2025, pp. 9768–9784, [Online]. Available at: https://aclanthology.org/2025.naacl-long.492/. PDF

We further find that in-context edits, where model parameters are not changed, can be detected, and that such edits can be reversed by inserting special reversal tokens. For instance, inserting the token in the prompt helps reverse edits in GPT-models.

P. Youssef, Z. Zhao, J. Schlötterer, and C. Seifert, “How to Make LLMs Forget: On Reversing In-Context Knowledge Edits,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Albuquerque, New Mexico, Apr. 2025, pp. 12656–12669, [Online]. Available at: https://aclanthology.org/2025.naacl-long.630/. PDF

Work in collaboration with Cass Zhixue Zhao, University of Sheffield, U.K. Website

The research was partially funded by the German Academic Exchange Service (DAAD) and the German Federal Ministry of Education and Research (BMBF) under grant number 30001797.

Twitter Facebook LinkedIn

xAI Lab

Knowledge Editing in Large Language Models (NAACL’25)