3 minute read

In our ever-changing world, the knowledge stored in large language models (LLMs) must be continuously updated as facts evolve.

The president of the U.S. is …

Knowledge Editing

One way of “teaching” new information to LLMs is fine-tuning—updating the entire model with data containing the new facts. This is effective but inefficient.
A more targeted alternative is knowledge editing, which modifies only those parameter regions believed to encode the relevant fact (locate-and-edit methods).

Other approaches avoid changing model weights entirely. Some store new information in an external memory the LLM can attend to when prompted. A related variant, in-context editing, embeds the new knowledge directly in the prompt without altering model parameters.

Risks

Knowledge editing can also be misused. If an LLM is intentionally updated with harmful or biased information and later deployed in downstream systems, the resulting outputs may be misleading or unsafe.

An efficient oral cure for corona is a disinfectant.

A detailed analysis of why knowledge editing poses security concerns, what makes today’s AI ecosystem vulnerable, and which countermeasures are needed
is provided in our position paper.

Detecting and Mitigating Knowledge Edits

To ensure trustworthiness, we want to determine whether an LLM has been edited after pre-training. Our work on detecting parameter edits shows that such modifications can be identified with high accuracy—even without access to the original, unedited model.

We also study in-context edits: cases where the model weights remain unchanged. We show that these can both be detected and reversed by inserting special reversal tokens into the prompt.
For instance, inserting the <BOS> token helps reverse edits in GPT-style models.

References

(Youssef et al., 2025) (Youssef et al., 2025) (Youssef et al., 2025)

  1. Paul Youssef, Zhixue Zhao, Christin Seifert, and Jörg Schlötterer. Has this Fact been Edited? Detecting Knowledge Edits in Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
    BibTeX
    @inproceedings{Youssef2025_naacl_detecting-knowledge-edits,
      author = {Youssef, Paul and Zhao, Zhixue and Seifert, Christin and Schl{\"o}tterer, J{\"o}rg},
      booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
      title = {Has this Fact been Edited? Detecting Knowledge Edits in Language Models},
      year = {2025},
      address = {Albuquerque, New Mexico},
      editor = {Chiruzzo, Luis and Ritter, Alan and Wang, Lu},
      month = apr,
      pages = {9768--9784},
      publisher = {Association for Computational Linguistics},
      code = {https://github.com/paulyoussef/deed},
      isbn = {979-8-89176-189-6},
      url = {https://aclanthology.org/2025.naacl-long.492/}
    }
    
  2. Paul Youssef, Zhixue Zhao, Jörg Schlötterer, and Christin Seifert. How to Make LLMs Forget: On Reversing In-Context Knowledge Edits. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
    BibTeX
    @inproceedings{Youssef2025b_naacl_reversing-in-context-edits,
      author = {Youssef, Paul and Zhao, Zhixue and Schl{\"o}tterer, J{\"o}rg and Seifert, Christin},
      booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
      title = {How to Make {LLM}s Forget: On Reversing In-Context Knowledge Edits},
      year = {2025},
      address = {Albuquerque, New Mexico},
      editor = {Chiruzzo, Luis and Ritter, Alan and Wang, Lu},
      month = apr,
      pages = {12656--12669},
      publisher = {Association for Computational Linguistics},
      code = {https://github.com/paulyoussef/reed},
      isbn = {979-8-89176-189-6},
      url = {https://aclanthology.org/2025.naacl-long.630/}
    }
    
  3. Paul Youssef, Zhixue Zhao, Daniel Braun, Jörg Schlötterer, and Christin Seifert. Position: Editing Large Language Models Poses Serious Safety Risks. Forty-second International Conference on Machine Learning Position Paper Track. 2025.
    BibTeX
    @inproceedings{Youssef2025_icml_position-llm-editing-safetey-risk,
      author = {Youssef, Paul and Zhao, Zhixue and Braun, Daniel and Schl{\"o}tterer, J{\"o}rg and Seifert, Christin},
      booktitle = {Forty-second International Conference on Machine Learning Position Paper Track},
      title = {Position: Editing Large Language Models Poses Serious Safety Risks},
      year = {2025},
      url = {https://openreview.net/forum?id=QLKBm1PaCU}
    }
    

Work in collaboration with Cass Zhixue Zhao, University of Sheffield, U.K.

Acknowledgement

The research was partially funded by the German Academic Exchange Service (DAAD) and the German Federal Ministry of Education and Research (BMBF) under grant number 30001797.

image-center image-center