Stop Thinking, Just Do!

Sungsoo Kim's Blog

Robust Distortion-free Watermarks for Language Models

tagsTags

2 August 2024


Article Source


Robust Distortion-free Watermarks for Language Models

  • A Google TechTalk, presented by John Thickstun, 2024-07-02

Abstract

A Google Algorithms Seminar. ABSTRACT: We describe a protocol for planting a watermark into text generated by an autoregressive language model (LM) that is robust to edits and does not change the distribution of generated text. We generate watermarked text by controlling the source of randomness–using a secret watermark key–that the LM decoder uses to convert probabilities of text into samples. To detect the watermark, any party who knows the key can test for statistical correlations between a snippet of text and the watermark key; meanwhile, our watermark is provably undetectable by anyone who does not know the watermark key. We instantiate our watermarking protocol with two alternative decoders: inverse transform sampling and Gumbel argmax sampling. We apply these watermarks to the OPT-1.3B, LLaMA 7B, and instruction-tuned Alpaca 7B LMs to experimentally validate their statistical power and robustness to various paraphrasing attacks. ArXiv: https://arxiv.org/abs/2307.15593

ABOUT THE SPEAKER

John Thickstun is a postdoctoral researcher at Stanford University, working with Percy Liang. Previously, he completed a PhD at the University of Washington, advised by Sham M. Kakade and Zaid Harchaoui. His current research focus is to improve the capabilities and controllability of generative models. His work has been featured in media outlets including TechCrunch and the Times of London, recognized by outstanding-paper awards at NeurIPS and ACL, and supported by an NSF Graduate Fellowship and a Qualcomm Innovation Fellowship.


comments powered by Disqus