Katherine Lee

Katherine Lee

I’m currently a senior research scientist at Google Brain and a PhD candidate at Cornell advised by David Mimno. My committee includes: James Grimmelmann and Chris De Sa.

Specifically, I study security and privacy in large language models.

I’m broadly interested in the translation between people and the systems we build. What kinds of decisions can algorithms help with, and which should we leave algorithms out of? What kinds of objectives, political or social, can we, or can we not write down?

Before that, I had great fun learning about how real neural networks (brains!) take in the world with Jonathan Pillow and Uri Hasson during my undergrad in Operations Research at Princeton.

You can find me on the internet: Twitter, Google Scholar, Goodreads, or email me at [my github handle]@gmail.com

If you are an undergrad or masters student and interested in memorization and privacy in language models, you can apply to work with me.

I am currently at capacity working with excellent students. You’re welcome to fill out this form, and I’ll reach out when I have more capacity.

Selected Publications

Full list on Google Scholar

Measuring Forgetting of Memorized Training Examples [arxiv][ICLR]
Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang. Jun 2022
Quantifying Memorization Across Neural Language Models [arxiv][ICLR Spotlight]
Nicholas Carlini*, Daphne Ippolito*, Matthew Jagielski*, Katherine Lee*, Florian Tramèr*, Chiyuan Zhang*. Feb 2022 (*authors alphabetical)
What Does it Mean for a Language Model to Preserve Privacy? [arxiv][FAccT]
Hannah Brown, Katherine Lee, Fatemehsadat Mireshghalla, Reza Shokri, Florian Tramèr. Feb 2022
Deduplicating Training Data Makes Language Models Better [arxiv] [ACL Oral]
Katherine Lee*, Daphne Ippolito*, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini. July 2021
Extracting Training Data from Large Language Models [arxiv] [USENIX Oral][blog] [video]
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel. Dec, 2020
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [JMLR]
Colin Raffel*, Noam Shazeer*, Adam Roberts*, Katherine Lee*, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. June, 2020
Hallucinations in Neural Machine Translation [NeurIPS IRASL]
Katherine Lee, Orhan Firat, Ashish Agarwal, Clara Fannjiang, and David Sussillo. Dec, 2018


Writing a Google AI Residency Cover letter with Ben Eysenbach (2019)

Submit to Journals [pdf] (2018)

Fun writing

Sourdough Literature Review (2020)


Memorization in Language Models [slides] [poster]
University of Toronto, May 2022
Mosaic ML, Jul 2022
GovAI, Aug 2022
ML Security & Privacy Seminar, Aug 2022
LEgally Attentive Data Scientists, Sep 2022
Cornell, NLP Seminar, Sep 2022
Cornell, C-Psyd, Sep, 2022
What does Privacy in Language Modeling Mean? [slides]
UNC, Apr 2022
Cornell, April 2022



In my free time, I enjoy making stuff. Sometimes this is pottery, sourdough, or knitting/crocheting. I also enjoy being outdoors, listening to jazz, and dancing. Sometimes all at the same time! I used to live here.

In the more distant past, I’ve also solved for optimal seating arrangements at Google, and I spotted ships at bay with hyperspectral sensors at the Naval Research Lab in DC.