Katherine Lee



I’m currently a Research Engineer at Google Brain and a PhD candidate at Cornell advised by David Mimno. My committee includes: James Grimmelmann and Chris De Sa.

Specifically, I study security and privacy in large language models.

I’m broadly interested in the translation between people and the systems we build. What kinds of decisions can algorithms help with, and which should we leave algorithms out of? What kinds of objectives, political or social, can we, or can we not write down?

Before that, I had great fun learning about how real neural networks (brains!) take in the world with Jonathan Pillow and Uri Hasson during my undergrad in Operations Research at Princeton.

You can find me on the internet: Twitter, Google Scholar, Goodreads, or email me at [my github handle]@gmail.com

If you are an undergrad or masters student and interested in memorization and privacy in language models, you can apply to work with me.

I am currently at capacity working with excellent students. You’re welcome to fill out this form, and I’ll reach out when I have more capacity.


Measuring Forgetting of Memorized Training Examples [arxiv][ICLR]
Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang. Jun 2022
Quantifying Memorization Across Neural Language Models [arxiv][ICLR Spotlight]
Nicholas Carlini*, Daphne Ippolito*, Matthew Jagielski*, Katherine Lee*, Florian Tramèr*, Chiyuan Zhang*. Feb 2022 (*authors alphabetical)
What Does it Mean for a Language Model to Preserve Privacy? [arxiv][FAccT]
Hannah Brown, Katherine Lee, Fatemehsadat Mireshghalla, Reza Shokri, Florian Tramèr. Feb 2022
Counterfactual Memorization in Neural Language Models [arxiv]
Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, Nicholas Carlini. December 2021
Deduplicating Training Data Makes Language Models Better [arxiv] [ACL Oral]
Katherine Lee*, Daphne Ippolito*, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini. July 2021
Extracting Training Data from Large Language Models [arxiv] [USENIX Oral][blog] [video]
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel. Dec, 2020
WT5?! Training Text-to-Text Models to Explain their Predictions [arxiv]
Sharan Narang, Colin Raffel, Katherine Lee, Adam Roberts, Noah Fiedel, Karishma Malkan. April, 2020
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [JMLR]
Colin Raffel*, Noam Shazeer*, Adam Roberts*, Katherine Lee*, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. June, 2020
Hallucinations in Neural Machine Translation [NeurIPS IRASL]
Katherine Lee, Orhan Firat, Ashish Agarwal, Clara Fannjiang, and David Sussillo. Dec, 2018
Propagation of information along the cortical hierarchy as a function of attention while reading and listening to stories [Cerebral Cortex]
Mor Regev, Erez Simony, Katherine Lee, Kean Ming Tan, Janice Chen, Uri Hasson. Sep, 2019


Writing a Google AI Residency Cover letter with Ben Eysenbach (2019)

Submit to Journals [pdf] (2018)

Fun writing

Sourdough Literature Review (2020)


Memorization in Language Models [slides] [poster]
University of Toronto, May 2022
Mosaic ML, Jul 2022
GovAI, Aug 2022
ML Security & Privacy Seminar, Aug 2022
LEgally Attentive Data Scientists, Sep 2022
Cornell, NLP Seminar, Sep 2022
Cornell, C-Psyd, Sep, 2022
What does Privacy in Language Modeling Mean? [slides]
UNC, Apr 2022
Cornell, April 2022



In my free time, I enjoy making stuff. Sometimes this is pottery, sourdough, or knitting/crocheting. I also enjoy being outdoors, listening to jazz, and dancing. Sometimes all at the same time! I used to live here

In the more distant past, I’ve also solved for optimal seating arrangements at Google, and I spotted ships at bay with hyperspectral sensors at the Naval Research Lab in DC.