
I’m currently a senior research scientist at Google Brain and a PhD candidate at Cornell advised by David Mimno. My committee includes: James Grimmelmann and Chris De Sa.
Specifically, I study security and privacy in large language models.
I’m broadly interested in the translation between people and the systems we build. What kinds of decisions can algorithms help with, and which should we leave algorithms out of? What kinds of objectives, political or social, can we, or can we not write down?
Before that, I had great fun learning about how real neural networks (brains!) take in the world with Jonathan Pillow and Uri Hasson during my undergrad in Operations Research at Princeton.
You can find me on the internet: Twitter, Google Scholar, Goodreads, or email me at [my github handle]@gmail.com
If you are an undergrad or masters student and interested in memorization and privacy in language models, you can apply to work with me.
I am currently at capacity working with excellent students. You’re welcome to fill out this form, and I’ll reach out when I have more capacity.
Selected Publications
Full list on Google Scholar
- Measuring Forgetting of Memorized Training Examples [arxiv][ICLR]
- Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang. Jun 2022
- Quantifying Memorization Across Neural Language Models [arxiv][ICLR Spotlight]
- Nicholas Carlini*, Daphne Ippolito*, Matthew Jagielski*, Katherine Lee*, Florian Tramèr*, Chiyuan Zhang*. Feb 2022 (*authors alphabetical)
- What Does it Mean for a Language Model to Preserve Privacy? [arxiv][FAccT]
- Hannah Brown, Katherine Lee, Fatemehsadat Mireshghalla, Reza Shokri, Florian Tramèr. Feb 2022
- Deduplicating Training Data Makes Language Models Better [arxiv] [ACL Oral]
- Katherine Lee*, Daphne Ippolito*, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini. July 2021
- Extracting Training Data from Large Language Models [arxiv] [USENIX Oral][blog] [video]
- Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel. Dec, 2020
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [JMLR]
- Colin Raffel*, Noam Shazeer*, Adam Roberts*, Katherine Lee*, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. June, 2020
- Hallucinations in Neural Machine Translation [NeurIPS IRASL]
- Katherine Lee, Orhan Firat, Ashish Agarwal, Clara Fannjiang, and David Sussillo. Dec, 2018
Writing
Writing a Google AI Residency Cover letter with Ben Eysenbach (2019)
Submit to Journals [pdf] (2018)
Fun writing
Sourdough Literature Review (2020)
Talks
- Memorization in Language Models [slides] [poster]
- University of Toronto, May 2022
- Mosaic ML, Jul 2022
- GovAI, Aug 2022
- ML Security & Privacy Seminar, Aug 2022
- LEgally Attentive Data Scientists, Sep 2022
- Cornell, NLP Seminar, Sep 2022
- Cornell, C-Psyd, Sep, 2022
- What does Privacy in Language Modeling Mean? [slides]
- UNC, Apr 2022
- Cornell, April 2022
Service
- Organized the Generative AI and Law Workshop (GenLaw ‘23’) at ICML 2023.
- Helped organize WELM workshop at ICLR 2021 and moderated a panel discussion on “Bias, safety, copyright, and efficiency”
- Reviewer for NeurIPS, ICML.
- Lead Brain Women 2017-2021. I have a lot of thoughts about DEI in the workplace. Feel free to ask me.
Fun!
In my free time, I enjoy making stuff. Sometimes this is pottery, sourdough, or knitting/crocheting. I also enjoy being outdoors, listening to jazz, and dancing. Sometimes all at the same time! I used to live here.
In the more distant past, I’ve also solved for optimal seating arrangements at Google, and I spotted ships at bay with hyperspectral sensors at the Naval Research Lab in DC.