I’m currently a research scientist at OpenAI. My work falls under the umbrella of computer security and machine learning. I focus primarily on breaking machine learning systems at the algorithmic level and how to operationalize these kinds of attacks. Many attacks proposed in the academic literature on adversarial machine learning ultimately don’t translate to the real world and I work on bridging this gap between theory and practice.
I also periodically consult and conduct private trainings on practical adversarial machine learning. I am in the process of writing up and releasing these trainings as The Machine Learning Red Team Manual for No Starch Press. My aim is to provide a practical guide for anyone interested in adversarial ML and red teaming as it relates to in-production ML systems.
I am also working on my PhD in computer science at Harvard University. My PhD advisors are Jim Waldo at Harvard and Ethan Zuckerman at the MIT Media Lab. At Harvard, I am a Fellow of the Belfer Center for Science and International Affairs Cyber Project and the Berkman Klein Center for Internet & Society. Prior to Harvard I graduated from the University of Utah with a double major in computer science and applied mathematics.
I co-founded and co-organize the DEF CON AI Village, a hacker community focused on the use and abuse of machine learning and artificial intelligence technology. We have an active public slack group and periodically organize virtual hangouts to discuss research and security industry topics.
Publications
Language Models are Few-Shot Learners. Brown T., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D.. Under review in NeurIPS 2020.
Release Strategies and the Social Impacts of Language Models. Solaiman I., Brundage M., Clark J., Askell A., Herbert-Voss A., Wu J., Radford A., Krueger G., Kim J.W., Kreps S., McCain M., Newhouse A., Blazakis J., McGuffie K., Wang J.. OpenAI Report, November 2019.
Computing minimal interpolants in C{1,1}(R^d). Herbert-Voss A., Hirn M., McCollum F. In Rev. Math. Iberoamericana 33 (2017), 29-66. doi: 10.4171/RMI/927 arxiv: 1411.5668
Recorded Talks
Model Hardening for Fun and Profit - DEF CON 26 AI Village
Don’t Red Team AI Like a Chump - DEF CON 27
Buzz
Live from Black Hat: Practical Defenses Against Adversarial Machine Learning with Ariel Herbert-Voss - Veracode Security News blog
Hackers tricked a Tesla, and it’s a sign of things to come in the race to fool artificial intelligence - Australian Broadcasting Corporation
AI Village: What Is AI Safety And How Can We Embrace And Prepare For Adversarial AI? - ITSP Magazine