Improved Adversarial Robustness via Abstract Interpretation

Improved Adversarial Robustness via Abstract Interpretation.

This paper improves methods for certifying the adversarial robustness of neural networks using techniques from abstract interpretation. The idea is to pass regions of the input space (rather than specific inputs) through the network, and compute an upper bound on the loss over that region. We introduce some practical techniques to get a tighter upper bound on this loss compared to previous work.

You can find the final report here. I had a lot of fun working on this project along with my coauthors Zachary DeStefano and Ildebrando Magnani!

William Merrill
Ph.D. Student

NLP, deep learning, and formal languages