Train, but Verify: Towards Practical AI Robustness
Software Engineering Institute
In this "Train, but Verify" project, we attempt to address the gap in the state of the art on secure training of ML systems with two objectives:
- Train secure AI systems by training ML models to enforce at least two security policies.
- Verify the security of AI systems by testing against declarative, realistic threat models.
We consider security policies from the Beieler taxonomy: ensure that an ML system does not learn the wrong thing during training (e.g., data poisoning), do the wrong thing during operation (e.g., adversarial examples), or reveal the wrong thing during operation (e.g., model inversion or membership inference).