Philippe Bergna is an AI safety researcher with expertise in adversarial machine learning, red teaming, and robust AI evaluation. At Advai, he worked with the UK Ministry of Defence on 3D adversarial patches and successfully attacked the UK’s leading facial verification system using transferable adversarial examples. His research applies adversarial methods beyond attacks, leveraging them as tools for uncertainty estimation, active learning, and Out-of-Distribution (OOD) detection. Most recently, he has investigated data extraction risks in large language models and built a Retrieval-Augmented Generation (RAG) chatbot for AI safety research.