Udari Madhushani

Research Scientist, Sony AI

I am a research scientist at Sony AI where I lead efforts on enhancing safety and utility of large-scale generative models.

I received my PhD from Princeton University where I was advised by Prof. Prateek Mittal and Prof. Mung Chiang. I previously interned at Meta AI (AI Red Team) and Microsoft Research. I have been fortunate to receive Qualcomm Innovation Fellowship and the Rising Star Award in adversarial machine learning. I previously organized the first seminar series on Security & Privacy in Machine Learning (SPML) at Princeton University.

Research Interests. Uncovering and mitigating safety risks in the development of next-generation of trustworthy AI systems.
  • Safer generative AI. We have demonstrated privacy risks in real-world diffusion models and developed privacy-preserving sampling and training methods [1, 2, 3, 4]. We have also developed techniques and benchmarked automated generation of adversarial and unsafe content from generative models [5, 6].
  • Responsible data synthesis. Concerned by the vast amount of generative content online, we've recently developed techniques to identify synthetic samples [7], even in the absence of artificial watermarks, and tracing them to source generative models [8].
  • Robust machine learning. We have conducted an in-depth exploration of adversarial robust learning, including circumventing its higher sample complexity using synthetic data [9], finding fundamental limits on robustness [10], demonstrating higher robustness with transformers [11], robustness across threat models [12, 13, 14], the effect of model scaling and compression [15, 16], and adversarial risks in transitioning from closed-domain to open-world systems [17, 18].
  • Benchmarking progress in AI safety. We developed the widely adopted RobustBench benchmark [19], followed by MultiRobustBench to account for multiple attacks [20], and most recently JailbreakBench [6] to benchmark progress on jailbreaks against LLMs. We have also written a detailed discussion on nuanced similarity and distinction in security and safety approaches towards Trustworthy AI [21].

Publications

2024

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma

ICML 2024 (to appear)

Using signature from the latent autoencoders, we propose an approach to trace synthetic images back to the source latent generative model.

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

Ashwinee Panda, Xinyu Tang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

ICML 2024 (to appear)

We consider the cost of hyperparameter optimization in differentially private learning and propose a strategy that prvoides linear scaling of hyperparameters, thus reducing the privacy cost and simultaneously achieving state-of-the-art performance across 22 benchmark tasks in CV and NLP.

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

Arxiv 2024 - pdf | webpage | code

A centralized benchmark for 1) repository of jailbreaking attacks and artifacts, 2) standardized evaluation framework, and 3) up-to-date leaderboard.

2023

Differentially Private Image Classification by Learning Priors from Random Processes

Xinyu Tang, Ashwinee Panda, Vikash Sehwag, Prateek Mittal

NeurIPS 2023 (spotlight) - pdf | code

We show that pre-training on data from random processes enables better performance during differentially private finetuning, while simultaneously avoiding privacy leakage associated with real pretraining images.

Extracting Training Data from Diffusion Models

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

USENIX Security Symposium, 2023 - pdf | video | News (1, 2, 3, 4)

This was one of the first works to demonstrate significant memorization of real-world images in web-scale text-to-image generative models (Stable Diffusion, ImageN). Our findings further motivated web-scale data deduplication in training dataset of generative models.

Uncovering Adversarial Risks of Test-Time Adaptation

Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

ICML 2023 - pdf | webpage | code

We show that test-time adaptation, a technique that aims to improve performance at test time, also increases exposure to novel security risks.

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal

ICML 2023 - pdf | webpage | code

Going beyond single-attack robustness (RobustBench), we develop a standardized benchmark for multi-attack threat vectors.

A Light Recipe to Train Robust Vision Transformers

Edoardo Debenedetti, Vikash Sehwag, Prateek Mittal

SaTML 2023 - pdf | video | slides | code

Contrary to the conventional wisdom of using heavy data augmentation in ViTs, we showed that a lighter data augmentation (along with other bag-of-tricks) achieves state-of-the-art performance with ViTs adversarial training.

2022

Generating High Fidelity Data from Low-density Regions using Diffusion Models

Vikash Sehwag, Caner Hazirbas, Albert Gordo, Firat Ozgenel, Cristian Canton Ferrer

CVPR 2022 - pdf

Our work showed strong generalization of diffusion models in the tail of the data distribution and developed adaptive sampling techniques to generate high-fidelity samples from the tail of the data distribution.

Understanding Robust Learning through the Lens of Representation Similarities

Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Y. Zhao, Prateek Mittal, Haitao Zheng

NeurIPS 2022 - pdf | video | slides | code

Using representation similarity metrics, such as CKA, we demonstrate multiple interesting characteristics of adversarially robust networks compared to non-robust networks.

Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?

Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang, Mung Chiang, Prateek Mittal

ICLR 2022 - pdf | video | slides | code | blog

We showed that synthetic data from diffusion model provides a termendous boost in the generalization performance of adversarial training.

2021

Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries

Arjun Nitin Bhagoji, Daniel Cullina, Vikash Sehwag, Prateek Mittal

ICML 2021 - pdf | video | slides | poster | code

We provide lower-bounds on cross-entropy loss in the persence of adversarial attacks for common small-scale computer vision datasets.

SSD: A Unified Framework for Self-Supervised outlier detection

Vikash Sehwag, Mung Chiang, Prateek Mittal

ICLR 2021, NeurIPS SSL workshop 2020 - pdf | video | slides | code

Using only unlabeled data, we develop a highly succesful framework to detect outliers/out-of-distribution samples.

RobustBench: A Standardized Adversarial Robustness Benchmark

Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

NeurIPS 2021 - leaderboard | pdf | code

We develop a standardized benchmark to track progress on adversarial robustness in deep learning. Our benchmark has been highly insightful and been visited by more than 40K users.

PatchGuard: Provable Defense against Adversarial Patches Using Masks on Small Receptive Fields

Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, Prateek Mittal

USENIX Security Symposium 2021 - pdf | video | code

A general defense framework to acheive provable robustness against adversrial patches.

2020

HYDRA: Pruning Adversarially Robust Neural Networks

Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana

NeurIPS 2020 - webpage | pdf | video | slides | code

We achieved state-of-the-art clean and robust accuracy when aggressively pruning the parameters of deep neural networks.

Fast-Convergent Federated Learning

Hung T. Nguyen, Vikash Sehwag, Seyyedali Hosseinalipour, Christopher G. Brinton, Mung Chiang, H. Vincent Poor

IEEE Journal on Selected Areas in Communications (J-SAC) - Series on Machine Learning for Communications and Networks 2020 - pdf

We proposed a fast-convergent federated learning algorithm, called FOLB, which improves convergence speed by an smart sampling of devices in each round.

A Critical Evaluation of Open-World Machine Learning

Liwei Song, Vikash Sehwag, Arjun Nitin Bhagoji, Prateek Mittal

ICML Workshop on Uncertainty & Robustness 2020 - pdf | code

We demonstrate a fundamental conflict between the learning objectives of open-world machine learning and adversarial robustness.

Analyzing the Robustness of Open-World Machine Learning

Vikash Sehwag, Arjun Nitin Bhagoji, Liwei Song, Chawin Sitawarin, Daniel Cullina, Mung Chiang, Prateek Mittal

ACM Workshop on Artificial Intelligence and Security (AISec) 2019 - pdf | slides | code

We demonstrate the vulnerability of open-world machine learning models to adversarial examples and proposed a defense against the open-world adversarial attacks.


Selected Open Source Repositories

Invited Talks

  • On safety risks of generative AI - From ChatGPT to DallE.3 | Nov 2023 | Columbia University
  • Prospects and pitfalls of modern generative models - An AI safety perspective | Feb 2023 | AAAI
  • Enhancing machine learning using synthetic data distilled from generative models | Jan 2023 | MSR
  • Role of synthetic data in trustworthy machine learning | May 2022 | UChicago, UBerkeley
  • A generative approach to robust machine learning | Mar 2022 | CISS Conference
  • A generative approach to robust machine learning | Jan 2022 | RIKEN-AIP TrustML Young Scientist Seminar
  • Generating novel hard-instances form low-density regions using generative models | Aug 2021 | Meta AI
  • A primer on adversarial machine learning | July 2021 | Princeton-Intel REU Seminar
  • Embedding data distribution to make machine learning more reliable | Mar 2021 | EPFL
  • Private Deep Learning Made Practical | Oct 2019 | Qualcomm

Academic Services

Teaching and Mentoring
  • Lecture on basics of adversarial machine learning | Princeton-Intel REU Seminar 2021
  • Teaching assistant for ECE 574: Security & Privacy | Fall 2021 - Princeton University
  • Taught a mini-course on adversarial attacks & defenses | Winterssion 2020 - Princeton University
  • Teaching assistant for ELE 535: Machine Learning and Pattern Recognition | Fall 2019 - Princeton University
  • Mentored ten students in AI research over the years: Edoardo Debenedetti, Rajvardhan Oak, Christian Cianfarani, Tinashe Handina, Matteo Russo, Xianghao Kong, Song Wen, Minzhou Pan, Zhenting Wang, Jie Ren.
Other Services
  • Workshop organizer - ICCV 2023 ARROW workshop, CVPR 2023 Workshop of Adversarial Machine Learning on Computer Vision: Art of Robustness 2023
  • Program committe member for IEEE Conference on Secure and Trustworthy Machine Learning - 2023
  • Organized more than 20 talks on security & privacy in machine learning (SPML seminar series) - 2022
  • One of the core maintainers of Adversarial Robustness Benchmark | robustbench.github.io
  • Volunteered as junior mentor at Princeton-OLCF-NVIDIA GPU Hackathon | June 2020 - Princeton University
  • Reviewed more than 50 papers for major computer vision and machine learning conferences and journals.