Faeze Brahman

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Information-Theoretic Distillation for Reference-less Summarization
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Published with Wowchemy — the free, open source website builder that empowers creators.