Nouha Dziri

Latest

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Position Paper: A Roadmap to Pluralistic Alignment
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Faith and Fate: Limits of Transformers on Compositionality
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations