Yejin Choi

Latest

Can Language Models Reason about Individualistic Human Values and Preferences?
An Empirical Investigation of Machines' Capabilities for Moral Judgment with the Delphi Experiment
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Information-Theoretic Distillation for Reference-less Summarization
Particip-AI: Anticipating Future AI Use Cases and Impacts with Lay Users
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Position Paper: A Roadmap to Pluralistic Alignment
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Faith and Fate: Limits of Transformers on Compositionality
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Reinforced Clarification Question Generation with Defeasibility Rewards for Disambiguating Social and Moral Situations
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Quark: Controllable Text Generation with Reinforced Unlearning
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Aligning to Social Norms and Values in Interactive Narratives
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
''I'm Not Mad'': Commonsense Implications of Negation and Contradiction