long paper

Can Language Models Reason about Individualistic Human Values and Preferences?

An Empirical Investigation of Machines' Capabilities for Moral Judgment with the Delphi Experiment

As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions …

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of …

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this …

long paper

Can Language Models Reason about Individualistic Human Values and Preferences?

An Empirical Investigation of Machines' Capabilities for Moral Judgment with the Delphi Experiment

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

Information-Theoretic Distillation for Reference-less Summarization