2024

Can Language Models Reason about Individualistic Human Values and Preferences?

An Empirical Investigation of Machines' Capabilities for Moral Judgment with the Delphi Experiment

As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions …

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of …

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models …

Information-Theoretic Distillation for Reference-less Summarization

The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models. While increasingly ubiquitous dependence on such large-scale …

MigraineTracker: Examining Patient Experiences with Goal-Directed Self-Tracking for a Chronic Health Condition

Self-tracking and personal informatics offer important potential in chronic condition management, but such potential is often undermined by difficulty in aligning self-tracking tools to an individual’s goals. Informed by prior proposals of …