Simulating Human Moral Judgment in LLMs

Constructs a benchmark from human moral responses to evaluate how closely large language models align with real-world ethical intuitions.

July 4, 2025 - January 15, 2026

status: Notes

certainty: possible

importance: 10/10

Idea

Create a dataset of moral dilemmas (like trolley problems, real-world ethical cases, etc.) and survey how different people respond to them. Then use that dataset to evaluate whether existing LLMs (GPT-4, Claude, etc.) mimic human responses, diverge in systematic ways, or exhibit biases. Explore implications for alignment.

Loading comments...

Citation

Cited as:

Yotam, Kris. (Jul 2025). Simulating Human Moral Judgment in LLMs. krisyotam.com. https://krisyotam.com/papers/ai/simulating-moral-judgment-llms

@article{yotam2025simulating-moral-judgment-llms,
  title   = "Simulating Human Moral Judgment in LLMs",
  author  = "Yotam, Kris",
  journal = "krisyotam.com",
  year    = "2025",
  month   = "Jul",
  url     = "https://krisyotam.com/papers/ai/simulating-moral-judgment-llms"
}