Back to Papers

Simulating Human Moral Judgment in LLMs

Constructs a benchmark from human moral responses to evaluate how closely large language models align with real-world ethical intuitions.

status: Notes

Status Indicator

The status indicator reflects the current state of the work: - Abandoned: Work that has been discontinued - Notes: Initial collections of thoughts and references - Draft: Early structured version with a central thesis - In Progress: Well-developed work actively being refined - Finished: Completed work with no planned major changes This helps readers understand the maturity and completeness of the content.

·
certainty: technical-philosophical

Confidence Rating

The confidence tag expresses how well-supported the content is, or how likely its overall ideas are right. This uses a scale from "impossible" to "certain", based on the Kesselman List of Estimative Words: 1. "certain" 2. "highly likely" 3. "likely" 4. "possible" 5. "unlikely" 6. "highly unlikely" 7. "remote" 8. "impossible" Even ideas that seem unlikely may be worth exploring if their potential impact is significant enough.

·
importance: 10/10

Importance Rating

The importance rating distinguishes between trivial topics and those which might change your life. Using a scale from 0-10, content is ranked based on its potential impact on: - the reader - the intended audience - the world at large For example, topics about fundamental research or transformative technologies would rank 9-10, while personal reflections or minor experiments might rank 0-1.

Idea

Create a dataset of moral dilemmas (like trolley problems, real-world ethical cases, etc.) and survey how different people respond to them. Then use that dataset to evaluate whether existing LLMs (GPT-4, Claude, etc.) mimic human responses, diverge in systematic ways, or exhibit biases. Explore implications for alignment.

Sign in with GitHub to comment

Loading comments...
Citation
Yotam, Kris · Jul 2025

Yotam, Kris. (Jul 2025). Simulating Human Moral Judgment in LLMs. krisyotam.com. https://krisyotam.com/papers/technology/simulating-moral-judgment-llms

@article{yotam2025simulating-moral-judgment-llms,
  title   = "Simulating Human Moral Judgment in LLMs",
  author  = "Yotam, Kris",
  journal = "krisyotam.com",
  year    = "2025",
  month   = "Jul",
  url     = "https://krisyotam.com/papers/technology/simulating-moral-judgment-llms"
}

in Naperville, IL
Last visitor from Mitaka, Japan