Back to Progymnasmata

The Journey to Mount Olympus

A narrative exercise describing a mythical journey to the home of the gods.

status: Notes

Status Indicator

The status indicator reflects the current state of the work: - Abandoned: Work that has been discontinued - Notes: Initial collections of thoughts and references - Draft: Early structured version with a central thesis - In Progress: Well-developed work actively being refined - Finished: Completed work with no planned major changes This helps readers understand the maturity and completeness of the content.

·
certainty: likely

Confidence Rating

The confidence tag expresses how well-supported the content is, or how likely its overall ideas are right. This uses a scale from "impossible" to "certain", based on the Kesselman List of Estimative Words: 1. "certain" 2. "highly likely" 3. "likely" 4. "possible" 5. "unlikely" 6. "highly unlikely" 7. "remote" 8. "impossible" Even ideas that seem unlikely may be worth exploring if their potential impact is significant enough.

·
importance: 6/10

Importance Rating

The importance rating distinguishes between trivial topics and those which might change your life. Using a scale from 0-10, content is ranked based on its potential impact on: - the reader - the intended audience - the world at large For example, topics about fundamental research or transformative technologies would rank 9-10, while personal reflections or minor experiments might rank 0-1.

Content goes here...

Sign in with GitHub to comment

Loading comments...
Citation
Yotam, Kris · Apr 2025

Yotam, Kris. (Apr 2025). The Journey to Mount Olympus. krisyotam.com. https://krisyotam.com/progymnasmata/narrative/the-journey-to-mount-olympus

@article{yotam2025the-journey-to-mount-olympus,
  title   = "The Journey to Mount Olympus",
  author  = "Yotam, Kris",
  journal = "krisyotam.com",
  year    = "2025",
  month   = "Apr",
  url     = "https://krisyotam.com/progymnasmata/narrative/the-journey-to-mount-olympus"
}
Quote of the moment
You also mentioned the whole Chatbot Arena thing, which I think is interesting and points to the challenge around how you do benchmarking. How do you know what models are good for which things? One of the things we've generally tried to do over the last year is anchor more of our models in our Meta AI product north star use cases. The issue with open source benchmarks, and any given thing like the LM Arena stuff, is that they’re often skewed toward a very specific set of uses cases, which are often not actually  what any normal person does in your product. [...] So we're trying to anchor our north star on the product value that people report to us, what they say that they want, and what their revealed preferences are, and using the experiences that we have. Sometimes these benchmarks just don't quite line up. I think a lot of them are quite easily gameable. On the Arena you'll see stuff like Sonnet 3.7, which is a great model, and it's not near the top. It was relatively easy for our team to tune a version of Llama 4 Maverick that could be way at the top. But the version we released, the pure model, actually has no tuning for that at all, so it's further down. So you just need to be careful with some of these benchmarks. We're going to index primarily on the products.
Mark Zuckerberg (https://www.dwarkesh.com/p/mark-zuckerberg-2)
Kris Yotam
Kris Yotam
long-form stable essays
Updated
2026-05-12
Reading time
~1s

in Naperville, IL
Last visitor from Mitaka, Japan