A close look at apple's analysis of reasoning in LLMs through problem complexity reveals limitations in current benchmark design and model interpretability.
status: Published
Status Indicator
The status indicator reflects the current state of the work:
- Abandoned: Work that has been discontinued
- Notes: Initial collections of thoughts and references
- Draft: Early structured version with a central thesis
- In Progress: Well-developed work actively being refined
- Finished: Completed work with no planned major changes
This helps readers understand the maturity and completeness of the content.
·
certainty: certain
Confidence Rating
The confidence tag expresses how well-supported the content is, or how likely its overall ideas are right. This uses a scale from "impossible" to "certain", based on the Kesselman List of Estimative Words:
1. "certain"
2. "highly likely"
3. "likely"
4. "possible"
5. "unlikely"
6. "highly unlikely"
7. "remote"
8. "impossible"
Even ideas that seem unlikely may be worth exploring if their potential impact is significant enough.
·
importance: 8/10
Importance Rating
The importance rating distinguishes between trivial topics and those which might change your life. Using a scale from 0-10, content is ranked based on its potential impact on:
- the reader
- the intended audience
- the world at large
For example, topics about fundamental research or transformative technologies would rank 9-10, while personal reflections or minor experiments might rank 0-1.
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem ComplexityBrydon Eastman, Chen Huang, Skyler Seto, Hadi Pouransari, Mehrdad Farajtabar, Raviteja Vemulapalli, Fartash Faghri, Oncel Tuzel, Barry-John Theobald, Josh SusskindJul 18, 2025