Return to Notebooks

Bradley-Terry Model

Notes on the Bradley-Terry model for ranking items via paired comparisons, its formula, and applications to personal preference ranking.

start: 2026.04.15, 12:00 · end: 2026.04.15, 12:00
status: Notes

Status Indicator

The status indicator reflects the current state of the work: - Abandoned: Work that has been discontinued - Notes: Initial collections of thoughts and references - Draft: Early structured version with a central thesis - In Progress: Well-developed work actively being refined - Finished: Completed work with no planned major changes This helps readers understand the maturity and completeness of the content.

· certainty: likely

Confidence Rating

The confidence tag expresses how well-supported the content is, or how likely its overall ideas are right. This uses a scale from "impossible" to "certain", based on the Kesselman List of Estimative Words: 1. "certain" 2. "highly likely" 3. "likely" 4. "possible" 5. "unlikely" 6. "highly unlikely" 7. "remote" 8. "impossible" Even ideas that seem unlikely may be worth exploring if their potential impact is significant enough.

· importance: 7/10

Importance Rating

The importance rating distinguishes between trivial topics and those which might change your life. Using a scale from 0-10, content is ranked based on its potential impact on: - the reader - the intended audience - the world at large For example, topics about fundamental research or transformative technologies would rank 9-10, while personal reflections or minor experiments might rank 0-1.


Recently I have been attempting to rank my favorite things, including films, anime, manga, directors, artists, and so on. The cognitive overhead for a task like this is immense. Rather than trying to assign arbitrary numbers to each item and refining them down to the tenth or hundredth manually to create separation within condensed tiers, I decided to consider ranking items via paired comparisons. There are a few options for this, such as Elo, Copeland's method, TrueSkill, and of course the model I have landed on: the Bradley-Terry model.

A simple formula for determining which of two items $i$ or $j$ is more likely to be chosen between them. The probability $\Pr(i > j)$ that item $i$ is preferred over item $j$ is given by:

$$\Pr(i > j) = \frac{p_i}{p_i + p_j}$$

where $p_i$ and $p_j$ are positive real-valued parameters representing the "strength" or "merit" of items $i$ and $j$ respectively. The model assumes that each item $i$ in a set of $n$ items has an associated parameter $p_i > 0$, and that the outcome of any pairwise comparison depends only on the ratio $p_i / p_j$. Since only ratios matter, the parameters are identifiable only up to a multiplicative constant, so one typically normalizes by setting $\sum_{i=1}^{n} p_i = 1$ or fixing one parameter.

Given a dataset of paired comparison outcomes, the parameters $p_1, p_2, \ldots, p_n$ are estimated by maximum likelihood. If $w_{ij}$ denotes the number of times item $i$ was preferred over item $j$, the log-likelihood is:

$$\ell(p) = \sum_{i \neq j} w_{ij} \log \frac{p_i}{p_i + p_j}$$

This is a concave function in the log-transformed parameters $\lambda_i = \log p_i$, which guarantees a unique global maximum (up to the normalization constraint) whenever the comparison graph is connected.

Note: I also built a project using this model, which you can find at whichisbetter.dev.

Recommended

To Read

Primary Sources

Papers


permanent link Notebooks RSS feed