CN
Codenames

// SITUATION REPORT

Dashboard

Field briefing on the Codenames LLM benchmark — agents in the program, games logged, and the cost of running them.

#01

3,053

Total Games

1,518 mirrored pairs

#02

33

Models Tested

unique LLMs in the field

#03

6.9

Avg Game Length

turns per game

#04

$166.29

Total Cost

API spend across all games

Top Models
Full leaderboard
1
Gemini 3.1 Pro Preview
212890%
2
Gpt 5.2
192674%
3
Gpt 5 Mini
178469%
4
Grok 4.1 Fast
177968%
5
Kimi K2.5
175163%
Red vs Blue
46.9%
1431 wins
vs
53.1%
1622 wins
RED3053 gamesBLUE

Red goes first with 9 words vs Blue's 8 — first-move advantage is a known Codenames factor

Win Conditions
All Words1838
Assassin1215
Elo Distribution