Code Arena | Overall

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning, tool use, and production-style workflows.

Mar 4, 2026
184,734 votes
51 models
Rank Spread
1
12
Anthropic
Anthropic · Proprietary
1556+14/-14
2,553
2
12
Anthropic
Anthropic · Proprietary
1555+13/-13
3,273
3
33
Anthropic
Anthropic · Proprietary
1523+12/-12
3,150
4
44
Anthropic
1497+8/-8
11,859
5
57
Anthropic
Anthropic · Proprietary
1475+7/-7
11,994
6
58
OpenAI · Proprietary
1472+16/-16
1,696
7
512
Google · Proprietary
1461+15/-15
1,829
8
612
Z.ai · MIT
1447+12/-12
3,166
9
712
Google · Proprietary
1442+7/-7
17,443
10
712
Google · Proprietary
1442+7/-7
13,601
11
713
Z.ai · MIT
1441+10/-10
5,131
12
713
MoonshotAI
Moonshot · Modified MIT
1438+10/-10
4,626
13
1114
Minimax
MiniMax · Modified MIT
1422+10/-10
4,259
14
1319
MoonshotAI
Moonshot · Modified MIT
1415+11/-11
3,539
15
1423
1404+8/-8
9,579
16
1423
Minimax
MiniMax · MIT
1402+8/-8
9,793
17
1426
OpenAI · Proprietary
1396+15/-16
1,634
18
1425
Qwen Icon
Alibaba · Apache 2.0
1396+11/-11
3,136
19
1426
OpenAI · Proprietary
1394+12/-12
3,929
20
1525
Anthropic
1390+7/-7
14,795
21
1526
Anthropic
Anthropic · Proprietary
1390+8/-8
8,983
22
1526
OpenAI · Proprietary
1389+9/-9
6,441
23
1726
Anthropic
Anthropic · Proprietary
1386+7/-7
16,570
24
1526
Qwen Icon
Alibaba · Apache 2.0
1384+15/-15
1,675
25
1727
Qwen Icon
Alibaba · Apache 2.0
1375+16/-16
1,593
26
1927
DeepSeek · MIT
1373+9/-9
6,560
27
2530
Z.ai · MIT
1357+8/-8
8,746
28
2732
OpenAI · Proprietary
1344+7/-7
13,268
29
2732
1342+8/-8
6,929
30
2732
OpenAI · Proprietary
1341+8/-8
6,476
31
2833
MoonshotAI
Moonshot · Modified MIT
1331+7/-7
13,314
32
2835
OpenAI · Proprietary
1330+9/-9
6,508
33
3136
DeepSeek · MIT
1322+8/-8
7,910
34
3236
Minimax
MiniMax · Apache 2.0
1313+9/-9
8,833
35
3336
Anthropic
Anthropic · Proprietary
1310+7/-7
14,613
36
3237
1308+13/-13
2,146
37
3638
DeepSeek · MIT
1287+10/-10
5,129
38
3738
Qwen Icon
Alibaba · Apache 2.0
1282+7/-7
14,283
39
3944
Kwai
KwaiKAT · Proprietary
1260+15/-15
1,954
40
3945
Qwen Icon
Alibaba · Apache 2.0
1257+16/-16
1,698
41
3945
Google · Proprietary
1255+17/-17
1,391
42
3945
OpenAI · Proprietary
1244+17/-17
1,534
43
3945
Qwen Icon
Alibaba · Proprietary
1243+17/-17
1,486
44
3945
xAI · Proprietary
1236+9/-9
7,131
45
4048
Mistral · Apache 2.0
1224+20/-20
1,039
46
4548
Google · Proprietary
1206+13/-13
3,453
47
4548
xAI · Proprietary
1205+19/-19
1,267
48
4548
Mistral · Modified MIT
1199+16/-16
1,687
49
4950
xAI · Proprietary
1154+22/-22
968
50
4951
xAI · Proprietary
1142+21/-21
1,017
51
5051
Mistral · Proprietary
1100+22/-22
1,020

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)