WebDev Leaderboard

Compare the performance of AI models for web development tasks built in the Code Arena
The legacy WebDev leaderboard is still available at web.lmarena.ai

Last Updated

Dec 4, 2025

Total Votes

50,534

Total Models

/

	Rank Spread (Upper-Lower)
1	1◄─►1	claude-opus-4-5-20251101-thinking-32k	1511	+14/-14	2,323	Anthropic	Proprietary
2	2◄─►3	gemini-3-pro	1476	+10/-10	7,154	Google	Proprietary
3	2◄─►3	claude-opus-4-5-20251101	1472	+14/-14	2,377	Anthropic	Proprietary
4	4◄─►8	gpt-5-medium	1399	+12/-12	3,943	OpenAI	Proprietary
5	4◄─►8	claude-sonnet-4-5-20250929-thinking-32k	1398	+9/-9	6,217	Anthropic	Proprietary
6	4◄─►8	gpt-5.1-medium	1395	+11/-11	3,429	OpenAI	Proprietary
7	4◄─►8	claude-opus-4-1-20250805	1392	+9/-9	6,028	Anthropic	Proprietary
8	4◄─►8	claude-sonnet-4-5-20250929	1387	+9/-9	7,311	Anthropic	Proprietary
9	9◄─►11	glm-4.6	1366	+10/-10	5,806	Z.ai	MIT
10	9◄─►12	gpt-5.1	1354	+10/-10	5,270	OpenAI	Proprietary
11	9◄─►12	kimi-k2-thinking-turbo	1350	+10/-10	5,118	Moonshot	Modified MIT
12	10◄─►12	gpt-5.1-codex	1341	+11/-11	3,614	OpenAI	Proprietary
13	13◄─►13	minimax-m2	1316	+10/-10	5,783	MiniMax	Apache 2.0
14	14◄─►16	deepseek-v3.2-exp	1293	+10/-10	5,154	DeepSeek AI	MIT
15	14◄─►16	qwen3-coder-480b-a35b-instruct	1289	+9/-9	5,972	Alibaba	Apache 2.0
16	14◄─►17	claude-haiku-4-5-20251001	1285	+9/-9	5,992	Anthropic	Proprietary
17	16◄─►18	KAT-Coder-Pro-V1	1264	+15/-15	1,943	KwaiKAT	Proprietary
18	17◄─►19	gpt-5.1-codex-mini	1252	+16/-16	1,564	OpenAI	Proprietary
19	18◄─►21	grok-4-1-fast-reasoning	1229	+13/-13	2,978	xAI	Proprietary
20	19◄─►21	gemini-2.5-pro	1213	+12/-12	3,504	Google	Proprietary
21	19◄─►21	grok-4.1-thinking	1205	+19/-19	1,258	xAI	Proprietary
22	22◄─►23	grok-4-fast-reasoning	1153	+22/-22	943	xAI	Proprietary
23	22◄─►24	grok-code-fast-1	1143	+21/-21	1,014	xAI	Proprietary
24	23◄─►24	devstral-medium-2507	1103	+21/-21	1,031	Mistral	Proprietary

WebDev Leaderboard

Remove Style Control Leaderboard Plots

Battle Count for Each Combination of Models (without Ties)

Confidence Intervals on Model Strength (via Bootstrapping)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles