LLM Leaderboard
5 models
SWE-bench
1
GPT-5.5
OpenAI
88.7
%
2
Claude Opus 4.7
Anthropic
87.6
%
3
Grok 4
xAI
83.4
%
4
Gemini 3 Pro
Google
80.6
%
5
DeepSeek V3.1
DeepSeek
76.2
%
Tap a model for breakdown ยท switch benchmark to re-rank