We rank the top LLMs by what we really care about:
how they work in n8n.
| Model | Average Run Cost | Overall | Tool Use | Hallucination | Logic | Scoring | Classification | Structured Output | Speed | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| Grok 4 Fast | $0.00125 | 88 | 64 | 83 | 91 | 38 | 50 | 100 | 86 | 98 |
| Qwen3 VL 235B A22B Instruct | $0.0016 | 86 | 51 | 91 | 96 | 55 | 39 | 83 | 86 | 98 |
| Grok 4.1 Fast | $0.00125 | 84 | 58 | 89 | 89 | 34 | 50 | 92 | 75 | 97 |
| GPT-5.1 Chat | $0.01125 | 82 | 51 | 83 | 84 | 26 | 50 | 92 | 97 | 91 |
| GPT-5.1-Codex | $0.01125 | 82 | 64 | 86 | 87 | 45 | 39 | 92 | 86 | 70 |
| Claude Haiku 4.5 | $0.0075 | 80 | 51 | 97 | 77 | 14 | 50 | 92 | 93 | 87 |
| Qwen3 Max | $0.009 | 79 | 64 | 97 | 91 | 28 | 29 | 83 | 85 | 87 |
| Devstral 2 2512 | $0.00036 | 79 | 58 | 71 | 84 | 51 | 29 | 75 | 86 | 99 |
| Grok 3 Mini | $0.00175 | 78 | 64 | 83 | 82 | 49 | 19 | 92 | 70 | 96 |
| Claude Sonnet 4.5 | $0.0225 | 78 | 64 | 100 | 89 | 16 | 39 | 100 | 86 | 60 |