Official n8n AI
Benchmark

We rank the top LLMs by what we really care about:
how they work in n8n.

Model Average Run CostOverallTool UseHallucinationLogicScoringClassificationStructured OutputSpeedCost
Grok 4 Fast
$0.001258864839138501008698
Qwen3 VL 235B A22B Instruct
$0.0016865191965539838698
Grok 4.1 Fast
$0.00125845889893450927597
GPT-5.1 Chat
$0.01125825183842650929791
GPT-5.1-Codex
$0.01125826486874539928670
Claude Haiku 4.5
$0.0075805197771450929387
Qwen3 Max
$0.009796497912829838587
Devstral 2 2512
$0.00036795871845129758699
Grok 3 Mini
$0.00175786483824919927096
Claude Sonnet 4.5
$0.022578641008916391008660