Model comparison
All 72 models ranked by accuracy. Filter by company, type or view domain-specific performance.
72 models
Proprietary
Open-weight
Leaderboard
# | Model | Company | Open | Price (In/Out) | Accuracy | Correct | |
|---|---|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview | No | $2.00/$12.00 | 99.8% | 504/505 | ||
| 2 | Z | glm-4.7 | Zhipu | Yes | $0.40/$1.50 | 98.6% | 498/505 |
| 3 | gemini-3-flash-preview | No | $0.50/$3.00 | 98.2% | 496/505 | ||
| 4 | gemini-2.5-pro | No | $1.25/$10.00 | 97.8% | 494/505 | ||
| 5 | grok-4.1-fast | xAI | No | $0.20/$0.50 | 97.6% | 493/505 | |
| 6 | gpt-5.2-chat-medium | OpenAI | No | $1.75/$14.00 | 97.4% | 492/505 | |
| 7 | Mo | kimi-k2-thinking | Moonshot | No | $0.40/$1.75 | 97.2% | 491/505 |
| 8 | claude-opus-4.5 | Anthropic | No | $5.00/$25.00 | 97.0% | 490/505 | |
| 9 | gpt-5.2-chat-high | OpenAI | No | $1.75/$14.00 | 96.8% | 489/505 | |
| 10 | gpt-5.2-chat-low | OpenAI | No | $1.75/$14.00 | 96.8% | 489/505 | |
| 11 | gpt-5-mini-medium | OpenAI | No | $0.25/$2.00 | 96.4% | 487/505 | |
| 12 | gpt-5.1-chat-medium | OpenAI | No | $1.25/$10.00 | 96.4% | 487/505 | |
| 13 | DS | deepseek-r1 | DeepSeek | Yes | $0.30/$1.20 | 96.2% | 486/505 |
| 14 | grok-4-fast | xAI | No | $0.20/$0.50 | 96.0% | 485/505 | |
| 15 | gpt-5-mini-high | OpenAI | No | $0.25/$2.00 | 95.6% | 483/505 | |
| 16 | gpt-5-mini-low | OpenAI | No | $0.25/$2.00 | 95.2% | 481/505 | |
| 17 | o4-mini-high | OpenAI | No | $1.10/$4.40 | 95.2% | 481/505 | |
| 18 | gemini-2.5-flash | No | $0.30/$2.50 | 95.0% | 480/505 | ||
| 19 | o4-mini-medium | OpenAI | No | $1.10/$4.40 | 95.0% | 480/505 | |
| 20 | grok-3-mini | xAI | No | $0.30/$0.50 | 95.0% | 480/505 | |
| 21 | DS | deepseek-v3.2 | DeepSeek | Yes | $0.22/$0.32 | 94.9% | 479/505 |
| 22 | gpt-5.1-chat-low | OpenAI | No | $1.25/$10.00 | 94.9% | 479/505 | |
| 23 | o3-mini-low | OpenAI | No | $1.10/$4.40 | 94.9% | 479/505 | |
| 24 | o3-mini-medium | OpenAI | No | $1.10/$4.40 | 94.9% | 479/505 | |
| 25 | claude-3.7-sonnet | Anthropic | No | $3.00/$15.00 | 94.7% | 478/505 | |
| 26 | o3-mini-high | OpenAI | No | $1.10/$4.40 | 94.7% | 478/505 | |
| 27 | gpt-5-chat | OpenAI | No | $1.25/$10.00 | 94.5% | 477/505 | |
| 28 | o4-mini-low | OpenAI | No | $1.10/$4.40 | 94.3% | 476/505 | |
| 29 | gpt-5.1-chat-high | OpenAI | No | $1.25/$10.00 | 93.9% | 474/505 | |
| 30 | gpt-4.1 | OpenAI | No | $2.00/$8.00 | 93.7% | 473/505 | |
| 31 | gemini-2.0-flash-001 | No | $0.10/$0.40 | 93.3% | 471/505 | ||
| 32 | gpt-5-nano-low | OpenAI | No | $0.05/$0.40 | 93.3% | 471/505 | |
| 33 | llama-4-scout | Meta | Yes | $0.08/$0.30 | 93.1% | 470/505 | |
| 34 | mistral-medium-3.1 | Mistral | Yes | $0.40/$2.00 | 93.1% | 470/505 | |
| 35 | qwen3-235b-a22b-2507 | Alibaba | Yes | $0.07/$0.46 | 93.1% | 470/505 | |
| 36 | qwen3-30b-a3b-thinking-2507 | Alibaba | Yes | $0.05/$0.34 | 93.1% | 470/505 | |
| 37 | gpt-4o | OpenAI | No | $2.50/$10.00 | 92.9% | 469/505 | |
| 38 | gpt-5-nano-high | OpenAI | No | $0.05/$0.40 | 92.9% | 469/505 | |
| 39 | gpt-5-nano-medium | OpenAI | No | $0.05/$0.40 | 92.9% | 469/505 | |
| 40 | MM | minimax-m2 | MiniMax | No | $0.20/$1.00 | 92.9% | 469/505 |
| 41 | qwen3-14b | Alibaba | Yes | $0.05/$0.22 | 92.9% | 469/505 | |
| 42 | qwen3-32b | Alibaba | Yes | $0.08/$0.24 | 92.1% | 465/505 | |
| 43 | gpt-4.1-mini | OpenAI | No | $0.40/$1.60 | 91.7% | 463/505 | |
| 44 | claude-haiku-4.5 | Anthropic | No | $1.00/$5.00 | 91.5% | 462/505 | |
| 45 | gemini-2.5-flash-lite | No | $0.10/$0.40 | 91.3% | 461/505 | ||
| 46 | gpt-oss-120b | OpenAI | Yes | $0.04/$0.19 | 90.7% | 458/505 | |
| 47 | qwen3-vl-8b-thinking | Alibaba | Yes | $0.18/$2.10 | 90.3% | 456/505 | |
| 48 | mistral-small-3.2-24b-instruct | Mistral | Yes | $0.06/$0.18 | 89.3% | 451/505 | |
| 49 | gpt-oss-20b | OpenAI | Yes | $0.03/$0.14 | 89.3% | 451/505 | |
| 50 | claude-sonnet-4.5 | Anthropic | No | $3.00/$15.00 | 89.1% | 450/505 | |
| 51 | mistral-small-24b-instruct-2501 | Mistral | Yes | $0.03/$0.11 | 88.7% | 448/505 | |
| 52 | qwen3-8b | Alibaba | Yes | $0.03/$0.11 | 88.7% | 448/505 | |
| 53 | phi-4-reasoning-plus | Microsoft | Yes | $0.07/$0.35 | 87.7% | 443/505 | |
| 54 | ministral-14b-2512 | Mistral | Yes | $0.20/$0.20 | 87.7% | 443/505 | |
| 55 | qwen3-vl-8b-instruct | Alibaba | Yes | $0.06/$0.40 | 87.5% | 442/505 | |
| 56 | Z | glm-4-32b | Zhipu | Yes | $0.10/$0.10 | 87.3% | 441/505 |
| 57 | ministral-8b-2512 | Mistral | Yes | $0.15/$0.15 | 86.9% | 439/505 | |
| 58 | gpt-4.1-nano | OpenAI | No | $0.10/$0.40 | 86.1% | 435/505 | |
| 59 | gemma-3-27b-it | Yes | $0.04/$0.15 | 85.3% | 431/505 | ||
| 60 | DS | deepseek-r1-0528-qwen3-8b | DeepSeek | Yes | $0.02/$0.10 | 85.1% | 430/505 |
| 61 | gpt-4o-mini | OpenAI | No | $0.15/$0.60 | 84.8% | 428/505 | |
| 62 | claude-3.5-haiku | Anthropic | No | $0.80/$4.00 | 84.0% | 424/505 | |
| 63 | gemma-3-12b-it | Yes | $0.03/$0.10 | 82.2% | 415/505 | ||
| 64 | nemotron-nano-9b-v2 | Nvidia | Yes | $0.04/$0.16 | 79.6% | 402/505 | |
| 65 | ministral-3b-2512 | Mistral | Yes | $0.10/$0.10 | 79.2% | 400/505 | |
| 66 | mistral-nemo | Mistral | Yes | $0.02/$0.04 | 78.8% | 398/505 | |
| 67 | nemotron-3-nano-30b-a3b | Nvidia | Yes | $0.06/$0.24 | 77.4% | 391/505 | |
| 68 | nemotron-nano-12b-v2-vl | Nvidia | Yes | $0.20/$0.60 | 77.4% | 391/505 | |
| 69 | gemma-3n-e4b-it | Yes | $0.02/$0.04 | 75.2% | 380/505 | ||
| 70 | llama-3.1-8b-instruct | Meta | Yes | $0.02/$0.03 | 72.5% | 366/505 | |
| 71 | gemma-3-4b-it | Yes | $0.02/$0.07 | 71.3% | 360/505 | ||
| 72 | llama-3.2-3b-instruct | Meta | Yes | $0.02/$0.02 | 57.6% | 291/505 |
Showing 72 of 72 models
Visualizations
Accuracy vs price
Higher accuracy models tend to be more expensive. Green dots are open-weight models.
Top 33 models
Top 33 open-weight models
Domain performance heatmap
Accuracy breakdown by domain for the top 50 models.
| Model | Drilling | Geophysics | Pet. Geology | Petrophysics | Production | Reservoir | Sediment. |
|---|---|---|---|---|---|---|---|
| gemini-3-pro-preview | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| glm-4.7 | 100 | 100 | 99 | 98 | 100 | 100 | 99 |
| gemini-3-flash-preview | 100 | 99 | 99 | 97 | 100 | 100 | 99 |
| gemini-2.5-pro | 96 | 99 | 99 | 97 | 93 | 100 | 100 |
| grok-4.1-fast | 96 | 100 | 99 | 96 | 100 | 100 | 100 |
| gpt-5.2-chat-medium | 96 | 100 | 99 | 96 | 100 | 100 | 99 |
| kimi-k2-thinking | 96 | 99 | 99 | 96 | 100 | 98 | 98 |
| claude-opus-4.5 | 96 | 96 | 98 | 96 | 100 | 100 | 97 |
| gpt-5.2-chat-high | 96 | 100 | 99 | 95 | 100 | 100 | 98 |
| gpt-5.2-chat-low | 96 | 99 | 99 | 96 | 100 | 98 | 98 |
| gpt-5-mini-medium | 96 | 100 | 98 | 95 | 93 | 100 | 99 |
| gpt-5.1-chat-medium | 96 | 98 | 99 | 95 | 100 | 100 | 98 |
| deepseek-r1 | 96 | 98 | 99 | 95 | 100 | 100 | 97 |
| grok-4-fast | 96 | 100 | 99 | 93 | 100 | 100 | 99 |
| gpt-5-mini-high | 96 | 100 | 99 | 93 | 93 | 100 | 100 |
| gpt-5-mini-low | 96 | 100 | 97 | 92 | 100 | 98 | 99 |
| o4-mini-high | 96 | 100 | 97 | 92 | 100 | 100 | 100 |
| gemini-2.5-flash | 88 | 98 | 99 | 93 | 100 | 100 | 98 |
| o4-mini-medium | 92 | 99 | 98 | 92 | 93 | 100 | 99 |
| grok-3-mini | 96 | 98 | 98 | 92 | 100 | 98 | 98 |
| deepseek-v3.2 | 92 | 96 | 97 | 92 | 100 | 100 | 97 |
| gpt-5.1-chat-low | 92 | 93 | 97 | 95 | 100 | 93 | 98 |
| o3-mini-low | 96 | 99 | 98 | 92 | 100 | 98 | 97 |
| o3-mini-medium | 96 | 99 | 99 | 92 | 100 | 100 | 97 |
| claude-3.7-sonnet | 92 | 94 | 95 | 93 | 100 | 100 | 96 |
| o3-mini-high | 96 | 99 | 98 | 92 | 100 | 95 | 97 |
| gpt-5-chat | 96 | 91 | 97 | 93 | 100 | 98 | 97 |
| o4-mini-low | 96 | 99 | 97 | 91 | 93 | 98 | 99 |
| gpt-5.1-chat-high | 96 | 89 | 96 | 93 | 100 | 93 | 99 |
| gpt-4.1 | 96 | 90 | 95 | 92 | 100 | 95 | 97 |
| gemini-2.0-flash-001 | 100 | 96 | 97 | 90 | 93 | 98 | 99 |
| gpt-5-nano-low | 100 | 95 | 97 | 90 | 86 | 95 | 98 |
| llama-4-scout | 88 | 98 | 96 | 90 | 100 | 98 | 98 |
| mistral-medium-3.1 | 96 | 95 | 97 | 89 | 100 | 100 | 98 |
| qwen3-235b-a22b-2507 | 92 | 93 | 97 | 91 | 79 | 95 | 96 |
| qwen3-30b-a3b-thinking-2507 | 100 | 96 | 98 | 89 | 93 | 98 | 97 |
| gpt-4o | 92 | 90 | 96 | 90 | 100 | 98 | 97 |
| gpt-5-nano-high | 96 | 96 | 98 | 89 | 86 | 100 | 97 |
| gpt-5-nano-medium | 96 | 95 | 98 | 89 | 93 | 100 | 96 |
| minimax-m2 | 96 | 94 | 95 | 90 | 86 | 98 | 96 |
| qwen3-14b | 96 | 95 | 97 | 90 | 93 | 95 | 96 |
| qwen3-32b | 88 | 96 | 95 | 89 | 86 | 100 | 97 |
| gpt-4.1-mini | 88 | 90 | 95 | 89 | 100 | 95 | 98 |
| claude-haiku-4.5 | 92 | 95 | 95 | 88 | 93 | 100 | 96 |
| gemini-2.5-flash-lite | 100 | 94 | 93 | 89 | 79 | 93 | 95 |
| gpt-oss-120b | 88 | 94 | 94 | 88 | 100 | 95 | 91 |
| qwen3-vl-8b-thinking | 92 | 94 | 93 | 87 | 93 | 95 | 95 |
| mistral-small-3.2-24b-instruct | 92 | 91 | 93 | 86 | 93 | 95 | 95 |
| gpt-oss-20b | 92 | 96 | 93 | 85 | 100 | 93 | 91 |
| claude-sonnet-4.5 | 88 | 86 | 89 | 91 | 100 | 95 | 83 |