FormationEval

72 language models evaluated on 505 petroleum geoscience MCQs

505
Questions
72
Models
7
Domains

Model comparison

All 72 models ranked by accuracy. Filter by company, type or view domain-specific performance.

72 models
Proprietary
Open-weight

Leaderboard

#
ModelCompanyOpen
Price (In/Out)
Accuracy
Correct
1Google logogemini-3-pro-previewGoogleNo$2.00/$12.0099.8%504/505
2
Z
glm-4.7ZhipuYes$0.40/$1.5098.6%498/505
3Google logogemini-3-flash-previewGoogleNo$0.50/$3.0098.2%496/505
4Google logogemini-2.5-proGoogleNo$1.25/$10.0097.8%494/505
5xAI logogrok-4.1-fastxAINo$0.20/$0.5097.6%493/505
6OpenAI logogpt-5.2-chat-mediumOpenAINo$1.75/$14.0097.4%492/505
7
Mo
kimi-k2-thinkingMoonshotNo$0.40/$1.7597.2%491/505
8Anthropic logoclaude-opus-4.5AnthropicNo$5.00/$25.0097.0%490/505
9OpenAI logogpt-5.2-chat-highOpenAINo$1.75/$14.0096.8%489/505
10OpenAI logogpt-5.2-chat-lowOpenAINo$1.75/$14.0096.8%489/505
11OpenAI logogpt-5-mini-mediumOpenAINo$0.25/$2.0096.4%487/505
12OpenAI logogpt-5.1-chat-mediumOpenAINo$1.25/$10.0096.4%487/505
13
DS
deepseek-r1DeepSeekYes$0.30/$1.2096.2%486/505
14xAI logogrok-4-fastxAINo$0.20/$0.5096.0%485/505
15OpenAI logogpt-5-mini-highOpenAINo$0.25/$2.0095.6%483/505
16OpenAI logogpt-5-mini-lowOpenAINo$0.25/$2.0095.2%481/505
17OpenAI logoo4-mini-highOpenAINo$1.10/$4.4095.2%481/505
18Google logogemini-2.5-flashGoogleNo$0.30/$2.5095.0%480/505
19OpenAI logoo4-mini-mediumOpenAINo$1.10/$4.4095.0%480/505
20xAI logogrok-3-minixAINo$0.30/$0.5095.0%480/505
21
DS
deepseek-v3.2DeepSeekYes$0.22/$0.3294.9%479/505
22OpenAI logogpt-5.1-chat-lowOpenAINo$1.25/$10.0094.9%479/505
23OpenAI logoo3-mini-lowOpenAINo$1.10/$4.4094.9%479/505
24OpenAI logoo3-mini-mediumOpenAINo$1.10/$4.4094.9%479/505
25Anthropic logoclaude-3.7-sonnetAnthropicNo$3.00/$15.0094.7%478/505
26OpenAI logoo3-mini-highOpenAINo$1.10/$4.4094.7%478/505
27OpenAI logogpt-5-chatOpenAINo$1.25/$10.0094.5%477/505
28OpenAI logoo4-mini-lowOpenAINo$1.10/$4.4094.3%476/505
29OpenAI logogpt-5.1-chat-highOpenAINo$1.25/$10.0093.9%474/505
30OpenAI logogpt-4.1OpenAINo$2.00/$8.0093.7%473/505
31Google logogemini-2.0-flash-001GoogleNo$0.10/$0.4093.3%471/505
32OpenAI logogpt-5-nano-lowOpenAINo$0.05/$0.4093.3%471/505
33Meta logollama-4-scoutMetaYes$0.08/$0.3093.1%470/505
34Mistral logomistral-medium-3.1MistralYes$0.40/$2.0093.1%470/505
35Alibaba logoqwen3-235b-a22b-2507AlibabaYes$0.07/$0.4693.1%470/505
36Alibaba logoqwen3-30b-a3b-thinking-2507AlibabaYes$0.05/$0.3493.1%470/505
37OpenAI logogpt-4oOpenAINo$2.50/$10.0092.9%469/505
38OpenAI logogpt-5-nano-highOpenAINo$0.05/$0.4092.9%469/505
39OpenAI logogpt-5-nano-mediumOpenAINo$0.05/$0.4092.9%469/505
40
MM
minimax-m2MiniMaxNo$0.20/$1.0092.9%469/505
41Alibaba logoqwen3-14bAlibabaYes$0.05/$0.2292.9%469/505
42Alibaba logoqwen3-32bAlibabaYes$0.08/$0.2492.1%465/505
43OpenAI logogpt-4.1-miniOpenAINo$0.40/$1.6091.7%463/505
44Anthropic logoclaude-haiku-4.5AnthropicNo$1.00/$5.0091.5%462/505
45Google logogemini-2.5-flash-liteGoogleNo$0.10/$0.4091.3%461/505
46OpenAI logogpt-oss-120bOpenAIYes$0.04/$0.1990.7%458/505
47Alibaba logoqwen3-vl-8b-thinkingAlibabaYes$0.18/$2.1090.3%456/505
48Mistral logomistral-small-3.2-24b-instructMistralYes$0.06/$0.1889.3%451/505
49OpenAI logogpt-oss-20bOpenAIYes$0.03/$0.1489.3%451/505
50Anthropic logoclaude-sonnet-4.5AnthropicNo$3.00/$15.0089.1%450/505
51Mistral logomistral-small-24b-instruct-2501MistralYes$0.03/$0.1188.7%448/505
52Alibaba logoqwen3-8bAlibabaYes$0.03/$0.1188.7%448/505
53Microsoft logophi-4-reasoning-plusMicrosoftYes$0.07/$0.3587.7%443/505
54Mistral logoministral-14b-2512MistralYes$0.20/$0.2087.7%443/505
55Alibaba logoqwen3-vl-8b-instructAlibabaYes$0.06/$0.4087.5%442/505
56
Z
glm-4-32bZhipuYes$0.10/$0.1087.3%441/505
57Mistral logoministral-8b-2512MistralYes$0.15/$0.1586.9%439/505
58OpenAI logogpt-4.1-nanoOpenAINo$0.10/$0.4086.1%435/505
59Google logogemma-3-27b-itGoogleYes$0.04/$0.1585.3%431/505
60
DS
deepseek-r1-0528-qwen3-8bDeepSeekYes$0.02/$0.1085.1%430/505
61OpenAI logogpt-4o-miniOpenAINo$0.15/$0.6084.8%428/505
62Anthropic logoclaude-3.5-haikuAnthropicNo$0.80/$4.0084.0%424/505
63Google logogemma-3-12b-itGoogleYes$0.03/$0.1082.2%415/505
64Nvidia logonemotron-nano-9b-v2NvidiaYes$0.04/$0.1679.6%402/505
65Mistral logoministral-3b-2512MistralYes$0.10/$0.1079.2%400/505
66Mistral logomistral-nemoMistralYes$0.02/$0.0478.8%398/505
67Nvidia logonemotron-3-nano-30b-a3bNvidiaYes$0.06/$0.2477.4%391/505
68Nvidia logonemotron-nano-12b-v2-vlNvidiaYes$0.20/$0.6077.4%391/505
69Google logogemma-3n-e4b-itGoogleYes$0.02/$0.0475.2%380/505
70Meta logollama-3.1-8b-instructMetaYes$0.02/$0.0372.5%366/505
71Google logogemma-3-4b-itGoogleYes$0.02/$0.0771.3%360/505
72Meta logollama-3.2-3b-instructMetaYes$0.02/$0.0257.6%291/505

Showing 72 of 72 models

Visualizations

Accuracy vs price

Higher accuracy models tend to be more expensive. Green dots are open-weight models.

Top 33 models

Top 33 open-weight models

Domain performance heatmap

Accuracy breakdown by domain for the top 50 models.

ModelDrillingGeophysicsPet. GeologyPetrophysicsProductionReservoirSediment.
gemini-3-pro-preview
100
100
100
100
100
100
100
glm-4.7
100
100
99
98
100
100
99
gemini-3-flash-preview
100
99
99
97
100
100
99
gemini-2.5-pro
96
99
99
97
93
100
100
grok-4.1-fast
96
100
99
96
100
100
100
gpt-5.2-chat-medium
96
100
99
96
100
100
99
kimi-k2-thinking
96
99
99
96
100
98
98
claude-opus-4.5
96
96
98
96
100
100
97
gpt-5.2-chat-high
96
100
99
95
100
100
98
gpt-5.2-chat-low
96
99
99
96
100
98
98
gpt-5-mini-medium
96
100
98
95
93
100
99
gpt-5.1-chat-medium
96
98
99
95
100
100
98
deepseek-r1
96
98
99
95
100
100
97
grok-4-fast
96
100
99
93
100
100
99
gpt-5-mini-high
96
100
99
93
93
100
100
gpt-5-mini-low
96
100
97
92
100
98
99
o4-mini-high
96
100
97
92
100
100
100
gemini-2.5-flash
88
98
99
93
100
100
98
o4-mini-medium
92
99
98
92
93
100
99
grok-3-mini
96
98
98
92
100
98
98
deepseek-v3.2
92
96
97
92
100
100
97
gpt-5.1-chat-low
92
93
97
95
100
93
98
o3-mini-low
96
99
98
92
100
98
97
o3-mini-medium
96
99
99
92
100
100
97
claude-3.7-sonnet
92
94
95
93
100
100
96
o3-mini-high
96
99
98
92
100
95
97
gpt-5-chat
96
91
97
93
100
98
97
o4-mini-low
96
99
97
91
93
98
99
gpt-5.1-chat-high
96
89
96
93
100
93
99
gpt-4.1
96
90
95
92
100
95
97
gemini-2.0-flash-001
100
96
97
90
93
98
99
gpt-5-nano-low
100
95
97
90
86
95
98
llama-4-scout
88
98
96
90
100
98
98
mistral-medium-3.1
96
95
97
89
100
100
98
qwen3-235b-a22b-2507
92
93
97
91
79
95
96
qwen3-30b-a3b-thinking-2507
100
96
98
89
93
98
97
gpt-4o
92
90
96
90
100
98
97
gpt-5-nano-high
96
96
98
89
86
100
97
gpt-5-nano-medium
96
95
98
89
93
100
96
minimax-m2
96
94
95
90
86
98
96
qwen3-14b
96
95
97
90
93
95
96
qwen3-32b
88
96
95
89
86
100
97
gpt-4.1-mini
88
90
95
89
100
95
98
claude-haiku-4.5
92
95
95
88
93
100
96
gemini-2.5-flash-lite
100
94
93
89
79
93
95
gpt-oss-120b
88
94
94
88
100
95
91
qwen3-vl-8b-thinking
92
94
93
87
93
95
95
mistral-small-3.2-24b-instruct
92
91
93
86
93
95
95
gpt-oss-20b
92
96
93
85
100
93
91
claude-sonnet-4.5
88
86
89
91
100
95
83