About FormationEval
Overview
FormationEval is a public benchmark suite for petroleum geoscience and subsurface language model evaluation. The current evaluated public benchmark is the 505 question MCQ v0.1 track (Christmas 2025), which was built to answer the original model comparison goal of the project.
The suite now also includes two imported public tracks: 1027 DISKOS-QA items and 100 SPE MCQs. Both are exposed with provenance and licensing notes, but neither is yet part of the public leaderboard or quiz flow.
72 language models have been evaluated on the MCQ track. A full rerun on the expanded suite is pending.
Current suite status
The original FormationEval OG benchmark (505 MCQ) was created Christmas 2025. FormationEval now also includes the imported DISKOS-QA (17 March 2026) and SPE MCQ (21 March 2026) tracks. The public leaderboard and quiz still reflect the evaluated 505 question MCQ v0.1 track. A full rerun on the expanded suite is pending because this is a self funded one person project and expanded suite evaluation requires materially more token spend.
If you want to collaborate, support reruns or discuss related research and engineering work, contact almaz.ermilov@gmail.com.
MCQ v0.1
Evaluated
Created Christmas 2025. 505 questions, 72 evaluated models, public leaderboard and quiz live.
DISKOS-QA v0.2
Imported
Imported 17 March 2026. 1027 QA items added to the public suite with provenance and licensing notes.
SPE MCQ v0.3
Imported
Imported 21 March 2026. 100 imported MCQs from Yohanes Nuwara with explicit provenance, figure support, and rerun pending.
Domain distribution
The table below covers the evaluated MCQ v0.1 track only. Questions are tagged with 1 to 3 domains, so percentages sum to more than 100%.
| Domain | Count | Percentage |
|---|---|---|
| Petrophysics | 272 | 53.9% |
| Petroleum Geology | 151 | 29.9% |
| Sedimentology | 98 | 19.4% |
| Geophysics | 80 | 15.8% |
| Reservoir Engineering | 43 | 8.5% |
| Drilling Engineering | 24 | 4.8% |
| Production Engineering | 14 | 2.8% |
Difficulty distribution
The difficulty labels below also cover the evaluated MCQ v0.1 track only. The imported DISKOS-QA and SPE MCQ tracks do not use the same difficulty schema as the quiz and leaderboard flow.
| Difficulty | Count | Percentage |
|---|---|---|
| Easy | 132 | 26.1% |
| Medium | 274 | 54.3% |
| Hard | 99 | 19.6% |
Source materials
FormationEval combines an authored MCQ track and an imported QA track. The MCQ items are concept based derivations with source references. The imported DISKOS-QA rows keep upstream provenance and are published as a separate track.
Licensing note
The evaluated FormationEval MCQ track remains under the project license and paper citation. The imported DISKOS-QA track is not relicensed as project authored content. It is redistributed with upstream attribution and a separate third-party notice. The website reflects this split and does not imply that the full suite shares one blanket license.
For implementation details and the current attribution wording, see the main repository README and third party notices.
Citation
The current paper covers the FormationEval MCQ v0.1 track. Use the citation below for that release.
@misc{ermilov2026formationeval,
title={FormationEval, an open multiple-choice benchmark for petroleum geoscience},
author={Almaz Ermilov},
year={2026},
eprint={2601.02158},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.02158},
doi={10.48550/arXiv.2601.02158}
}About the author
Almaz Ermilov
Former petrophysicist, now full time software engineer. Current work and research focus on LLM transparency, control and security in high hazard industries.
If you want to collaborate, support reruns or discuss related research and engineering work, contact almaz.ermilov@gmail.com