About FormationEval

Overview

FormationEval is a public benchmark suite for petroleum geoscience and subsurface language model evaluation. The current evaluated public benchmark is the 505 question MCQ v0.1 track (Christmas 2025), which was built to answer the original model comparison goal of the project.

The suite now also includes two imported public tracks: 1027 DISKOS-QA items and 100 SPE MCQs. Both are exposed with provenance and licensing notes, but neither is yet part of the public leaderboard or quiz flow.

72 language models have been evaluated on the MCQ track. A full rerun on the expanded suite is pending.

Current suite status

The original FormationEval OG benchmark (505 MCQ) was created Christmas 2025. FormationEval now also includes the imported DISKOS-QA (17 March 2026) and SPE MCQ (21 March 2026) tracks. The public leaderboard and quiz still reflect the evaluated 505 question MCQ v0.1 track. A full rerun on the expanded suite is pending because this is a self funded one person project and expanded suite evaluation requires materially more token spend.

If you want to collaborate, support reruns or discuss related research and engineering work, contact almaz.ermilov@gmail.com.

MCQ v0.1

Evaluated

Created Christmas 2025. 505 questions, 72 evaluated models, public leaderboard and quiz live.

DISKOS-QA v0.2

Imported

Imported 17 March 2026. 1027 QA items added to the public suite with provenance and licensing notes.

SPE MCQ v0.3

Imported

Imported 21 March 2026. 100 imported MCQs from Yohanes Nuwara with explicit provenance, figure support, and rerun pending.

Domain distribution

The table below covers the evaluated MCQ v0.1 track only. Questions are tagged with 1 to 3 domains, so percentages sum to more than 100%.

DomainCountPercentage
Petrophysics27253.9%
Petroleum Geology15129.9%
Sedimentology9819.4%
Geophysics8015.8%
Reservoir Engineering438.5%
Drilling Engineering244.8%
Production Engineering142.8%

Difficulty distribution

The difficulty labels below also cover the evaluated MCQ v0.1 track only. The imported DISKOS-QA and SPE MCQ tracks do not use the same difficulty schema as the quiz and leaderboard flow.

DifficultyCountPercentage
Easy13226.1%
Medium27454.3%
Hard9919.6%

Source materials

FormationEval combines an authored MCQ track and an imported QA track. The MCQ items are concept based derivations with source references. The imported DISKOS-QA rows keep upstream provenance and are published as a separate track.

Well Logging for Earth Scientists, 2nd Edition

Darwin V. Ellis and Julian M. Singer (2007)

TextbookProprietary (Springer)

Petroleum Geoscience: From Sedimentary Environments to Rock Physics

Knut Bjørlykke (Ed.) (2010)

TextbookProprietary (Springer)

TU Delft OpenCourseWare, Applied Earth Sciences

TU Delft (2024)

Open courseCC BY-NC-SA 4.0

DISKOS-QA benchmark

George Ghon and collaborators (2026)

Imported QA benchmarkNLOD 2.0, as stated in the upstream README

Large Oil and Gas industry text dataset from Norwegian, UK and Dutch public oil and gas documents

FORCE and collaborators (2024)

Underlying corpus provenanceCC BY 4.0

SPE MCQ Dataset

Yohanes Nuwara (2025)

Imported MCQ trackMIT, as tagged in the upstream Hugging Face metadata

Study Guide for the SPE Petroleum Engineering Certification Examination (4th ed.)

Society of Petroleum Engineers (2011)

Origin note in upstream dataset cardSee upstream source context and third-party notice

Licensing note

The evaluated FormationEval MCQ track remains under the project license and paper citation. The imported DISKOS-QA track is not relicensed as project authored content. It is redistributed with upstream attribution and a separate third-party notice. The website reflects this split and does not imply that the full suite shares one blanket license.

For implementation details and the current attribution wording, see the main repository README and third party notices.

Citation

The current paper covers the FormationEval MCQ v0.1 track. Use the citation below for that release.

@misc{ermilov2026formationeval,
  title={FormationEval, an open multiple-choice benchmark for petroleum geoscience},
  author={Almaz Ermilov},
  year={2026},
  eprint={2601.02158},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2601.02158},
  doi={10.48550/arXiv.2601.02158}
}

About the author

Almaz Ermilov

Former petrophysicist, now full time software engineer. Current work and research focus on LLM transparency, control and security in high hazard industries.

If you want to collaborate, support reruns or discuss related research and engineering work, contact almaz.ermilov@gmail.com

Resources