MiniF2F Runs (50 752) · SciLib

⊕ Dataset · minif2f-50k · r1

MiniF2F Bench Logs

Full logs of MiniF2F runs across four premise-retrieval pipelines. JSON-Lines, pass@1 metrics with Lean REPL verification, comparison of SciLib-GRC21 / LeanSearch / LeanFinder / LeanExplore.

Raw logs of the premise-retrieval benchmark on MiniF2F. Each row is one run of one task through one pipeline in one configuration (104 configurations × 488 tasks = 50 752 rows).

Row content: MiniF2F task id, Lean goal, method, configuration, top-k results, latency, model version, seed. pass@1 is computed against ground truth via Lean REPL.

Linked to the MiniF2F bench experiment. Available on request.

Tags: benchmark, lean, premise-selection, minif2f, logs

← To the catalogue