⊕ Dataset · slm-data-v2 · v2.0
sciLibRuModal v2
Training corpus for the SciLibModal model: Mathlib mathematical objects in five modalities (EN, RU, Lean4, LaTeX, image) anchored in the SciLib ontology.
sciLibRuModal v2 is the multimodal training corpus used for SciLibModal. Each record is one mathematical object from Mathlib in five consistent modalities; all modalities are anchored in a single interpretation entity via the SciLib ontology.
Record content: EN/RU statement, Lean signature and body, LaTeX (where generated), formula image (where available). All modalities are parts of one semantic object.
Access: on request via info@scilibai.ru with attribution.
Tags: dataset, multimodal, math, lean, latex, image