SciLibModal

Multimodal encoder: a scientific object is modelled as a simplex in a shared 312-dim space. Vertices are modalities (EN, RU, Lean4, LaTeX, image); the centroid is an aggregate representation invariant to the choice of modality.

SciLibModal is the multimodal encoder for mathematical objects. It takes five modalities of one object and places them in a shared semantic space R^D, D = 312.

Idea: simplex and centroid

Each representation of an object is a vertex of a simplex. The centroid is the aggregate representation invariant to a specific modality. Training shrinks the simplex of one object (modalities should be consistent) while pushing centroids of different objects apart in the shared space.

Loss

Alignment: (1/M) · Σ ‖z_i − μ‖² shrinks the simplex.
Contrastive: pushes centroids of different objects apart (a modification of Feng et al. 2014 where the centroid plays the role of a universal object representation).
Anti-collapse regularization: prevents degenerate configurations where simplex vertices line up.

Loss cost is O(M), in contrast to the pairwise CLIP-like scheme (O(M²)).

Metrics (EXP-012)

Cross-modal recall@1 = 0.7410, recall@10 = 0.8894 (e8c, ConvNeXt-Tiny + XLM-R base).
Symbolic cluster (EN ↔ RU ↔ Lean) — recall@1 = 0.80–0.93.
Visual-formal (LaTeX ↔ image) — recall@1 ≈ 0.90.
Leave-one-out R@1 ≈ 0.999 — robustness to a missing modality.

Use

(1) Semantic search over Mathlib through the Qdrant collection scilib_mathlib_v1; (2) optional vector augmentation of SciLib-GRC21 (step 8); (3) near-duplicate detection in formalized corpora.

Access

API via the MCP tool semantic_search. Model weights and training corpus (sciLibRuModal v2) — available on request via info@scilibai.ru.