SciLib
A laboratory for semantic analysis of scientific data. Ontology, knowledge graph and vector space — a single contract.
Active projects
What we do
Семантический контракт между формальной онтологией, графом знаний и векторным пространством. Первая предметная область — математика: Lean4 Mathlib.
MathLib Explorer
Semantic and graph navigation across 213 000+ Mathlib statements. Browse typed dependencies between theorems.
Open → ⊢Graph RAG
Premise retrieval for Lean4 goals over a formal knowledge graph. Eight pipeline steps, no LLM calls on the hot path.
Open → ∂Lab
Published research, open models and datasets of the SciLib lab.
Open →Journal
From the blog
Simplex over pairs: geometry of multimodal mathematics
Why SciLibModal uses centroid geometry instead of the pairwise CLIP scheme.
04 May 2026 · A. Khalov ∇SciLib-GRC21: eight steps without an LLM
We walk through the design of our pipeline that retrieves premises for Lean4 goals.
04 May 2026 · A. Khalov ∇scilibai.ru is open
We have launched the public part of the SciLib lab.
04 May 2026 · Команда SciLibAbout
SciLib is a laboratory for semantic analysis of scientific data. We build infrastructure where the meaning of a scientific object (what it means), its representation (how it is expressed — text, formula, code, image) and its resource (where and how it is stored) are kept separate and connected by explicit semantic relations. This architecture provides addressee-invariance: the same knowledge is read by humans and processed by programs without losing semantics.
Our first domain is formalized mathematics. Built on top of it: the SciLib ontology (a modular OWL/DL ontology with three levels of description), a knowledge graph over Mathlib (213 000+ statements, typed edges usesInType/usesInValue, 660+ subclasses with provenance), a multimodal vectorization, and a premise retrieval pipeline driven by deterministic graph expansion.
All components are reachable through public APIs: /grag/* for premise retrieval and Lean4 verification, /mcp/* for semantic search, graph queries and corpus access. Most artifacts are released under open licenses with attribution; some are available on request.