Domain-Specific Reasoning for Science
AstroMLab is a community-led effort building open domain models, scientific benchmarks, and autonomous research agents for astrophysics.

Open, Calibrated, and Sovereign AI
Commercial general-purpose AI models are closed, expensive, and struggle with specific scientific logic. We train and align open domain models that run locally, trace literature accurately, and bridge unstructured publications with structured databases.
Interactive Research Pipeline
Click on any pipeline node to inspect our scientific workflows, data integration layers, and autonomous agents.

Domain-specific benchmarking for physical sciences
Evaluating AI on astronomy requires doctoral-level conceptual reasoning. We design benchmarks and testing methodologies to measure true knowledge recall, reasoning calibration, and cost-efficiency curves.
- ARA&A-derived expert benchmark (AstroMLab-1)
- Evaluation framework for scientific research assistants (EAIRA)
- Multimodal benchmarks for scientific charts and visual plots

The Model Family
Specialized model weights optimized for physical sciences reasoning tasks.
The AstroSage family
AstroSage-Llama-3.1-70B
Flagship reasoning model. Specialized on two decades of physics and astronomy literature to execute multi-step scientific reasoning.
AstroSage-Llama-3.1-8B
Compact and efficient domain-specific assistant. Brings frontier-class in-domain accuracy to local consumer-grade hardware.
AstroLLaMA-2 / 3
The original foundation line for astronomy LLMs that established specialized pre-training and alignment practices.
The AstroMLab series
Trace our contributions from the first astronomy benchmark to model architecture, knowledge extraction, and recommender graphs.
AstroMLab 1: Who Wins Astronomy Jeopardy!?
The first astronomy-specific LLM benchmark and a broad evaluation of proprietary and open models.
Derived from ARA&A review articles to measure high-level scientific reasoning.
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy
First rigorous benchmarking of specialized astronomy LLMs; the first 70B-parameter astronomy model.
Demonstrates that continual pre-training pays off at scale over naive specialization.
AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model
AstroSage-8B — a domain-specialized assistant trained on two decades of astronomy literature.
Domain alignment creates compact models matching commercial models in-domain.
AstroMLab 4: Benchmark-Topping Performance in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model
AstroSage-70B — reasoning-capable, domain-specialized, evaluated against 119 models.
Combines domain expertise with reasoning traces for complex scientific Q&A.
AstroMLab 5: Structured Summaries and Concept Extraction for 400,000 Astrophysics Papers
A knowledge layer over all of astro-ph: structured summaries plus a curated concept vocabulary.
Bridges unstructured literature text and structured databases for AI search agents.
Autonomous Agents & Related Publications
Forecasting concept-object knowledge graph links using ALS with similarity smoothing.
Autonomous spectral line fitting with LLM visual QA.
Multi-agent tree search emulating human reasoning on SED fitting.
Evaluating state-of-the-art models on astronomy Olympiad theory exams.
Efficiently repurposing LLaMA-3.1-8B to predict galaxy redshifts via LoRA.
AI literacy integration in undergraduate education and AstroTutor.
Argonne-led multi-faceted evaluation framework for scientific assistants.
A 5-step data synthesis pipeline for multimodal chart reading.
Across the community
Astrophysicists, cosmologists, and ML researchers led by Yuan-Sen Ting, spanning national labs and universities worldwide.














