AstroMLab is a dynamic group of astrophysicists and computer scientists developing Large Language Models (LLMs) for astronomy. Our team includes:
We’ve achieved:
Our flagship model, AstroSage-LLaMA-3.1-8B, achieves 80.9% accuracy on the AstroMLab-1 benchmark, comparable to OpenAI’s GPT-4o with an 8-point improvement over LLaMA-3.1-8B. It operates at a fraction of the cost of other models (see AstroBench).
Model | Score (%) |
---|---|
Claude-3.5-Sonnet | 85.0 |
O1-Preview | 81.6 |
AstroSage-LLaMA-3.1-8B (AstroMLab) | 80.9 |
Mistral-Large-2 | 80.8 |
O1-Mini | 80.1 |
Grok-Beta | 79.5 |
Gemini-1.5-Pro-002 | 78.2 |
LLaMA-3.1-8B | 73.7 |
AstroLLaMA-2-70B (AstroMLab) | 72.3 |
AstroLLaMA-2-7B (UniverseTBD) | 44.3 |
All our models are available on Hugging Face
Contact us: astromachinelearninglab@gmail.com
Yuan-Sen Ting The Ohio State University |
Tirthankar Ghosal Oak Ridge National Laboratory |
Tijmen de Haan KEK |
Josh Nguyen University of Pennsylvania |
|
Rui Pan University of Illinois Urbana-Champaign |
Hardik Arora Indian Institutes of Technology |
Emily Herron Oak Ridge National Laboratory |
Yuwei Yang Australian National University |
|
Zechang Sun Tsinghua University |
Alberto Accomazzi NASA Astrophysics Data System |
Azton Wells Argonne National Laboratory |
Nesar Ramachandra Argonne National Laboratory |
|
Sandeep Madireddy Argonne National Laboratory |
We present AstroSage-LLaMA-3.1-8B, a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Through extensive data curation, massive continued pre-training, and supervised fine-tuning, we demonstrate that proper specialization of a relatively small model can achieve performance comparable to much larger flagship models.
Key points:
Rui Pan, Josh Nguyen, et al., 2024
We introduce new models: AstroLLaMA-3-8B and AstroLLaMA-2-70B, building upon the previous AstroLLaMA series and quantitatively assess specialized LLMs in astronomy, leveraging recently curated high-quality astronomical MCQs.
Key points:
Yuan-Sen Ting, et al., 2024, arXiv:2407.11194
We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics.
Key findings:
The first open-source conversational AI tool tailored for the astronomy community – AstroLLaMA-2-7B and AstroLLaMA-2-7B-Chat.