Stop Thinking, Just Do!

Sungsoo Kim's Blog

Intern-S1 - Multimodal LLM for Science

tagsTags

1 September 2025


Intern-S1: Multimodal LLM for Science

Abstract

In this AI Research Roundup episode, Alex discusses the paper: ‘Intern-S1: A Scientific Multimodal Foundation Model’ Intern-S1 introduces a multimodal Mixture-of-Experts LLM family targeting complex scientific domains requiring vision, sequences, and time-series understanding. Built on a Qwen3-235B MoE backbone (241B total, 28B active), it’s trained on ~5T tokens with over 2.5T scientific tokens. The system integrates InternViT-6B vision, a dynamic tokenizer for SMILES/FASTA with ~70% better compression, and a time-series encoder for long signals. A three-pronged strategy spans data mining and filtering to boost scientific purity, page-level PDF parsing with VLMs, and scalable infrastructure for long-range reasoning.

Paper URL: https://arxiv.org/abs/2508.15763

Intern-S1 mini모델 공개. 논리,수학,엔지니어링계산,코딩 테스트