From Large Language Models to Reasoning Language Models

Abstract

In this talk, we explore the fascinating evolution of Large Language Models (LLMs) and their transformative journey through the lenses of computation and optimization. We begin by tracing the origins of LLMs, highlighting how advances in computation and optimization were pivotal in their development. We then delve into the key optimizations that have achieved a staggering 1,000x cost reduction, making LLMs widely accessible even on portable devices. Moving forward, we address the limitations of human-generated data and introduce the concept of constructive hallucination in LLMs. This technique allows for the generation of new hypotheses and their validation through reasoning chains, pushing the boundaries of knowledge creation. Next, we provide an overview of the technology fundamentals and early successes of reasoning models, such as OpenAI’s o1 and o3 preview. These models, while significantly enhancing computational capabilities, also exponentially increase computational demands. Finally, we conclude by presenting our ambitious Ultra Ethernet effort, which aims to establish the interconnect standard for future AI workloads. This initiative is crucial in meeting the growing demands at the system level, ensuring seamless and efficient operation in the age of reasoning models.

Stop Thinking, Just Do!

From Large Language Models to Reasoning Language Models

Tags

13 January 2025

Article Source

From Large Language Models to Reasoning Language Models

Abstract

DeepSeek-R1 Paper Review