Stop Thinking, Just Do!

Sungsoo Kim's Blog

Language Modeling from Scratch

tagsTags

18 April 2025


Article Source


Stanford CS336 Language Modeling from Scratch

  • For more information about Stanford’s online Artificial Intelligence programs visit: https://stanford.io/ai

To learn more about enrolling in this course visit: https://online.stanford.edu/courses/cs336-language-modeling-scratch

To follow along with the course schedule and syllabus visit: https://stanford-cs336.github.io/spring2025/

Abstract

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleansing for pre-training, transformer model construction, model training, and evaluation before deployment.

Due to high compute requirements for this class and high workload, we unfortunately have to limit enrollment. Instead of direct enrollment request, all applicants will join a waitlist. If you would like to be considered please submit a completed non-degree application and the course specific application: https://docs.google.com/forms/d/e/1FAIpQLSdSby5uGJGfsw6Q-R0e-BWKDd0qBbtfwraJ5C2VkWBmkZMfOQ/viewform?usp=sf_link

Content

What is this course about?

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment. Prerequisites

Proficiency in Python

The majority of class assignments will be in Python. Unlike most other AI classes, students will be given minimal scaffolding. The amount of code you will write will be at least an order of magnitude greater than for other classes. Therefore, being proficient in Python and software engineering is paramount.

Experience with deep learning and systems optimization

A significant part of the course will involve making neural language models run quickly and efficiently on GPUs across multiple machines. We expect students to be able to have a strong familiarity with PyTorch and know basic systems concepts like the memory hierarchy.

College Calculus, Linear Algebra (e.g. MATH 51, CME 100)

You should be comfortable understanding matrix/vector notation and operations.

Basic Probability and Statistics (e.g. CS 109 or equivalent)

You should know the basics of probabilities, Gaussian distributions, mean, standard deviation, etc.

Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N)

You should be comfortable with the basics of machine learning and deep learning.

Note that this is a 5-unit class. This is a very implementation-heavy class, so please allocate enough time for it.


comments powered by Disqus