Scaling Up Graph Neural Networks to Large Graphs

Jure Leskovec
Computer Science, PhD
To follow along with the course schedule and syllabus, visit: http://web.stanford.edu/class/cs224w/

In real-world applications, such as recommendation systems and social networks, graphs can be very large with millions if not billions of nodes and edges. This makes the native full batch GNN training and testing extremely hard as the GPU memory is limited. In this lecture, we will introduce three methods that scale up GNNs: 1) Neighbor Sampling, 2) Cluster-GCN, and 3) Simplified GCN.

Cluster-GCN: Scaling up GNNs

Neighbor Sampling, presented in the previous lecture (17.2) constructs a computational graph separately for each node in a mini-batch. This creates a lot of redundancy in computing node embeddings within the mini-batch. A different approach is to sample a subgraph from a large graph that is small enough to be loaded into GPU. Then, the efficient and non-redundant full-batch GNN can be applied over the sampled subgraph. An example of this method is Cluster-GCN. Cluster-GCN first pre-processes a large graph by partitioning it into clusters of nodes. Then, during training, it samples clusters of nodes in each mini-batch and applies the full-batch GNN over the induced subgraph.

Stop Thinking, Just Do!