Article Source
ST-UNet: A Spatio-Temporal U-Network for Graph-structured Time Series Modeling
Abstract
The spatio-temporal graph learning is becoming an increasingly important object of graph study. Many application domains involve highly dynamic graphs where temporal information is crucial, e.g. traffic networks and financial transaction graphs. Despite the constant progress made on learning structured data, there is still a lack of effective means to extract dynamic complex features from spatio-temporal structures. Particularly, conventional models such as convolutional networks or recurrent neural networks are incapable of revealing the temporal patterns in short or long terms and exploring the spatial properties in local or global scope from spatio-temporal graphs simultaneously. To tackle this problem, we design a novel multi-scale architecture, Spatio-Temporal U-Net (ST-UNet), for graph-structured time series modeling. In this U-shaped network, a paired sampling operation is proposed in spacetime domain accordingly: the pooling (ST-Pool) coarsens the input graph in spatial from its deterministic partition while abstracts multi-resolution temporal dependencies through dilated recurrent skip connections; based on previous settings in the downsampling, the unpooling (ST-Unpool) restores the original structure of spatio-temporal graphs and resumes regular intervals within graph sequences. Experiments on spatio-temporal prediction tasks demonstrate that our model effectively captures comprehensive features in multiple scales and achieves substantial improvements over mainstream methods on several real-world datasets.
An illustration of the proposed Spatio-Temporal U-Net architecture. ST-UNet employs graph convolutional gated recurrent units (GCGRU) as its backbone. In this example, the proposed framework contains three GCGRU layers formed as a U-shaped structure with one ST-Pool and one ST-Unpool applied in one side respectively. Spatio-temporal features obtained from the input are downsampled into multi-resolution representations through a ST-Pooling operation. As subgraph (a) represents, the input graph at each time step is equally coarsened into nearly a quarter of its original size at the level 2 combining with feature pooling regarding the channel dimension. Meanwhile, the temporal dependency of the input sequence is dilated to 2 with skip-connections crossing every other recurrent unit, as shown in subgraph (b). The ST-Unpooling, as a reverse operation, restores the spatio-temporal graph into its original structure with upsampling in spatial features and resumes regular dependencies of time series concurrently. To assemble a more precise output with better localized representations, high-level features of the pooling side are fused with the upsampled output through a skip connection at the same level. The final output can be utilized for predicting node attributes or the entire graph in the next few time steps.