Data-Driven Mesoscopic Simulation Models of Large-Scale Surface Transit Networks
The planning of transit services, assessment of operational strategies, and evaluation of service changes across any transit network can benefit tremendously from utilizing a high-fidelity model of the entire network. While microsimulation methods represent an attractive option, they are typically used for modeling one corridor or a small sub-network. Developing a microsimulation model of an entire transit network is a daunting or otherwise practically infeasible task, particularly for extensive networks. In response to this challenge, this study presents an alternative data-driven mesoscopic simulation pipeline that models surface transit movements based on open data and machine learning. Vehicle running speed regression models and lognormal dwell time distribution models were used to perform the mesoscopic transit simulations. A comprehensive comparison of running speed models using multiple linear regression (MLR), support vector machine (SVM), linear mixed effect model (LME), regression tree (RT) and random forest (RF) showed that LME and RF outperformed MLR in terms of root mean squared error (RMSE) by 7.9% and 5.9% respectively. It was also found that the dwell time models and the running time model together adequately replicated the variations in headways, delays, and dwell times. Validations of the simulation at the stop level and the route level, while showing encouraging results, demonstrated the need to capture passenger demand and congestion variations using additional data in future studies.