Article Source
Parallel and Distributed Computing in Python with Dask
Abstract
Dask is a library for scaling and parallelizing Python code on a single machine or across a cluster. Dask provides familiar, high-level interfaces to extend the SciPy ecosystem (e.g. NumPy, Pandas, Scikit-Learn) to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms and workflows. This tutorial will cover the ins and outs of Dask for new users, including the Dask Array and Dask DataFrame collections, low-level Dask Delayed and Futures interfaces, pros and cons of Dask’s task schedulers, and interactive diagnostic tools to help users better understand their computational performance. No previous Dask experience is required, though familiarity with Python, NumPy, and Pandas basics is preferred.
Find additional information and set up instructions for the SciPy 2020 Tutorials