Stop Thinking, Just Do!

Sungsoo Kim's Blog

Parallel and Distributed Computing in Python with Dask

tagsTags

5 April 2021


Article Source


Parallel and Distributed Computing in Python with Dask

Abstract

Dask is a library for scaling and parallelizing Python code on a single machine or across a cluster. Dask provides familiar, high-level interfaces to extend the SciPy ecosystem (e.g. NumPy, Pandas, Scikit-Learn) to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms and workflows. This tutorial will cover the ins and outs of Dask for new users, including the Dask Array and Dask DataFrame collections, low-level Dask Delayed and Futures interfaces, pros and cons of Dask’s task schedulers, and interactive diagnostic tools to help users better understand their computational performance. No previous Dask experience is required, though familiarity with Python, NumPy, and Pandas basics is preferred.

Find additional information and set up instructions for the SciPy 2020 Tutorials


comments powered by Disqus