Checkout, Frequently Asked Questions!

Dask#

Dask is a library for parallel computing in Python. From a programmer’s point of view, it has two components:

  1. Dynamic task scheduler to optimize computation as farm of tasks. Once tasks are formed Dask can pipeline them as a workflow engine/manager and schedule them on available resources

  2. Data structure representations called collections which are conducive to larger than memory computation when dealing with Big Data. When working with these data structures, a Python developer trades in familiar territories as working with Numpy, Pandas or Python futures etc. These collections leverage from the task scheduler to run.