Combine many results/files into one. (Map-Reduce)

A beginner-friendly map-reduce pattern for combining many outputs into one file.

Map-reduce means:

  • map: run many function calls in parallel

  • reduce: combine their outputs into one result

Why you might need this

Use this when you want to do lots of work in parallel, but end with one final output.

  • Many files → one report

  • Many small results → one total (this example)

Map writes outputs to /workspace/shared. Reduce reads them back and combines them.

Before you start

Make sure you have already:

  1. installed Burla: pip install burla

  2. connected your machine: burla login

  3. started your cluster in the Burla dashboard

If you’re new to /workspace/shared, start with Read and Write GCS Files. If you’re new to func_cpu and func_ram, start with Run code on one big cloud machine.

Step 1 (Map): Write one file per input

This creates 5 files in /workspace/shared/map-reduce-demo/parts/.

Step 2 (Reduce): Combine all files into one file

The reduce step runs once, so it is a common place to use a bigger machine (more CPU / RAM).

The final combined file path is:

  • /workspace/shared/map-reduce-demo/final/total.txt