Page cover

Decide how to split your work

Pick the input unit for a Burla job.

Use this when you know what code should run, but not what to pass as the input list. Do not use this to tune a job whose input unit is already obvious. The unit of work is one item from the list you pass to remote_parallel_map. Each worker should own enough work to be worth starting, but not so much that one failure wastes an hour. The output should be small, or a path to a file written by the worker.

The main decision in a Burla job is not the cluster size. It is the shape of the input list.

The rule

Pick an input unit that is:

  1. independent from the other inputs

  2. large enough to amortize startup and setup

  3. small enough to fit in worker memory

  4. cheap enough to retry

  5. aligned with the output you need

If the input unit is wrong, more machines usually make the wrong thing happen faster.

Common split patterns

Use one file per input when files are already the natural boundary.

from pathlib import Path

file_paths = [str(path) for path in Path("/workspace/shared/raw").glob("*.parquet")]

Use several files per input when each file is tiny and startup would dominate.

Use row ranges when a database table has an indexed numeric column.

Use byte ranges when one line-oriented file is too large to read on one machine.

Use chunks of URLs, IDs, or prompts when the bottleneck is an API, website, database, or model provider.

Use one tile, scene, sample, or shard when the source system already has useful boundaries.

Write the worker around one input

The worker should make one input useful. Keep source reads, local work, and output writes together.

Then run that worker over the input list.

Return small outputs

Returning small dicts, numbers, or short strings is fine.

For large outputs, write a file from inside the worker and return the path.

This avoids sending large dataframes or arrays back through the client.

Pick chunk size from the bottleneck

If startup or model load time is expensive, make each input bigger.

If failures are common, make each input smaller.

If memory is tight, make each input smaller or ask for more func_ram.

If the output is large, write files to /workspace/shared and reduce paths later.

If the bottleneck is outside Burla, start with the external limit. API quotas, website politeness, database connections, object storage bandwidth, and GPU memory matter more than CPU count.

When to reduce

Reduce when the final answer needs a global view.

Examples:

  1. many file reports into one CSV

  2. many JSONL shards into one table

  3. many heaps into one top-K list

  4. many image manifests into one training manifest

For small outputs, reduce locally after remote_parallel_map returns.

For large outputs, run a second Burla call over output paths.

Start with this checklist

Before you run the full job, write down:

  1. What is one input?

  2. What does one worker read?

  3. What does one worker write or return?

  4. What is the external bottleneck?

  5. What resource does one worker need?

  6. What happens if one input fails?

  7. Does the final answer need a reduce step?

That checklist catches most bad Burla job shapes before you spend money on them.

Examples that use this pattern