
Decide how to split your work
Pick the input unit for a Burla job.
Use this when you know what code should run, but not what to pass as the input list. Do not use this to tune a job whose input unit is already obvious. The unit of work is one item from the list you pass to remote_parallel_map. Each worker should own enough work to be worth starting, but not so much that one failure wastes an hour. The output should be small, or a path to a file written by the worker.
The main decision in a Burla job is not the cluster size. It is the shape of the input list.
The rule
Pick an input unit that is:
independent from the other inputs
large enough to amortize startup and setup
small enough to fit in worker memory
cheap enough to retry
aligned with the output you need
If the input unit is wrong, more machines usually make the wrong thing happen faster.
Common split patterns
Use one file per input when files are already the natural boundary.
from pathlib import Path
file_paths = [str(path) for path in Path("/workspace/shared/raw").glob("*.parquet")]Use several files per input when each file is tiny and startup would dominate.
Use row ranges when a database table has an indexed numeric column.
Use byte ranges when one line-oriented file is too large to read on one machine.
Use chunks of URLs, IDs, or prompts when the bottleneck is an API, website, database, or model provider.
Use one tile, scene, sample, or shard when the source system already has useful boundaries.
Write the worker around one input
The worker should make one input useful. Keep source reads, local work, and output writes together.
Then run that worker over the input list.
Return small outputs
Returning small dicts, numbers, or short strings is fine.
For large outputs, write a file from inside the worker and return the path.
This avoids sending large dataframes or arrays back through the client.
Pick chunk size from the bottleneck
If startup or model load time is expensive, make each input bigger.
If failures are common, make each input smaller.
If memory is tight, make each input smaller or ask for more func_ram.
If the output is large, write files to /workspace/shared and reduce paths later.
If the bottleneck is outside Burla, start with the external limit. API quotas, website politeness, database connections, object storage bandwidth, and GPU memory matter more than CPU count.
When to reduce
Reduce when the final answer needs a global view.
Examples:
many file reports into one CSV
many JSONL shards into one table
many heaps into one top-K list
many image manifests into one training manifest
For small outputs, reduce locally after remote_parallel_map returns.
For large outputs, run a second Burla call over output paths.
Start with this checklist
Before you run the full job, write down:
What is one input?
What does one worker read?
What does one worker write or return?
What is the external bottleneck?
What resource does one worker need?
What happens if one input fails?
Does the final answer need a reduce step?
That checklist catches most bad Burla job shapes before you spend money on them.