Page cover

Use custom Docker images & GPUs

Run Burla workers with custom images, native tools, and GPUs.

Use this when your worker needs CUDA, native binaries, pinned system packages, large model weights, or a private runtime. Do not build an image for a small pure-Python job unless package install time is already the problem. The unit of work stays the same: one file, batch, tile, sample, or shard per input. Each worker runs your function inside the image you pass to remote_parallel_map. The output should be small metadata or a path to files written by the worker.

An image changes the worker environment. It should not change the shape of the job.

When to use an image

Use a custom image when the worker needs:

  1. native tools such as bwa, samtools, gdal, ffmpeg, or OCR libraries

  2. CUDA libraries for PyTorch, TensorFlow, CLIP, YOLO, or embedding models

  3. large model weights that should not download on every worker startup

  4. system packages that pip install cannot provide

  5. a pinned Python environment that must match production

For ordinary Python packages, start without a custom image. Burla can install many Python dependencies at runtime.

Build the smallest image that contains the slow parts

For native command-line tools, start from an image that already has the system packages you need.

FROM python:3.12-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    bwa \
    samtools \
    && rm -rf /var/lib/apt/lists/*

RUN pip install boto3 awscli

Build and push it to a registry your Burla workers can pull from.

Run native tools from a worker

Plan one sample per input.

The worker can call the tools directly.

Run the job with the image and the resources one sample needs.

The output is a report. The BAM files are written to object storage by the worker.

Use a CUDA image for GPU work

For GPU jobs, start from a CUDA runtime image or a framework image with CUDA already installed.

Baking model weights into the image makes startup slower at build time and faster at job time. That is usually the right trade when many GPU workers load the same model.

Cache heavy models on the worker

Plan text shards or document batches on the client.

Cache the model on the worker process so later inputs on the same worker do not reload it.

Ask for a GPU and cap parallelism to your GPU quota.

Then reduce paths, not arrays.

Match Python versions

The Python version in your client and the image should match.

If your image runs Python 3.12, run your local script with Python 3.12. Version drift can look like a Burla or Docker problem when it is really a serialization problem.

Keep credentials out of the image

Do not bake API keys, database passwords, or cloud credentials into an image.

Use runtime environment variables, workload identity, service accounts, or the cloud permissions already attached to the worker.

The image should contain code dependencies. Runtime credentials should stay runtime credentials.

Choose resources from the worker

Set resources from what one worker does:

  1. func_cpu: threads used by one task

  2. func_ram: peak memory for one task

  3. func_gpu: GPU type needed by one task

  4. max_parallelism: quota or external bottleneck

  5. image: environment needed by one task

Do not ask for a GPU because the whole pipeline has a GPU step. Ask for a GPU only on the call whose worker uses CUDA.

Examples that use this pattern