
Tune XGBoost on 1,000 CPUs
In this example we:
Download a 1.4GB Kaggle flight-delay dataset.
Boot 13 machines with 80 CPUs each.
Train 36 XGBoost models in parallel and pick the best one.
Dataset: 2022 flight delays
The dataset is Combined_Flights_2022.csv, a commercial flight-delay CSV from Kaggle.
The question is simple: given the route, airline, schedule, airport ids, and date fields, can we predict whether a flight arrives at least 15 minutes late?
This is the kind of job where I don't want to build a training platform. I want to try a bunch of XGBoost settings, look at the AUCs, and move on.
Step 1: Upload the CSV
Download the CSV here:
Then upload it to the Burla filesystem.

Files uploaded here appear at ./shared inside every worker container. Files your code writes there show up back in the dashboard, so you can download results later.
Step 2: Boot some VMs
For this run we boot 13 VMs, each with 80 CPUs.

The workers use the python:3.12 image. That image does not already have XGBoost, pandas, or sklearn installed, which is fine. Burla detects the local packages the function needs and installs them inside the worker containers when the job starts.
Step 3: Write the model function
Each function call trains one model with one set of hyperparameters. It loads the CSV from ./shared, cleans a few columns, trains XGBoost with all 80 CPUs, and returns the AUC.
Step 4: Test one model
Before launching the whole grid, run one model. This is the smoke test. It tells us the CSV path, packages, memory, and function shape are all sane.
We pass func_cpu=80 because the XGBoost call uses n_jobs=80. There is no magic here. The function asks for the same hardware the training code is going to use.
Step 5: Run the grid
Now we send 36 parameter sets to the cluster.
Because we have 13 machines, the first 13 models start immediately and the rest wait in the queue. Burla can queue millions of inputs, so 36 models is nothing.
You can watch logs from the Jobs tab while the models train.

What's the point?
The point is not that XGBoost is hard to parallelize. It isn't.
The annoying part is usually the stuff around it: picking machines, getting the CSV onto them, installing packages, running many trainings at once, and collecting the scores without turning the notebook into a platform project.
If someone asked me to try a parameter grid on this dataset, this is the version I would actually want to run. Same CSV, same model code, no rewrite into a different framework just because my laptop is too small.
Last updated